AMCC 3Ware 9650SE Serial ATA Controller Series Review

This article is going to reopen the series of tests devoted to RAID controllers, which was temporarily absent from our regular review schedule. Today we are going to talk about a new lineup from AMCC 3Ware – multi-channel RAID controllers for PCI Express.

by Aleksey Meyev
05/15/2008 | 11:40 AM

We are resuming our RAID controller reviews with the 3Ware 9650SE-16ML model to show our respect to the company that has undergone a lot of changes recently. Particularly, it has been devoured by AMCC. The company transformed and its products changed, too. They would acquire an external cache memory module but then would lose it again.

 

Keeping up with the times, 3Ware products acquired SATA2 support, transitioned to the PCI Express bus, began to work with the now-fashionable RAID6. The outcome of all this is the 3Ware 9650SE series. One controller from this series will be discussed in this review.

Closer Look at AMCC 3Ware 9650SE-16ML

Not a senior model in the series. The 9650SE-16ML controller has 16 ports for hard disks (four multilane connectors each of which supports up to four devices) and 256 megabytes of DDR2-533 SDRAM. The senior model is called 9650SE-24MB and it has 24 ports and 512 megabytes of memory. And if you need low-profile controllers, you’ll find them in this series too, but they support no more than eight ports. The controllers all support the PCI Express interface, but the junior model with two ports is equipped with PCI Express x1, the 4- and 8-port models with PCI Express x4, and the senior models support PCI Express x8. Of course, the senior models are compatible with the narrower buses. Every version, except for the junior 9650SE-2LP, supports a battery module for cache memory.

 

Besides the controller, the box contains a standard selection of accessories: a user manual, a disc with drivers, and four interface cables.

  

An AMCC Power PC 405CR chip is the controller’s processor. 3Ware doesn’t specify its frequency but it must be 266MHz judging by the marking. Two eight-channel SATA-II chips Marvell 88SX6081-BCZ1 are responsible for the disks. This is a noteworthy change – 3Ware used to employ its own chips before.

The controller supports a long list of operating systems including 32-bit and 64-bit versions of Microsoft Windows 2003/XP/2000, Red Hat Linux, SuSE Linux, Fedora Linux, Linux cores 2.4 and 2.6, FreeBSD. You can find the full list at the manufacturer’s website and download the latest drivers and firmware from there.

The 3Ware 9650SE-16ML supports RAID 0, 1, 5, 6, 10, 50 as well as Single Disk mode (when the controller “outputs” the attached HDDs as independent disks) and JBOD. There is one peculiarity about RAID6: the controller doesn’t allow using this array type for less than five HDDs. You cannot build a four-disk RAID6 neither through the BIOS nor in the 3DM program. We were really perplexed at that because four disks is the theoretical minimum necessary for a RAID6 (as opposed to RAID5, this array type needs one disk more to store the checksum). Still, the controller would not do that and we had to contact the developer’s tech support for the explanation.

We found out that the prohibition of four-disk RAID6 had no technical foundation. It is all about marketing, actually. They thought that if the performance of a four-disk RAID6 is lower than that of a four-disk RAID10, the user should not be given the opportunity to build a four-disk RAID6 at all.

As a matter of fact, RAID10 has lower reliability than RAID6. A RAID6 can survive the failure of any two disks whereas data on a RAID10 only survive if the two malfunctioning disks are in different mirror pairs (to remind you, RAID10 is a stripe of mirror pairs). The logic of the marketing people from 3Ware escapes us really.

The controller supports such technologies as SATA2, eighth-generation StorSwitch architecture, 64-bit LBA addressing, and StreamFusion for optimizing multithreaded requests. It comes with the 3DM version 2 program that can be used to manage the hard disks and arrays under Windows.

The program offers information about the controller and disks attached to it as well as about the battery status. It also displays system warnings and controller messages. Its main purpose is in managing arrays and specifying the controller’s settings in the appropriate sections of the Management tab.

We won’t describe the settings because you can see them clear enough in the screenshots above. We used this program to manage the arrays and controller throughout our tests and found it to be an easy-to-use tool indeed. Although lacking a pretty interface, it offers all the options in a logical manner. You don’t need a user manual to find your way around. The available features are labeled in such a way as to be intuitively understood by everyone who has some basic understanding of RAID technology.

Testbed and Methods

The following benchmarks were used:

Testbed configuration:

The controller’s firmware was version FE9X 3.08.02.005 and the driver version was 3.00.02.140. 3DM version 2 was used for monitoring and synchronizing the arrays.

The controller was installed into a PCI Express x8 slot but the 3DM program reported that it worked in PCI Express x4 compatibility mode then. Well, the 10GBps bandwidth available in that mode was quite enough for our arrays that consisted of a maximum of 4 disks.

We didn’t test RAID6 because the 3Ware 9650SE-16ML didn’t allow us to build a RAID6 array out of four disks.

 

The Western Digital WD740GD (Raptor 2) hard disks were installed into the standard boxes of the SC5200 system case.

The controller was set at the Performance mode for maximum performance during the tests. This mode allows deferred writing and look-ahead reading for both the controller (in its own buffer memory) and the disks. The Performance mode should not be used without a cache battery because there is a high risk of losing data in case of a power failure.

Performance in Intel IOMeter

Database Pattern

In the Database pattern the disk array is processing a stream of requests to read and write 8KB random-address data blocks. The ratio of read to write requests is changing from 0% to 100% throughout the test while the request queue size varies from 1 to 256.

We’ll discuss the results for queue depths of 1, 16 and 256.

At high percentages of reads the arrays all have similar performance. The differences can only be seen in the right part of the diagram where deferred writing influences the results. The speed of the RAID0 arrays increases depending on the number of disks in them, but a strict proportion can only be observed at 80% or higher writes.

It’s interesting to watch the behavior of the mirrored arrays RAID1 and RAID10. The former is ahead of the single disk until high percentages of writes. The same is true for the RAID10 in comparison with the two-disk RAID0 but it has a sudden slump at 60% writes.

The RAID5 arrays are no good at writing data due to the checksum calculation overhead. Take note that the performance of this array type depends but little on the number of disks in it. With RAID5 technology each write operation involves two read operations, two XOR operations, and two writes. All these disk operations are performed on two disks irrespective of the total amount of disks in the array. The controller can reduce the disk load by reading a full stripe (look here for details) when processing sequential data, but has to stick to the 2R-2X-2W method for random addresses.

When the queue is increased to 16 requests, the picture is different because the controller has got the opportunity to load all the disks of the array. The RAID0 arrays show a proportional performance growth depending on the number of disks under any load. Well, the four-disk RAID0 doesn’t speed up as much as you could expect looking at the same-type arrays with fewer disks. These arrays also increase their speed at high percentages of writes but this trend is not so conspicuous here because the left part of the graphs has risen up, too.

The mirrored RAID1 and RAID10 arrays are faster at reading. They are always ahead of the single disk and two-disk RAID0, respectively, in every test mode except for writes-only. Take note that the RAID10 proves to be the fastest array in the reads-only mode, outperforming the four-disk RAID0. The RAID1 is considerably faster than the two-disk RAID0 in this mode.

The RAID5 arrays are about as fast as the RAID0 arrays with the same amount of disks at random reading but slow down suddenly as soon as there are write requests to be performed. As a result, they are slower than the single disk or any other array at 70% and higher writes. Interestingly, the three-disk RAID5 is somewhat faster than the four-disk RAID5 in the writes-only mode.

There are few changes in the standings when the request queue is increased further. We can note all the arrays deliver even higher speeds when processing read requests. The four-disk RAID0 and RAID5 are ahead of the RAID10 in the reads-only mode.

Here are diagrams that show the performance of each array at five different queue depths:

Random Read & Write Patterns

Now we’ll see the dependence between the arrays’ performance in random read and write modes and the data chunk size.

The arrays all deliver similar performance when reading small-size data chunks. The single disk and the two-disk RAID1 are somewhat faster than the others. The different array types begin to differ in performance as the data chunk size grows larger. Generally speaking, arrays with more disks are somewhat faster. The amount of disks being equal, RAID0 is the fastest array type. RAID5 is second best, followed by the mirrored arrays. The RAID10 breaks this rule, having a performance boost at 2MB data chunks.

It’s simpler with writing because the standings of the arrays do not differ throughout the test. As could be expected, the RAID5 arrays have the lowest speeds of writing. The RAID1 has a higher speed yet it is anyway slower than the single disk. The RAID10 is better, and the RAID0 arrays are better yet, ranked according to the number of disks per array.

Disk Response Time

IOMeter is sending a stream of requests to read and write 512-byte data blocks with a request queue depth of 1 for 10 minutes. The results are sorted by the write response.

While the read response seems to be the same for every array type, there is a logical trend at writing. The RAID5 arrays have the highest response at writing and it almost doesn’t depend on the number of disks in the array. This must be due to the sophisticated algorithm for checksum processing. It is performed by the XOR-processor and takes about the same time for any number of disks. The rest of the results are similar to the Random Write test: the RAID1 is slower than the single disk which is outperformed by the RAID10. The RAID0 arrays are in the lead.

Sequential Read & Write Patterns

IOMeter is sending a stream of read and write requests with a request queue depth of 4. The size of the requested data block is changed each minute, so that we could see the dependence of the array’s sequential read/write speed on the size of the data block. This test is indicative of the highest speed the array can achieve.

We’ve got a nearly ideal picture of maximum speeds at reading. The RAID5 arrays with N disks are as fast as the RAID0 arrays with N-1 disks. The mirrored arrays with 2*N disks are exactly as fast as RAID0 arrays with N disks (e.g. the two-disk RAID1 equals the single disk). You can see a characteristic thing with the mirrored arrays: their speed grows up somewhat on very large data chunks because the controller begins to read data from both disks of each mirror.

Note also that the many-disk arrays only show their advantage with 8KB and larger blocks. With smaller data blocks the arrays differ but slightly.

Sequential writing is just as good as sequential reading: everything works well. We don’t see any negative influence of the controller’s algorithms. That’s an excellent result that doesn’t need any comments.

Multithreaded Read & Write Patterns

The multithreaded tests simulate a situation when there are one to four clients accessing the virtual disk at the same time – the clients’ address zones do not overlap. We’ll discuss diagrams for a request queue of 1 as the most illustrative ones.

Here is the first surprise, and a rather unpleasant one. Neither array can reach a read speed of 80MBps with one thread, which is a very low result especially for RAID0 arrays. The read speed of the single disk is 63MBps, so the performance gain is very small. You can also note another oddity here: the RAID0 and RAID5 arrays with many disks are slower than the same-type arrays with fewer disks.

When the number of threads is increased, the mirrored arrays go ahead. The speed of the two-disk RAID1 indicates that the data are being read at full speed from both disks of the mirror. The RAID10 delivers an even higher speed but without the correlation the RAID1 shows. The other array types are still very slow – and slower than at reading one thread. The standings have become more logical, though. The RAID5 are faster than the RAID0 and inside each of these array types the speed grows up depending on the number of disks in the array.

Of course, these low results may be also due to the HDDs’ algorithms that are allowed to work in the controller’s Performance mode. The Areca 1220 controller had better results in this test, however.

There are no surprises at writing. The two-disk RAID1 is as fast as the single disk. The two-disk RAID0, three-disk RAID5 (one of its disks stores the checksum rather than data even though the checksum is evenly distributed among the disks in practice) and four-disk RAID10 are two times faster. The three-disk RAID0 and four-disk RAID5 are three times as fast as the single disk. The four-disk RAID0 is four times as fast as the latter.

The standings do not change at two threads – the arrays just slow down somewhat. All the arrays, save for RAID5, survive the addition of more threads well enough. They only lose a little more speed. The RAID5 type slows down to the level of the single disk at three and four threads. This must be the processor’s fault.

Web-Server, Fileserver, Workstation Patterns

The controllers are tested under loads typical of servers and workstations. The server tests are the most interesting to us, being the most typical application for a RAID controller. Workstations may also have a high disk load, though.

The names of the patterns are self-explanatory. The Workstation pattern is used with the full capacity of the drive as well as with a 32GB partition. The request queue is limited to 32 requests in the Workstation pattern.

The results are presented as performance ratings. For the File-Server and Web-Server patterns the performance rating is the average speed of the drive under every load. For the Workstation pattern we use the following formula:

Rating (Workstation) = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16.

Every array is faster than the single disk at request queue depths other than 1. The two-disk RAID1 behaves in a curious way: it is but slightly slower than the leaders at a queue depth of 2 but doesn’t speed up much as the queue gets longer. As a result, it shows lowest results at long request queue depths. The four-disk RAID10 is in the lead until a queue depth of 4 requests, being a little faster than the four-disk RAID0. The latter overtakes the RAID10 at longer queue depths, though. The RAID10 is always ahead of the three-disk RAID0.

The RAID5 arrays are not as fast as the RAID0 arrays with the same amount of disks but are overall good at this type of load. The RAID0 arrays boast good scalability of performance depending on the number of disks. The scalability of RAID5 is somewhat lower.

The performance ratings agree with the diagrams. The performance at short queue depths having a bigger weight, the RAID10 scores more points than the four-disk RAID0. Overall, the standings are the same as with the controllers we have tested before. Take note that the three-disk RAID5 and the two-disk RAID0 and RAID1 have very similar ratings.

The Web-Server pattern having no write requests, the mirrored arrays show their best thanks to their ability to alternate the requests between the disks of a mirror couple. As a result, the four-disk RAID10 wins the test at every request queue depth. The two-disk RAID1 is ahead of the two-disk RAID0. The RAID5 arrays are good at this load type, too. They deliver the same performance as RAID0 with the same amount of disks.

The Web-Server ratings agree with the diagrams and don’t need special comments.

Emulating a user working in multiple applications, this test runs better on RAID0 arrays. The 3Ware 9650SE is not an exception. The RAID10 is quite competitive at a queue depth of 2, but the four-disk RAID0 has no rivals at long queue depths whereas the RAID10 is about as fast as the three-disk RAID0 then. Take note that the four-disk RAID0 is only faster than the three-disk RAID0 at queue depths longer than 2.

The two-disk RAID1 is almost as fast as the two-disk RAID0. RAID5 is not the right array type for this load. These arrays have low speed and low performance scalability.

The standings generally reflect the arrays’ performance at long queue depths although the RAID10 is slightly ahead of the three-disk RAID0 while the RAID1 arrays are slower than the same-size RAID0 arrays.

The reduction of the test zone to 32GB doesn’t lead to serious changes in the graphs: the above-mentioned trends have become more outlined and the speeds of the arrays have got higher.

There are but small changes in the standings of the arrays: the three-disk RAID0 almost catches up with the RAID10, and the four-disk RAID5 overtakes the RAID1.

Performance in FC-Test

For this test two 32GB partitions are created on the virtual disk of the RAID array and formatted in NTFS and then in FAT32. After that a file-set is created of the hard disk. It is then read from the disk, copied within the same partition and then copied into another partition. The time taken to perform these operations is measured and the speed of the array is calculated. The Windows and Programs file-sets consist of a large number of small files whereas the other three patterns (ISO, MP3, and Install) include a few large files each.

This test produces too much data, so we will only discuss the results of the Install, ISO and Programs patterns which illustrate the most characteristic use of the arrays. For better readability the results are colored according to the array type.

FAT32 File System

FAT32 results come first.

As might have been expected, the RAID0 arrays cope best with writing files. A curious fact, they show very good scalability with the ISO and Install patterns but deliver similar results with the small files of the Programs pattern. The RAID5 arrays are fast with the Programs file-set too, but slower than the single disk with the other file-sets.

As for the mirrored arrays, the RAID1 is about as fast as the single disk whereas the RAID10 is similar to the two-disk RAID0.

There are again some problems with reading: no array can get two times as fast as the single disk. The RAID1 matches the speed of the single disk while the other arrays deliver similar performance irrespective of the number of disks in them. Interestingly, the four-disk RAID0 turns to be slower than the three-disk RAID0 with two out of the three file-sets. The controller seems to really have some problems with reading.

The RAID0 arrays are best at copying within the same partition, the four-disk RAID0 again having problems with scalability. The mirrored arrays behave like they did at writing: the RAID1 and the RAID10is as fast as the single disk and the two-disk RAID0, respectively.

The RAID5 arrays are very sensitive to the particular file-set: the larger the files, the faster the arrays. For example, in the ISO pattern they are as fast as the RAID0 arrays with the same amount of disks minus one. When the files get larger, their performance is getting closer to that of the single disk.

Copying into another partition doesn’t affect the standings much. It’s like when copying within the same partition.

NTFS File System

Now we’ll run the test again, this time over NTFS partitions.

It’s overall the same as in FAT32 but with minor changes: the RAID5 arrays do not speed up in the Programs pattern and are always slower than the single disk. The performance of the mirrored arrays has become lower relative to the single disk and RAID0 arrays. The speeds of all the arrays have lowered in every test mode in comparison with FAT32.

The arrays are still rather slow at reading. Comparing them, we can see three differences from the FAT32 results: the RAID10 has become faster relative to the other arrays. The RAID5 arrays show a constant speed irrespective of the number of disks in them. And the most surprising thing, the two-disk RAID0 is the best among the RAID0 arrays!

The standings haven’t changed in comparison with FAT32 although the speeds have become somewhat lower.

Performance in WinBench 99

We use WinBench 99 to record the data-transfer graph for each array:

We’ll compare the data-transfer rates at the beginning and end of the virtual disks:

Interestingly, these speeds are exactly the same as in the Sequential Read test. They do not agree with what we’ve seen in the read tests from FC-Test.

Conclusion

The 3Ware 9650SE-16ML controller is good overall, yet is not free from certain drawbacks. The biggest problem is the low speed of reading files from arrays that have a high theoretical read speed. It is complete nonsense: the controller manages to write files on RAID0 arrays faster than to read the same files from them! We also have some gripes about the scalability of RAID0 arrays as well as about the performance of RAID5. On the other hand, the controller delivers nearly perfect performance in the sequential read and write test and copes well with distributing the load between the disks of mirrored arrays. Thus, we guess the best application for this controller is mirrored arrays of RAID1 and RAID10 types.