3ware 9500S-8 SATA RAID Controller Review

We resume testing multi-channel SATA-RAID controller cards. Today we are going to introduce to you a not very new, but certainly revolutionary solution from 3ware – 3ware 9500S-8 controller. Find out what is so revolutionary about it from our detailed review!

by Alexander Yuriev
08/06/2006 | 09:25 AM

We resume testing multi-channel SATA-RAID controller cards. Today we are going to introduce to you a not very new, but certainly revolutionary solution from 3ware – 3ware 9500S-8 controller. It truly deserves the “revolutionary” title. It is for the first time in many years that 3ware used a replaceable SDRAM module with ECC support instead of a few megabytes of SRAM. The controller cache-memory has finally become comparable with the summary of cache buffer memory of the hard disk drives that can be attached to the controller.

The 9000-series of 3ware controllers, just like the previous product line-ups, consists of solutions supporting 4, 8 and 12 HDDs respectively. As usual we decided to take the medium model for our tests.

The increase in the controller’s cache memory size should definitely have its positive effect on the controller performance under high workloads. Moreover, the manufacturer claims that 3ware 9500S-8 can demonstrate up to 400MB/s read speed and up to 100MB/s write speed in RAID5 array. Of course, we couldn’t help checking out these results on our own.

Closer Look

As we have already mentioned, 3ware 9500S-8 controller can support up to 8 Serial ATA.150 devices. It allows building the arrays of the following types: 0, 1, 5, 10, 50 and JBOD. It features 64bit PCI bus working at 66MHz to connect to the system. As for the cache memory, the controller is equipped with 128MB SDRAM with ECC support by default, however, it can be upgraded to 256MB.

This is our today’s hero:

Like all other eight-channel 3ware controllers, 3ware 9500S-8 boasts the traditional compact (Half-length) design, when the SATA connectors are placed in two layers. The front side of the controller card is almost completely covered with all sorts of microchips starting with the actual controller chip and finishing with the memory and BIOS chips. However, the developers did a great job on the overall layout of the electronic components and even moved some of them on the reverse side of the PCB, so there appears enough room for additional four SATA connectors and corresponding chips of the 12-port model.

Here are the major controller specifications for your reference:

As usual, the 3ware controller left a very favorable impression. Let’s find out if it remains the same after the test session.


Testbed and Methods

Our testbed was configured as follows:

We tested the controller in Intel IOMeter 2003.02.15. We used FileServer and WebServer patterns in our Intel IOMeter tests.

These patterns are intended for measuring the performance of the disk subsystem under workload typical of file- and web- servers.

We also use the WorkStation pattern, created by Sergey Romanov (a.k.a. GReY). It is based on the statistical data about the disk subsystem workload as given in the StorageReview Testbed 3 description. The statistical data for the NTFS5 file system are gathered in three operational modes: Office, Hi-End and Boot-up.

This pattern shows how well the controller performs in a typical Windows environment.

Lastly, we checked out the controller’s ability to process sequential read/write requests of variable size and its performance in the DataBase pattern, which loads the disk subsystem with SQL-like requests.

Our controller was tested with the firmware version 9.02 and the drivers from 2.4.1.40 set. To control the array status and synchronize the arrays we resorted to 3DM Disk Management tool.

The controller was installed into the PCI-X/133MHz slot (although it only supports PCI32 66MHz).

WD740GD (Raptor2) hard disk drives were installed into the default baskets of the SC5200 case and were fastened with four screws at the bottom of the drive.

During our test session we enabled the lazy writing for our hard drives. We changed the controller lazy write options (WriteBack and WriteThrough) as needed.


Performance in Intel IOMeter

DataBase Pattern

This pattern sends a stream of requests to read and write 8KB random-address data blocks. By changing the ratio of reads to writes we can check how well the controller’s driver can sort them out. The results of the controller in WriteBack mode are presented below, however, for your convenience we have split them in three groups:

Let’s take a look at the graphs which will show the dependence of the controller’s speed on the percentage of write requests for queues of different depths. All the graphs have been arranged in groups depending on the queue depth:

All the arrays perform very close to one another under linear workload (RandomRead mode in the beginning of the graph), because the array algorithms have nothing to optimize.

As the probability of write increases, the performance of a single HDD grows up gradually: the HDD lazy write algorithms kick in. The performance of RAID0 array increases together with the growing number of drives in the array, however, this growth is proportional only in case of the highest write probability.

The performance of RAID5 arrays should drop down as the probability of writes increases, however, we see a completely different picture. The behavior of RAID5 array graphs is very similar to what we saw with RAID0 and is most likely to be determined by the dominating lazy write algorithms of the HDDs. Nevertheless, when there are few drives in the array and the probability of writes is relatively high, the RAID5 arrays speed decreases as the writes probability grows up.

RAID1 mirrored array disappointed us: it appeared slower than the single HDD in almost all modes. It was a pretty unexpected result, because all 3ware controllers use their brand name Twinstor technology to speed up reading from mirrored arrays. This technology uses the previous requests history to determine the HDD that will process the current request faster, thus speeding up the entire array. And as we have seen from our previous experiences with 3ware controllers, Twinstor technology has always had a positive effect. Hopefully, this algorithm will start working properly under higher loads.

The performance graphs of another mirrored array – RAID10 – just like the graphs for RAID0 and RAID5, show mostly the influence of the HDD lazy writing algorithms.


Now let’s check out the performance of our arrays with higher workload:

The increase in the workload doesn’t really change the performance picture. The graphs, however, are shaped up by the arrays algorithms rather than HDD algorithms, which was the case under linear workload. The single HDD graphs remained practically the same, because it reflects only the lazy write algorithms of the HDD.

RAID0 array graphs look similar to the single HDD graph and are proportional to the number of hard drives in the array. However, unlike linear workload, we can seethe proportion in RandomRead mode already and as the number of write requests increases it becomes more evident. Only the 7-HDD graphs doesn’t fit into this picture, because we can see significant performance drops at 0% of writes and 60% of writes.

RAID1 and RAID10 mirrored arrays speed up significantly in RandomRead mode, because the absence of write requests and sufficient amount of read requests with random address let Twinstor technology show its real best. As the share of write requests increases, Twinstor technology loses its efficiency, because it can only optimize read requests. On the other hand, the lazy write algorithms of the hard disk drives start showing more with the increase of write requests share, so the performance of RAID10 array in RandomWrite mode gets even higher. All in all, RAID1 array is always faster than a single HDD when working with 16 request queue, and the RAID10 array of 2n HDDs is always faster than RAID0 array of n HDDs in all modes.

RAID5 arrays process read requests as efficiently as RAID0 arrays. However, as the number of writes starts growing, RAID5 arrays slow down significantly. RAID5 performance is directly proportional to the number of hard drives in the array. Only the RAID5 array of 3HDD doesn’t belong to the overall picture.


The workload increase to 256 requests per queue doesn’t affect the overall situation anymore. RAID0 arrays graphs are almost the same as the single HDD graph in proportion to the number of HDDs in the array. It indicates that StorSwitch copes just fine with sorting and transferring the requests to the corresponding HDD. Here we can disregard small performance drops on the 6 and 7 HDD graphs :) RAID5 and RAID10 arrays are fast and their performance is proportional to the number of hard disk drives in the array. Just like in case of 16 requests, RAID5 array of 3 HDDs is kind of standing out here.


We tested only 4-HDD arrays in WriteThrough mode. The results are summed up in the table below:

To compare the performance of 4-HDD RAID arrays with different caching algorithms we will put together a traditional table. The number in each table cell is the ratio of the array speed in WriteBack mode to its speed in WriteThrough mode. If the result is less than 1 (these numbers are usually highlighted red, but this time there are none), then WB caching is not efficient here. If the result is bigger than 1(these numbers are highlighted blue), then WB-caching has had its positive effect on the performance. If the result equals 1.0, both – WB and WT caching are equally efficient in this work mode.

As we have expected, disabling WB-caching in the BIOS reduces the array performance in all work modes, however, it is for the first time that our result table doesn’t have a single red number in it. It means that WriteBack mode of 3ware 9500S-8 controller has only positive effect on its performance. However, the performance of RAID10 array with different controller cache settings in RandomRead mode is pretty strange: with 0% writes the performance of this array type should not depend on the caching algorithms.

Now let’s take a look at the graphs showing the arrays performance with different caching algorithms involved. For your convenience we will build the graphs for each array type in WriteThrough and WriteBack modes for queue of 1, 16 and 256 requests:

With the disabled controller cache the RAID0 array slows down a lot even when there are few write requests present. As the share of writes increases, the performance difference gets bigger, too reaching 7.5x. The more requests are in the queue, the lower is the performance difference between the WriteBack and WriteThrough modes. However, the advantages of WB-caching show themselves in all work modes except RandomRead, because there are no write requests and hence lazy writing algorithms do not affect the performance.


The situation is quite similar when we disable the controller cache for RAID5 array. The biggest performance drop of 236% with enabled WT-caching takes place under linear workload. As the number of write requests increases in the queue, the performance difference between the operation with different caching types reduces. However, just as in RAID0 array, WB-caching is highly advantageous for all work modes with write requests.

The influence of WT-caching on the RAID10 array performance is negative in all work modes with available write requests. The maximum performance drop was 353%. However, as we have already mentioned above, disabling the controller cache leads to a performance drop in RandomRead mode when there are no write requests in the queue, which is bad.

So, we can conclude that enabled WT-caching reduces the performance significantly. Therefore, it is really hard to say what caching type is preferable for most efficient work. The reliability is higher in WriteThrough mode, however, in WriteBack mode the performance boost is definitely more tempting :)


Sequential Read and Write Patterns

Now let’s look into sequential reading and writing.

As you already know, Intel IOMeter is sending a stream of read/write requests to the array with request queue depth = 4. Every minute the size of the data block changes, so by the time the test is over, we can see the dependence of the linear read/write speed on the data block size. The results (the dependence of the controller data transfer rate on the size of the data block) are summed up in the tables below. We will first look at WriteBack mode:

Now let’s check the graphs showing the dependence of the controller read speed on the size of the data block:

Looks like there is a conflict between the controller read ahead algorithms and our workload…

Let’s compare the results obtained for the sequential reading on 4-HDD arrays in WriteBack and WriteThrough modes:


Now it’s high time for the sequential writing. First come the complete results tables:

The next batch of tests emulates the work of real devices.


FileServer and WebServer Patterns

We will start with a pattern emulating the work of the fileserver disk subsystem. First we will discuss the results in WriteBack mode:

These are the graphs showing the dependence of the controller performance on the request queue depth:

RAID0, RAID5 and RAID10 arrays prove beautifully scalable depending on the number of hard disk drives in the array even when the queue is relatively small. RAID1 and RAID 10 of 2n HDDs are always faster than a single hard drive and RAID0 of n HDDs respectively. Since FileServer pattern contains only 20% of writes, Twinstor algorithm works perfectly well.


We are going to compare the arrays by calculating their performance ratings. Since each load has the same probability, the performance rating is the averaged speed of the array under all types of load:

As we have expected, the single hard disk drive is the slowest, while the RAID0 array of 8 hard disk drives is the fastest. We were pleasantly surprised with the RAID10 performance: each RAID10 array of 2n HDDs was always faster than the RAID0 array of respective (n+1) HDDs.

Now let’s compare the results for WriteBack and WriteThrough modes:

Even with only 20% of write requests in the queue, disabling the controller cache slows it down significantly.

Depending on the type of array, the performance in WriteThrough mode is 1.5-2 times lower. So, you have to decide between the reliability and speed…


Now let’s check out the WebServer pattern. As always, WriteBack results come first:

The array graphs remained almost the same as in FileServer patterns. We can only point out a performance drop on the RAID0 graph of 6 HDDs at 64 requests queue depth. However, since there are no write requests in WebServer pattern, RAID5 and RAID10 became much faster.


Now let’s once again resort to our rating system calculated according to the same formula as in FileServer pattern:

Just like in the previous pattern, the single hard disk drive is the slowest, while the situation in the leading group has become slightly different. 100% share of reads makes RAID10 arrays faster than RAID1 arrays of the same number of hard drives.

Now let’s look at the results of four-hard-disk arrays obtained with different types of caching algorithms enabled:

In fact, the type of controller caching enabled may not affect the performance at all if there are no write requests in the queue. This is what we see by RAID5 and RAID0 arrays. However, RAID10 slows down by half once the controller cache is disabled. We have already seen exactly the same strange phenomenon in the DataBase pattern. Therefore, we dare suppose that in RAID10 array the controller doesn’t switch between WriteBack and WriteThrough modes correctly.


WorkStation Pattern

The last pattern we are going to discuss today is the WorkStation pattern that emulates the intensive user work in different applications on NTFS5 file system.

First come the results for WriteBack mode:

The situation is pretty common for RAID0 arrays: the more drives there are, the faster the array is. RAID5 arrays show similar results, however the performance of 5-HDD array suddenly drops down at 2 requests queue. The situation with RAID10 arrays is even worse: the arrays of 6 and 8 hard disk drives are almost the same. RAID1 is faster than a single HDD in all work modes, however when the workload increases, it starts slowing down. The performance of the single HDD is, on the contrary, growing under high work load and dropping under small load.


We compare the different RAID arrays by calculating their performance rating using the following formula:

Performance = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32.

Once again the single hard disk drive is the slowest of all. RAID0 arrays got ranked according to the number of hard drives in them, and so did RAID5 arrays. However, RAID5 made of 8 HDDs is slower than RAID0 of 3 HDDs. The performance of RAID10 arrays of 6 and 4 HDDs are pretty high, while the 8 HDD array could have performed better.

Let’s see how the WriteBack caching algorithm of the controller affects its performance with 4-HDD arrays:

Switching to WriteThrough mode doesn’t change the overall situation, although the general performance rate drops down by 45% on average.


In conclusion I would like to offer you the linear read graphs for each array tested:

Conclusion

The increase in the cache memory size allowed 3ware 9500S-8 controller to improve its performance significantly, especially in the work modes with high share of write requests. I would also like to specifically stress the higher controller performance with RAID5 arrays thanks to a more powerful XOR processor working at 150MHz frequency.

Very soon we are going to introduce to you a newer controller from 3ware, which is already available in the market, the 9550SX. But this will be another story already…