<%BANNER[top_768x90]%>

<%BANNER[banner_468x60_h]%>

Promise FastTRAK S150 SX4 SATA-RAID Controller Review

Today we would like to introduce to you one more multi-channel (4 and up) controller from Promise. It proved to be an excellent solution, especially efficient in the benchmarks imitating real-life applications. Anyway, find out the details now in our new article!

by Alexander Yuriev , Nikita Nikolaichev
06/17/2004 | 02:07 PM

Continuing with our study of currently available multi-channel (four and more) Serial ATA RAID controllers, we came across the FastTRAK S150 SX4 card from Promise Technology. The name of this controller suggests its cognation to the FastTRAK S150 TX4 (see our Promise FastTRAK S150 TX4 Controller Review). Well, they do have something in common – they both support four SerialATA-150 devices and this fact explains the similar-sounding model names. As for the difference, the FastTRAK S150 SX4 seems to be a more advanced version because of the RAID5 support it provides, but we will also see some deeper differences on a closer inspection.

<%BANNER[article]%>

So, again, this controller supports up to four SerialATA-150 devices (and arrays of up to 1 terabyte capacity). The types of arrays supported are standard, including 0, 1, 10, 5 and JBOD. The controller’s ability to work in a 32-bit/66MHz PCI 2.2 slot provides a maximum theoretical bandwidth of 266MB/s. The FastTRAK S150 SX4 allows using up to 256MB of cache memory (64MB being the necessary minimum). Moreover, the controller’s PCB carries chips of Ferroelectric RAM for keeping a transactions log and integrated General-Purpose I/O ports for controlling the device. 

FastTRAK S150 SX4 also allows expanding the array’s capacity and changing its level while online. It supports two caching modes (write-back and write-through) and anticipatory reading based on application and data types. Then, it uses batch commands and merges interrupts to minimize their number and optimize their execution.  It also supports load balancing (for mirrored RAID arrays only) and elevator seek (for streamlining commands based on the data location on the disk).

Closer Look

The controller is not small, but nevertheless all electronic components are very compactly arranged on its entire PCB surface. A PDC 20621, a XOR processor, resides in the center. To the left, there is a 168-pin DIMM slot – the memory module is situated in parallel to the controller PCB. Four SerialATA connectors are located along the upper edge of the card.

Judging from the outward appearance, the FastTRAK S150 SX4 has much more in common with the FastTRAK SX4000 card than with the FastTRAK S150 TX4. Well, it is true: the FastTRAK S150 SX4 is actually a FastTRAK SX4000, which has been trained to talk to SATA drives via Marvell converters.

The basic parameters of the controller are listed below:

Specifications

RAID

JBOD, 0, 1, 5 and 10

Serial ATA

4 SATA 150

PCI

33MHz/32bit PCI bus support.
Compatible with PCI 2.2 standard.
Data transfer rate up to 266MB/sec.

Large LBA

Supports arrays with the overall storage capacity up to 1 terabyte.

ACPI

One slot for 168-pin DIMM with ECC support or up to 256MB non-ECC memory (the minimum requirement is 64MB).


Testbed and Methods

The testbed was configured as follows:

We used the following benchmarking software:

We created one partition for the total capacity of the array in WinBench 99. We ran WinBench tests seven times each and took the best result for further analysis.

We used FileServer and WebServer patterns in Intel IOMeter.

These patterns help to test the disk subsystem performance under a workload typical for file- and web-servers.

The Workstation pattern is created by Sergey GReY Romanov, our author, basing on the statistical data about the disk subsystem load as given in the SR Testbed 3 description. The statistical data are gathered for the NTFS5 file system in three operational modes: Office, Hi-End and Boot-up.

This pattern shows how well the controller performs in a typical Windows environment.

Lastly, we checked out the controller’s ability to process sequential variable-size read/write requests and its performance in the Database pattern, which loads the disk subsystem with SQL-like requests.

Our controller had the firmware version 1.1.0.15 and we used the driver from the same pack (version 1.1.0.15). The 3DM Disk Management utility helped us control and synchronize the arrays. We installed the controller into a PCI-X/133MHz slot (although it only supports 32-bit/66MHz PCI at maximum).

WD360GD (Raptor) hard disk drives were installed into the rails of the SC5200 system case and fastened at the bottom.

The FastTRAK S150 SX4 controller can change the lazy write status for the hard disk drives by means of the exclusive PAM software. Thus, unlike with the 3ware 8506-8 Escalade controller (see our 3ware Escalade 8506-8 SATA RAID Controller Review for details), for example, we can definitely state that we enabled lazy write for the hard disk drives during the main test program, changing the driver’s caching mode (write-back or write-through) as needed. We performed all the tests using 256MB of cache memory.


Performance in Intel IOMeter DataBase Pattern

Traditionally, we begin by checking the arrays’ ability to process mixed streams of requests.

This pattern is sending a mixed stream of requests to read and write 8KB random-address data blocks. By changing the ratio of reads to writes, we can find out how good the controller driver is at sorting the requests out.

The following table contains the results for the write-through caching mode:

The following diagrams show the dependence of the data-transfer rate on the reads/writes ratio for different request queue depths. For better reading, I drew two diagrams for two groups of arrays:

Under linear workload, almost all the arrays show similar speeds at the beginning of the graph (Random Read mode) and the RAID1 and RAID10 are a little faster than the others. But as the percentage of write requests grows, the RAID1 and RAID10 arrays perform more like the single drive and the two-disk RAID0 and even worse in the Random Write mode (100% writes). FastTRAK S150 SX4 controller seems to be using an algorithm of handling mirror arrays which is similar to TwinStor technology from 3ware. This technology keeps a request history log and determines which drive in the array would process the current request faster. TwinStor brings about an overall advantage in read speed. The maximum performance growth is 15% in this case (in the Random Read mode). By the way, we can recall that the FastTRAK S150 TX4 didn’t alternate requests between the disks of a mirrored couple (see our Promise FastTRAK S150 TX4 Controller Review for details) – so Promise has definitely made some progress in this respect.

The performance of the JBOD array is going up as there appear more write requests in the queue due to the lazy write algorithms of the hard disk drive itself. The speed of the RAID0 grows according to the number of disks in the array, but this proportion is only felt when the percentage of writes is high. The RAID5 arrays are slowing down as the writes percentage grows – write requests slow them down considerably. At the same time, there is no big difference between the speeds of the three- and four-disk RAID5s. That’s rather strange.


The workload is getting heavier, and we see the arrays perform differently. The “mirrored” arrays (RAID1 and RAID10) are always faster than the single drive and the two-disk RAID0, respectively, save for the Random Write mode. The maximum difference in speed (in the Random Read mode) is 75%, which is close to the results of 3ware’s TwinStor technology. At the same time, their speeds diminish as the writes percentage grows (contrary to what we had under linear workload), because the number of requests is high enough under this workload for the controller’s optimized reading from the mirror to be more efficient than the optimization of write requests by the hard disk drives. The RAID0 arrays show excellent scalability depending on the number of disks per array across all the work modes. The higher workload didn’t affect the RAID5 arrays, however. They stubbornly perform exactly like one another, and we may note that the speed of the four-disk RAID5 array is normal (because it equals the speed of the four-disk RAID0 in the Random Read mode), while the speed of the three-disk RAID5 is abnormally high!

The workload of 256 requests doesn’t affect the ranking. The mirror arrays, RAID1 and RAID10, have twice the speed of the single drive and the two-disk RAID0, respectively, in the Random Read mode. Their advantage decreases at higher writes percentages to come to naught at the Random Write point. The graphs of the RAID0 and JBOD arrays are practically “parallel”. The speeds of the RAID5 arrays are smoothly declining as there appear more writes in the queue and seem to correspond to the speed of the four-disk RAID5.


Now let’s view the results of the same test, but in the write-back mode. First, the table:

And the diagrams:

I’m going to single out discrepancies in the graphs compared to the write-through mode. The numbers will follow.

First of all, the graphs have bends now which are imposed by another optimization, the controller’s lazy write.

The second noticeable difference from the write-through mode is that the RAID5 arrays are speeding up when the writes percentage grows. For the first time we see the RAID5 arrays of three and four disks showing different speeds, although in three modes only. Their acceleration is explained by the fact that write requests are now additionally optimized by the controller’s cache. The mirrors, RAID1 and RAID10, are always faster than the JBOD and the two-disk RAID0, respectively, save for the Random Write mode (100% write requests).


Under a load of 16 simultaneous requests, the RAID0 arrays increase their speed in response to higher writes percentages and provide a nice scalability as concerns the number of disks per array. The mirrored arrays – RAID1 and RAID10 – are again faster than the JBOD and the two-disk RAID0, respectively, and again, save for the Random Write mode. Unlike in the write-through mode, however, their speeds only decrease till 80% writes, but then start to grow. This seems to be the result of the controller optimizing write requests. This is also why the RAID5 arrays lost their speeds just a little at 70% writes and even sped up somewhat at 80% writes. I’d also like to note that the RAID5 arrays showed different speeds in the Random Read mode, at last, but this difference diminished later and vanished completely at 70% writes. That’s again a mysterious phenomenon.

Under a workload of 256 requests, the three-disk RAID0 slows down when there are many writes to be performed. The RAID5 arrays don’t have that sharp curve in their graphs when the writes percentage is high. Otherwise, it’s all the same as under a workload of 16 requests.


Now, let’s compare the speeds of all the RAID arrays in different caching modes. We fill the table with ratios of the controller’s speed in the WB mode to its speed in the WT mode. A higher number means higher efficiency of WB caching in this mode. If the number is below 1 (marked with red), WB caching is harmful. If the number is above 1 (marked with blue), WB caching brings a performance gain. If you see “1.0”, then WB and WT caching modes are equally useful.

Overall, it seems like WB caching is good for all the arrays, except the RAID5. The details follow:

WB caching has a positive impact on the JBOD, two- and four-disk RAID0s, RAID1 and RAID10. This influence gets stronger when there are more write requests in the queue and gets weaker when the workload is small. The maximum speed gain of the above-mentioned arrays is 32%, 25%, 18%, 31% and 29%, respectively. We see black numbers mostly in the Random Read mode, where there are no writes and caching cannot affect anything. WB caching inhibits the performance of the RAID1 and RAID10 arrays under a workload of 256 requests, but the loss is only 1% and such workloads are rare in practice. There is also one more mode for the RAID10 array (60% writes under linear workload) when WB caching is injurious, but everywhere else it brings certain benefits.

The controller’s lazy writing affects positively the performance of the four-disk RAID5 under small and average loads. The maximum speed growth is 183%! However, under higher workloads, WB caching becomes harmful, although the speed loss is only 6%.

For the three-disk RAID5, the border between positive and negative effects of WB caching goes along the line between the Random Read mode under linear load and Random Write mode under the maximum load. The maximum speed gain is 190%; the maximum speed loss is 12%. It is all natural with the maximum gain. All arrays benefit from the maximum efficiency of the controller’s lazy write at the Random Write point, but the maximum loss falls on the Random Read mode and that’s strange: there are no write requests here and WB caching cannot affect anything. This is another “peculiarity” of this RAID5.

For the three-disk RAID0, this border between positive and negative effects of WB caching goes along the line between the Random Read mode under the maximum load and Random Write mode under linear load. Thus, the status of the controller’s lazy write doesn’t affect the speed in the Random Read mode, but the efficiency of WB caching declines at high writes percentages.

As you see, WB caching is not always useful. It usually brings benefits at small loads, but at big loads it is either useless or harmful. That’s why you should decide whether to use it or not depending on the type of the array and the expected load on the controller.


Performance in Intel IOMeter Sequential Read & Write Patterns

The array receives a stream of read/write requests with a request queue depth of 4. Every minute the size of the data block changes, so we can see the dependence of the linear read/write speed on the size of the data block. The results are listed in the following table:

For better reading, we divide the arrays into two groups in the diagrams. The graphs represent the dependence of the sequential read speed on the data chunk size:

The arrays reach their maximum speeds at different sizes of the data block when the controller can split a big request into several small ones for the HDDs of the array to process them in parallel. There is a clear dependence of the speed on the number of the disks only with the two- and three-disk RAID0s. The four-disk RAID0 doesn’t follow the suit.

When discussing mixed streams of requests, we supposed that the algorithms of reading from a mirrored couple use alternating requests, but we see now that this algorithm affects negatively the speed of the RAID1 and RAID10 during sequential reading. The read speed of the RAID1 array is always smaller than that of the single driver, while the RAID10 lags behind the two-disk RAID0.

The graphs of the RAID5 arrays coincide like in the DataBase pattern.

Now we enable WB caching and see what happens (in fact, caching shouldn’t bring in any changes, because we have no write requests in this pattern):

And really, none of the arrays has changed its speeds compared to the write-through mode. The only exception is the three-disk RAID5, which maximum speed is now even smaller than that of the RAID10.


Now, let’s do some sequential writing:

And the graphs:

All the arrays reach their maximum speeds at big chunks only, when the controller can split the request up for the drives of the array to process it in parallel. But the RAID0 arrays all reach the same maximum, while the four-disk RAID0 is the fastest at 64 and 128KB requests.

The graph of the RAID1 fully repeats the graph of the single drive, while the RAID10 draws a line which is very similar to the graph of the two-disk RAID0. The RAID5 arrays are the slowest, save for the two biggest request sizes where they reach the level of the RAID10.

Now we enable WB caching and watch for any changes. Now we have 100% writes and this should cause some substantial changes:

The graphs of the RAID1 and JBOD are the same, because RAID1 works like a single drive when there are no read requests. But it is a mystery why the other graphs coincide? Judging by the “smoothness” of the graphs we may suppose that the speed of all the arrays is limited by the speed of the controller central chip (or rather by the data transfer rate between the central chip and the cache memory).

The next patterns simulate real-life loads, let’s see how the controller copes with them.


Performance in Intel IOMeter File Server & Web Server Petterns

These patterns simulate the operation of the disk subsystem of a typical file- or Web-server. First goes the File Server pattern, in write-through and write-back modes:

The following graphs show the dependence of the performance on the request queue depth:

The RAID0 arrays are highly scalable. The RAID1 and RAID10 are always faster than the single drive and the two-disk RAID0, respectively. The speeds of the RAID5 arrays are the same, like in the previous tests.

On enabling WB caching, nearly all the arrays speed up. The speeds of the three- and four-disk RAID5s are different now, unlike in the write-through mode.

We calculate the performance rating for each array by averaging the controller speed under each workload. This allows comparing the arrays:

The four-disk RAID0 took the first place in performance, but only because the File Server pattern has 20% of writes, which RAID0 performs faster than RAID10. RAID1 fell behind the two-disk RAID0 in the write-through mode, but outperformed it in the write-back mode. RAID5 arrays of two and three disks swap their places in different caching modes.


The next pattern simulates the disk subsystem of a typical Web-server. The slogan of this pattern sounds like: no write requests! First, the tables:

Here are the graphs:

The lack of write requests results in no differences between the WT and WB caching modes. As usual, there is only one exception, the three-disk RAID5, which slows down in the write-back mode. The mirror arrays and the RAID5 arrays are extremely fast in this pattern.

We calculate the performance rating for the WebServer pattern like we do it for the FileServer one:

As you see, all the arrays show the same speeds in the two caching modes (save for the three-disk RAID5). The arrays that use mirroring as well as the RAID5s are faster than the RAID0 arrays of the same number of disks.


Performance in Intel IOMeter Workstation Pattern

The Workstation pattern imitates intensive work of a user in various applications in the NTFS5 file system. The results for the write-through and write-back modes are as follows:

The graphs show the dependence of the speed on the request queue depth:

The RAID0 arrays reach their maximum speeds at small queue depths, but only two- and three-disk arrays are scalable from the number of disks per array. The maximum speed of the four-disk RAID0 is just little higher than that of the three-disk RAID0. The speed of the RAID10 is similar and sometimes higher than the speed of the four-disk RAID0. The RAID1 is always faster than the single drive, even considering the performance loss at 2+ request queue depths. The speeds of the RAID5 arrays are the same in the write-through mode – that’s a kind of tradition for the FastTRAK S150 SX4 controller.

After enabling the lazy write, we make the four-disk RAID0 perform faster, but it still doesn’t fit into the scale of the two- and three-disk RAID0s. The four-disk RAID5 is slower than the three-disk RAID5 at requests queue depths of 4 and smaller, but faster than that at bigger depths.

We calculate the performance rating for the Workstation pattern by the following formula:

Performance Rating = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32

The four-disk RAID5 took the last place, because of the high percentage of writes and a high probability of short requests. The three-disk RAID5 was faster than the single drive due to the relatively high speed on short requests. For the same reason, RAID1 and RAID10 outperformed RAID0s of the same number of disks.


Performance in WinBench 99

WinBench is going to be our last test today. This benchmarking set has been always helping us in evaluating the disk subsystem performance in desktop applications in NTFS and FAT32 file systems. Test results for the NTFS file system are listed first, in the write-through and write-back modes:

The following table compares the arrays in two integral subtests: Business Disk Winmark and High-End Disk Winmark:

The overall picture for the two modes looks like that: RAID0 arrays took the first three places, while the RAID5s are the slowest. All the arrays are faster during WB caching, although not too much faster, than in the write-through mode. However, the three-disk RAID5 and the two-disk RAID0 are faster in the write-back mode than the four-disk RAID5 and the three-disk RAID0, respectively. Note also that the single drive is faster than the RAID1 in this mode.


Now, let’s see what we have in FAT32, in WB and WT modes:

The arrays have the following speeds in Business Disk Winmark and High-End Disk Winmark:

The arrays ranked up just like they did in NTFS, but the speeds grew up significantly. When we change the status of lazy writing, the arrays remain on their respective places.


The linear read speeds are the same for NTFS and FAT32, so we will show you just two diagrams for the both file systems:

The linear read speed is supposed not to depend on the status of lazy writing, as there are no write requests here. However, the RAID5 does show its dependence on the WB caching. The speed grows in the beginning of the array and drops down in the end. That’s a curious array!

The graphs of linear reading are given by the following links:

Conclusion

Promise FastTRAK S150 SX4 controller we have reviewed today is a good piece of hardware. It is especially good in tests that simulate real-life applications. Our only question regarding it concerns the results of the three-disk RAID5: its performance was higher than expected with WT caching (or maybe we tested a three-disk RAID5 once again instead of a four-disk configuration?).

The low performance with sequential requests and the negligible influence of a large cache (256MB) are topics that only specialists take interest in. We prepare an independent article to cover these issues where we will examine the speed of the Promise FastTRAK S150 SX4 with 64, 128 and 256MB modules for the cache memory.

Currently, the manufacturer’s website offers to download drivers for Windows 2003/XP/2000, SuSE Linux, Red Hat 8x and 9x Linux. The Russian-language site site (www.promise.ru) claims there is a driver for FreeBSD 4.8 Beta, but www.promise.com only says “FreeBSD 4.7 coming soon”.

<%BANNER[banner_468x60_f]%>