by Alexander Yuriev , Nikita Nikolaichev
12/23/2003 | 11:28 AM
This controller seems to be a close relative of the solution, which we have already reviewed before: Promise FastTRAK TX4000 IDE RAID Controller. That is why you may get the impression that a lot of the things we are going to talk about today are already familiar to you. Although the WD Raptor drives, which we are using in every review, add a bit of spice to each SerialATA RAID controller coverage.
I believe that many of you hope secretly that SATA RAID controllers will manage to defeat the SCSI ones one day :)
So, what will Promise surprise us with today? Let’s find it out…
As you may have guessed from the name, FastTRAK S150 TX4 controller can support up to 4 SerialATA/150 devices. The list of supported RAID arrays is pretty standard: RAID 0, RAID 1, RAID 0+1, JBOD. Thanks to the controller’s ability to work in 66MHz PCI 2.3 slot, its maximum theoretical bandwidth makes 266MB/s. FastBuild BIOS optimized the drives performance depending on their type, supports system bootup from the array of any level, ensures compatibility with the BBS, boasts very convenient menu system for array creation and configuring, supports simultaneous multiple-array configurations. It is also possible to update the BIOS via software. Promise Array Management utility (PAM) allows controlling the status of the array, hard disk drives and the controller either locally or remotely (via TCP/IP). Also the utility reports the errors, allows restoring failed devices, synchronizes the data in fault-tolerant arrays, reports the errors or arrays status by e-mail, controls S.M.A.R.T.-status of the active hard disk drives.
This is what this monster looks like:
As we can see, the controller is of usual light-green color with dark-green wiring on the PCB, which is noticeable through the lacquer surface. The Promise PDC20319 controller chip is covered with a heatsink. Two connectors for LED-indicators are located aside. Four SerialATA connectors and the BIOS chip are evenly spread along the top edge of the PCB and make the card look pretty light, with a lot of free space. At the same time, the controller card is quite small (it is a low-profile solution) and if we compare the size of our Promise FastTRAK S150 TX4 with that of the FastTRAK TX4000, we will have no doubt that SATA is another step on the way to freeing more space inside the PC case.
Here are the device specifications for your reference:
Promise FastTRAK S150 TX4 Specifications
JBOD, 0, 1, 0+1
4 SATA 150
Supports PCI bus: 32bit 33/66MHz PCI.
48bit Large LBA
(Logical Block Addressing) – supports devices with over 137GB storage capacity each.
S1, S3 and S4
Our testbed was configured as follows:
We used the following software:
For WinBench99 tests the hard drive was formatted as one partition with the default cluster size. The WinBench tests were run seven times each; the best result was then taken for further analysis.
To compare the hard disk drives performance in Intel IOMeter we used the FileServer and WebServer patterns:
These patterns are intended to measure the disk subsystem performance under workloads typical of file- and web-servers.
Our colleague, Sergey Romanov aka GreY, developed a WorkStation pattern for Intel IOMeter basing on the StorageReveiw's study of the disk subsystem workload in ordinary Windows applications. The pattern was based on the average IPEAK statistics StorageReview provided for Office, High-End and Bootup work modes in NTFS5 file system and mentioned in Testbed3 description.
The pattern serves to determine the attractiveness of the HDDs for an ordinary Windows user.
Well, and in the end we checked the ability of the drives to work with sequential write and read requests of variable size, and tested the drive’s performance in DataBase pattern, which imitates the work of the disk subsystem with SQL-like requests.
The controller featured BIOS version 1.00.0.37. We used the driver version 1.00.0.37. To control the arrays status and to synchronize the arrays with one another we used a special PAM utility (Promise Array Management) version 22.214.171.124.
The controller was installed into the PCI-X/133MHz slot (even though it supports only PCI32 33/66MHz). The WD360GD Raptor hard disk drives were installed into the default chassis of SC5200 case and were fastened at the bottom with four screws.
During the major test session we enabled lazy writing for all drives. The driver requests caching modes (WriteBack and WriteThrough) were changed on the fly according to the situation.
As usual we will start the discussion of our controller performance with the pattern producing the biggest bunch of data.
As you remember, in this pattern we test how fast the controller can process a mixed requests stream including reads and writes of 8KB data blocks with random address. Changing the reads-to-writes ratio we can figure out how well the controller drive can sort the reads from writes.
As always, a details performance chart in WriteThrough mode comes first:
Now have a look at the graphs:
Note that under linear workload (one outgoing request) the performance of RAID 1 array of two hard disk drives is almost identical to that of JBOD, while RAID 01 array runs as fast as RAID 0 or two HDDs.
It looks as if the controller in Write Through mode “didn’t slow down the requests processing” but distributed them among the drives immediately, that is why read requests interleaving between the drives of the mirrored pair simply didn’t work at all.
At the same time, as the writes share increases, the major contribution to the array performance is made not by the controller but by the WD360GD hard drive. To be more exact, by the lazy write algorithms of the Raptors…
During reading (when there are no writes at all), all arrays demonstrate very similar performance, i.e. their speed actually depends on the access time value of the WD Raptor drives.
Workload increase changes the performance picture. The mirror (RAID 1 array) is always faster than a single hard disk drive even in RandomWrite (100% writes). Since we have been obtaining the same results throughout all the experiments, we wouldn’t consider the performance difference in this case a measuring error.
RAID 01 array is faster than RAID 0 of two HDDs everywhere except RandomWrite. In case of low write requests share (less than 40%), it even manages to outperform RAID 0 array of four drives. Theoretically, RAID 0 is the fastest array type and RAID 01 should be slower during reads. However, there is a special rule according to which the requests interleaving for RAID 0 array takes place: if the request falls into the address space of a given hard disk drive, then this HDD is responsible for processing it. But in case of random requests it can happen that two or more requests in a row will be sent to one and the same drive. In this case, this HDD will be loaded with work while the other drives will be idling. As a result the array performance will be limited by the single HDD performance. For an array including mirrored pairs the reads (and maybe even writes) inside the pair will be strictly shared between the two drives, i.e. the drives will process these requests in turns independent of the data block size (in fact, the HDDs can take turns in a bit “smarter” way). The HDDs are loaded more evenly this way, which leads to the increase in the average array performance.
Having taken a look at the graphs for 256 requests workload I was really pleased, frankly speaking: here they are, the perfect performance rates! The graphs for RAID 0 array and a single hard disk drive are nearly parallel. The mirrored RAID 1 array is twice as fast as a single drive during RandomRead and then little by little slows down until the graphs merge at RandomWrite. RAID 01 and RAID 0 arrays of two drives behave just the same way.
And now let’s have a look at another big table: the controller performance in WriteBack mode:
Now the graphs come:
Even under linear workload RAID 1 array is faster than a single drive, and RAID 01 array outperforms RAID 0 of two drives. As we remember, in WT mode we didn’t see anything like that. It means that WriteBack mode (WB, write requests caching performed by the driver) allows increasing the performance of mirrored arrays even under linear workload.
In case of 16 requests workload the advantages of mirrored arrays (RAID 1 and RAID 01) over the JBOD and RAID 0 of two drives respectively appear even more evident. RAID 0 arrays run pretty slow if the share of writes is low enough, just like in case of WriteThrough. However, in this case the performance drop is much greater. Look how aliased the graph for RAID 0 is! You can notice that for arrays of two, three or four drives the performance drop disappears when we hit 20%, 30% and 40% writes share respectively. By the way, as you remember, we saw this stepping effect in WT mode as well, that is why it is not the WB algorithms that produced it here. Since we have never seen an effect like that when testing any other SerialATA RAID controllers, I dare suppose that it is Promise controller drivers that tell here.
In case of 256 requests queue depth the situation doesn’t look any different from what we observed in WriteThrough mode.
And now let us compare the performance of all RAID arrays with different caching algorithms involved. We will just make one more table, where each cell contains the coefficient obtained as controller performance in WB mode divided by controller performance in WT mode. This way, this coefficient will reflect the efficiency of WB-caching in each particular situation. If the coefficient is smaller than 1 (these results are highlighted in red), it means WB-caching doesn’t make sense here. And if the coefficient is bigger than 1 (these results are given in blue), then WB-caching has a positive effect on the performance in this case.
If you see “1.0”, then WT and WB caching are equally efficient in this mode.
Now I would like to make a few comments about these results. In all arrays driver WB-caching resulted into performance drop in RandomRead mode and performance increase in RandomWrite mode, and slowed down the controller performance in case of 256 requests queue depth.
In case of a single hard disk drive, the WB-caching efficiency increases as the share of write requests grows up, and decreases, as the share of writes drops down. That is why the influence of WB-caching on JBOD performance should be considered from various viewpoints. On the one hand, the performance grows and decreases. On the other hand, the performance drop never gets beyond 3%, while the performance increase sometimes hits 20%.
For all other arrays enabled WB-caching reduces the performance mostly in RandomRead mode when the requests queue depth is 256 requests. The maximum performance drop caused by enabled WB-caching notched 5%, and the maximum performance improvement reached 25%. If you have been checking the table attentively enough, you may have noticed red numbers under linear workload, as well as in case of 64 requests. However, in these cases the performance drops not more than by 2%.
Under linear workload with even share of reads and writes RAID 01 array slows down by about 10%. This mode seems to be the hardest for the driver that is why this spot falls out of the overall picture. It should be a really tough task to make the decision in case of a complex array, especially with even shares of read and write requests.
So, the arrays working with enabled WB-caching in mixed modes with dominating reads always slow down in RandomRead and never slow down in RandomWrite.
As we know from our previous reviews of Promise controllers, enabled WB-caching slows down controller response a little bit during requests processing. The reason of this slow-down is evident for writing operations, because the controller driver spends processor time to think about the most optimal caching strategy (searching through the cached writes trying to find out if there are any requests to be processed together with the newly arriving ones, or arranging the cached requests in the most optimal way for further processing). However, I cannot find a reasonable explanation for the slow-down during reads processing…
Here we see the performance drop in RandomRead mode. As soon as write requests appear in the queue, the performance starts growing up. Moreover, unlike Promise FastTRAK TX4000, the positive effect brought by WB-caching grows smaller as the queue depth increases, i.e. when the controller has something to choose from.
In the general case, when the controller processes writes, WB-caching can be considered highly positive. 256-request queues are a very rare thing in real life that is why enabled WB-caching is very unlikely to harm the system performance. I can think of only one example right now when you’d better not enable the WB-caching: if you have a web-server, which receives almost no write requests at all.
Now let’s see how well the controller will cope with sequential reading and writing. And of course, it is also very interesting to find out if the caching algorithms (WB/WT) will affect the performance in this case, too.
With the help of Intel IOMeter program we sent a stream of reads/writes with queue depth equal to 4. Once per minute the data block size changes. As a result, we get the dependence of the linear read and write speeds on the data block size in the end of the test session. The obtained results are summed up in the tables below:
Unlike Promise FastTRAK TX4000 controller, we see very big difference between the graphs for WriteBack and WriteThrough that is why I suggest considering them separately. For a better picture let’s split the tested RAID arrays into two smaller groups.
As we remember, some manufacturers force requests interleaving between the two drives of the mirrored pair and do it even during linear reading. This way, RAID 1 array appears very similar to RAID 0 array and the read speed from the array can theoretically double! However, the tests show that Promise FastTRAK S150 TX4 controller uses this algorithm only in case of RandomRead (just like all other controllers from Promise, we reviewed).
The read speed from RAID 1 array is almost always lower than the read speed from a single hard disk drive, which is a very upsetting thing, I should say.
RAID 01 array is always behind RAID 0 of two drives and falls even behind JBOD when the data blocks are 2KB-8KB big. Besides, the graphs for RAID 01 and RAID 1 show a few performance drops when we work with 32KB and 256KB data blocks respectively.
Now let’s check the situation with enabled WB-caching:
At first sight the graphs look a little worse than in case of WriteThrough-caching: they are pretty curved and not that nice and smooth any more. However, if we consider absolute values, then the performance will be higher during small data blocks processing, and at least not lower during larger data blocks processing. Even though each graph has a spot or two where the performance drops down compared with what we saw in case of WT, we can still conclude that WB-caching has overall positive influence on the performance of RAID 0 and JBOD during linear reading of small data blocks, and hardly has any influence on the large data blocks read speed at all.
The use of WB-caching increases the performance of mirrored arrays when the processed data blocks are smaller than 8KB. If we are reading large data blocks then WB-caching doesn’t affect the read speed at all.
As a result, a single HDD and RAID 1 array demonstrate similar read speed for smaller than 8KB data blocks. As the data blocks grow bigger, the HDD speed starts to act as a limiting factor, so that the influence of WB-caching disappears.
Now let’s have a look at sequential writing:
Well, and here are the first problems. RAID 0 array of two hard disk drives performs almost twice as fast as a single HDD, which is quite acceptable. However, as we come to a three-disk RAID 0 array, its performance appears almost identical to that of a two-disk one. And the 4-HDD RAID 0 array even falls a little behind the others when we write data blocks bigger than 16KB. As you remember, we saw something of the kind during the Intel SRCS14L controller test session (see our Intel SRCS14L Four-Channel SerialATA RAID Controller Review). Although, it was the insufficient cache that was the one to blame then, while in case of our today’s FastTRAK S150 TX4 controller (it features a software cache) we should probably blame the driver.
RAID 1 array is always behind a single hard disk drive, and RAID 01 array working with 8KB data blocks starts falling behind all the other arrays, which is actually not at all surprising, since it has to do twice as much work.
Now let’s check how enabled WB-caching affects the performance of our system:
Just like in case of RandomRead enabled WB-caching allows speeding up small data blocks processing quite significantly.
The write speed reaches its maximum when we work with very small data blocks and doesn’t change even when they grow bigger. As we remember, it happens because the controller driver uses the CPU resources to “stick together” the requests for sequentially located data. So, the driver sends to the disk the requests of pretty big size, which is very convenient for both: the disk drive and the interface.
The curious thing about it is the fact that RAID 0 array of four drives is always lower than any other RAID 0 arrays.
RAID 01 array is the slowest with the data blocks of any size, and RAID 1 array runs as fast as a single drive in all cases except 1KB and 2KB data blocks processing.
Now we will have a look at those patterns, which are closer to real life situations.
These are the patterns imitating the work of the file-and webservers.
Although the results in the tables are pretty close, we should still check the graphs, just to make sure we haven’t missed anything.
Mirrored arrays (RAID 1 and RAID 01) perform beautifully fast. RAID 1 array runs almost as fast as RAID 0 array of two drives, except the 16 requests queue depth. This performance drop can be observed in both: WT and WB modes, but in case of enabled WB-caching the performance loss is less severe. RAID 01 outperforms all other arrays, except RAID 0 of four drives, and when the queue is 4 or 16 requests deep, then it shows its real best.
Let’s apply our great rating system to compare the performance of RAID arrays of different types with one another. Assuming that all the workloads are equally probable, we will calculate the rating as average performance under all types of workload:
RAID 0 array of four drives managed to retain the performance leadership only because FileServer pattern carries some writes, which are usually processed faster on RAID 0 than on RAID 01. RAID 1 array fell behind RAID 0 of two HDDs because of the same 20% of writes, even though the graphs of these two arrays were almost coinciding. Besides, the rating system demonstrates that even so few writes are more than enough to boost the performance in case of enabled WB-caching.
Now let’s take a look at the WebServer pattern, which is remarkable for having no writes at all. First come a few tables:
And now a few graphs:
As we see, the performance difference between the WT and WB modes is very small, even smaller than in FileServer pattern. However, there is one more general difference, which you can clearly notice from the graphs. In WebServer the graphs for a single hard disk drive and RAID 0 arrays start from the same point. The reason is very simple: under linear workload RAID 0 arrays lose their advantage of parallel HDD functioning. The arriving request is simply sent to the corresponding drive and each HDD works separately (if the requested data block is smaller than the stripe).
The absence of write requests allows taking full advantage of the mirrored arrays, and of course, they can’t miss this great opportunity. RAID 01 array is at the leading edge, while RAID 1 outperforms RAID 0 of two drives.
The rating system where we compare the performance of various RAID arrays (we calculate the rating the same way we did it for FileServer) shows the same picture as the graphs above. WB-caching results into an evident performance drop. The reason is also very simple: no write requests. Caching doesn’t have any positive effect in this case, while the driver still does use processor time. So, the overall performance decreases.
WorkStation pattern imitates the user’s intensive work in different applications of the NTFS5 file system.
Now let’s consider the WT and WB graphs separately.
RAID 0 arrays boast a noticeable performance growth, however, the dependence on the number of drives involved starts telling only when the queue is over 4 requests. RAID 01 array outperforms RAID 0 array of four drives, but suffers a performance drop at queue=16, just like RAID 1.
As we have already shown above, the use of WB-caching results into a performance boost. RAID 01 array outpaces RAID 0 array of four HDDs at queue=2.
To compare the performance of different types of RAID arrays and analyze the influence of the WB-caching, we will again apply our rating system. The rating for WorkStation pattern is calculated according to the following formula:
Performance = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32
The large writes share made RAID 1 lag behind RAID 0 of two drives. While RAID 01 array managed to retain the second position even though the 3-HDD RAID 0 was close behind it. As we see, enabled WB-caching results into 6-10% performance increase.
In conclusion we will run a few tests from WinBench99 package, which serves as a tool for measuring the HDD performance in desktop applications.
We will compare the performance of different RAID arrays in two integral tests: Business Disk WinMark and High-End Disk WinMark:
As we have expected, the first three positions are occupied by RAID 0 arrays of four, three and two hard disk drives. RAID 01 array is right behind the two-drive RAID 0. I would like to draw your attention to the competition between RAID 1 array and a single hard disk drive: in case of WT-caching a single HDD is faster than RAID 1, while in case of WB-caching RAID 1 manages to get ahead.
RAID 01 array is again behind all RAID 0 arrays, while RAID 1 fell behind a single HDD.
Since the linear read speeds are the same for NTFS and FAT32 file systems, we will consider only one diagram.
As usual, here are the linear read graphs (in fact, the graphs are very similar for both caching systems, that is why we would like to offer you only one set, taken in WT mode):
The linear read graphs for RAID 1 and RAID 01 show that the controller does its best to optimize the reading from mirrored pairs, though it hardly succeeds, unfortunately… :(
In order to test the controller’s ability to ensure proper data security if one of the array drives fails, we modeled this emergency situations for HDDs of RAID 1 and RAID 01 arrays.
Just like in our Intel SRCS14L Four-Channel SerialATA RAID Controller Review we imitated the HDD failure by disconnecting the SATA power cable from the drive. The array status was monitored with the help of PAM (Promise Array Management program).
In about a minute after the “failure”, PAM reported this fact and changed the array status from “Functional” to “Critical”. In order to avoid annoying the people around me (the program started emitting a very irritating beeps), I restored the power supply immediately. After that the program started restoring the array. I suppose that it is not quite correct: what if I used the same failed drive again?
Restoring the arrays under workload (when the controller not only checks the data integrity and restores the lost data, but also processes the users’ requests) usually takes a lot of time. I decided not to waste so much time. Therefore, the additional controller workload was eliminated (I terminated the running test) and measured the time it took the controller to restore the array completely.
It took the controller half an hour to restore RAID 1 array, while RAID 01 array needed a little over an hour.
According to the benchmarks results, Promise FastTRAK S150 TX4 controller is an excellent product. However, despite the overall great impression, we need to be objective to the end, that is why let’s recall the controller’s behavior in all considered tests.
DataBase: The controller demonstrated excellent performance scalability of RAID 0 arrays depending on the number of hard disk drives used, and excellent performance of RAID 1 and RAID 01. At the same time under moderate workloads in case of pretty low write shares we observed a “stepping” effect on the graphs. Besides, the RAID 01 array could suddenly start losing its speed… All this means that in these modes some algorithms of the controller driver get activated and try to optimize all ongoing requests. All in all, these algorithms seem to be set for different type of workload.
By the way, since we came to speak about the settings. It suddenly occurred to me: why don’t RAID controller makers make it easier for their own software developers by creating presets in the controller drivers just like we see in the drivers of some graphics cards? Why should they load the driver with a hard task of deciding what type of workload is currently involved, if it can be simply set in the driver?
Well, after I tried to imagine the settings pages of FC-test and WinBench99 I almost gave up the idea… But back to our Promise controller and its performance.
SequentialRead and SequentialWrite: The controller performed very fast only when working with RAID 0 arrays. Unfortunately, it slowed down as it came to RAID 1 and RAID 01.
FileServer and WebServer: The controller was beyond any competition. Brilliant speed of the arrays with reserved data is worth mentioning separately. Although we have noticed some strange performance drops in case of RAID 1 array working in FileServer pattern with 16 requests queue.
WorkStation: The controller proved pretty fast here, although large share of writes didn’t allow the controller to show its very best in RAID 1 and RAID 01.
In WinBench tests the controller also proved highly scalable when working in RAID 0 arrays, while RAID 1 and RAID 01 didn’t demonstrate anything special.
All in all, the benchmark results indicate that this controller is a great solution for small servers, rather than high-performance workstations. Although the only problem we faced during this test session, namely low linear erad speed in some testing modes, could have been the result of the controller’s incorrect functioning when installed into PCI-X 133MHz slot.
We will continue keeping an eye on the BIOS and driver updates!