by Nikita Nikolaichev
09/13/2003 | 07:58 AM
The stir around SerialATA interface has reached its peak this summer: a lot of freshly released hard disk drives and controllers with SerialATA support occupied our minds (and testbeds) for a long time. However, I found a way out of this “vicious” circle (for a short while) and returned to reviewed of more popular ATA RAID controllers. Why I said more popular? I said so because ATA drives are still much cheaper than SATA ones and do not yield that much in performance to the newcomers.
We will start our new tour of RAID controller investigations with Promise FastTRAK TX 4000. According to its features, this controller is positioned inbetween the dual-channel FastTRAK TX2000 and a powerful four-channel SX4000. Even though you can also connect four HDDs to the TX2000 controller, our experience shows that four-channel controllers are always faster than their dual-channel counterparts when working with RAID arrays of three or four hard disk drives (see our article called Four-Channel ATA RAID Controllers Comparison for more details). The SX4000 controller with its four channels belongs to a different weight category compared with TX4000, because SX4000 features a XOR processor onboard and hence can support hardware RAID 5 arrays.
TX4000 controller should replace a very fast but, unfortunately, pretty conflicting FastTRAK100 TX4 controller (see our Promise FastTrak100 TX4 IDE RAID Controller Review). The bridge layout used on this controller (two PDC20270 ATA chips connected via the PCI-PCI bridge from Intel) allowed to avoid the situations when there is only one cable for two HDDs, because each HDD could be connected to a separate ATA-channel. However, the users have been facing the incompatibility problem between their mainboards and the controller cards they were using. As far as I understand, the problems were caused by the Intel PCI-PCI bridge. To be fair I should say that this controller was developed by DEC engineers and Intel got it only after the merger. However, this is the have-been things already…
So, Promise launched a four-channel single-chip ATA/133 RAID controller chip aka PDC20619 and a corresponding TX4000 controller card based on it. The set of supported RAID arrays is pretty standard: RAID 0, RAID 1, RAID 0+1, JBOD. Due to controller’s ability to work in a 66MHz PCI 2.3 slot, the maximum theoretical bandwidth it can provide makes 266MB/sec.
In the lower right corner of the controller PCB we see a BIOS chip and connectors for LEDs.
The stripe block size for arrays was set to 64KB. For WinBench tests, the arrays were formatted in FAT32 and NTFS as a single partition with a default cluster size. The benchmarks were run five times each; the average result was calculated for further analysis. The HDDs didn’t cool down between the tests. When building RAID arrays we increased the number of drives by adding new drives to IDE channels once by one.
For our testing we used the following benchmarks:
Our testbed was configured as follows:
RAID arrays were created of four Maxtor 6L020L0 hard disk drives (FW: A93.0500). The AAM and write check were disabled.
The controller was tested with the BIOS version 1.00.0.26 and with drivers version 1.00.0.21. We used a special Promise Array Management utility version 126.96.36.199 to control the array status and manage the caching driver work modes. Actually, a very funny thing happened to me when I just started working with this utility. When I installed the equipment the utility did recognize the controller model name correctly but when I started using it to build up RAID arrays, it offered me RAID 5 among available RAID array types! At first I was very happy thinking: could it be that Promise had implemented software RAID 5 just like HighPoint did a while ago? However, when I clicked “Create array”, the PC sobbed quietly and rebooted. Well, at it least it was worth a try…
The entire test session was repeated twice: with the help of PAM utility we changed the work mode of the caching driver (WriteBack / WriteThrough).
We will start with the hardest pattern of all: the DataBase.
Just in case let me remind you that here we check the controller’s ability to work with a mixed stream of requests featuring both: reads and writes of 8KB data blocks with random address. By changing the reads-to-writes ratio, we can figure out how well the controller driver sorts out read and write requests.
As usual, see the table with results in WriteThrough mode:
Now come a few more illustrative graphs:
Note that in case of linear workload (one outgoing request), the single HDD proved faster than a RAID 1 array. And RAID 01 array appeared faster than RAID 0 array of three and two hard drives only if the reads share was pretty big.
As the workload onto the controller increased, the situation got completely different. The mirror (RAID 1 array) is much faster than a single drive in those mode where the reads are dominating, and RAID 01 array in its turn outperforms RAID 0 array of two HDDs almost in all situations. In RandomRead mode (with no write requests) RAID 01 array appeared faster than RAID 0 array of four drives!
All in all, all the advantages of FastTRAK100 TX4 have been fully inherited by the FastTRAK TX4000.
Increasing the workload up to 256 outgoing requests changes the situation a little bit (making the results of RAID 0 arrays a bit better). However, the things are more or less clear anyway: data mirroring arrays turn out very efficient when the share of writes is pretty low.
And now comes another big table showing the controller performance in WriteBack mode:
Here are the graphs illustrating it, take a look:
Note that even when the workload is linear, RAID 1 array is a little faster than a single HDD. In other words, even in this mode the controller driver distributes the read requests between the two HDDs of the mirrored pair.
When the workload reaches 16 requests, we see that RAID 1 array is always a little faster than a single drive. Even in RandomWrite! RAID 01 array also looks not bad at all, though it yielded a little bit to a two-disk RAID 0 in RandomWrite.
With the workload of 256 requests, the picture doesn’t change that much: everything the controller could optimize, has already been done.
At first glance the performance of different array types during WriteThrough and WriteBack caching varied a little bit. But how big was this difference? Let’s try to compare the performance of all arrays in different caching modes.
Hey, this is easier to say than to do… Of course, we could make 18 diagrams with two graphs on each (6 types of RAID arrays under three types of workloads). However, I doubt that you would be very much excited about it. Neither would be I, to tell the truth. :)
After a little bit of thinking, it seemed to me I found a really elegant solution (I have always been very modest, you know :) I made the third table, where each cell contains the WB value divided by WT value. In other words, the result of this division can be considered an efficiency coefficient for WB caching in each particular case. If the coefficient is smaller than 1 (marked in red), then WB caching doesn’t do any good here. If the coefficient is bigger than 1 (marked in blue), then WB-caching pushed the performance up, i.e. is very helpful.
If the coefficient is equal to 1.0, then both: WT and WB caching are equally efficient for this particular case.
This is a pretty curious thing, don’t you think so?
Of course, for a single HDD enabling WB-caching in the driver didn’t cause any performance boost. On the contrary, in RandomRead mode its performance even got a little bit lower.
However, two-drive RAID 0 arrays have definitely benefited from enabled WB-caching. The maximum performance boost for this array reached 10%. At the same time we can’t disregard the group of red cells on the right. The maximum drop caused by the enabled WB-caching made 17%. Even though this performance reduction was detected only in the mode, which is pretty hard to achieve (256 outgoing requests), it made a very unpleasant impression.
For RAID 0 arrays of three or four hard disk drives, enabling of WB-caching is not much of a deal. The maximum benefit they get from it is 2% performance growth. We got the impression that in these modes the driver optimizations do not work at all, and the performance difference should be explained solely by the measuring error.
The influence of WB-caching on the RAID 01 array performance appeared twofold. On the one hand, we see about 5% performance reduction under small workloads, while on the other hand WB-caching provides about 5-6% performance increase, as the workload grows up.
The effect of WB-caching on RAID 1 array performance was simply amazing. Even under linear workload in case of large writes share the performance boosted by well over 20%. The maximum bonus we managed to observe as a result of WB-caching equaled 36%!!!
However, even in this tropical heaven we see a few reds. Why did they turn up here? Note that they gathered in the left part of the table, i.e. within the “responsibility” of write requests. However, none of the red numbers can be seen in the RandomRead field. So, the array slow down when WB-caching is enabled in mixed modes with mostly read requests.
As we know from the previous Promise controllers reviews, enabling of WB-caching increases a little bit the controller response time during requests processing. With write requests the reasons for this delay are evident: the controller driver spends some of the valuable processor time on caching strategy planning (it tries to find a request among the deferred ones, which could be “combined” with the ongoing requests). However, it is still a mystery why the controller slowed down when working with read requests…
Today however, we do not see any slowdown in RandomRead mode, but as soon as write requests pop up the controller loses speed. And it manages to regain speed only when the writes share increases and the WB-caching starts working, or when the requests queue depth gets bigger, i.e. when the controller driver has a rich choice in front of it. Does it sound logical to you?
Well, now that we have analyzed the results of this pattern, we see that enabled WB-caching affects only a few combinations of the array type and a number of HDDs in it. Of course, all caching algorithms are developed to work with a pair of drives in RAID 0 or RAID 1. These algorithms also worked in RAID 01 array, because the structure of this array type implies the mirroring of a stripe-group pair. One thing is absolutely clear though: WB-caching doesn’t seem to be of any value to RAID 01. Therefore, I would suggest not to enable it, just in case :)
The efficiency of WB-caching for RAID 0 and RAID 1 arrays depends a lot on the request type. For instance, RAID 0 array with enabled WB-caching works perfectly well when the reads share is big enough, but doesn’t feel quite at home with a lot of write requests and heavy workload. The situation with RAID 1 array is just the opposite.
Well, let’s see how well the controller will cope with sequential reading/writing. Of course, we are also very curious to find out if the caching algorithms (WB/WT) affect the read/write speed in this case.
IOMeter send a stream of reads and writes to the array with a queue depth of 4. Once per minute the data block size changes, so after the test is complete, we can see the dependence of the linear read or write speed on the data block size. The obtained results (the dependence of the data transfer rate provided by the controller on the data block size) is summed up in a table below:
If we compare the controller performance in WB and WT modes, we will see that there is hardly any difference! Now come the graphs for read speed in WB mode:
Well, the scalablity of the performance depending on the number of drives in an array is very typical of this controller, but it can achieve the maximum read sped only when the request is super-big. Besides, the four-HDD array didn’t reach 160MB/sec. Although a little later we will see that the problem lies with the HDDs and not with the controller.
Let’s compare the read speed from RAID 1 array with the read speed from the JBOD drive, and the read speed from RAID 01 array with the read speed from RAID 0 array of two HDDs. As we remember, some manufacturers alternate read requests between the disk drives of the mirrored pair. This way RAID 1 array during reads becomes very similar to RAID 0 array, and thus its performance can (theoretically) double!
However, as soon as we cast a glance at the graphs, it becomes evident that Promise controller doesn’t resort to any of the tricks. The read speed from RAID 1 array is always lower than from a single hard disk drive, and RAID 01 array is a little slower than RAID 0 array made of two hard drives. Besides, the speed graphs for RAID 1 and RAID 01 arrays show certain drops when the drive work with 64-128KB and 64-256KB data blocks.
Now let’s turn to SequentialWrite:
It is really exciting that the write speed of the array with various caching settings also didn’t differ that much. Although as we have seen in our Dual-Channel SerialATA RAID Controllers Roundup the differences should be pretty significant especially for a Promise controller.
Just look, what a nice picture! I have even shed a few tears: the controller showed ideal scalability. Of course, Maxtor 6L020L0 hard disk drives should be acting as “good boys” during writes, so that the controller is capable of squeezing everything possible out of them.
Now let’s compare the performance of RAID 1 array and JBOD, and RAID 01 array and RAID 0 array of two drives.
As you see, RAID 1 array is just as fast as a single HDD during writes, while RAID 01 array yields just a little bit to RAID 0 of two drives when processing 8KB and 16KB blocks.
The tests in SequentialRead and SequentialWrite patterns showed that WB-caching algorithms implemented in TX4000 controller differ a lot from those we discussed in our Dual-Channel SerialATA RAID Controllers Roundup.
Now comes the WorkStation pattern. It should imitate intensive work in different applications in NTFS5 file system.
In order to compare the performance of different RAID array types and to try evaluating the efficiency of WB-caching, we will make up a diagram with the array performance ratings. These ratings are calculated according to the following formula:
Performance = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32
As you see, RAID 0 arrays of three and four hard drives again proved to be the fastest, which actually correlates with WinBench results. However, unlike WinBench, RAID 01 array appeared to be far ahead RAID 0 array of two drives and is almost pushing at the back of RAID 0 array of three drives! Besides, RAID 1 array also managed to get somewhat ahead of the single HDD, although unfortunately, it didn’t defeat the dual-drive RAID 0.
Note that WB-caching appeared quite beneficial for almost all array types (because the WorkStation pattern contains a big sharer of write requests).
The last pattern we are going to take a look at imitate the work of the disk subsystem in a file- or web-server.
Again we will use the performance rating to compare the results shown by different types of RAID arrays. However, for server patterns we will simply average the Total I/O values obtained for all types of workload (considering all workloads to be equally probable).
Of course, the mirroring arrays, RAID 1 and RAID 01, feel at home under this workload. Just look at the RAID 01 array! It simply ripped into pieces the RAID 0 of three HDDs! RAID 0 of four drives managed to retain the leading positions only because FileServer pattern has some write requests, which RAID 0 array performs much faster than RAID 01. RAID 1 laid itself out but finally managed to outperform RAID 0 of two drives.
Now let’s check how our tested arrays will behave in WebServer pattern, which has absolutely no writes (the server is supposed to have no dynamic content and simply delivers the requested files).
The absence of writes among the processed requests exerts great influence on the overall situation. As a result, RAID 01 array takes the leading position in the race, and RAID 1 array catches up with RAID 0 of three drives.
Summing up the results of DataBase and WebServer pattern we can conclude that in case of random reads Promise TX4000 controller splits the requests evenly between the HDDs of the mirrored pair.
Now let’s pass over to less synthetic benchmarks, namely to WinBench99 package. This test has been serving perfectly for HDD performance evaluation in desktop applications. Of course, the file sizes as well as the applications themselves have changed a lot through the past few years, but this ids the only test of the kind available today, so we can’t just give it up.
Ok, let’s not dwell on the table for long and go directly to performance comparison in two integral tests: Business Disk WinMark and High-End Disk WinMark:
Well, there is not much to talk about, to tell the truth… As we have expected, the first two prizes were won by RAID 0 arrays of three and four HDDs respectively. The third prize was split between RAID 01 array, which proved faster with enabled WT-caching, and RAID 0 array of two drives, which performed more efficiently with enabled WB-caching. We should also mention the competition between RAID 1 array and a single HDD: during WT-caching a single drive is faster than the RAID 1 array, and as soon as we shift to WB-caching the latter takes the leader.
If you remember, we have already seen something similar when we talked about the DataBase pattern in Intel IOMeter. See the corresponding paragraphs above.
In FAT32 everything appeared much simpler than in NTFS: RAID 1 array is always faster than a single HDD, and RAID 01 array is always slower than RAID 0 of two drives.
Promise FastTRAK TX4000 controller made a very good impression.
First, even with the first version of the BIOS and drivers it showed very high performance. As for weak spots, it is pretty difficult to find any right away.
Second, the first version of the controller BIOS and drivers seemed to be free of bugs, so that we didn’t face any problems with the testing session (except for a funny incident with the PAM, described in the beginning).
Our today’s hero has a indisputable advantage over the FastTRAK100 TX4: it features simpler design and as a result costs less. Hopefully, TX4000 won’t have too many compatibility issues. However, the fact that PDC20619 chip is currently integrated by many mainboard manufacturers including Intel, means quite a lot to us…
P.S.: Our next story will be about HighPoint RocketRAID 404 controller, which now knows to support RAID 5.