Promise FastTrak TX4300 Serial ATA RAID Controller Review

It’s been a while since we offered you controller card reviews, but today we are going to correct this mistake. WE would like to introduce to you a new four-channel Serial ATA RAID controller from Promise FastTrak series – TX4300. Are you building a fast and reliable storage subsystem? Check out the review, it might be something you need.

by Alexander Yuriev , Nikita Nikolaichev
10/20/2005 | 04:09 PM

As you may have already guessed from the name of the today’s article, we are going to discuss the new four-channel controller from Promise: FastTrak TX4300. The new model differs from the predecessor, Promise FastTrak TX4200, by faster data transfer rate between the controller and the HDD: 3Gbit/s (against 1.5Gbit/s by FastTrak TX4200, for details see our article called Promise FastTrak TX4200 4-Port Serial ATA RAID Controller Review ).

The controller is shipped in Promise’s brand name box:


But the controller card itself appeared very tiny, it is nearly half as big as FastTrak TX4200:

Unfortunately, the HDD connectors do not face the same direction, which can make it difficult to settle all the cabling inside low-profile (1U) servers. Above the chip there are connectors for hard disk drives activity indicators (individual as well as combined). There is a also a connector for the information cable of the SuperSwap 4100 chassis (this cable serves to monitor the rotation speed of the fan inside this chassis and HDD temperatures).

The controller is bundled with everything necessary: four SATA cables, power supply converter (which is becoming less required as there appear more power supply units with the corresponding connectors), the bracket used to install the controller into a low-profile system case, user’s guide and a CD-disc with the drivers.

The controller complies with the Serial ATA II Extensions to Serial ATA 1.0a specification and can support up to 4 Serial ATA hard disk drives (supporting up to 3Gbit/s data transfer rates).

Besides the Native Command Queuing Technology (NCQ), it also supports Tagged Command Queuing (TCQ).

The supported RAID array types are pretty standard: RAID 0, 1, 10 and JBOD. Since the controller can work in the 32bit/66MHz PCI 2.2 slot, the maximum possible theoretical bandwidth this controller provides is 266MB/s.


Testbed and Methods

Testbed configuration:

We used the following benchmarking software:

We used FileServer and WebServer patterns in our Intel IOMeter tests.

These patterns are intended for measuring the performance of the disk subsystem under workload typical of file- and web- servers.

We also use the WorkStation pattern, created by Sergey Romanov (a.k.a. GReY). It is based on the statistical data about the disk subsystem workload as given in the StorageReview Testbed 3 description . The statistical data for the NTFS5 file system are gathered in three operational modes: Office, Hi-End and Boot-up.

This pattern shows how well the controller performs in a typical Windows environment.

Lastly, we checked out the controller’s ability to process sequential read/write requests of variable size and its performance in the DataBase pattern, which loads the disk subsystem with SQL-like requests.

Our controller was tested with the firmware version 2.00.0.31 and driver version 1.00.0.36. The controller was installed into the PCI-X/133MHz slot (although it only supports PCI64 66MHz). WD740GD (Raptor) hard disk drives were installed into the rails of the SC5200 system case and fastened at the bottom with four screws.


Performance in Intel IOMeter DataBase Pattern

As usual we will start with the results obtained during mixed requests processing. In this pattern we will check how well the controller can handle read and write requests for 8KB data blocks with random address. By changing the read-to-write requests ratio we will be able to see how efficient the controller driver is for sorting requests out.

The table with the overall performance results comes first:

Now take a look at the graphs showing the dependence of the data transfer rates on the relative density of writes. Each graph corresponds to a different requests queue depth: 1, 16 and 256 requests:

Under the linear workload in RandomRead mode all arrays perform similarly. As the share of write requests increases, the HDDs can perform lazy writing more efficiently so the overall array performance improves. The maximum results for all RAID arrays are obtained in RandomWrite mode as we have expected.

The RAID 1 mirror array graph is almost the same as the single HDD graph with small share of writes. As the number of writes grows up RAID 1 array slows down and falls behind the single HDD. The second mirrored array, RAID 10, is slower than 2-HDD RAID 0 array in all modes.

As the workload grows higher, the arrays performance starts depending more on the array type and the number of hard disk drives it is built of, independent of the number of reads and writes among the processed requests.

Note that all mirrored arrays are faster than RAID 0 arrays of the same number of hard disk drives in RandomRead mode, but slower in RandomWrite mode.

The controller can manipulate the read requests and process them on the most ready/convenient/well-rested hard disk drive, so that to achieve the highest performance and efficiency. However, when it comes to writing, the situation is just the opposite. To ensure that the data is identical on both hard drives of the mirrored pair should receive positive write confirmation from each of the drives, which causes additional delays (ideally, it would also have to perform control reading, though).

As the queue depth increases to 256 requests, the situation doesn’t really get any different. The only exception is the workmodes with dominating number of read requests, where the performance grows up significantly due to TCQ technology.


Performance in Intel IOMeter Sequential Read and Write Patterns

Now let’s pass over to sequential reading and writing. The array receives a stream of read/write requests with a request queue depth of 4. Every minute the size of the data block changes, so we can see the dependence of the linear read/write speed on the size of the data block by the end of the test session. The sequential read results (the dependence of the controller data transfer rate on the data block size) are listed in the following table:

Let’s take a look at the graphs showing the dependence of the array performance on the data block size:

The advantages of RAID 0 arrays made of a lot of hard disk drives start showing only for the large data blocks, i.e. when the controller can split large chunks of data into a few smaller ones and use HDDs in parallel. RAID 0 arrays look pretty good: they all reach their maximum performance with 64KB data blocks already (and the 2-HDD array shows its maximum speed with a 16MB data block). We can clearly see that the read speed of the array depends on the number of HDDs involved, and the 4-HDD array performs close to the maximum bus bandwidth (266MB/s).

RAID1 and RAID10 mirrored arrays turned out less attractive here. In most testmodes they are slower than a single HDD and a RAID 0 array of two HDDs respectively.


Now let’s take a look at the performance of Promise FastTRak TX4300 controller during sequential writing. The data transfer rates shown by the controller in this test for different data block sizes are given in the table below:

Now let’s look at the dependence graphs:

Only the single hard disk drive and RAID 0 of two HDDs performed impeccably here. The graphics for RAID 0 arrays of 3 and 4 HDDs coincide with that for RAID 0 of 2 HDDs, and the mirrored RADI1 and RAID 10 arrays fall behind a single hard drive in all test modes.

To tell the truth, this is a pretty unusual picture. It looks more like the result of certain optimizations for write requests processing introduced in the controller driver. In our purely synthetic pattern the optimizations evidently provided a negative result.


Performance in Intel IOMeter FileServer and WebServer Patterns

At first let’s check the controller performance in the file-server storage subsystem:

Let’s build a graph showing the dependence of the arrays speeds on the request queue depth:

Since the share of writes is only 20%, all arrays demonstrate pretty good results. The performance of RAID 0 arrays proves very scalable depending on the number of HDDs involved. The mirrored RAID 1 and RAID 10 arrays are notably faster than a single HDD and a RAID 0 array of 2 HDDs respectively.

For a better comparison of different RAID arrays performance we will use our traditional rating system. Provided all workloads are considered equally probable, we will calculate the general performance rating index as the average performance during requests processing under all types of workload:

RAID 0 arrays made of 3 and 4 HDDs take the first two lines of our ranking system, and RAID 1 and RAID 10 appeared far ahead of the single hard disk drive and a RAID 0 array of 2 drives, correspondingly.


Now let’s take a look at how efficient our controller can be in a web-server pattern:

This is a truly dramatic sight. The RAID 0 of 4 drives and RAID 10 array perform quite expectedly. The single hard disk drive and a RAID 0 array of 2 HDDs are much slower than they should be, while the performance of RAID 1 is almost normal.

The first thing that occurs to you is probably the fact that there has been some mistake made during the testing process. Let me deny that supposition completely right away. Our Intel IOMeter test script implies that all the subtests are performed one after another and the system is rebooted between the test sessions only. So, the WebServer pattern is performed within the same session with the FileServer pattern (first we run FileServer and then WebServer, without rebooting the system in-between).

So, if the controller goes through the FileServer with excellent results and then stumbles over the WebServer patterns, you shouldn’t blame the testers: the system or RAID settings do not get changed between the two pattern tests. All settings are adjusted BEFORE the script start running. And the complete testing cycle for the script with certain settings is about 14 hours non-stop (not taking into account the short breaks needed to reboot the system).

So, the reason for these abnormal performance results should probably lie in the controller drivers.

In our rating system built following the same rules as we have just described during our FileServer discussion the situation looks as follows:

RAID 0 array of 4 HDDs and RAID 10 array look very close to what we have just seen in FileServer pattern. The results demonstrated by other arrays are pretty confusing, I should say.


Performance in Intel IOMeter WorkStation Pattern

Now let’s pass over to WorkStation pattern, which imitates active user work in various applications in NTFS5 file system:

As always here follows the graph showing the dependence of arrays performance on the requests queue depth:

The situation is pretty typical of RAID 0 arrays: the more HDDs are used to build the array, the faster it processes the requests. As for RAID 1 and RAID 10 arrays, they start off with the speeds close to those of RAID 0 arrays made of 2 and 4 HDDs respectively, but as the workload increases, they perform more like a single HDD and a RAID 0 array of 2 HDDs.

Let’s compare the performance of RAID arrays of different types. The ratings for the WorkStation pattern will be calculated according to the formula below:

Performance Rating = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32

RAID 0 arrays ranked themselves depending on the number of hard drives used. The performance of RAID 10 array appeared higher than that of RAID 0 array of 2 HDDs. A similar situation occurs between RAID 1 array and a single hard drive.


Performance during Multi-Threaded Writing / Reading

In this pattern we test the controller’s ability to perform multi-threaded sequential writing/reading. Since it is the first time we are using this type of workload to test controller cards and hard disk drives, it makes sense to tell you a little bit more about the major testing principles and methodology.

So, the task is to emulate simultaneous workload on the storage subsystem imposed by a few applications that have requested “large files”. There will be a special test agent of the IOMeter program that will emulate the above mentioned applications (in Intel IOMeter terms it is called Worker ). This agent will read/write a sequence of 64KB blocks of data starting with some initial segment. Let’s call this process thread .

We are increasing the number of outgoing requests from the Worker (from 1 to 8 with the increment equal to 1). This allows us to study how well the controller can reorganize requests (i.e. combine several requests for sequential data into a single request).

By increasing the number of Workers, we make it harder for the storage subsystem, because in real-life situation several simultaneously working programs will compete with one another for the priority access to the hard disk drives. Each Worker processes its own data (i.e. the addresses of the requested data blocks do not coincide by different Workers).

We will consider two cases: when all programs read the data and all programs write the data. Of course, there can be a lot of different workload combinations, but so far we decided to take into consideration the following ones:


Let’s take a look at the performance difference between various arrays when the workload equals 1 outgoing request (the situation closest to reality):


To make the diagrams easier to read we decided to mark all RAID0 arrays with the same color.
Despite this fact you can easily distinguish between the RAID 0 arrays with different number of HDDs:
the arrays made of more drives are closer to the top of the diagram (come first).

Of course, with only 1 request workload we didn’t see any scalability of the read speed depending on the number of hard drives in the array in a single thread (application). A single HDD and RAID 1 array performed equally fast, and RAID 10 array outperformed even RAID 0 of 4 HDDs… And it happened not because RAID 10 read successfully from all disks of the array at a time, but because RAID 0 array failed to perform better. Looks like neither the controller, nor the hard drives support aggressive read ahead feature.

However, when we moved to two simultaneous threads, there was a surprise waiting for us: RAID 1 sky-rocketed!

Two threads got split ideally between the two drives of the array and the read speed almost doubled. Of course, in case of two threads the HDD heads have to keep moving all the time between the two working zones, so you’d better forget about high linear read speed from the array in this case. But what about the read ahead function, which serves for these particular situations? Unfortunately, the hard disk drives we used in our test session are optimized for server needs, i.e. the main objective is to ensure short access time. So, the bigger chunks of data the HDD is going to “swallow” upon each read request, the more time it will require to process random requests.

If we turn to the results tables above we will see that if the workload exceeds 1 outgoing request, the arrays start performing much faster: it is either the TCQ of the hard drives or the requests optimization of the controller. The requests sent to the drives start looking more like linear reading.

In case of three and four simultaneous threads we see very clear scalability of the RAID 0 array speed from the number of HDDs in the array.

Let’s see what we’ve got during sequential writing:

And when we write data, the worst results were demonstrated by the mirrored arrays: almost in all test modes these arrays appeared slower than a single HDD as well as a RAID 0 array of 2 drives.

However, RAID 0 arrays coped with the single and two threads processing very well. No wonder, since the hard disk drives we used feature very big cache buffer, which is used up almost entirely for lazy writing. In case of three and four simultaneous threads RAID 0 array performance again proves scalable depending on the number of hard drives in the array.

So, the first multi-threaded testing experiments with RAID controllers gave us a lot of food for thought. We hope you also found a lot of useful information here.


Performance in FC-Test

And now let’s take a look at the performance of Promise FastTrak TX4300 in a “single-user” mode.

We stick to our traditional methodology of using FC-Test: we create two logical volumes, 32GB each, on the array and format them in NTFS and then in FAT32. We create a set of files on the first volume, and then this pattern is read from the array, then copied into a folder on the same partition (copy-near – inside one and the same logical volume), and finally copied onto another partition (copy-far).

The test system is rebooted between the tests to avoid the influence of the OS’s caching on the results. We use five file patterns here:

NTFS File System

Let’s start with NTFS. We’re going to examine the results of each test action for each pattern independently due to the abundance of data we have received. The first action is the creation of a set of files on the array.

We will take a look at the diagrams composed for the three most interesting patterns:

As we have expected, RAID 0 arrays show the best results here. We see that the obtained results are scalable depending on the number of HDDs in the array.

The write speed onto RAID 1 array is lower than that onto a single HDD, which we have seen in the previous tests. RAID 10 also didn’t strike us with any outstanding performance.

Let’s check out the read speed:

Files reading appeared a true disappointment: RAID 0 array made of many hard drives didn’t prove up to our expectations. RAID 0 array of two drives performed bets of all here.


Now we are passing over to copy speed.

Copying is actually a work of two simultaneous threads: one if reading and another one is writing. And RAID 0 arrays turned out to be made best of all for this job. Again, scalability is evident.

The second batch of copy tests doesn’t bring us any surprises. RAID 0 arrays are ahead, mirrored arrays are slightly behind a single HDD and a RAID 0 array of 2.

Now let’s enlarge our cluster! :)


FAT32 File System

Is it my imagination or has the performance truly increased? Looks, like things are getting better and better :)

But during reading we again seem to be hitting against the glass wall :(

Almost all arrays have frozen before some unattainable bar. Is it the driver issue or some hardware limitation?


File copying again was a success.

Well, RAID array tests in FC-Test benchmarking suite turned out to be pretty informative. Now all we have to do is test some other controller card and compare the results. Or wait for the next driver revision.

Performance in WinBench99

This time we decided to refrain from running all the WinBench tests. We only took the linear read speed graphs for your reference:


Conclusion

Well, according to the benchmark results, Promise FastTrak TX4300 controller proved a pretty good solution. RAID 0 arrays demonstrate very good scalability of the array performance on the number of HDDs in an array.

The controller performed very stably in some tests, however, we would really like to run the tests again with a new version of the driver before making any final conclusions.

Speaking about the innovations introduced in the new controller, we have to admit that FastTrak TX4300 is not dramatically different from FastTrak TX4200. so if you are a happy owner of a TX4200, you do not need to rush and upgrade to the TX4300 modification. And if you are considering of getting yourself something to enhance and speed up your storage subsystem, then Promise FastTrak TX4300 is a good choice at this time.

Some of you believe that contemporary mainboards offer sufficient number of SATA ports and can allow you to build RAID arrays from the connected HDDs. However, since computer hardware migrates so rapidly, a chipset implementation of a RAID array may not allow you to move all the software from the old system to the new one that easily. In this case an external RAID controller with the forever-young PCI interface may be the right choice. Unless you enjoy reinstalling Windows once a month, of course :)