by Alexander Yuriev , Nikita Nikolaichev
12/16/2004 | 01:43 PM
We’re returning once again to the Raptor 2 drive from Western Digital and its relationships with Tagged Command Queuing (TCQ). The reason for our stirring up the dying fire searching for burning cinders is simple – we’ve got a Talon ZL4-150 controller from Pacific Digital Corp.
This manufacturer is remarkable for being the only company to produce ATA controllers with TCQ support and this made us think that the SATA controller from Pacific Digital will ideally suit for the Raptor 2, whose ATA origins have already been discussed in great detail in our WD740GD aka Raptor 2 Hard Disk Drive Review.
As you can guess, the Talon ZL4-150 controller supports four Serial ATA-150 devices and allows uniting them into arrays of the following types: JBOD, RAID0, RAID1 and RAID10. A note: at the time of our tests the controller didn’t permit to build a three-disk RAID0 array.
Talon ZL 4-150 SATA Controller Features
JBOD, 0, 1 and 10
4 SerialATA 150
32bit 33/66MHz PCI bus.
Supports up to 1PetaByte array size.
The controller comes with a user manual, drivers and utilities (on a CD and a diskette), and four SATA cables.
This is what the controller looks like:
The main chip is hidden under the heatsink in the center of the PCB; a flash memory chip is located to the right of it. The controller has a low profile as the SATA connectors are “sunken” relative to the device’s top edge. On the controller’s PCB, near each of the SATA plugs, you can discern four identical chips – we are not surprised to see Marvell 88i8030 converters here, which we have already met a few times.
In other words, the Talon ZL4-150 is in fact a PATA RAID controller that has been made to support SATA drives thanks to PATA-SATA converters from Marvell. Well, this doesn’t scare us at all :).
As we have already said here, the Marvell chip doesn’t prevent the drive and controller from working with tagged commands, so we shouldn’t consider the use of a converter as a drawback. Let’s better watch the controller in action.
We used the following tests:
We created one partition for the entire capacity of the array in WinBench 99; then we ran each of WinBench tests seven times, choosing the best result.
To compare the speeds of the arrays in IOMeter, we used File Server and Web Server patterns:
These patterns help to measure the disk subsystem performance under a load typical for file and web servers.
Our own Workstation pattern created by Romanov Sergey aka GReY is based on the disk access statistics given in the StorageReview Testbed 3 description. These statistics data are gathered in the NTFS5 file system, in Office, Hi-End and Boot-up operational modes.
This pattern shows how well the controller performs in a typical Windows environment.
Lastly, we checked out the controller’s ability to process sequential read/write requests of variable size and its performance in the Database pattern, which is bombarding the disk subsystem with SQL-like requests.
Our controller had a version 3.08 BIOS and we used a version 126.96.36.199 driver. The controller was installed into a 133MHz PCI-X slot (although the controller itself only supports 66MHz PCI).
WD740GD (Raptor 2) hard disk drives were installed into the rails of the SC5200 system case and fastened at the bottom.
The TCQ support option is controlled through a special Windows-based utility, rather than through the controller’s BIOS. Unfortunately, the utility doesn’t allow setting each drive independently, but uses the same setup for all the attached devices. The second unpleasant thing that directly affected the number of our tests is that the utility selects “TCQ Enabled” after each system reboot, irrespective of what you’ve selected manually before. We will discuss the problem later on.
We only tested the controller with “TCQ Off” in one mode (a JBOD made of a single drive).
The controller doesn’t have its own memory (we don’t count in its 16 kilobytes of FIFO buffers), so it offers no caching-related settings. You also cannot control the deferred write mode of the attached hard disk drives. Judging by the results of the arrays, the controller doesn’t prohibit the drives’ deferred writing.
We are going to start by checking the controller’s ability to reorder the commands in a special pattern that contains requests to read random-address sectors.
Steadily increasing the load of the disk subsystem (the number of requests) from 1 to 256 stepping 1, we will judge as to the presence/absence of TCQ by the increase of the performance gain (the drive’s speed when processing the incoming requests) depending on the request queue depth.
We have already employed this type of load to determine the TCQ support of the WD740GD on the Promise SX8 controller, but we had an engineering sample of that controller then and we couldn’t measure precisely the value of the gain.
Besides the results of the Talon ZL4-150 controller the diagram also shows the results of the same WD740GD drives on a Promise S150 TX4 controller which does not support TCQ.
As you see, enabling TCQ you get a considerable performance bonus. What’s most important for users, this bonus is also noticeable at small loads. The results of the WD740GD on the Talon ZL4-150 controller with disabled TCQ look similar to its results on the Promise SATAII 150 SX8 controller (see our Raptor 2 review).
The disabled TCQ works the same way – up to a queue of 32 requests the commands cannot be executed out of order, both on the drive level and on the controller’s driver level (see the green graph). After that we see a sudden growth of the controller’s speed as its driver starts to sort the requests out.
The “TCQ On” graph of the Talon controller resembles the graph we got earlier with the Promise controller: there’s a flat stretch around the load of 32 requests and there are two “humps” at high loads. So, TCQ does work on Pacific Digital’s controller, too.
Meanwhile, the results with disabled TCQ support have no practical meaning as the controller just stifles the performance of the drives in this case.
Let’s now see how the Talon ZL4-150 can handle a mixed stream of requests.
The controller is processing a stream of requests to read and write 8KB random-address data blocks. Changing the ratio of reads to writes in the stream we can determine how well the controller’s driver is sorting the requests out.
The results of this pattern follow below:
The following diagrams show the dependence of the data-transfer speed on the percentage of writes for different requests queue lengths. Since the number of supported RAID arrays is rather limited, we will divide the diagrams by different queue lengths only.
The arrays have the same speeds in the Random Read mode under a linear load, but as soon as there appear write requests the lazy write and the type of the array begin to influence its speed noticeably. The number of drives in the RAID0 array only affects its speed at high percentages of writes.
The RAID0 array is slower than the single drive when the number of write requests is above 50 percent. The RAID10 behaves oppositely to what we see with the two-disk RAID0. It’s hard to identify the effect of mirroring under this load.
As for the effect from Tagged Command Queuing, it is negative. Why? TCQ is bad under a linear load as some time is spent to process the requests, but this processing brings no particular advantages. This is compensated by the drives’ lazy writing algorithms at high percentages of writes, but otherwise the single drive with enabled TCQ is slower than the same single drive without TCQ, as you can clearly see.
Under a higher load the RAID0 arrays show scalability in the number of drives per array, but the speed of the two-disk RAID0 goes down after the 40-percent-writes mark.
The RAID10 array behaves well enough – its speed is smoothly decreasing as there appear more write requests in the stream. The RAID10 is slightly slower than the four-disk RAID0 in the Random Read mode, but in the Random Write mode the RAID10 is faster than the two-disk RAID0. That is, the RAID10 uses an algorithm of alternating the incoming read requests between the drives of the mirror couples.
Unfortunately, we can’t say the same about the second mirroring array. Its speed is always lower than that of the single drive, especially at a high writes percentage.
The diagram suggests that a load of 16 requests is already enough for the TCQ technology to showcase its advantages. When there’s a high percentage of reads, the drive without TCQ is noticeably slower than itself with enabled TCQ.
Well, recalling how the Talon controller “disables” the TCQ support, we should say that these data cannot be an indication of a “significant advantage” of TCQ.
Moreover, the effect of lazy writing becomes more important when there are fewer reads to be performed, and starting from the 60-percent-writes mark the speed of the drive almost doesn’t depend on TCQ.
Our increasing the load further doesn’t cause any changes in the ranks but emphasizes drawbacks in operation of some arrays. The slump in the graph of the two-disc RAID0 has become even bigger, and this array is just a little faster than the single drive, while the RAID1 is always slower than the single drive nearly everywhere in this test. Curiously enough, the drive without TCQ is faster than the drive with TCQ when there are many writes to be performed.
The results of the Talon ZL4-150 controller in the mixed-requests pattern show that the alteration of read requests between the drives of a mirror couple doesn’t work for the RAID1 array, and you can only use this type of array to increase the data security. Alas, a RAID1 won’t have any speed bonuses compared to a single drive.
The RAID10 array, on the contrary, works fine and helps to increase the speed of reading random addresses. The four-disk RAID0 array has a high speed (faster than on any SATA RAID controller we’ve tested so far), while the two-disk RAID0 has problems with high percentages of writes.
Turning TCQ on usually brings a speed gain, but the value of this gain is small compared to the previous test.
Let’s watch the controller doing sequential reading and writing.
IOMeter is sending a stream of read/write requests with a 4-request-long queue. Every minute the size of the data block changes so that we could get the dependence of the linear read/write speed on the data block size:
The following diagram shows the dependence of the controller’s read speed on the size of the data block.
We thought RAID arrays comprised of a different number of drives (especially RAID0 arrays) would have different read speeds, but… There are two distinct groups: 1) the RAID1 and the two JBOD arrays with TCQ on and off and 2) the two-, four-disk RAID0 arrays and the RAID10. We can’t really say why the arrays behave as they do. :) Numbers like 60 or 77MB/s can’t be related to the read speed of the single drive (it is somewhere around 68MB/s) or to the limitations of the PCI bus.
One thing is certain, though. We can’t see any profit from TCQ here at all.
Next goes sequential writing; here’s the table:
We construct a diagram like with sequential reading.
We see the graphs of the single drive with enabled and disabled TCQ fully coincide, but the rest of the graphs are simply awful.
First, all the drives are slower than the single drive starting from 4KB data blocks. Second, the speeds of the RAID0 arrays are similar, and the two-disk array is often faster than the four-disk one. Third, the performance of the mirroring arrays (RAID1 and RAID10) is terrible.
Overall, the Talon ZL4-150 doesn’t have the best possible results in synthetic patterns. Let’s see what it has to show us in patterns that imitate real-life workloads.
Let’s start with the File Server pattern:
The next diagram shows the dependence of the array speed on the request queue depth:
The RAID0 arrays are excellently scalable in the number of drives per array, even at small queue depths. The RAID1 and the RAID10 are always considerably faster than the single drive and the two-disk RAID0, respectively. Since there are only 20 percent of write requests in the File Server pattern, the mirroring algorithms of the RAID1 and the RAID10 work well enough. By the way, this is the first test where the RAID1 is confidently faster than the single drive. To compare the performance of the arrays we calculate their performance ratings by averaging the controller’s speeds under all loads.
That’s just an ideal picture: the four-disk RAID0 takes the first position. The RAID10 and RAID1 are much faster than the two-disk RAID0 and the single drive, respectively. The drive with disabled TCQ has the worst performance here.
Next goes the Web Server pattern:
And the graphs:
Like in the File Server pattern, the RAID0 arrays boast an excellent scalability in the number of drives per array even at small request queue depths. The lack of write requests allows the mirroring RAID1 and RAID10 arrays to show their best. The speed of the RAID10 almost equals that of the four-disk RAID0, while the RAID1 is always faster than the two-disk RAID0 in every mode.
Like in the previous pattern, the sorting of requests adds considerably to the speed.
We calculate the performance rating here like we do in the File Server pattern:
The rankings are different here compared to the File Server pattern, but quite expectable. Since the Web Server pattern includes read requests only, the mirroring arrays have a certain advantage over the others. That’s why the four-disk RAID0 is just a tiny bit better than the RAID10. The RAID1 is ahead of the two-disk RAID0. The drive with disabled TCQ is the worst of all, like in the previous pattern.
This pattern emulates a user who is intensively working in various applications in the NTFS5 file system:
The next diagram shows you the dependence of the array speed on the request queue depth:
The RAID0 arrays feature good scalability starting from a queue of four requests long. The RAID10 is faster than the two-disk RAID0 in every operational mode. The second mirroring array (RAID1) outperforms the single drive at short queues only. Its speed goes down suddenly at a queue of four requests, and it is behind the single drive since 16 requests.
The drive with disabled TCQ is slower than the rest of the arrays even on single requests.
We calculate the performance rating for the Workstation pattern by the following formula:
Performance Rating = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32
Since there’s a higher probability of short requests, the RAID1 array is faster than the single drive, while the big percentage of writes in this pattern leads to the drives lining up like they did in the File Server pattern.
This benchmark allows evaluating the performance of the disk subsystem in desktop applications, in NTFS and FAT32 file systems.
NTFS comes first:
And here are the results of the two integral subtests, Business and High-End Disk Winmark:
The most astonishing of the results is the speed of the single drive with TCQ disabled: it seems like TCQ greatly impedes the controller in this mode. This may be due to the file system, but we had no TCQ-related problems in the Workstation pattern that works in NTFS, too.
The RAID0 arrays are expectedly the fastest among the other array types. The mirroring arrays (RAID1 and RAID10) took the last places.
Let’s now switch to FAT32:
Unlike in NTFS, the rankings vary depending on the subtest: if we sort the arrays by their speeds in Business Disk Winmark, they would line up in the same order as in NTFS. This once again proves that the bonus from TCQ comes irrespective of the file system.
And if we sort the drives by the speed in High-End Disk Winmark (as we did in the diagram above), the single drive without TCQ will be right after the single drive with enabled TCQ – the rankings look quite logical in this case.
Since the linear read speeds are the same for NTFS and FAT32, we offer one diagram for both file systems:
The speed of the single drive is the same with TCQ enabled or disabled. The speeds of the RAID1 and RAID10 arrays are smaller compared to the single drive and the two-disk RAID0, respectively. The speed of the four-disk RAID0 is rather too low.
Here are data-transfer graphs for each of the arrays:
The Talon ZL4-150 is a queer controller as it had problems in half of our synthetic tests, but its TCQ does work and the RAID0 arrays have just excellent speeds in some benchmarks. The controller is also very good in tests that emulate server-like workloads; its current driver seems to be overall optimized sharply for servers.
Currently you can download version 1.1 of the driver from the manufacturer’s website. The user manual says the Talon ZL4-150 controller can work in Windows 2003/XP/2000 and Linux, but the minimal system requirements only mention Microsoft Windows 2000 with Service Pack 2 and higher, Windows 2003 and Windows XP with Service Pack 1 and higher. We could find no drivers or utilities for Linux.