by Alexander Yuriev , Nikita Nikolaichev
08/31/2006 | 09:18 PM
We’ll test a new eight-channel SATA II RAID controller from Areca today and we guess most of you haven’t ever heard anything about this company, although you might have met its products. Areca controllers are selling in Europe under the Tekram brand for some reason. Their single difference from Areca’s original controllers is in the text on the box.
But we’ve got an original Areca without a trace of Tekram:
The eight-channel ARC-1220 controller is based on an Intel IOP333 processor (that works an impressive frequency of 500MHz!) and supports up to eight Serial ATA-II drives (with a data-transfer rate of 3GB/s). The controller comes with 128 megabytes of onboard memory (without an opportunity of adding more); the battery backup unit is optional. The controller allows building the following types of RAID arrays: 0, 1, 3, 5, 6, 10 and JBOD. It is designed as a PCI Express card (the PCI-X version of this controller has the model name of ARC-1120).
The controller can be switched into a SATA controller mode by its software. That is, you can disable all of its RAID functionality, but you will hardly want to do so because $700 seems a bit too much for just a Serial ATA controller.
And now we want to tell you about the new array type, RAID6, which is supported by the Areca controller. There’s in fact nothing extraordinary about RAID6. It is a further development of the RAID5 concept – distributed parity, but the check sum for each stripe is written to two disks rather than to one. It can be illustrated with the following figure:
Here, D1, D2, etc. denote data, and P1, P2 denote parity for the D1 and D2 data blocks, respectively.
Thus, the minimum amount of disks required to build a RAID6 array is 4. The useful capacity of a RAID6 array is (N-2)* disk capacity, which is of course smaller than with a RAID5 array consisting of the same number of disks.
The disk usage coefficient is the same in a four-disk RAID6 as in a RAID10, i.e. 0.5, but it becomes quite reasonable with RAID6 arrays that consist of more disks.
Why is it necessary to store a copy of a stripe check sum? In order to enable the array to endure a failure of two disks simultaneously without losing any data! For arrays consisting of inexpensive and, accordingly, not very reliable disks, the RAID6 array type seems to be a reasonable compromise between storage capacity and reliability. But what about its speed? Theoretically, it shouldn’t be slower than the speed of a RAID5 array since it is the checksum calculation that takes the most time whereas the extra write is easily made up for by the controller’s and/or disks’ cache memory. We’ll check this out right now.
The following benchmarks were used:
We used File Server and Web Server patterns with IOMeter:
With these patterns we can emulate the disk subsystem load typical of file and web servers.
The Workstation pattern was created by our author Sergey Romanov aka GReY basing on the disk access statistics published in the SR Testbed3 description. The statistics were collected for NTFS5 in Office, Hi-End and Boot-up modes.
A drive’s performance in this pattern is indicative of how appealing it is for an ordinary Windows user.
And finally we checked the drives out at processing sequential variable-size read and write requests in the Database pattern that emulates the disk subsystem load under SQL-like requests.
The controller was tested with the version 1.02 drivers and was installed into a PCI Express x8 slot. The WD740GD (Raptor) drives were installed into the standard rails of the SC5200 system case.
We’ll start out as usual by testing the controller under mixed request streams.
In this test we will be sending a mixed stream of requests to read and write random-address 8KB data blocks. By changing the ratio of reads to writes we can see how efficiently the requests are sorted out by the controller’s driver.
The results of this test are shown below:
Let’s view them as graphs. The dependence of the data-transfer speed on the percentage of writes is shown for request queue depths of 1, 16 and 256. For better readability we divide the arrays into two groups:
The arrays all have very close speeds in the Random Read mode. A disk can perform deferred writing more effectively when there are more write requests in the queue, so the speed of the single drive grows up. The speed of RAID0 depends on the number of disks per array, but it is only in modes with a large share of writes that the RAID0 speed is exactly proportional to the number of disks in the array.
The mirrored RAID1 and the RAID5 and 6 arrays have very close speeds in every mode. The RAID10 looks preferable to them, but only due to its RAID0 constituent, we guess. This load must be not big enough for the algorithms implemented in those arrays to give any performance gain, but some time is already spent to process them.
Under higher load we can see that the speed of RAID0 is proportional to the number of disks per array even in the 100% reads mode. But contrary to the single drive, the speed of RAID0 doesn’t grow up much as the percentage of write requests becomes bigger. Note also the different algorithms the JBOD and RAID0 arrays use with regards to write requests.
The speeds of all the arrays from this group fall as the writes percentage grows. With the mirrored RAID1 and RAID10 arrays it is due to their increased read speed thanks to the intelligent selection of the optimal (considering the current position of the read/write heads) disk from a mirrored couple. The RAID5 and RAID6 are quite fast at reading, but slow down on write requests because they have to spend time to create checksums when processing them. What’s strange, the RAID6 is a little faster than the RAID5 through all the modes!
It’s simpler with the mirrored arrays. If you compare the RAID1 and the RAID10 with the single drive and the two-disk RAID0, respectively, you can learn that the mirrored arrays are faster in modes with high probability of a read request and slower in modes with high probability of a write request. So, we can claim that the intelligent algorithm for reading from mirrored arrays implemented in the Tekram ARC1220 works well under such loads.
Under an even greater load, the RAID0 arrays slow down as the percentage of writes grows up. Their speeds are still proportional to the number of disks per array, though.
The arrays from this group behave almost in the same manner as under the lower load.
This pattern helps us examine the controller in sequential read and write mode. IOMeter is sending the drive a stream of read and write requests with a request queue depth of 4. Every minute the size of the data block is changed so that we could see the dependence of the sequential read/write speed on the data block size.
The controller’s read speed relative to the data block size is shown in the following table:
The RAID arrays are divided in two groups in the following diagrams for better comparison.
The advantages of RAID0 arrays that consist of many disks show up only when then requested data block is big enough, i.e. when the controller can split large blocks into a few smaller ones and use the disks in parallel. In this case, the RAID0 arrays perform very well. The 2-, 3-, and 4-disk arrays reach their maximum speeds on 8KB, 16KB and 32KB data blocks, respectively. The arrays’ scalability is perfect: the read speed grows up along with the number of drives per array.
The controller’s performance is perfect again: the graphs of the mirrored RAID1 and RAID10 arrays coincide with the graphs of the single drive and two-disk RAID0, respectively. The speed of the three- and four-disk RAID5 is the same as of the two- and three-disk RAID0, respectively. The RAID6 performs exactly like the two-disk RAID0 does.
Generally speaking, we should see this picture with any controller in this mode because array algorithms can’t have any effect on performance here, yet it can but seldom be seen in practice.
Now let’s check the controller out at sequential writing. The dependence of the controller’s write speed on the data block size is shown in the table:
Here are diagrams for the two groups of arrays:
This group of arrays performs even better at sequential writing than at sequential reading. The speeds are in fact the same, and the arrays reach their maximum speeds on exactly the same data blocks.
The speed of the RAID1 and RAID10 is nearly the same as the speed of the single drive and the two-disk RAID0, respectively, when there are no read operations to be performed. The graphs of the RAID5 and RAID6 arrays look somewhat worse, but these arrays cannot but behave like that in 100% write modes!
We’ll now check the controller out in a test mode that emulates the typical load on the disk subsystem of a file and web server.
The file server emulation comes first.
This table can be represented graphically like follows:
There are only 20% of write requests in this pattern, so all the arrays have good results. The RAID0 arrays show a very good scalability in speed relative to the number of disks per array. The speeds of the RAID1 and RAID10 are close to the speeds of the two- and four-disk RAID0, respectively. This means that the algorithm of optimized reading from a mirror works fine here. The RAID5 and RAID6 have good results, too.
We will compare the different arrays by calculating their performance ratings. All loads are equally probable, so the overall rating is the average speed of the array under all possible loads.
The four-disk RAID0 enjoys a big advantage over the rest of the arrays. The RAID10 is somewhat slower, yet holds the honorable second place. The RAID5 and RAID6 are a little behind the three-disk RAID0. What’s interesting, the RAID6 has a higher rating than the four-disk RAID5. The RAID1 takes the last but one place, but is not far from the single drive.
And now let’s see how the controller performs in a pattern that emulates a web-server disk load:
The graphs of the RAID0 arrays haven’t changed much in comparison with the File Server pattern. The RAID5 and RAID6 have become faster because the Web Server pattern doesn’t include write requests and reading is the optimal operating mode for these array types. For the same reason, the mirrored RAID1 and RAID10 arrays that use a special algorithm for optimized reading from mirror couples are almost everywhere faster than the two- and four-disk RAID0, respectively.
We calculated performance ratings for the arrays by the same formula as for the File Server pattern.
The RAID10, RAID6 and RAID5 made good use of the opportunity to work without any write requests. Although the four-disk arrays all have very close speeds, the RAID10 is the best. The four-disk RAID5 proved to be faster than the RAID6 while the four-disk RAID0 didn’t even make it into the top three. The three-disk RAID0 is slower than the three-disk RAID5 whereas the RAID1 is quite far ahead of the two-disk RAID0.
The Workstation pattern emulates a user who’s working in various applications in the NTFS5 file system:
It’s simple with the RAID0 arrays: the more disks an array has, the faster it processes the requests. The RAID1 is far faster than the single drive, and the RAID10 is everywhere a little but faster than the two-disk RAID0. The RAID5 and RAID6 aren’t very fast because the Workstation pattern has quite a lot of random write requests which slow those arrays down considerably.
We’ll calculate the performance ratings of the different RAID arrays for the Workstation pattern by the following formula:
Performance = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32
Quite expectedly, the pattern’s having random write requests has led to the RAID5 and RAID6 arrays being slower even than the single drive. The RAID0 arrays line up according to the number of disks per array. The mirrored RAID10 is slower than the three-disk RAID0. The RAID1 has been faster than the single drive everywhere except under linear load, but in our performance rating the result under a shorter request queue has a bigger weight. That’s why the performance rating of the single drive is higher than that of the RAID1.
With this pattern we will explore the controller’s ability to perform multi-threaded sequential reading and writing by emulating several applications demanding large files all at the same time. IOMeter’s test agent (called Worker) that emulates an application sequentially reads/writes 64KB blocks of data starting from a certain sector. By increasing the number of requests coming from one Worker (from 1 to 8 stepping 1) we examine the ability of the drive or controller to reorder the requests (to glue several requests to sequentially located data together into one request). By increasing the number of the Workers we put a heavier load on the disk subsystem – like in the real environment when several applications are competing to access the drives. Each Worker works with its own data (i.e. the data addresses requested by the Workers are different).
The diagram below shows the speed of the arrays at a load of 1 request as the most probable in real environments.
The RAID0 arrays are all marked with the same color, the RAID5 arrays with another, and the mirroring arrays with a third color. The single drive and the RAID6 arrays have their own colors, too. The order of the arrays in the diagram corresponds to their order in the legend: the higher the bar, the more disks there are in the array.
At a load of one request per one thread the RAID0 and RAID5 arrays show good scalability: their speed is higher in proportion to the number of disks per array. The single drive performs almost as fast as the RAID1. The read speeds of the RAID10, RAID6 and the three-disk RAID5 almost coincide, too.
The arrays all slow down when processing two threads simultaneously, although the RAID0 and RAID5 still show performance scalability depending on the number of disks per array. With two threads, the hard drives’ read/write heads have to be constantly moving between the two work zones, so you can’t hope to achieve the linear speed. The HDDs we use in the tests are optimized for server use, i.e. have a very small access time. That’s why you shouldn’t expect look-ahead reading from them.
The speeds of all the arrays grow up a little as we add more threads.
And here are the results for writing:
The writing results look better. At a load of one request in one thread, the arrays are somewhat slower than at reading, but there is performance scalability of RAID0 and RAID5 arrays depending on the number of disks per array. The single drive’s speed almost coincides with the speed of the RAID1. The RAID10 is as fast as the two-disk RAID0.
But at two threads the write speed of every array is much higher than at reading. Even though the speed of almost each array becomes lower when we add more threads, it still remains higher than at reading when three and four threads are being processed!
Now we’re going to check the controller in a single-user environment that operates with files rather than sectors.
We stick to our traditional methodology of using FC-Test: we create two logical volumes, 32GB each, on the array and format them in NTFS and then in FAT32. We create a set of files on the first volume, and then this pattern is read from the array, then copied into a folder on the same partition (Copy Near, within one and the same logical volume), and finally copied onto another partition (Copy Far).
The test system is rebooted before each test to avoid the influence of the OS’s caching on the results. We use five file patterns here:
Let’s start with NTFS. We’re going to examine the results of each test action for each pattern independently due to the abundance of data. The first action is the creation of a set of files on the array.
We created diagrams for the three most interesting file-sets:
As might have been expected, the RAID0 arrays have the highest results. The RAID5 arrays are rather fast in the Install and ISO patterns, and are very slow in the Programs pattern only. The RAID0 arrays show performance scalability depending on the number of disks per array with every file-set while the RAID5 arrays do that only wit the Install and ISO patterns.
Reading goes next:
Performance scalability of the RAID0 and RAID5 arrays can still be observed. All the arrays are surprisingly very fast with the ISO pattern.
And here are the results of the copying test:
The four-disk RAID0 has an inexplicable performance slump when copying the ISO pattern. In the other patterns the RAID0 shows performance scalability. The RAID5 arrays are slower, but also scale up in performance depending on the number of disks per array.
The results of the copy far test do not differ much from the Copy Near results, including the slump in speed of the four-disk RAID0 with the ISO pattern.
Now we switch to the FAT32 file system:
The speeds are generally higher in comparison with NTFS, except for the unexplainable performance hit the four-disk RAID0 suffers when processing the ISO file-set.
The read speed has grown up but slightly in comparison with the NTFS file system.
The three-disk RAID0 twice has a speed slump in the file copy tests.
We only took the linear read graphs in WinBench 99:
The Areca ARC-1220 controller has turned in superb results in our today’s tests! The RAID5 and RAID0 arrays show excellent scalability of performance relative to the number of drives per array and are generally very fast. The controller keeps data secure in fault-tolerant arrays (RAID1, RAID5, RAID6 and RAID10) and, interestingly, the more fault-tolerant RAID6 has performed comparably to the four-disk RAID5 in most of the patterns. The mirroring arrays obviously use a special algorithm for optimized reading from a mirror, which greatly improves performance when processing random requests.
The controller’s excellent performance in FC-Test needs a separate mention. We’ve never yet seen a controller that would be as fast at processing real files as under synthetic (i.e. ideal) loads!
Fast cache, a high XOR processor frequency and lack of “optimizations” are the constituents of the success! Renowned manufacturers of RAID controllers can learn something from the young, yet very talented, Areca.