HighPoint RocketRAID 4320 SAS RAID Controller Review

Today we will talk about a SAS RAID controller from HighPoint that belongs to the highest-performance controller series from this maker. It is based on a popular Intel IOP348 processor with 1.2 GHz frequency. Read more in our review!

by Aleksey Meyev
06/23/2009 | 11:51 AM

We are continuing our series of reviews of RAID controllers with SAS support. Today, we will talk about a product from HighPoint. Like other first-tier manufacturers, this company has been keeping up with the transition of hard disk drives with SAS interface from the category of special and fast products into the inventory of widespread server solutions and has released appropriate controllers in not one but two series: RocketRAID 43xx and 26xx. The former series includes controllers with onboard cache memory and a full-featured processor that computes checksums for RAID5 and RAID6. The RocketRAID 26xx series offers simpler products that process checksums by the driver, i.e. using the computing capabilities of the server. Roughly, they are the same as the well-known controllers of the RocketRAID 20xx series that support not only SATA but also SAS drives. Interestingly, all of them communicate across four PCI Express lanes whereas the 43xx series uses PCI Express x8.

 

Of course, we are more interested in RocketRAID 43xx which is a logical continuation of the RocketRAID 35xx SATA controller series. It is equipped with a PCI Express interface (PCI-X has been quickly abandoned by all manufacturers) and has a similar design. The processor frequency has increased from 800MHz to 1.2GHz, and the processor now has two cores. The heat dissipation has grown up, too. The previous series used to be cooled passively whereas every model of the SAS-compatible series comes with a cooling fan. Some features have been lost in the process of evolution, though. For example, the RocketRAID 43xx series currently offers models with no more than 8 ports whereas the RocketRAID 35xx series included a 24-port model. You will have to use expanders in order to connect a large number of hard disks.

Closer Look at HighPoint RocketRAID 4320

The HighPoint RocketRAID 4320 controller has eight internal ports. In other words, it is equipped with two SFF-8087 connectors. The series also includes two 8-port models (with two external connectors and with one external and one internal connector) and two 4-port ones (with an internal or external connector).

Besides the controller proper, the box contains a user manual, a disc with drivers, two cables (each for up to four devices), and a low-profile bracket. Yes, this is a low-profile card like every other in its series, excepting the model with two external connectors.

Take note of the network connector. It is meant for remote administration, making a nice addition to standard management tools (BIOS during startup, special software, or remote access via the OS running on the server).

All controllers of this series share the same processor, but have different amounts of onboard memory. The RocketRAID 4320 is not lucky in this respect. It only has 256MB like the 4-port models whereas the 8-port controllers with external connectors are equipped with 512 megabytes. HighPoint does not declare the memory speed, describing it only as “DDR-II ECC” but we can be more specific: the controller carries Qimonda HYB18TS12160B2F-3S chips with a frequency of 667MHz. RocketRAID 43xx series controllers support the same array types as the predecessor series, namely:

RAID60, a stripe of RAID6 arrays, is missing in the list, but such a peculiar combination is not demanded much. Instead, there is RAID3 available – quite a rare type of RAID these days. It is close to RAID5, being its ancestor in fact, but differs from the latter in storing checksums on one specific disk (with RAID5, checksums are distributed uniformly among all the disks of the array) and all data are split into 1-byte blocks. You can easily guess why this array type is unpopular. The data segmentation method provokes problems with performance while the dedication of one disk for checksums guarantees that disk a much higher load than to any other disk in the array.

The controller came to us without a battery backup unit but was kind not to disable its caching algorithms for that reason. On our part, we want to remind you once again that information (and recovery thereof) is costly, so you should not use your RAID controller without a BBU. In case of a power failure, a lot of data can be lost in the cache, including housekeeping data. We can run controllers without BBUs for test purposes, but would never do so in real-life applications.

Testbed and Methods

The following benchmarks were used:

Testbed configuration:

The controller was installed into the mainboard’s PCI-Express x8 slot. We used Fujitsu MBA3073RC hard disk drives for this test session. They were installed into the standard boxes of the SC5200 system case and fastened with four screws at the bottom. The controller was tested with four and eight HDDs in the following modes:

As we try to cover all possible array types, we will publish the results of degraded arrays. A degraded array is a redundant array in which one or more disks (depending on the array type) have failed but the array still stores data and performs its duties.

For comparison’s sake, we publish the results of a single Fujitsu MBA3073RC hard disk on an LSI SAS3041E-R controller as a kind of a reference point. We want to note that this combination of the HDD and controller has one known problem: its speed of writing in FC-Test is very low.

The stripe size is set at 64KB for each array type.

We used the latest BIOS available at the time of tests on the manufacturer’s website for the controller and installed the latest drivers. The BIOS was version 1.2.12.11 and the driver was version 1.2.19.4.

Performance in Intel IOMeter

Database Patterns

In the Database pattern the disk array is processing a stream of requests to read and write 8KB random-address data blocks. The ratio of reads to writes is changing from 0% to 100% (stepping 10%) throughout the test while the request queue depth varies from 1 to 256.

We will be discussing graphs and diagrams but you can view the data in tabled format using the following links:

Everything is all right with the RAID0 and RAID10 arrays at a queue depth of 1. Deferred writing works properly, so every array boasts very good scalability of performance at writing depending on the number of disks in it. To remind you, a RAID10 should write as fast as a RAID0 built out of twice the number of disks because each mirror pair is writing data synchronously.

The RAID5 and RAID6 arrays have problems at a queue depth of 1 request. The arrays all perform in the same way (the degraded RAID6 are worse than the others, though) and are slower than the single disk. The controller seems to be processing data requests without caring to cache them. But why is there such a sudden performance growth at pure writing then? Does it mean that the controller’s firmware wakes up and tries to cache requests after all? If so, it does that inefficiently. Overall, this is a serious problem.

When the queue is 16 requests long, the controller delivers a predictable performance growth and shows the specific behavior of its firmware with arrays of different types. For example, we can note that the controller can effectively choose what disk of a mirror pair can read the requested data faster. Thanks to that, the RAID10 arrays are ahead of the RAID0 ones at high percentages of reads. The degraded RAID10 is incapable of that. After the loss of a disk the controller stops to look for the “luckier” disk in a mirror couple (although it might do that in the healthy pairs).

The checksum-based arrays accelerate to normal level at reading but still have problems with writing. They write slowly, especially the degraded arrays. The healthy RAID5 and RAID6 built out of eight disks are also not fast enough for arrays based on disks with such a low response time as our Fujitsu MBA3073RC.

When the queue is 256 requests long, the degraded RAID10 is ahead of the 4-disk RAID0, save at pure writing. The driver seems to be looking for the luckier disk in the healthy mirror pairs at such long queue depths. Still, the performance hit of the degraded array in comparison with the healthy one is obvious at high percentages of reads.

In the second group of arrays all the degraded arrays look poor. Their performance is low at both reading and writing. The healthy RAID6 are no good, either. Even the 8-disk RAID6 is worse at writing than the single HDD.

Disk Response Time

IOMeter is sending a stream of requests to read and write 512-byte data blocks with a request queue depth of 1 for 10 minutes. The disk subsystem processes over 60 thousand requests, so the resulting response time doesn’t depend on the amount of cache memory.

The RAID10 boast a lower response time at reading than the single HDD thanks to effective reading from mirrors. Note that the degraded array is just a little bit worse than the healthy ones. Its response time is very good, too.

The other arrays are behind the single HDD: the 8-disk arrays are half a second slower, and the 4-disk ones are a full second slower. The degraded arrays are at the bottom of the diagram, the RAID6 with two failed disks being the worst. Restoring data from two checksums is not easy.

The more total cache an array has, the lower its response time. This is confirmed by the results of the RAID0 and RAID10 arrays. The checksum-based arrays have problems, though. They are up to five times as slow as the single HDD. This must be another aspect of the problem we have seen in the Database test at the shortest queue depth.

Random Read & Write Patterns

Now we’ll see the dependence of the arrays’ performance in random read and write modes on the data chunk size.

We will discuss the results of the arrays at processing random-address data in two variants basing on our updated methodology. For small-size data chunks we will draw graphs showing the dependence of the amount of operations per second on the data chunk size. For large chunks we will compare performance depending on data-transfer rate in megabytes per second. This approach helps us evaluate the disk subsystem’s performance in two typical scenarios: working with small data chunks is typical of databases. The amount of operations per second is more important than sheer speed then. Working with large data blocks is nearly the same as working with small files, and the traditional measurement of speed in megabytes per second is more relevant for such load.

We will start out with reading.

The efficient reading from the RAID10 mirror pairs shows up again in the test of reading in small data chunks.

The checksum-based arrays go very close to each other. We can note a couple of facts here. First, the arrays with more disks are somewhat faster even when reading very small blocks (smaller than the size of a stripe). And second, the degraded RAID6 with two failed disks slows down greatly at reading, so you may want to switch to RAID10 if you suspect your disks are going to fail often.

Take note of how long the RAID10 remains in the lead. It is as fast as the 8-disk RAID0 even at a data chunk size of 8MB (comparable to the size of a document, musical file or hi-res photo). The degraded RAID10 is somewhat ahead of the 4-disk RAID0.

The 8-disk RAID5 and RAID6 go neck and neck, including the degraded variants. The 4-disk RAID6 is a disappointment as its speed hardly differs from that of the single HDD even on large data chunks.

Random writing goes next.

Everything is good in this group of arrays. We see nearly ideal scalability here.

The RAID5 and RAID6 arrays have serious problems when writing in small data chunks. This must be the explanation of the high write response time and the Database results. You can see it clear with the 8-disk arrays: when the data block size is reduced to 8KB, they suffer a sudden performance hit. Moreover, their performance goes on lowering when the data chunks gets smaller, although the normal behavior for such arrays is to accelerate steadily. The 4-disk and degraded arrays are especially poor. If the controller behaves like that with RAID3, a RAID3 array would be downright sluggish.

There are some inexplicable fluctuations of performance when the RAID0 and RAID10 arrays are reading in large data blocks, but these are not as serious as to cause any problems.

The controller copes very well with writing large data blocks to RAID5 and RAID6 arrays. Moreover, the degraded arrays hardly differ from their healthy counterparts. Thus, you can store large files on these arrays but should avoid keeping small files on them.

Sequential Read & Write Patterns

IOMeter is sending a stream of read and write requests with a request queue depth of 4. The size of the requested data block is changed each minute, so that we could see the dependence of an array’s sequential read/write speed on the size of the data block. This test is indicative of the highest speed a disk array can achieve.

The read graphs are excellent! You can see good scalability and it’s all right with small data blocks. The simultaneous reading of large data blocks from both disks of mirror pairs is obvious.

It is all right with the healthy RAID5 and RAID6 arrays, too. The degraded arrays behave in an interesting way. They are inferior to their healthy counterparts in speed but only because they achieve their top speeds on somewhat larger data blocks. However, they are comparable to the healthy arrays in terms of max speed, which is a very satisfying performance.

The RAID0 and RAID10 are somewhat worse at linear writing than at linear reading. This is due to the surprisingly low performance of the 4-disk RAID0 and the inexplicable fluctuations of speed of the 8-disk RAID10 whose max speed looks like an accidental achievement.

The RAID5 and RAID6 arrays have problems with writing, too. Every array likes large data chunks. Even a 1MB block seems to be not big enough for the arrays to show their full speed although a full stripe equals only 512KB even with eight disks. The RAID5 arrays additionally have performance fluctuations with certain data block sizes, which indicates flaws in the controller’s firmware.

Multithreaded Read & Write Patterns

The multithreaded tests simulate a situation when there are one to four clients accessing the virtual disk at the same time, the number of outstanding requests varying from 1 to 8. The clients’ address zones do not overlap. We will discuss diagrams for a request queue of 1 as the most illustrative ones. When the queue is 2 or more requests long, the speed doesn’t depend much on the number of applications.

When the arrays are reading a single thread, they have predictable standings and deliver high speeds. Of course, you can only see the maximum speeds at a queue depth longer than 1 (like on other controllers we have tested so far in our labs), yet the HighPoint RocketRAID4320 is very good even at the shortest queue.

The addition of a second thread lowers the speed of every array with one exception. Judging by the increased speed of the 4-disk RAID10, this array identifies the two threads and sends each thread to a separate disk of a mirror pair. Unfortunately, the 8-disk RAID10 does not do the same. So, this feature of the controller’s firmware is not stable.

Funnily enough, the RAID10 arrays are both faster with three than with four threads although one might think that it would be easier for them to parallel an even number of threads.

The degraded arrays do not have serious performance hits but are slower than their healthy counterparts.

When the arrays are writing one thread, their standings are logical and predictable and their speeds are high.

When there are two threads to be processed, most of the arrays slow down at the same rate, but there are exceptions: the RAID0 arrays and the ordinary and degraded RAID5 have a smaller performance hit.

There are no changes in the standings when a third and fourth thread is added. The speeds just get lower.

Web-Server, File-Server, Workstation Patterns

The controllers are tested under loads typical of servers and workstations.

The names of the patterns are self-explanatory. The request queue is limited to 32 requests in the Workstation pattern. Of course, Web-Server and File-Server are nothing but general names. The former pattern emulates the load of any server that is working with read requests only whereas the latter pattern emulates a server that has to perform a certain percent of writes.

Thanks to efficiently selecting the more suitable disk in a mirror pair, the RAID10 arrays are considerably faster than the same-size RAID0 when there are only read requests to be performed. The degraded RAID10 is slower than its healthy counterpart but always ahead of the 4-disk RAID0 and, at low loads, even ahead of the 8-disk RAID0.

It is all simple and clear here: the more disks the array has, the higher its performance. And it does not really matter what type of array (RAID5 or RAID6) you use.

The degraded RAID5 and RAID6 both suffer a twofold performance hit due to the loss of one disk. The loss of a second disk in the RAID6 makes it almost as slow as the single HDD.

The performance ratings show that the RAID10 enjoy a large advantage over same-size arrays of other types at pure reading.

The standings change somewhat with the addition of write requests. The RAID10 are still in the lead under low loads but the RAID0 arrays go ahead at a queue depth of more than 32 requests.

There are significant changes in the second group of arrays, too. The 8-disk RAID6 falls behind the 8-disk RAID5 although the 4-disk arrays of these types are equals. The performance of the degraded RAID6 with one failed disk is much lower than that of the healthy array.

Thanks to good results at short queue depths the RAID10 are but slightly slower than the RAID0 in terms of performance ratings. The RAID5 and RAID6 arrays are slower than in the previous test.

The Workstation pattern has a large portion of writes and greatly different data block sizes. The RAID0 arrays are superior here. The RAID10 can only win at very short queue depths.

There are no changes among the checksum-based arrays. The standings are almost exactly like in the File-Server pattern.

The RAID10 are better according to our performance rating formula.

Take note that not only the degraded but also all of the healthy arrays, except for the 8-disk RAID5, are inferior to the single HDD at such load.

When the test zone is limited to 32GB, the arrays all accelerate and even the degraded RAID6 with two failed disks is ahead of the single HDD. The RAID0 arrays are now obviously better than the RAID10: the reduction of the test zone lowers the time to access data and reduces the effect of selecting the “luckier” disk in a mirror pair.

Performance in FC-Test

For this test two 32GB partitions are created on the virtual disk of the RAID array and formatted in NTFS and then in FAT32. Then, a file-set is created on it. The file-set is then read from the array, copied within the same partition and then copied into another partition. The time taken to perform these operations is measured and the speed of the array is calculated. The Windows and Programs file-sets consist of a large number of small files whereas the other three patterns (ISO, MP3, and Install) include a few large files each.

We’d like to note that the copying test is indicative of the array’s behavior under complex load. In fact, the array is working with two threads (one for reading and one for writing) when copying files.

This test produces too much data, so we will only discuss the results of the Install, ISO and Programs patterns in NTFS which illustrate the most characteristic use of the arrays. You can use the links below to view the other results:

Everything is normal in the Create test except that the degraded RAID10 is too different from the healthy array without a cause. Perhaps it was trying to write to the failed disk and was waiting for the latter to respond. The arrays are generally as fast as on competitor controllers, but we might wish them to perform better after the results of the sequential writing test.

Interestingly, the RAID5 and RAID6 arrays are faster than the RAID0 and RAID10 on large files and comparable to them on other file-sets. The degraded arrays are surprisingly good. They are generally as fast as the healthy arrays (excepting the RAID5 with one failed disk in the ISO pattern).

The speeds are considerably higher at reading! Of course, they are not as high as in the sequential reading test, yet the RAID0’s 585MBps with large files is a very good speed (it will take only 8 seconds to read a DVD image, for example). Surprisingly, the degraded RAID10 suffers almost no performance hit.

We have high speeds when reading large files from the healthy RAID5 and RAID6. The degraded arrays are much slower.

There is something odd with small files: the degraded arrays gain the lead and we can’t explain this.

Copying within the same partition or between two partitions is almost the same here: the larger the files, the better the RAID0 are than the RAID10. The degraded RAID10 is not fast. It must be limited by its poor write speed.

The second group of arrays does not show anything exceptional at copying, either. The 8-disk arrays are faster than the 4-disk ones on large files. The RAID5 are ahead of the RAID6 just as expected.

Performance in WinBench 99

Finally, here are data-transfer graphs recorded in WinBench 99.

And these are the data-transfer graphs of the RAID arrays built on the HighPoint RocketRAID 4320 controller:

The following diagram compares the read speeds of the arrays at the beginning and end of the partitions created on them:

Everything seems to be all right here at first sight except that the degraded RAID5 is for some reason slower than the RAID6 without one disk, and only in the faster section of the partition. If you take a look at the graph, you can see a flat stretch in its left part, which indicates some bottleneck. The interface offers enough bandwidth, so we have only two explanations: the firmware algorithms are not perfect in terms of restoring data from checksums and they overload the processor or there is a bottleneck due to the very fact of data recovery through multiple read operations. We think the first explanation is more probable. By the way, you can see a similar horizontal stretch in the graph of the RAID6 with two failed disks.

Conclusion

The HighPoint RocketRAID 4320 is an ambiguous product just like its opponents we tested before. We have not seen an ideal RAID controller yet, although we are still searching for one. This particular model is good in three aspects. First, it is very good at reading from mirror pairs in RAID10 arrays. It can effectively select the “luckier” disk (i.e. the one with the lower time to access the requested data) or read large files from both disks simultaneously. The only thing this controller lacks to be ideal for this array type is the ability to effectively parallel multithreaded load. Second, the HighPoint RocketRAID4320 is very good at processing files in FC-Test. The 8-disk arrays were as fast as 500MBps. And third, this controller copes with degraded arrays well.

On the downside is the controller’s work with RAID5 and RAID6. These arrays are slow at low loads. Additionally, there are problems when writing to these arrays in small data blocks. If such loads are important to you, you should better consider other controllers. Otherwise, the HighPoint RocketRAID4320 is going to make a good buy.

There is only one brand of SAS RAID controllers we have not yet covered in our reviews, but we’ve already got the missing model for tests. Thus, we will soon be able to compare six controllers from six brands between each other.