by Alexander Yuriev , Nikita Nikolaichev
12/23/2005 | 07:22 AM
We’re going to review the new eight-channel SATA II RAID controller from LSI Logic today, the model name MegaRAID 300-8X. This controller not only expands the MegaRAID series quantitatively (it supports eight disks against the ex-senior model’s six) but qualitatively, too. It supports the new SATA II interface and the high-speed PCI-X bus, has a large cache buffer and a battery to power the cache chips in case of emergency server shutdown. All these features make the new controller a top-end product, of course.
The sheer number of disks supported still plays a very important role, however, and the MegaRAID SATA 300-8X makes it possible to build a reliable and fault-tolerant disk subsystem. For example, the MegaRAID SATA 150-6 permitted to build only three RAID1 arrays and we didn’t have a free “hot-spare” drive ready to replace a failed drive in any of the arrays whereas the new controller makes the following configuration possible:
Of course, if you want to store the database on a RAID10 array (and you are sure to want it if the percentage of write operations with the database is going to be higher than… well, you’ll learn the exact number shortly), the system software and the logs are going to share the same array:
Earlier, the mainboard’s disk controller had to be employed or another RAID controller had to be added to the server to enable such disk configurations, but now you can do with just a single eight-channel controller with obvious advantages:
The drawbacks come naturally from the advantages. A centralized disk subsystem is more sensitive to a failure of its keystone, the controller. But we are responsible people and always have a spare controller just in case, aren’t we? :)
Returning to the MegaRAID SATA 300-8X, its functionality grew up also because it now supports the faster PCI-X bus. The new interface is very welcome since the controller supports more disks and the interface bandwidth of each of the disks has doubled. Clocked at 133MHz, PCI-X can theoretically pump through two times more data than the PCI-64 66MHz (1066MB/s against 533MB/s). However, we should note that the peak bandwidth of the bus interface is smaller than the total of the bandwidths of the eight disks (8 x 300MB/s = 2400MB/s).
The amount of the controller’s own cache memory has doubled over the previous-generation models. The MegaRAID SATA 300-8X carries 128 megabytes of DDR SDRAM with ECC. We couldn’t find a mention of the memory speed in the documentation, but judging by the chips marking, we suppose it works at 166MHz, i.e. as DDR333.
The LSliBBU01 battery, if fully charged, can power the cache memory chips for 72 hours on an emergency shutdown of the server. The small letter “i” in the model name means “intelligent” – the battery is compatible with the Smart Battery Data Specification and can report to the controller information about itself and its current charge level.
You can also use a simpler LSIBBU03 battery which is not intelligent and would keep the cache data safe for only 32 hours when the power is off.
The controller supports SATA-150 as well as SATA-300 drives and allows uniting them into RAID arrays of level 0, 1, 10, 5, and 50. It also supports Tagged Command Queuing and Native Command Queuing, so the good old Raptor 2 is not forgotten.
In the next section we’ll have a closer look at the device.
The device ships in a small black-and-blue box.
The box contains the controller:
And the traditional accessories (eight 1-meter-long SATA cables, installation guide and software):
The MegaRAID SATA 300-8X can work in the following environments:
Drivers for these operating systems as well as driver updates can be found on the manufacturer’s website .
These are the controller’s two main chips:
The I/O chip (bus interface, XOR operations, etc) is on the left; the eight-channel SATA II chip from Marvel, responsible for communication with the drives, is on the right.
Here’s a larger view of the Intel IOP331 chip:
As we found in the documentation on Intel’s website, this processor can work at frequencies up to 800MHz, but we could not find its actual frequency on the LSI Logic’s website. The marking on the chip seems to indicate 250MHz frequency (we also take into account the characteristics of the Intel SRCS28X controller which is a twin brother of the MegaRAID SATA300-8X).
The processor is equipped with a dual-port DDR SDRAM controller; the maximum memory size supported is 2GB, but as we told you above, the controller carries 128MB of cache memory and you can’t expand this amount further. Well, 128 megabytes of cache is quite enough for controllers of this class.
We mentioned the battery above and here it is, already installed on the controller:
This battery gave us some trouble as will be explained below…
The battery is mounted on a small card that carries all that intelligent electronics and the card is fastened to the controller with screws that someone didn’t forget to put into the battery pack. The battery powers the cache and communicates with the controller through a trapezoid connector you can see in the top left corner of the first snapshot of the controller.
One more thing must be mentioned – some sticky stuff was applied on the controller’s disk connectors to improve the cohesion and make the connection more reliable. The solution seems useful considering that some system cases from Intel have abnormally strong fans. The cohesive material tends to wear off with use as you can see on some connectors in the last snapshot. So once you’ve connected the cables to the controller, do not touch them if everything works right! :)
But it’s time now we got to our business.
Our testbed was configured as follows:
We tested the controller in FC-Test 1.0 build 13 and IOMeter 2003.02.15. IOMeter was used to check the controller’s ability to process sequential requests to read/write data blocks of a varying size and to measure the controller’s speed in the Database pattern that consists of SQL-like requests. The performance of different arrays was compared in File Server, Web Server and Workstation patterns.
For FC-Test we used our five standard file-sets (Install, ISO, MP3, Programs and Windows) which we wrote to the array, read from it and then copied.
The LSI MegaRAID SATA 300-8X controller worked with firmware FW_813l and version 5.49 drivers. It was installed into a PCI-X/133MHz slot. The WD740GD hard disk drives (Raptor) were installed into an AXX6SATADB rack (SATA 6-Hot-Swap Drive Bay Upgrade Kit). We performed our tests on four drives only, so we didn’t examine RAID50 arrays.
The controller’s settings would take a whole new review to describe. LSI’s SATA controllers come with a very powerful BIOS that offers a staggering amount of various settings that are hard to learn the particulars of even if you’ve got the documentation. Most often we just carried out a series of experiments to pinpoint a new bottleneck and it took a huge amount of time, but eventually we settled on the following combination of settings:
We set the stripe size at 64KB for RAID0 and RAID5 arrays. The controller comes with a large cache buffer so we gave it a chance to show its ability to predict the nature of requests to the arrays by selecting “Adaptive Read Ahead” as the Read Policy.
We also permitted the controller to perform lazy writing to the array using the cache memory. Lazy writing was permitted for the disks, too, in another section of the controller’s BIOS.
The controller’s Cache Policy was set at “Direct”, but as we learned from our experiments, this menu item has a very little, if any, effect on the controller performance.
We call these the “maximum performance” settings since we only had time to explore the biggest the controller could do.
Of course, you may want to prohibit the drives to do lazy writing in order to improve reliability of the disk subsystem or to set the Read Policy setting at “Normal” to improve the controller’s speed in web-server mode, but it’s just impossible for us to test each and every possible combination of settings.
So we tested the controller at the max-performance settings, and let those who can do more! J
As an illustration of the hard life of a hardware tester: we didn’t have an LSIiBBU01 battery at first and the controller wouldn’t work with Write Policy = Write Back, which was wise as concerned keeping the data on the array safe. More exactly, the controller did allow setting Write Policy at “Write Back”, but this setting only worked until the next reboot of the server! Our test methodology implies frequent system reboots (the test script begins with a reboot command!) and we spent several days just trying to find why the controller’s performance remained the same with Write Back and Write Through caching!
So we performed two full test cycles, browsing through all possible types of arrays that could be built out of four hard disk drives.
The modes with enabled and disabled lazy write of the controller are referred to as WriteBack and WriteThrough, respectively. The numbers in the tables refer to all modes, but the diagrams for WriteThrough mode were only built for the four-disk arrays and only in IOMeter tests.
We traditionally start out by checking the controller’s operation with mixed streams of requests.
This pattern sends a stream of requests to read and write 8KB random-address data blocks. By changing the ratio of reads to writes we can check how well the controller’s driver can sort them out. The results of the controller in WriteBack mode are presented in the table:
Let’s view these numbers as diagrams, which will show the dependence of the controller’s speed on the percentage of write requests for queue depths of 1, 16 and 256 requests. For better readability we divide the arrays into two groups.
Under linear load and the arrays have similar speeds in Random Read mode. When there are more write requests to be processed, the efficiency of lazy writing grows up and the speed of the single drive grows up.
The speed of the RAID0 arrays also grows up depending on the number of disks per array, but it doesn’t scale up exactly proportionally to the number of the disks even in Random Write mode (there’s a smaller difference between the two- and three-disk arrays than between the three- and four-disk ones).
The mirroring arrays (RAID1 and RAID10) seem to alternate the requests between the two disks of the mirror because their performance improves (above that of the JBOD and the two-disc RAID0) at higher percentages of reads. When the writes percentage is high, the alternating algorithm is not efficient as is exampled by the RAID1 whose performance degenerates suddenly at 70-90% of writes.
In theory, the RAID5 performance must improve as there are more writes to be performed. In this case, however, the speed of both RAID5 arrays remains almost the same in all test modes. Moreover, the three- and four-disk RAID5 do not differ between each other much…
The load becomes heavier:
The higher load makes the dependence between the performance of the array and the number of drives in it conspicuous even at 100% reads, but the situation seems illogical at high percentages of writes. There are two distinctly different groups of arrays (1- and 2-disk as opposed to 3- and 4-disk ones) as if their cache policy were different. Can the controller assign different cache quotas for the arrays? This is a logical supposition as the cache buffer can be divided either proportionally to the number of the drives attached to the controller or to the number and type of the arrays. Since the arrays with fewer drives use lazy writing more aggressively, we suppose that we deal with the first case here.
This effect may be actually due to our method of testing RAID controllers. To make sure we complete the tests with the same hard disk drives (and we deliberately select hard drives from the same series and with the same firmware version), we begin to test the controller with arrays made of the maximum number of disks and then disconnect the unnecessary disks (we physically uninstall them from the rack). After a system reboot the controller reads the configuration of the physical drives and arrays and sees only as many drives as we want to have in the array, so it may well increase the quota to arrays of a fewer number of drives.
In real life, however, there is a high probability that the maximum number of drives the controller supports are immediately attached to it (what else would you want to buy this exactly controller for?) and the quotas are assigned depending on this number, irrespective of what arrays you are going to unite the attached drives into. In this case the performance of the arrays may differ from what we’ve got in our tests. Well, we just have to content ourselves that it is virtually impossible to test the controller in each and every operating situation possible.
Let’s now see what the RAID5 and RAID10 arrays have to show.
The RAID5 arrays are slower at this load than the mirroring arrays. Of course, the performance of all the arrays is going down as the percentage of writes becomes higher, but we couldn’t have expected that the performance of the RAID1 would be near that of the three-disk RAID5.
At high percentages of read requests the RAID1 and RAID10 are faster due to the intellectual selection of the optimal (for the particular request) disk from the mirroring couple. But why are the RAID5 arrays so slow at writing?
Since we again do not see any big difference between the RAID5 made of different number of disks at high percentages of writes, we begin to suspect the XOR processor to be the weak link. The checksum calculations seem to be the bottleneck.
If we compare the mirroring RAID1 and RAID0 arrays with the single drive and the two-disk RAID0, respectively, we are going to see that like at linear load the mirroring arrays are faster when there’s high probability of a read requests and slower when there’s high probability of a write request.
The behavior of the arrays doesn’t change much at a queue of 256 requests.
Here we also offer the controller’s results at Write Policy = Write Through (click here to see the results ).
Now let’s see if it makes sense to buy the battery, i.e. to enable Write Back mode. The number in each table cell is the ratio of the array speed in WriteBack mode to its speed in WriteThrough mode. A bigger number is indicative of a higher value of the WriteBack caching policy (and of the battery, too!)
Write-caching influences the performance of all the arrays more when there’s a higher percentage of writes and a longer request queue. It helps the RAID0 and RAID10 arrays perform faster, but the RAID5 is slower under high loads when write-caching is enabled.
We built diagrams with four disk arrays in WriteThrough and WriteBack modes for 1, 16 and 256 queue depths.
Quite expectedly, the performance of the RAID0 array degenerates in all modes, expect Random Read, when write-caching is disabled. The maximum performance hit amounts to 478%, at linear load.
It is more complicated with the RAID5 array. Disabling write-caching at low loads leads to a performance hit. At high loads, on the contrary, you may have a speed gain by turning it off. Keep it in mind, however, that the maximum speed loss from disabled caching is 165%, while the maximum speed gain is only 15%.
We will not venture a guess as to why the RAID5 array may be slower with enabled lazy writing.
The speed of the RAID10 array is always decreasing if you disable caching and as lower as there is a higher percentage of write requests.
So we can say that the WriteBack caching policy greatly improves the speed of processing write requests (once again, you must insert the battery into the controller to enable this mode). The only exception is the RAID5 array at very high loads, which are virtually impossible in practice.
IOMeter is sending a stream of read/write requests to the array with request queue depth = 4. Every minute the size of the data block changes, so we can see the dependence of the linear read/write speed on the data block size. The results for WriteBack mode are tabled below:
The diagrams below show the performance/number-of-disks correlation for two groups of RAID arrays.
The RAID0 arrays consisting of more drives enjoy an advantage only when the requested data block is large, i.e. when the controller can split the large data block into several smaller ones and use the hard drives in parallel. The RAID0 arrays did well in this test. The 2- and 3-disk arrays reached their maximum speed as soon as 64KB data block size and the 4-disk array on 128KB blocks. The read speed scales up depending on the number of disks per array almost ideally.
The mirroring RAID1 and RAID10 arrays perform much worse, their speeds being similar to that of the single drive and the 2-disk RAID0, respectively. However in some modes, especially at large data block sizes, the read speeds of the mirroring arrays are far below the speeds of the arrays the mirrors are made up of.
The RAID5 arrays don’t look very good, either. Their behavior at 64KB and smaller blocks is quite explicable, but the read speed slump on 128KB blocks is beyond our understanding.
The WriteThrough results are given in the table below:
We will next compare the performance of the 4-disk arrays at different caching policies.
Caching should have no effect on the results since there are no write requests here. This is generally so, yet there are one or two data block sizes for each array where the difference between its WriteBack and WriteThrough speeds is bigger than might be explained by measurement errors.
Next we will check the controller’s speed at sequential writing. The speeds of the arrays at different data block sizes are shown below, in WriteBack mode:
We again split the arrays in two groups and create diagrams to show the dependence of the array speed on the data block size.
Just like at sequential reading, the advantages of the RAID0 arrays with more hard drives show up only at large data block sizes. And while the 2-disk RAID0 reached its maximum speed on 64KB blocks, the 3-disk array did the same on 256KB blocks only, and the 4-disk array did not reach the maximum speed even on 1024KB data blocks, possibly limited by the performance of the I/O processor.
The speeds of the mirroring RAID1 and RAID10 arrays almost coincide with the speeds of the single drive and the 2-disk RAID0, respectively, now that there are no read requests to be served. The RAID5 arrays speed up towards 64KB data blocks and then from 128KB blocks onwards.
You can compare these numbers to the WriteThrough mode results :
Here’s a comparative diagram for 4-disk arrays:
100% write requests is the ideal case to show how write-caching influences the performance of an array. Every array is much slower when caching is off!
Let’s now see the controller work in modes typical for the disk subsystems of a file- and web-server. The file-server load comes first:
The following diagrams represent the same numbers visually:
There are only 20% of write requests in this pattern, so all the arrays have very good speeds. The RAID0 performance scales up depending on the number of drives in the array. The speeds of the RAID1 and the RAID10 are near those of the 2-disk and 4-disk RAID0, respectively, and it means the algorithm for optimal reading from a mirror works perfectly here. So the 4-disk RAID5 is the only to act up – at the queue depth of 256. There’s something wrong with this array at high loads in all the patterns.
We are going to compare the arrays by calculating their performance ratings. Since each load has the same probability, the performance rating is the averaged speed of the array under all loads:
The 4-disk RAID0 is quite far ahead of the others. The RAID10 takes the second place and the 3- and 4-disk RAID5 arrays are a little slower than the 2- and 3-disk RAID0 arrays, respectively. The RAID1 is on the last but one position, but is still much faster than the single drive.
Let’s see if these results change in WriteThrough caching mode.
Even 20% of write requests is enough for all the arrays to perform differently in WriteBack mode than in WriteThrough. The RAID5 acts up again at the queue of 256 requests.
This is how the caching mode affects the performance rating of an array:
So it is clear that turning WriteBack caching off in modes with some write requests results in a performance reduction.
Next goes the web-server-emulating pattern:
The shape of the graphs of the RAID0 arrays hasn’t changed much as compared with the File Server pattern, but the speed of the arrays has become lower. The RAID5 arrays have become much faster since the Web Server pattern has no write requests and is the most optimal operating mode for such arrays. This is also why the mirroring RAID1 and RAID10 arrays that alternate read requests between the disks of the mirroring couple are faster than the 2- and 4-disk RAID0 arrays.
We calculated the performance ratings of the arrays like in the File Server pattern, i.e. by averaging their speeds under different loads:
The RAID10 and RAID5 caught at the opportunity to work without any write requests and the RAID5 arrays are not only one step higher in the results table but have also nearly reached the performance of the RAID0 arrays built out of the same number of disks. The RAID10 has the highest performance, while the RAID1 is far ahead of the 2-disk RAID0.
Here are the results of the controller in WriteThrough mode:
These graphs show that lazy writing has no effect on the performance of the arrays in this mode, i.e. when there are no write requests at all.
We’d also want to single out the brilliant performance of the RAID10 array which is faster than the RAID0 at all loads!
Lazy writing should not affect the speed of an array in this writes-free pattern and this is exactly what we see in the diagram.
The Workstation pattern emulates the user’s intensive work in various applications in NTFS5.
It’s funny but we found a curious thing in the arrays’ behavior at very low loads – the characteristic ledge in the graph. We saw this ledge when we first reviewed the WD740GD hard disk drive in our article called WD740GD aka Raptor 2 Hard Disk Drive Review .
It’s all normal with the RAID0 array – the more drives it consists of, the faster it is. The performance of the RAID1 is much higher than that of the single drive, while the RAID10 is always, even though by a narrowest margin, faster than the 3-disk RAID0. The RAID5 arrays are not very fast since the Workstation pattern contains many random write requests which negatively affect the performance of RAID5.
We compare the different RAID arrays by calculating their performance rating by the following formula:
Performance = Total I/O (queue=1)/1 + Total I/O (queue=2)/2 + Total I/O (queue=4)/4 + Total I/O (queue=8)/8 + Total I/O (queue=16)/16 + Total I/O (queue=32)/32.
It was to be expected that the RAID5 arrays would only be faster than the single drive – because of the write requests. The RAID0 arrays rank up according to the number of drives they are comprised of. The mirroring RAID1 and RAID10 take their places right after the RAID0 arrays with the same number of drives.
Let’s see how the controller’s caching policy affects the performance of the arrays in this pattern:
Disabled caching affects the speed of each array negatively – the performance is lower by a third almost.
With this pattern we will explore the controller’s ability to perform multi-threaded sequential reading and writing by emulating several applications demanding large files all at the same time. IOMeter’s test agent (called Worker) that emulates an application sequentially reads/writes 64KB blocks of data starting from a certain sector. By increasing the number of requests coming from one Worker (from 1 to 8 stepping 1) we examine the ability of the drive or controller to reorder the requests (to glue several requests to sequentially located data together into one request). By increasing the number of the Workers we put a heavier load on the disk subsystem – like in the real environment when several applications are competing to access the drives. Each Worker works with its own data (i.e. the data addresses requested by the Workers are different).
The diagram below shows the speed of the arrays at a load of 1 request as the most probable in real environment. The RAID0 arrays are all marked with the same color, the RAID5 arrays with another, and the mirroring arrays with a third color. The order of the arrays in the diagram corresponds to their order in the legend: the higher the bar, the more disks there are in the array.
The read speed does not clearly depend on the number of drives with RAID0 or RAID5 at the 1-request load. The single drive proved to be faster than the RAID1, while the RAID10 had the highest speed. Considering that the RAID0 didn’t do any better, neither controller nor the drives perform anticipatory reading.
When there are two threads to be processed simultaneously, the RAID1 is on top (it’s the ideal case for it because each disk gives out its own data). The other arrays have become slower, although the speed/number-of-disks correction with RAID0 and RAID5 has become apparent. Of course, when there are two threads, the heads of the hard drives have to constantly move between the two operational zones, so the array can’t have the same speed as at linear reading.
The hard disk drives we use in our tests are server-optimized, i.e. have a very small access time. That’s why you shouldn’t expect anticipatory reading from them. But it is not quite clear why the controller doesn’t do it when we had selected Adaptive Read Ahead in the settings. Where’s the adaptability of the algorithm?
When there are even more threads, the speeds of the RAID0 and RAID5 grow up, while the mirroring arrays slow down.
Next goes the multi-threaded writing test.
The mirroring arrays have performed not quite successfully at writing. They are almost everywhere slower than the single drive and the 2-disk RAID0. This load also almost killed the RAID5 arrays – the 4-disk RAID5 did well when processing one thread, but that was the only gleam of intelligence.
The RAID0 arrays of two and four disks processed one and two threads quite well, but the 3-disk array managed to fall behind the others. With three and four threads we again see the speed of the 2- and 4-disk RAID0 arrays scale up depending on the number of disks per array while the same 3-disk RAID0 again slows down inexplicably.
If you are interested, here are the results of the controller in WriteThrough mode:
Now we’re going to check the controller in a single-user environment that operates with files rather than sectors.
We stick to our traditional methodology of using FC-Test: we create two logical volumes, 32GB each, on the array and format them in NTFS and then in FAT32. We create a set of files on the first volume, and then this pattern is read from the array, then copied into a folder on the same partition (Copy Near, within one and the same logical volume), and finally copied onto another partition (Copy Far).
The test system is rebooted before each test to avoid the influence of the OS’s caching on the results. We use five file patterns here:
Let’s start with NTFS. We’re going to examine the results of each test action for each pattern independently due to the abundance of data. The first action is the creation of a set of files on the array.
We built diagrams for the three most curious cases:
The RAID0 arrays are quite expectedly the fastest. The RAID5 ones are rather fast at processing the Install and ISO patterns, but slow down on the Programs files which are too small. And still, the performance of the RAID0 and RAID5 arrays scales up depending on the number of the disks in the array as can be seen for each test pattern.
Writing to a RAID1 array takes more time than writing to the single drive, and the RAID10 doesn’t have a very high write speed, either.
Next goes the reading test:
And the results are rather strange. The speeds of the 3- and 4-disk RAID0 and RAID5 arrays are very similar and, in fact, disappointing. Can the Adaptive Read Ahead setting be spoiling the day once again?
The best performance is delivered by the 2-disk RAID0 and the RAID10 (which can be viewed as a 2-disk RAID0 at sequential reading). The read speed of the RAID1 is the same as the speed of the single drive.
Let’s now check the copying speed:
The RAID0 is the best at copying files and its speed scales up well depending on the number of disks in the array. However the speed is still not so very high – today’s single desktop drives copy data much faster within themselves.
The RAID5 arrays are somewhat slower, but their speed also scales up depending on the number of disks per array. The mirroring arrays are only noticeably faster than the single drive and the 2-disk RAID0, respectively, when they process the Programs pattern.
The results of the second test cycle do not differ much from the previous numbers.
Now we switch to FAT32.
The speeds of the drives are higher than in NTFS, excepting for the incomprehensible slowdown of the 2-disk RAID0 with the Install pattern.
The read speed of the RAID1 is almost the same as that of the single drive. The other arrays have almost the same speeds.
We have no questions about the copying test.
Here are the results of the controller with disabled write-caching:
We just took the linear read graphs in WinBench 99:
And we have absolutely no complaints about these graphs. J
The MegaRAID SATA 300-8X has performed generally well in our tests. It was especially good in the patterns that emulated the load on a file and web server.
The mirroring arrays are all right, too, but it was not all so well with RAID5. The XOR processor may be too slow or there may be some other reason, but the performance of the RAID5 array does not depend on the number of disks per array at high percentages of write requests. Alas, we only had four hard drives and could not examine this array type in more detail.
The only explanation that we can think of is that the controller was intended for 7200rpm SATA drives and the I/O processor was selected to match the typical access time of such drives. The high disk access time of the drives rather than the insufficient computational capacity of the XOR processor would be the bottleneck in that case.
So our recommendation is simple. Use this controller to build RAID1 and RAID10 arrays and make sure you have the LSIiBBU01 battery!