by Aleksey Meyev
05/30/2009 | 11:07 AM
LSI currently offers three series of SAS RAID controllers: Entry Line, Value Line and Feature Line. The Entry Line MegaRAID SAS 82xx series is the baseline not only in its name but also in its hardware properties. Its controllers do not have dedicated processors or buffer memory and do not support RAID6. Thus, this series is meant for people who don’t need maximum performance or high data security. The other two series offer full-fledged controllers but the Feature Line 88xx series is more functional. It comes with more memory and offers external connectors. Thus, it supports more devices connected via an expander.
But today we are going to talk about a controller from the Value Line. Its model name is 8708EM2. Like many other LSI controllers, it has a twin brother selling under the name of Intel SRCSASBB8I. You can see for yourself that these two controllers are much more than just similar-looking.
The Value Line currently includes four controllers based on the latest generation of chips. The models differ in the number of ports (8 or 4) and form-factor: the EM2 models are shorter, MD2-compliant and have a PCI Express x8 interface whereas the ELP models are longer and use PCI Express x4.
Following the general revision of its corporate style, the company has changed the product package design. Instead of gloomy blue and black hues the box is now painted a sprightly mix of orange and white. The box may seem even gaudy but at least it is sure to catch the customer’s eye.
The controller is based on a RAID-on-chip processor LSISAS1078 that has PowerPC architecture and a clock rate of 500MHz. This is not high after we’ve seen the Adaptec ASR-5805 with its 1.2GHz dual-core chip but, on the other hand, the 3ware 9690SA-8I controller coped with RAID arrays well even at half that frequency by utilizing the advantages of its processor architecture. So, let us wait until the practical tests before we judge this controller’s speed.
As opposed to the 88xx series controllers that are equipped with 512MB or 256MB of memory, the 8708EM2 comes with only 128 megabytes of DDR2 SDRAM clocked at 667MHz.
The box contains a brief user manual, a disc with drivers, a back-panel bracket, and two cables. Each cable allows connecting up to four devices to each of the controller’s SFF-8087 connectors.
The controllers from LSI’s two senior product series support all the popular array types including:
There is everything you need to build a RAID for your particular applications. The lack of RAID1E and RAID5EE, let alone even more exotic array types, should not be a problem for most users.
As usual, we run the controller with a BBU and strongly recommend you using a BBU, too. Your data and the time you can spend recovering it are far more expensive than this simple accessory.
We used an LSIiBBU06 which was fastened right to the controller’s PCB with three small poles. Take note that the other 87xx and 88xx series controllers employ other BBUs.
And finally, we’d like to say a few words about the software aspect. The manufacturer’s website offers a broad choice of drivers for various OSes. You can also download OS-specific versions of MegaCLI, the exclusive tool for managing the controller and arrays from an OS. Its functionality is broad enough for you not to have any need to enter the controller’s BIOS, but its interface is far from friendly: the context menu is too context-dependent and it is hard to understand where the menu item responsible for the particular operation is. One gets used to the menu over time, though.
The following benchmarks were used:
The controller was installed into the mainboard’s PCI-Express x8 slot. We used Fujitsu MBA3073RC hard disk drives for this test session. They were installed into the standard boxes of the SC5200 system case and fastened with four screws at the bottom. The controller was tested with four and eight HDDs in the following modes:
As we try to cover all possible array types, we will publish the results of degraded arrays. A degraded array is a redundant array in which one or more disks (depending on the array type) have failed but the array still stores data and performs its duties.
For comparison’s sake, we publish the results of a single Fujitsu MBA3073RC hard disk on an LSI SAS3041E-R controller as a kind of a reference point. We want to note that this combination of the HDD and controller has one problem: its speed of writing in FC-Test is very low.
By default the controller suggests to use a stripe size of 64KB. This is the size we always use in our tests, so we did not have to change the default setting.
We used the latest BIOS available at the manufacturer’s website for the controller and installed the latest drivers. The BIOS was version 9.1.1-0013 and the driver was version 126.96.36.199.
In the Database pattern the disk array is processing a stream of requests to read and write 8KB random-address data blocks. The ratio of read to write requests is changing from 0% to 100% (stepping 10%) throughout the test while the request queue depth varies from 1 to 256.
We’ll be discussing graphs and diagrams but you can view the data in tabled format using the following links:
Everything would be perfectly normal at a queue depth of 1 if it were not for the 8-disk RAID10. Both the healthy and degraded arrays are too slow at 100% writes. The 4-disk RAID10 does not have a performance hit at 100% writes and delivers the same speed as the 8-disk RAID10 as the result. We can’t find an explanation for this.
The checksum-based arrays are all right, too. The small response time of the HDDs we use and the high performance of the controller help the RAID arrays show high speeds at high percentages of writes. We can remind you that RAID arrays built out of SATA drives on previous-generation controllers slowed down rather than accelerated in this test as the percentage of writes was increasing.
Comparing the LSI controller with the other controllers we have tested in our labs, we can note that the LSI handles the degraded arrays well while the healthy RAID5 and RAID6 arrays built out of a large number of disks are not very good on it. Is it the processor’s fault we wonder?
We increase the queue depth to 16 outstanding requests and see the graphs rise up. The RAID10 have problems at writing again. It looks like deferred writing is turned off for them. On the other hand, they can effectively find the luckier disk in a mirror: the RAID10 arrays are much faster than the same-size RAID0 at high percentages of reads.
The degraded RAID10 behaves in an interesting manner. It goes neck and neck with the healthy array at writing but slows down at reading because it does not have the opportunity to choose the better disk in one of the mirrors. Still, it is better than the RAID0 at pure reading, which is an excellent result.
The RAID5 and RAID6 arrays are not quite good at writing, either. Their performance is too low at high percentages of writes, and the 4-disk arrays lose to the single HDD. The competitor controllers did better in this test.
The arrays are all right at reading, though. Take note of the performance hit provoked by the degradation: the 8-disk arrays with one failed disk are only as fast as the 4-disk arrays at reading.
The further increase of the queue depth does not help the RAID10 arrays: they still have no deferred writing. The degraded array does not feel good at all. The loss of one disk has a fatal effect on its reading speed.
The RAID5 and RAID6 arrays draw very neat, nice-looking graphs at the maximum queue depth. However, their performance at high percentages of writes is somewhat lower than we might expect basing on the results of the competitor controllers. The degraded arrays have good read speeds, by the way. The controller seems to have a high-performance processor and effective firmware algorithms but it has too little of system memory to do deferred writing efficiently. The competitor controllers had more memory on board.
IOMeter is sending a stream of requests to read and write 512-byte data blocks with a request queue depth of 1 for 10 minutes. The disk subsystem processes over 60 thousand requests, so the resulting response time doesn’t depend on the amount of cache memory.
The read response time of the RAID10 arrays is lower than that of the single HDD thanks to effective reading from the mirror pairs. The 4-disk is 1 millisecond faster, which is a large gap considering that the single HDD has a read response of 6 milliseconds. We can’t but applaud the programmers of the controller’s firmware, yet it is unclear why it is the 4-disk rather than 8-disk array that turns in the best result.
The arrays of the other types are worse than the single HDD, the 8-disk ones being better than the 4-disk ones. This is all normal, however, because the controller cannot be absolutely transparent in terms of response time. It is only unclear why the RAID0 has the maximum response time among the different array types.
Theoretically, the write response time is determined by the size of the combined cache of the controller and HDDs: the requests go into the cache and then scatter among the disks. Here, the arrays all comply with the theory, excepting the 4-disk RAID0 whose response time is too high (it should equal the 8-disk RAID10’s). The controller’s write response is overall far from record-breaking: the competitor products come with more cache.
Now we’ll see the dependence of the disk subsystems’ performance in random read and write modes on the data chunk size.
We will discuss the results of the disk subsystems at processing random-address data in two variants basing on our updated methodology. For small-size data chunks we will draw graphs showing the dependence of the amount of operations per second on the data chunk size. For large chunks we will compare performance depending on data-transfer rate in megabytes per second. This approach helps us evaluate the disk subsystem’s performance in two typical scenarios: working with small data chunks is typical of databases. The amount of operations per second is more important than sheer speed then. Working with large data blocks is nearly the same as working with small files, and the traditional measurement of speed in megabytes per second is more relevant for such load.
We will start out with reading.
The efficient reading from mirror pairs shows up in the test of reading in small data chunks: this is the reason why the controller can yield up to 200 operations per second on the RAID10. Even the degraded RAID10 has acceptable performance.
There is confusion among the checksum based arrays: the 4-disk RAID5 speeds up on 2KB data blocks while the degraded RAID6 (with two failed disks) suffers a performance hit on 8KB data blocks. Take note that the 8-disk arrays enjoy a larger advantage over the others as the data chunks grow bigger.
The speed of sequential reading becomes an important factor when the arrays are processing large data blocks. And we’ve got some odd results here. The 4-disk RAID0 can’t leave the single HDD behind whatever the data block size. The 8-disk RAID0 is no different from the other arrays even on 2MB data blocks although it should be faster then.
The healthy RAID5 and RAID6 are all right in terms of speed: their graphs are a treat for our eyes. But the degraded arrays are only as fast as the single HDD – we want more!
Writing comes next.
This diagram usually shows scalability according to the principle “the more disks and the more cache, the higher the resulting performance is”. The LSI controller has it in its own way: the 8-disk RAID10 behave oddly, slowing down on every small-size data block excepting the smallest one (512 bytes). As a result, they are occasionally slower than the 4-disk array! However, the 4-disk RAID0 is even worse: its speed is no higher than that of the single HDD. At the same time, the 8-disk RAID0 and the 4-disk RAID10 show a normal performance.
The rotated parity arrays are all right. The graphs are indicative of proper scalability but the controller still lacks cache memory to be competitive to the opponent products we have tested before.
The arrays go on behaving oddly when writing in large data blocks. This time the 8-disk RAID10 (both healthy and degraded) are good whereas the 8-disk RAID0 depends too much on the size of the data chunk. Particularly, it shows a strong dislike of 512KB data chunks. The 4-disk RAID0 is slower at writing than the single HDD, and the 4-disk RAID10 is not much better, either.
The RAID6 arrays show problems when writing in large data blocks. The speed is too low, except for the degraded RAID6 with one failed disk.
IOMeter is sending a stream of read and write requests with a request queue depth of 4. The size of the requested data block is changed each minute, so that we could see the dependence of an array’s sequential read/write speed on the size of the data block. This test is indicative of the highest speed a disk array can achieve.
The 8-disk RAID0 is almost as fast as 1000MBps on large data blocks, which is an excellent result. The scalability is all right, too. The controller even tries to read from both disks in the mirror pairs on very large data blocks, but the performance growth is not big. The degraded RAID10 passes this test almost without any loss of speed.
The checksum-based arrays cope with reading well: their speeds are just what you can expect if you know the speed of the single HDD. The degraded arrays are surprisingly slow here: the controller is not quite good at restoring data from checksums.
We see inexplicable fluctuations of speed of every array at writing. The 8-disk RAID0 does not reach its theoretical speed. The 4-disk RAID10 and RAID0 have problems with medium-sized and large data chunks, respectively. The top speed of the 8-disk RAID0 is good, but it is the slowest of all on small data blocks and we have no explanation for that.
The RAID5 and RAID6 arrays have problems with writing, too. There are flaws in the controller’s firmware. This is the only explanation why the degraded arrays are superior to the healthy ones. The 8-disk RAID6 is too slow on large data blocks.
The multithreaded tests simulate a situation when there are one to four clients accessing the virtual disk at the same time, the number of outstanding requests varying from 1 to 8. The clients’ address zones do not overlap. We’ll discuss diagrams for a request queue of 1 as the most illustrative ones. When the queue is 2 or more requests long, the speed doesn’t depend much on the number of applications.
Like other RAID controller of its class with these HDDs, the LSI 8708EM2 produces its top speed only at request queue depths longer than 1. When there is one read thread, the 8-disk arrays show the best performance. The only exception is the 8-disk RAID10. As usual, the picture changes dramatically when we add a second thread. The same trio of 8-disk arrays still have high speeds but the 4-disk RAID0 has sped up twofold and now competes with the leaders. Interestingly, it has left behind the 8-disk RAID0, which has suffered a twofold performance hit. So, can the controller process two threads in parallel in mirror pairs? Perhaps it can but does not do that always.
The degraded RAID5 and RAID6 survive the addition of a second thread nicely. They even increase their speed. It looks like the second thread provides an opportunity for them to increase the queue and optimize their operation.
The 8-disk RAID10 wakes up at three threads and outperforms its 4-disk counterpart. The degraded arrays are still very good while the 8-disk RAID0 takes first place: the additional threads make up for the short queue depth for it, so it does not slow down at all.
Save for the RAID10, all of the 8-disk arrays, including the degraded ones, accelerate at four threads. This controller obviously likes to work with high loads and many disks. The RAID10 arrays lose their quickness, finding it hard to handle four read threads simultaneously.
The controller gives us yet another portion of oddities at writing: the degraded arrays are in the lead when processing one write thread. The 8-disk RAID0 is the slowest of all.
The second write thread provokes changes in the standings, but we don’t see any logic in them. Some arrays get faster and some slower, irrespective of the number of disks or array type.
When we add more write threads, the picture becomes somewhat clearer. The RAID5 and RAID6 arrays consisting of a large number of disks cope best with multithreaded writing. The RAID10 do not like this load and are even slower than the single HDD.
The controllers are tested under loads typical of servers and workstations.
The names of the patterns are self-explanatory. The request queue is limited to 32 requests in the Workstation pattern. Of course, Web-Server and File-Server are nothing but general names. The former pattern emulates the load of any server that is working with read requests only whereas the latter pattern emulates a server that has to perform a certain percent of writes.
The 8-disk RAID10 is unrivalled at pure reading as it can effectively read from both disks of a mirror couple. But you can note that the failure of one disk worsens its performance greatly.
The RAID5 and RAID6 arrays go neck and neck. The failure of one disk lowers their performance by 50%. The loss of a second desk cuts the performance of the RAID6 by 50% more.
The performance ratings indicate that the number of disks per array is more important under such load than the array type. However, RAID10 is preferable, the other parameters being equal, if you are looking for maximum speed.
The RAID10 is not a leader anymore when there are write requests to be performed. It is still good at short queue depths but loses to the RAID0 at long queue depths.
While the 4-disk RAID5 and RAID6 are similar in terms of performance, the 8-disk RAID5 is ahead of the same-size RAID6. The degraded arrays slow down greatly, yet even the RAID6 with two failed disks performs better than on competitor controllers.
Lower queue depths have higher weights in our performance rating formula, but this doesn’t help the RAID10 get closer to the RAID0. The checksum based arrays find this test difficult: processing those checksums has a negative effect on their performance.
The standings are the same here as in the File-Server test. The positions of the two degraded arrays have changed somewhat. The RAID10 is now closer to the healthy RAID10, and the RAID6 without two disks has sunk to the level of the single HDD.
The RAID6 arrays have moved down to lower places. We don’t think they should be used in highly loaded workstations, though.
When the test zone is limited to 32GB, each array increases its gap from the single HDD.
For this test two 32GB partitions are created on the virtual disk of the RAID array and formatted in NTFS and then in FAT32. Then, a file-set is created on it. The file-set is then read from the array, copied within the same partition and then copied into another partition. The time taken to perform these operations is measured and the speed of the array is calculated. The Windows and Programs file-sets consist of a large number of small files whereas the other three patterns (ISO, MP3, and Install) include a few large files each.
We’d like to note that the copying test is indicative of the array’s behavior under complex load. In fact, the array is working with two threads (one for reading and one for writing) when copying files.
This test produces too much data, so we will only discuss the results of the Install, ISO and Programs patterns in NTFS which illustrate the most characteristic use of the arrays. You can use the links below to view the other results:
The arrays take proper standings when creating files, but have rather too low speeds. Of course, the RAID0’s 254MBps looks good, but it is an 8-disk array and we can expect it to be three times as fast as that!
The same goes for the second set of arrays: the standings are normal except for the too modest results of the 8-disk RAID6, but the speeds are overall too low.
Thus, writing is definitely a weak point of this controller, at least with the current version of its firmware.
It is somewhat better at reading, but there are a few odd things here. For example, the 8-disk RAID0 is inferior to the 4-disk RAID0 with the Install pattern. The RAID10 arrays all have the same speed with this pattern, which is queer.
Reading from the checksum based arrays is not without surprises, either. The Install file-set is read faster by the 4-disk RAID5 rather than by the 8-disk one as we might expect.
Take note that the degraded arrays deliver faster speeds with the large files of the ISO pattern than the single HDD, but lose to the latter with small files.
The 8-disk RAID0 is unexpectedly good at copying files within the same partition. While the other arrays show similar speeds and equal same-type arrays built on opponent controllers, these array and controller deliver twice the others’ performance.
The RAID5 and RAID6 arrays pass the copying test successfully. The standings are logical and the arrays all deliver high speeds, except for the 4-disk RAID5 in the ISO pattern.
The results do not change much when the arrays have to copy into another partition. Every array copes well with this job.
Finally, here are data-transfer graphs recorded in WinBench 99. We could not record it for the degraded RAID6 with two failed disks as the program would issue an error message.
And these are the data-transfer graphs of the RAID arrays built on the LSI 8708EM2 controller:
The following diagram compares the read speeds of the arrays at the beginning and end of the partitions created on them:
Everything is neat in this diagram. The speed grows up from array to array in a predictable way and with almost ideal scalability (the 8-disk RAID5 spoils the picture somewhat).
The LSI MegaRAID SAS 8708EM2 RAID-controller is very good at reading. It prefers heavy loads and arrays with quite a lot of disks. It is very effective in choosing what disk in a mirror pair will take less time to access the required data. It is also good at reading from degraded arrays: its processor is fast enough to achieve a high speed even from a RAID6 with two failed disks. Multithreaded reading is fast on this controller, too.
Writing is, on the contrary, the weak point of the LSI MegaRAID SAS 8708EM2 due to the small amount of cache memory or firmware flaws. This controller won’t be a good choice if your applications do a lot of writing.