by Ilya Gavrichenkov
12/23/2002 | 12:00 AM
The IT industry sees few revolutions for very simple reasons. The market is oriented onto effectiveness rather than performance while most break-throughs are expensive and at first usually don't bring even the same performance that the existing previous generation high-end solutions provide. Anyway, although the effectiveness and price-to-performance ratio of a technology may be the guiding factor for the market, the market also vitally needs going forward. That's why the industry is constantly making advances into new fields, if they are not unacceptably expensive.
The last few years of memory market development may be a good example. The natural on-flow of DRAM industry was interrupted by sudden emergence of Intel & Rambus factor. We won't dwell upon this story now: its details are well known to everyone (see our article called Rambus: the Story of One Company). Let's recall the outcome: the expensive, proprietary standard, which could boast no effectiveness, because had been developed by a single company, was rejected by the market in favor of DDR SDRAM. DDR SDRAM defeated RDRAM in an absolutely fair game (at least, on the DDR's part). So now there is a question: what standard should succeed it?
Considering the above said things, there could be no doubt about the choice the market is going to make. The market needed something that would be faster than DDR, but at the same time most compatible with it in the already built infrastructure, so that the transition would be most smooth and costless. In this case we have got a name that is showing the gist: DDR II. JEDEC (to be more exact, the Future DRAM Task Group formed by its JC-42 committee dealing with DRAM) started developing it as far back as April 1998 when there was not even DDR in the PC market. By now, all the major DRAM manufacturers have already rolled out their DDR II chips and the group includes over a hundred companies, whose interests range from CPU making to producing test equipment for memory modules.
This process is quite contrary to the process of RDRAM development, as it is as open as possible. The working group is open not only to JEDEC's members, and the standard itself was strictly positioned as open and free from the very beginning. Of course, it means lower prime cost and larger production volumes.
The keystones of the new specification are very logical. The main thing is not the memory as it is, but system (CPU and chipset) requirements to the memory. Of course, there are some advances made to reduce the cost and power consumption. Of course, they will also try to meet the desired requirements, such as: to make it backward compatible with DDR, make sure that there are no special, non-standard interfaces supported. And of course, they will do their best to improve the performance by, say increasing the bus efficiency.
In a separate line goes the encouragement to take all the best that is already here: like draft materials on SLDRAM, SRAM and so on. And this is not just wishful thinking. Back in the beginning of 1999, SLDRAM Inc. announced that it wound up its work on SLDRAM (by that time the research and development had consumed up to $4 billion) and would put its efforts (and all the results it had already got) into DDR II.
The first question on the agenda is the notorious backward compatibility of DDR II with DDR. If it had not been for that, there would have been no need to make up all the mess: they might have been better off developing some brand-new specification. But in many respects this backward compatibility will only be a thing for the engineers: the DIMM module changes its appearance again. The 184-pin DDR module will be replaced with a 232-pin DDR II.
On the whole, if we look into details, we will find that there has been plenty of work done for keeping present infrastructure alive. The controller's command set hasn't been replaced, but expanded, so that one and the same memory controller would be able to work with both DDR and DDR II. We have the same principle of data packets transfer along the data bus, the same four-bank structure and the same memory page size. We should admit that the industry seems to shun the word "revolution" after the Rambus affair.
Well, the compatibility is important, but it's not all. Otherwise, we could better stick to DDR, as it's 100%-compatible with itself. The general aim remains the same: performance growth. There are some things done to improve it. First, and most evident, the clock-rate is a little higher. This "a little" stems from the same architecture: you can't wait for a sudden frequency boost from nearly the same thing.
Anyway, after 200MHz (400MHz) PC3200 DDR the clock frequency of the first mass DDR II chips will begin with 200MHz/266MHz. With the standard 64bit memory bus, this gives us 3.2-4.3GB/sec bandwidth. Of course, the frequencies will be growing: 333MHz (667MHz) chips are on schedule already, so we are to see 5.4GB/sec quite soon. And keeping in mind that there will be no single-channel chipsets even in the PC market by the time DDR II arrives, the two memory channels will provide up to 10.8GB/sec bandwidth! That will be enough for future processors as well as for AGP 3.0 and PCI-X. By the way, the bus bandwidth of Prescott CPUs, the next x86 generation CPUs from Intel, will be exactly 5.4GB/sec. A perfect hit.
This frequency growth became possible due to the improved manufacturing technologies used, as well as due to some evolutionary changes in the core. Such as the doubling of the data packets prefetch, which is now equal to 4 instead of 2. So, the core is functioning at a four times lower speed than the memory bus does. Let's take a 266MHz chip with the resulting external bus frequency of 533MHz as an example. Its core frequency will be 133MHz, which makes no problem for any DRAM maker today.
One more similar innovation is that the commands can now be executed on any rising edge of the wavefront if they are not in conflict with the preceding ones. As a result, the command bus now can work at a twice lower frequency than the data bus. To cut it short, everything doubles inside the doubled overall frequency.
Before going on further, let's recall some basic concepts. The data amount stored in a chip is represented as a combination of several divisions or banks, which are in their turn split into pages. A page is a two-dimensional array (table). One of the key parameters that determine memory performance at the same bandwidth are CAS (column address strobe) and RAS (row address strobe). They stand for the number of clock cycles necessary to access the required column and row, respectively. Their crossing gives us the memory cell to be read or written to.
Now among the architectural changes we can mention such things as Posted CAS or the formula sounding as: "write latency = read latency - 1", which help to utilize the bus more efficiently. The time interval between requests to the column and row of the data array (RAS-to-CAS delay, tRCD) is at least 13ms for DDR, which lead to a loss of about four clocks at the frequencies of DDR II chips. The Posted CAS mode and additive latency concept were introduced in order to combat these losses. They allow executing CAS and RAS commands with certain overlapping, practically without any pauses.
Let's dwell a little upon the latencies. It's a sore spot of today's DRAM architectures. You can't underestimate it, as we see by RDRAM example. High latencies during memory access result in poor performance in benchmarks. DDR may look better than RDRAM in this respect, but no more. Just look: during the few years from the times of i486 to Pentium 2, the CPU clock-rate grew by more than 10 times as well as the peak memory bandwidth. But the memory access latencies got five times higher! So, although it's not evident, the industry is moving in the direction shown by Rambus: bandwidth vs. latencies.
The balance between the two should be maintained, though. And there are measures taken: those Posted CAS and "write latency = read latency - 1 ". The experience in developing such specifications as VCM (Virtual Channel Memory) RAM and ESDRAM Lite is also being put to use. On the whole, in both cases they suggest creating a small cache (8Kbit per bank) that would allow doing without tRCD, which slows down the memory subsystem at higher frequencies, and avoiding time penalties when getting onto the wrong page or bank. But here we can see no progress yet. At least, by the available DDR II chips the tRCD value equals at least 15ns.
And now the last point concerning latencies. There are no half-clock latencies by DDR II. Combined with the command conflicts restrictions and four-packet continuous data transfer, this reduces a bit the cost of testing the end products and thus reduces their prime cost. Moreover, the specs claim that the layouts of the chips and DIMM modules are optimized for value mainboards with fewer slots.
One more lesson, industry learned when watching the tortures of Rambus, is power consumption requirements. Well, the requirements are quite obvious, Rambus just didn't bother to meet them. We should acknowledge, though, that DDR DIMM modules do have heatsinks, which first appeared by RIMMs. On the whole, this was one more task, DDR II developers confronted.
It all came out very straightforwardly here. First, as we have already mentioned above, prefetch 4x allows reducing chips core frequency to more than acceptable values and the direct correspondence between the clock-rate and voltage still holds true. Second, the input voltage was also lowered from the today's 2.5 to 1.8V.
So, we have come to the last important issue dealing with the architecture, which is still very interesting: the developers consider the possibility to introduce the idea of autocalibration. The process begins with writing certain data set into the chip via a slow write protocol. The slow protocol should be used after chip initialization, as it is not calibrated yet and thus doesn't allow writing in four-bit "shots". So, there goes a command that turns on expanded registers set and begins the process of tuning up by changing the resistance of the circuit. Then, the system tries to read the pre-written data set. If more tuning is necessary, the calibration procedure is repeated. The same tuning is done for timings. So the point is: this given module is tuned up to this given system environment! Rather a subtle way of combating circuits' instability, which grows alongside with their complexity. Unfortunately, this aspect of the specification is still quite vague.
Well, the samples of 512Mbit DDR II chips presented by Micron, Samsung and Elpida give us some more or less clear idea of the mass market up to the second half of 2003:
Structure: 32x4, 16x8, 8x16;
Core/chip voltage: 1.8V;
Power consumption: 304mW;
Extra features: external circuits resistance regulation.
Well, looks quite nice. We have got a product that combines both increased frequency/performance (although we might wish the growth to be higher) and reduced power consumption. It's most important as the current trend is to make computer devices and peripherals as small as possible: take PDAs as an example. FBGA instead of today's TSOP-II allows placing the chips closer to one another on the PCB and provides better overall signal stability. It's also a more unified variant, which is important, too. According to Semico Research, three years ago 69% of all DRAM chips went to PCs, while in a couple of years it will only be 46%. All the rest will go to communication equipment, consumer electronics and mobile devices.
Well, yeah - and graphics cards! High speeds are required here even more than in mainboards, while circuitry is somewhat simpler: you needn't ensure that the modules from various manufacturers work with your card, and the like. The number of parties involved is a little bit smaller, too.
And they managed to jointly ratify DDR II-based graphics memory standard, GDDR-2/3 in the end of 2002. 2 or 3 - depends upon who's talking. NVIDIA and Co think that it's GDDR-2 with 1Gbit/sec per output bandwidth (compare with 400Mbit/sec per output by 400MHz DDR II chips!). ATI thinks its GDDR-3, which is exactly the same spec, but higher frequencies. So, the bandwidth here is 1.4Gbit/sec per output. Simple arithmetic tells us that we deal with frequencies of 1GHz and 1.4GHz, respectively! And that's not wishful thinking: Samsung should be already producing limited volumes of 1GHz GDDR-2 chips the time you are reading this article.
Moreover, graphics card makers need one more DDR II variation. But now they don't want utmost clock-rates, but low cost and efficiency. Here is the recipe: take one DDR II specification, cut off all you can do without and get DDR-2M, a mobile variant with the required features. ATI and Elpida tried to use the recipe, but it's not quite clear yet, whether the market will accept their private initiative.
So, we see the old story repeats once again. If the market gets carte blanche and isn't pressed upon, the winner will be the price-to-performance ratio. Moreover, it will continue this way in the future. DDR III is shaping up and … Yeah, no revolutions! The voltage is reduced ever lower to 1.2-1.5V. The resulting frequency increases ever higher :from 800 to 1500MHz, so the module bandwidth will be 12GB/sec instead of 6.4GB/sec. But the daintiest bit is that one more tendency is sure to go on: DDR II chips cost about the same as DDR, and DDR III is going to remain within this rule, too.
Well, we are a little bit too fast, I assume. Let's first meet DDR II, which should come into the mass market starting the second half of 2003. About that time we are also going to see first chipsets supporting DDR II, like SiS656. Intel is only planning to introduce its dual-channel DDR II chipset in 2004. In the meanwhile VIA should be entering the scene with its dual-channel P4X800, somewhere about the fourth quarter of the next year.