Those people who are into CPU overclocking or just interested in technologies implemented in modern CPUs will surely point out cooling as the most essential problem in this industry. This problem concerns Intel processors as well as CPUs from AMD. Processors now consume a lot of power and generate a lot of heat. That's why it is necessary not only to take the heat off the CPU die, but also to transfer it out of the PC case, so that the temperature of the air cooling the CPU heatsink could be as low as possible.
As Intel launches new more powerful Pentium 4 processor models, the company updates its requirements to system integrators concerning CPU thermal conditions. Among other things, the requirements specify the maximum allowed CPU core temperature, maximum allowed air temperature inside the PC case, recommended temperature lag of the CPU cooler and so on. The company only guarantees stable operation of its CPUs when these requirements are met. To be fair, we should also say that AMD sets similar requirements to Athlon XP based computers.
However, the question is: how well these requirements are fulfilled. Of course, if we deal with "brand-name" PCs from Hewlett-Packard or Dell, we can be sure that the manufacturer chose the chassis, cooler and other components paying attention to their thermal characteristics. But if we talk about PCs the users assemble themselves, it may appear they haven't even heard about any specific thermal requirements from Intel or AMD. Moreover, all research and calculations made by PC, case and cooler makers can end up in vain when the user tackles CPU overclocking. Power consumption and heat dissipation of an overclocked CPU are greatly intensified, so the die temperature jumps up provoking unstable functioning of the entire system.
Instability of the PC during overclocking is natural. But it is rather odd when an overclocked CPU works much slower than it's supposed to. In fact, this situation is quite typical of Intel Pentium 4 processors, although seems paradoxical at first sight. The cause lies in CPU temperature, to be more exact, in Thermal Control Circuit that modifies CPU performance depending on its temperature. In our today's article we will try to find out the working principles of the Thermal Control Circuit and also check out the dependence of Pentium 4 performance on its die temperature.
So, Intel implemented a new technology, Thermal Control Circuit, in its Pentium 4 CPUs. This technology is intended to ensure stable work and protect the CPU against overheating. Every Pentium 4 has two built-in thermal diodes. One thermal diode reports the CPU temperature to the hardware monitoring system of the mainboard. The other is placed in the warmest spot of the die, next to ALU units and is a part of Thermal Monitor circuit.
AMD Athlon XP CPUs also feature a similar thermal diode, but the two processors do differ a lot here. The thermal diode of Athlon XP reports the CPU temperature to the mainboard. A special logic unit of the mainboard processes the received data and shuts down the PC when the temperature exceeds a certain critical value. Of course, all the unsaved data will be lost in this case. Thermal monitoring in Pentium 4 is based on a different principle: the system should work stable even when the CPU temperature notched the critical value. The system should only shut down in case of emergency. This means that the CPU must prevent itself from heating up further and continue providing stable work of all applications.
This idea was implemented by integrating into the Pentium 4 core an additional special circuit (Thermal Monitor), which compares the current temperature with a certain critical value, and Thermal Control Unit logic that regulates CPU heat dissipation. The work of the Thermal Monitor implies the comparison of two electric currents: one comes through the thermal diode and the other is taken from an independent, reference source. Thermal diode resistance will depend on its temperature, so the electric current coming through it will change according to the CPU core temperature. Comparing this value with the reference one, we can determine whether the critical temp level is reached or not. The job of Thermal Monitor is quite simple: if the temperature in the warmest spot of the CPU exceeds some certain value, it's necessary to send PROCHOT# signal and enable Thermal Control Circuit system to reduce CPU heat dissipation and prevent further temperature growth.
There are a lot of erroneous ideas about the Thermal Control Unit mechanism. The most frequenctly occurring delusion implies that Pentium 4 reduces its nominal clock-rate when it gets overheated. That is, if it works at 2.2GHz, it will drop its frequency to 1.8GHz or even lower. It is not quite correct if we talk about the nominal CPU frequency set by its frequency generator. Let's make it clear now. First we have to recall the way the clock-rate is generated in Pentium 4 2.8GHz CPU.
Suppose the mainboard sends 133MHz frequency to the CPU. This frequency is multiplied by a coefficient, which equals 21 in Pentium 4 2.8GHz. This 2.8GHz frequency is the nominal frequency mentioned in the Pentium 4 marking as read by programs like WCPUid. This frequency determines the working frequency of the processor arithmetic units. Thermal Control Unit can affect this frequency. When the temperature is normal, ALUs will receive the same 2800MHz frequency. But when the CPU temperature is above a certain value, Thermal Monitor sends its PROCHOT# signal to enable Thermal Control Circuit. The latter modulates the frequency sent to the CPU and determines how many clock cycles should be omitted to reduce CPU heat dissipation. The modulation of the clock signal sent to the CPU is shown in the following diagram:
As a result, some clock cycles may be excluded from the normal 2.8GHz. In other words, they are set by the CPU multiplier unit, but left out by the null cycles control system which is enabled by the PROCHOT# signal. So we will have a lower resulting frequency sent to the CPU ALUs. Of course, CPU performance drops down as well as its heat dissipation, although the mainboard and internal clock-rate generator will keep producing the same 2.8GHz. Intel claims the resulting frequency can be as low as 30-50% of the nominal, depending on the CPU model.
However, as the temperature goes down, the Thermal Control Circuit unit will start returning the CPU to its nominal working mode, by reducing the number of null clock cycles and this way increasing the end frequency of the CPU (here we mean the internal frequency, which is set by the processor multiplier unit and is modulated with reference to the Thermal Control Circuit). As soon as the core temperature goes down by about 1oC (the so-called temperature hysteresis number) below the critical value, Thermal Monitor stops sending PROCHOT# signal. After that, Thermal Control Circuit will stop generating null cycles and the effective (resulting) frequency will equal the nominal one - 2800MHz in our case.
Now, there is a question: what temperature value is considered the critical one, when Thermal Monitor enables Thermal Control Circuit? It differs among various Pentium 4 models. Moreover, Intel claims that the integrated thermal diodes are calibrated specifically for each given CPU on the manufacturing stage. After the critical value for the Thermal Monitor is set, it can't be modified anymore.
Thermal Control Circuit can be enabled from an application via ACPI registers or via the mainboard BIOS. In case it is enabled via software, Thermal Control Circuit can work in "On-Demand" mode. That is, it can be enabled at any temperature, so the application can regulate the share of null and effective cycles. The number of null clock cycles may vary from 12.5% to 87.5% of the total. By the way, Thermal Control Circuit is disabled by default in all Pentium 4 CPUs. So, it must be enabled in the mainboard BIOS on system boot-up, or later in the operation system via the drivers or some other special software.
So, what do we get from the Thermal Control Circuit technology? Take a Pentium 4 CPU, which receives insufficient cooling. There may be a poor-quality cooler, no thermal paste between the heatsink and the CPU surface, or the system case is overloaded with "hot" expansion cards and has no system fans. All this leads to CPU overheating, so that the processor will work slower than if it were equipped with efficient CPU cooler and installed into a similar system assembled in a more expensive case providing better airflow. This problem may also take place during CPU overclocking. An overclocked CPU generates much more heat than in case it were working at the nominal frequency. So it is quite possible that an overclocked, but insufficiently cooled CPU will work even slower than at its nominal clock-rate.
For example, we overclock a Pentium 4 2.2GHz to 2.8GHz, but don't improve the cooling. In this case the CPU temperature quickly gets above the critical value and Thermal Monitor unleashes Thermal Control Circuit: the CPU starts missing out clock cycles. As a result, the user will see during boot-up and in WCPUid-like programs that the CPU is at 2.8GHz, although the effective performance of the processor may be even lower than that of Pentium 4 2.2GHz.
Of course, Thermal Control Circuit technology isn't as powerful as at seems. It's quite efficient for preventing further CPU temperature growth during overheating, but is unable to keep the system alive when the CPU cooler goes down, for example. In order to save the CPU from damages in case of such breakdowns, the second thermal diode built into the CPU keeps track of another CPU temperature value. When this value is notched, it doesn't "slow down" the CPU anymore, but sends THERMTRIP# signal for the system to shut down. This second temperature value is still below the fatal one and this should prevent the CPU from thermal damage in emergency situations. As all the diodes and circuitry of Thermal Monitor and Thermal Control Circuit are integrated into the CPU core and are independent of the mainboard, the anti-overheating system turned out very fast: temperatures comparison takes only a few nanoseconds, therefore this system will even save your CPU if you remove the cooler from the working processor.
For your information: the temperature when THERMTRIP# signal is sent equals about 135oC, according to Intel.
Testbed and Methods
In our tests we wanted to find out the operation threshold of Thermal Monitor system and check out the dependence of Pentium 4 performance on its temperature.
Testbed configuration looked as follows:
- Intel Pentium 4 3.06GHz CPU;
- GlacialTech Igloo 4310 Pro cooler;
- ASUS P4PE mainboard;
- 256MB PC2100 DDR SDRAM;
- GeForce4 MX440-8x graphics card;
- IBM DTLA 15GB 7200rpm HDD;
- SoundBlaster Live! Value sound card;
- CD-ROM 24x drive;
- InWin J-536 case (we turned off the system fan);
- Windows XP Professional operation system.
In order to see Thermal Monitor and Thermal Control Circuit technologies in action, we had to smoothly increase CPU temperature up to the critical value at the same time watching its performance. It didn't make sense to turn the CPU cooler off or take it off completely, as the CPU temperature would grow too fast in this case and we wouldn't see the detailed dynamic picture. So, we developed a special way to increase the CPU temperature. First of all, we slowed down the CPU cooler by means of Zalman FanMate regulator:
We reduced the fan rotation speed of the cooler from its nominal 4500rpm to 2200rpm. Thus we could reduce the effectiveness of CPU cooling so that its temperature grew a lot and stopped at some point. If we turned the cooler off completely, the temperature would grow too fast until the overheating protection system shuts the system down. But we didn't end up here. Simple reduction of fan rotation speed wouldn't give us the smooth CPU temperature growth we needed. Besides, the lower is the cooler fan rotation speed, the lower gets the cooler efficiency. And at the lowest rpm the dependence of the cooler efficiency on the fan rotation speed would become too non-linear.
So, we thought it would be easier if we took a different way. If the CPU is used in a system under steady workload and is equipped with the same cooler throughout the entire experiment, its temperature will linearly depend on the air temperature inside the system case, which in its turn depends on the environment air temperature. So, we can make the CPU heat up smoothly by steadily increasing the room temperature.
It is pretty hard to change the room temperature with at least 1oC precision, and it is exactly the precision we will need for our experiment. However, we could emulate the outside environment with the help of a heat chamber. For this purpose we used a MIR-253 medical incubator from SANYO. It looks like a cupboard with the 162x50x70 dimensions and 254liter capacity. So we put our ATX standard system case in there. The cables for the display, keyboard, mouse and power supply were connected through a special opening in the incubator case. This way the PC case put inside this incubator will appear in a thermally isolated environment where the air temperature doesn't depend on the room temperature and can be set manually. MIR-253 incubator combines a heating element and a refrigerating unit, like the ones used in home fridges. It consumes 220W of power and can maintain internal temperature in the range of -10oC to +50oC, which can be set with 0.1oC precision. Although during our experiments we discovered that it was impossible to reach such high precision, as the PC proved to be a constant and powerful heat dissipating source, which tended to increase the environment temperature a lot. Therefore, the temperature inside the incubator was always fluctuating by about 0.5oC-1.0oC.
So, we were able to change the environment temperature with 1oC precision with the help of MIR-253 incubator. And as a result, the CPU temperature should increase proportionally to the temperature growth inside the incubator.
We decided to test the CPU performance and its dependence on the temperature in a real application, and not in a synthetic benchmark. For our test session we used Unreal Tournament 2003 game and FRAPS utility, which shows the fps rate for the running Direct3D application in real time. Unlike the performance measuring tools built into Unreal Tournament 2003 or Quake3 Arena, FRAPS utility displays the fps rate in big letters, which is very helpful when the application is run in a window with low resolution. In fact, FRAPS utility also uses up some of the CPU resources and thus influences the results, but since we didn't reboot the testbed during our experiments, the results obtained can be regarded as credible.
Unreal Tournament settings were modified so that to reduce the graphics card workload as greatly as possible and make the gaming performance more dependent on the CPU. Unreal Tournament 2003 ran in a separate window with 320x240 resolution and 16-bit color depth. We ran DM-Asbestos level in "Instant Action" mode. All bots were disabled so that they could not affect the rendering speed. After the game started, we chose a place where the fps rate remained constant. We stopped the player there from then on we didn't touch the mouse and keyboard, so that the gaming window could remain unchanged. This way, we got the rendering speed of one and the same scene of a real gaming application. This speed was supposed to change depending on the CPU temperature. As the game ran in a separate window, we had the opportunity to simultaneously keep track of the CPU temperature with the help of ASUS PC Probe hardware monitoring utility that comes with the mainboard. All we had to do after starting Unreal Tournament 2003 was to increase the temperature inside the incubator and write down the numbers. Well, you will see the results a bit later, now let's talk about…
Hyper-Threading Technology and CPU Temperature
As we have already mentioned above, we used the newest Intel Pentium 4 3.06GHz CPU in our testbed. This processor supports Hyper-Threading technology (for more details please refer to our article called Intel Pentium 4 3.06GHz CPU with Hyper-Threading Technology: Killing Two Birds with a Stone…). We would like to discuss this technology from another aspect today: its impact on the CPU heat dissipation. On the one hand, due to Hyper-Threading technology the effectively used core area in Pentium 4 3.06GHz got 5% larger than in slower models (like in Pentium 4 2.8GHz). This should theoretically lead to higher heat dissipation. On the other hand, one physical processor is now recognized by the system as two logical CPUs and thus it is loaded more effectively. We couldn't definitely say you how this fact could tell on the heat dissipation. Well, let's not guess, but simply test Intel Pentium 4 3.06GHz in our system with the Hyper-Threading technology enabled and disabled in the mainboard BIOS. The room temperature during the tests was 20OC. The fan of the GlacialTech Igloo 4310 Pro cooler was set at its maximum rotation speed. We "heated up" the processor with the help of CPU Burn utility, SiSoft Sandra 2003 and Unreal Tournament applications.
The first program can load the processor more than any other Burn-in test. We ran CPU Burn for 10 minutes and wrote down the CPU temperature. However, this is a rather old program, released before the arrival of Pentium 4 processors supporting Hyper-Threading technology.
SiSoft Sandra 2003 is another story and understands Hyper-Threading all right. So we used Burn-in module with a looped CPU Multimedia Benchmark test. After running the test 57 times, we measured the CPU temperature.
Since we were going to test our system with Unreal Tournament, we had to figure out first when this game heats the CPU most of all: with the enabled or with the disabled Hyper-Threading technology. We ran the game with the above-described settings for 10 minutes. Then the CPU temperature was measured. We also took the temperature values with the ASUS PC Probe utility.
As we see, the processor temperature in the idle mode is considerably lower when Hyper-Threading is enabled. It's also lower in applications that don't "know" anything about this technology. Meanwhile, when the application is optimized for HT, it uses all of CPU pipelines more effectively and thus the processor does heat up more. As the CPU temperature was higher in Unreal Tournament with disabled HT technology, we decided to turn it off during our experiment with Thermal Monitor and Thermal Control Circuit. This way we will be able to heat up the CPU more.
Now let's get to the main point of the article: the dependence of Pentium 4 performance on its temperature. The initial air temperature in the incubator was 28oC; the initial CPU temperature was 69oC; performance in Unreal Tournament 2003 was about 115fps. From that moment on, we steadily increased the air temperature writing down the four parameters: air temperature in the incubator, air temperature inside the PC case, CPU temperature and CPU performance.
Well, let's see what we have got here. The game speed doesn't change until the CPU temperature gets over 72oC (Segment 1). From that moment on we see Unreal Tournament dramatically slowing down. We supposed Thermal Control Circuit would reduce the CPU speed according to its temperature. But we see now that there are spots when the CPU temperature remains constant (Segments 2 and 3), while the gaming performance keeps going down. It means Pentium 4 can resist its core temperature increase and keep the same temperature for some time. We would say that in these segments the temperature depends on the speed, rather than vice versa. So, Thermal Control Circuit does its job well.
Overall, the air temperature went from 27oC up to 50oC during the test; the air temp inside the system case - from 44oC to 63oC; the CPU temperature - from 69oC to 85oC. Our experiment showed that the critical temperature value for Pentium 4 3.06GHz was 72oC. It's when Thermal Monitor sent PROCHOT# signal and Thermal Control Circuit started slowing the CPU down. As we raised the CPU temperature 13oC higher above the critical value, we got CPU performance drop down more than twice: the speed of Unreal Tournament 2003 dropped from 115fps to 49fps. Unfortunately, we couldn't continue the test any further as MIR-253 incubator can only provide +50oC in its chamber. So all we could do, was to turn off the CPU cooler. After that, the CPU temperature rose to 94oC in less than one minute and the system shut down. The speed of Unreal Tournament 2003 didn't change during this time, which provoked some suspicions on our part. This fact pushed us to suppose that the performance drop of 2.3 times was the maximum the Intel Pentium 4 3.06 could do in our case.
However, we decided to double check if our supposition was true. To do that we resorted to CPU RightMark 2 RC3 test. This benchmark shows CPU performance in real time and allows keeping track of it even when the performance is dropping down during the test.
The graph built by the CPU RightMark changes too rapidly, which doesn't allow us to use the previous testing approach, that is to increase the environment temperature. In order to check Pentium 4 performance drop with CPU RightMark we had to heat up the CPU fast, but not up to the critical point. We couldn't take the CPU cooler off as the temperature would quickly grow to the critical value and Thermal Control Circuit would immediately cut down the CPU performance, without giving us the desired answers to our questions. So, we chose another way to "heat-up" the CPU: by turning off the CPU cooler. In this case, the CPU temperature will grow fast enough for RightMark to show benchmark results, but not too fast to lead to system hang-up. As a result, we got the following performance graph. It doesn't show the CPU temperature, though, but we don't actually need it that much. We are more interested to find out the limit of CPU speed adjustment.
The graph confirms that this limit does really exist. Thermal Control Circuit had been reducing the CPU speed for some time, but then the performance remained at the same level, although the temperature still grew up. After that, we turned the cooler on so that the CPU didn't hang up and its performance went back to its initial level. So, we have a 2.7 times performance drop and can claim that it is the maximum the Thermal Control Circuit of Pentium 4 3.06GHz can do.
To make it a really "pure" experiment, we set the environment temperature to -7oC, turned off the CPU cooler fan and started the system once again. We thought the low air temperature would allow the CPU to work stable, but we were wrong here. Soon after we started Unreal Tournament, the CPU temperature reached 94oC and the PC shut down.
The temperature inside MIR-253 incubator was 50oC. When we opened the door we could feel a flow of dry warm air . the CPU heatsink was too hot to touch. But, our Pentium 4 processor didn't get burned even when the air temperature inside the system case grew to 63oC.
Meanwhile, Thermal Monitor and Thermal Control Circuit technologies used in Pentium 4 are not a universal remedy against the overheating problems. Our test results prove that Pentium 4 does resist its temperature growth, but at quite a high cost: the performance drops more than twice. Besides, our tests showed that Thermal Control Circuit technology can't uphold the Pentium 4 3.06GHz CPU stability when the CPU cooler goes down, too.
At the same time, Pentium 4 is highly immune to environment temperature growth and will surely work stably in freezing winter as well as in hot summer weather. And probably, Pentium 4 will still keep working when all other processor freeze, though in this case its performance will be a little lower than usual.
Intel claims the effective internal frequency after being modulated by Thermal Control Circuit may be 50% lower than the nominal. Our tests show the performance drop during overheating is more than 50%. Still, it's less than 87.5% promised by Intel in case on-demand Thermal Control Circuit is enabled via software.
Our tests also revealed why Pentium 4 processors sometimes work slower than expected. Even when the CPU is not overclocked, it still can suffer from overheating because of an ineffective cooler, small system case without system fans and/or too many "hot" expansion cards, or simply because of high air temperature in the room. Whatever the overheating cause is, Thermal Control Circuit will set the performance-to-temperature ratio. And our tests showed that even a little core temperature increase from 72 to 75oC might result in a 10% performance drop by Pentium 4 3.06GHz.
If so, the CPU cooling problem takes quite another turn. Nothing now can prevent CPU cooler makers from claiming that their cooling solutions would make your PC work not only noiselessly and stably, but also really fast. Well, it makes sense, you know…