Information

X-bit Labs for mobile users! Do not forget that we are running a special version of X-bit Labs web-site for users of mobile and handheld devices: http://pda.xbitlabs.com. Check out our news and articles from smartphones and PDAs to be always updated on the latest computer and technology news.

 

Articles: CPU

IBM PowerPC G5: Another World (page 14)


Category: CPU

by Victor Kartunov

[ 01/27/2004 | 11:36 AM ]


Pages : 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19

Part 6: Execution Units

But let’s continue our exploration of the internal processor structure. We have approached the execution units. It’s not as simple here as it seems at first glance. PowerPC970 contains two integer units (IU), each of which is paired with a Load/Store unit. I can’t say we have anything exceptional, as far as the number of units goes: Pentium 4 features two fast ALUs that work at a double frequency (and one slow ALU for certain types of operations). G4+ also has two of them: one integer unit for simple operations (addition) and one for complex operations (integer division). Athlon 64 boasts even three ALUs. But IBM has thrown in some technological nuances into the architecture. These two IUs we have in PowerPC970 are not exactly the same. Both units perform simple operations like addition and subtraction in perfectly the same way. When it comes to more complex operations, there is certain specialization: for example, it is the IU2 that performs all divisions in PowerPC970. IBM refers to these two units as “slightly different”, but that makes the situation even more confusing. Unfortunately, I couldn’t find any information about what unit sorts the commands according to their specifics and sends them to the appropriate IU. It’s not also quite clear what happens if a group of IOPs contains several division commands. Will they all be waiting for the IU2? If this is the case, we can pinpoint a definite bottleneck of this architecture. The decoder unit may try to keep the groups as balanced as possible: this variant seems more probable, but there is a question of how good the decoder is at changing the IOPs positions.

To our great disappointment, IBM also doesn’t disclose any information on the latencies and the instruction throughput for the PowerPC970. We only know that some simple instructions take one clock cycle (excluding the decoder, of course), while others take at least several clocks. Independent IOPs can start each clock cycle, while IOPs that are dependent on each other can start no faster than each second cycle. We could try evaluating a few things according to the given pipeline stages scheme: for instance, the most probable latency for accessing data stored in L1 cache will be equal to 4 clocks.

The Load/Store units come next. These units show some deviation from the ideology we see in x86 processors (Pentium 4, Athlon 64). The x86 processor includes two units for performing integer and floating-point loads/stores. For example, the Pentium 4 has the following specialization, according to the data from arstechnica.com:

  • All LOAD: LSU (Load/Store Unit);
  • Integer STORE: LSU;
  • Floating-point STORE: FSTORE;
  • Vector STORE: FSTORE.

The author is doubtful about this, as he seems to have some concerns about the correctness of such a “division of labor”, which is actually quite hard to confirm or deny. And in PowerPC970 these units are identical, that is there are two special units responsible for all types of loads/stores. There remains some uncertainty about vector operations, though: the predecessor of the family, Power 4 processor, didn’t have a vector unit. That’s why it is not quite clear how many vector Load/Store operations and of what kinds the appropriate unit of the PowerPC970 processor can perform. There is probably a specialization, too, so that one unit is responsible for Vector LOAD and the other - for Vector STORE. But that is only a supposition. Another variant is also possible: the units only perform Vector LOAD, while Vector STORE is combined with the execution units of the pipeline. The worst variant is when only one unit is responsible for all operations with vectors.

<<< Previous page Next page >>>

Discussion

Comments currently: 37
Discussion started: 01/27/04
View comments

Add your Comment

Name/Nickname
Your Comments
 

Category News

Category: CPU

Thursday, July 24, 2008

11:06 pm Intel Rumoured to Speed Up Nehalem Launch on Desktop. Intel’s Bloomfield Processor to Emerge in September – Rumours

Wednesday, July 23, 2008

3:35 pm AMD to Discuss Rival for Intel Atom Towards Year End. AMD’s Competitor for Intel Atom in the Works, Says Company

Monday, July 21, 2008

8:46 am AMD Initiates Pilot Production of 45nm Chips. AMD to Bring 45nm Products in Early Q4 2008

Thursday, July 17, 2008

2:36 pm AMD’s Chief Executive Officer Hector Ruiz Steps Down. Dirk Meyer Becomes New Chief Exec of AMD

12:15 pm Intel: Atom Will Not Substitute Celeron Processors. Intel Denies Possibility to Change Celeron for Atom

 
News Archive
All Latest News