
University of Michigan: High-Performance Microprocessor Project
Cascade has also been developing an area-distributed pad router. This tool works within Cascade's design system, Epoch, and enables the designer to plan and optimize a chip using area pads. Presently the tool allows an array of area pads to be distributed across a design and routed. A preliminary version of this tool, Eggo, has been delivered to the University of Michigan.
During the next year, Cascade will fully support the University of Michigan with their area pad designs. Work will continue on the Eggo area pad routing tool, and a top-down floorplanner will be enhanced to assist in the placement of the area pad drivers. Cascade will also enhance their timing analysis tool, Tactic, to support domino logic for this project. This work is scheduled to be completed by February 1997.
University of Michigan researchers, building on previous work by their colleagues, developed a new timing analysis model for domino logic. The timing model was presented to representatives of Cascade, and a proposal was offered for incorporating this timing model into their static timing analysis tool, Tactic.
The development of a businterface chip (PIP) was initiated. The PIP will form the interface between the MMUs, the main memory, and the PCI bus. A detailed specification is being drafted.
The cell library described above has been used in the design of an ALU, which has been taped out for fabrication. This chip, which includes over 100,000 transistors, measures 5.7 x 6.0 mm. It implements over 150 PowerPC instructions in Domino-logic. The ALU test chip is designed to allow high-speed testing on a lower-speed tester; on-chip buffers at the inputs and output allow for slow scan-in and scan-out of data with full-speed operation during the test.
In addition to the ALU, a sub-nanosecond access time 4K-byte SRAM and a 600-ps access time 32x32 register file chip have been designed. Simulation results from these circuits have already provided information which will change the microarchitecture of the final system; test results will qualify the cell set for use in the final processor chips. A 3-port floating point register file has also been designed. The 2 read and 1 write port register file has a 64 by 32 bit organization. The chip core measures 3.6 mm x 0.8 mm. Simulations indicate a read access time of 0.62 ns with a 1.5 V supply and 1 ns clock period. Power is estimated to be 500 mW at the rated conditions. We believe that this design achieves the best power-delay product of any sub-nanosecond register file to date.
The architecture team has simulated the performance of the Spec 92 and 95 benchmarks on different PUMA pipeline organizations, and continues to make improvements to the simulator in order to evaluate different FXU-FPU and memory hierarchy organizations.
The GCC compiler has been modified such that it targets the PUMA instruction set. The modifications prevent the compiler from generating output that includes some unsupported PowerPC instructions. The GCC will be modified such that it will only generate instructions in the PUMA instruction set.
The Mayo group gave us access to a program still under development which extracts the effective R, L, G and C parameters of the transmission lines as a function of frequency. We improved the part of the program which extracts the propagation constant, and are developing a program which de-embeds the probe sites at each end of the lines in order to get better values of the actual line parameters. These results are being compared with simulation parameters determined from the dimensions and structural parameters of the MCM interconnects, using Quad Design's XTX program; preliminary results indicate reasonable agreement. This information will be used in simulation of the MCM interconnect signals.
A second MCM, designated MCM2, will be designed which will have six chips mounted on a 2" substrate. We are developing an ALU chip in CGaAs technology which will be flip chip attached with gold bumps (this chip is further described above under Circuit Design). The design includes area array placement of the pads which will aid verification of the advantages expected with this design technique. We will also develop power and thermal models for the efficient placement of power and thermal pads.
A Microtester in CMOS technology will interface with the SRAM chips, two of which are mounted on the upper surface of the MCM as bare die (if available) and two in single chip packages on the backside of the MCM. This will exercise the Memory management scheme for the final MCM system. The MCM substrate will consist of signal and power layers which can be attached to chips mounted on the back side.
The third MCM, MCM3, will contain the prototype processor, including the FXU, FPU, IMMU, DMMU, and caches, and will interface to a PCI bus. It will be a two-sided MCM on an aluminum nitride substrate, with the Si cache ICs mounted on the lower side, and the GaAs processor chips and the high density interconnects mounted on the upper side.
The complete system includes the above mentioned PCB2 with components and the host processor provided by the Motorola Power Stack. Through the PCI interface and PIP, the host supplies data to the SDRAM and signals the MMT to begin the test procedures. The MMT transfers data from the SDRAM through the PIP to the SRAM. Next, the MMT reads the data from the SRAM and performs basic operations (e.g. XOR, ADD, etc.) before storing it to a new location back in SDRAM. When the transfer is signaled complete, the host reads SDRAM to verify that the correct data changes are stored.
For further information, please see the papers Software-managed address translation and Simplifying virtual memory management with hardware segmentation