PUMA


Project Summary For Fiscal Year 99


ARPA Contract Number: DAAH04-94-G-0327

Title: Area-Interconnected CGaAs Microprocessor


ContractorUniversity of Michigan
SubcontractorsMotorola Government and Space Technology Group
Duet Technologies, previously Cascade Design Automation
Principal InvestigatorsRichard B. Brown
2403 EECS - 1301 Beal Ave.
University of Michigan
Ann Arbor, MI 48109-2122
PH: (734) 763-4207
FX: (734) 763-9324
Email: brown@umich.edu
Trevor N. Mudge
2222 EECS - 1301 Beal Ave.
University of Michigan
Ann Arbor, MI 48109-2122
PH: (734) 764-0203
FX: (734) 763-4617
Email: tnm@umich.edu
Ronald J. Lomax
2302 EECS - 1301 Beal Ave.
University of Michigan
Ann Arbor, MI 48109-2122
PH: (734) 936-2972
FX: (734) 763-9324
Email: rjl@umich.edu
Karem A. Sakallah
2213 EECS - 1301 Beal Ave.
University of Michigan
Ann Arbor, MI 48109-2122
PH: (734) 936-1350
FX: (734) 763-4617
Email: karem@umich.edu
Objective The primary focus of this project is development of a radiation-hard, complementary GaAs (CGaAs), PowerPC microprocessor with flip-chip, area I/O packaging. This processor, called PUMA, is ideally suited to space applications because of its low power-delay product and excellent radiation hardness. This project demonstrates semiconductor and packaging technologies which are appropriate for aerospace applications, but the circuits, design methodologies and packaging approaches developed will be vital in advanced CMOS processes. Through a related AASERT project, the processor will be integrated into a prototype system.
Approach This project has explored the use of complementary GaAs technology for implementation of large digital circuits such as microprocessors. During the course of this project, the CGaAs process was modified by adding a low-temperature GaAs buffer below the channel, making the circuits resistant to single-event upset (CGaAs was already immune to latchup and intrinsically hard to total radiation dose). This radiation-hard technology was used to implement an embeddable processor for space applications. To accomplish this, a number of developments were required: circuit designs and a full cell library, including an SRAM compiler; redesign of the PowerPC microarchitecture, targeting it for a lower integration level; design automation software to support CGaAs; and packaging, clock distribution and I/O schemes which permit multi-chip systems to operate efficiently.

The first test circuits designed in this project were fabricated and characterized to validate circuit topologies such as DCFL, domino, and various tri-state designs in the CGaAs technology; to verify SPICE models; to evaluate different circuit designs topologies; and to demonstrate critical subcircuits. Multichip modules were also designed and tested to characterize their interconnect and to test various I/O schemes.

A CMOS version (TSMC 0.35 micron process) of the 32-bit superscalar PowerPC microprocessor was then designed to verify the architecture. The CMOS design has been fabricated and packaged, and testing is well along. (So far, no problems have been found.) The CGaAs design has also been completed, and the chips have been fabricated and packaged. They will be tested as soon as the CMOS processor testing is complete.

The processor chip, which includes an area I/O pad array (in addition to peripheral pads), will be flip-chip assembled using gold bumps on a fine-pitch MCM-L board, connecting it to level-1 cache, a memory management unit, PCI interface, and level-2 cache. The area I/O array is the densest and finest pitch array ever to be assembled on MCM-L. The remainder of this project and an accompanying AASERT will complete the system design and demonstrate the prototype in a desktop computer.

Recent Accomplishments A Complementary GaAs PowerPC microprocessor was designed, fabricated and packaged. CGaAs has both p- and n-devices, like CMOS, but operates with a 0.9 to 1.5-volt power supply, keeping dynamic power very low. The HEMT transistors produce logic gates with short delays due to high carrier mobilities in their InGaAs channels. The very low power-delay product and excellent radiation hardness make this an ideal processor for space applications.

An architecture incorporating a number of advanced features, such as superscalar execution and a two-level dynamic branch predictor, was developed for small transistor budget designs. The processor implements a small on-chip primary instruction cache and a larger off-chip primary data cache. Implemented in the current 0.5-micron process, and running standard benchmark programs, the PUMA processor should achieve 0.77 instructions per cycle, and run at more than 200MHz. Other embedded processors could benefit from the PUMA studies which showed that some advanced features give a good performance return on the area and power they require, and from the efficient circuit designs in which they were implemented.

A methodology (which works for any semiconductor technology) was developed to quantitatively evaluate semiconductor processes for optimal scaling. The PUMA processor was designed in the Motorola 0.5-micron CGaAs process which had been developed by shrinking the gate length of a 0.7-micron process; it was clear that the design rules needed to be adjusted as they were scaled. As dimensions are shrunk below 0.18 micron, linear scaling will become very costly in CMOS, too. The methodology guides process engineers to scale design rules in the most cost-effective way to reach the desired objective. The heart of this method is the automated exploration of the design-rule space using a process-independent, optimizing SRAM compiler developed in this project. A cost/benefit analysis of the CGaAs design rules showed that when operating under a fixed spending cap, nonlinear scaling provides greater improvements in area and performance than linear scaling.

A gold bumping process which makes precisely-sized bumps of the desired aspect ratios has been developed in our lab as part of a flip-chip packaging scheme. Test chips having thousands of bumps with pitches as small as 50 microns are being assembled on MCMs. The processor chip includes a 315-pin area I/O pad array with a pad pitch of 6 mils, in addition to 288 pads in a staggered peripheral ring. This packaging approach is appropriate for military and aerospace applications now, and will be important for the commercial CMOS systems predicted by the SIA roadmap.

Gunning transceiver, differential voltage, and switched current I/O interfaces have been fabricated in CGaAs and tested. Test results indicate that these circuits in CGaAs can support bit rates of at least 650 Mb/s/pin (limited by the test set-up).

An advanced CGaAs transceiver based on switched-current techniques has been designed. The receiver actively terminates the input line to its characteristic impedance using an active current mirror. The transmitted current pulse is 1.5mA. The receiver is biased using a feedback circuit that overcomes process variations and power supply differences between the transmitting and receiving chips by adjusting bias levels of the receiving chip. Simulations indicate that the circuit can support 1.2 Gb/s/pin signaling while dissipating only 4.5 mW, with a 1.5 V supply. The circuits are small, so they could be implemented efficiently for high-speed chip-to-chip interfaces.

A CMOS version of the switched-current transceiver was designed, fabricated and tested. It was implemented in a 3.3 V process, but designed to operate with a 2.5 V supply, to emulate future CMOS processes which will have lower supply voltage (Vdd) to threshold voltage (Vth) ratios. This I/O link achieved a range of speeds from 880 Mb/s at 5 mW with a 2.25 V supply, to 1.05 Gb/s at 9 mW with a 3 V supply. The low power, small size, and process and voltage tolerance of this interface is exactly what is needed for future CMOS chip-to-chip communication.

A CGaAs delay-locked-loop (DLL) was designed to explore the effects of low supply voltage and headroom on phase noise performance. Simulations indicate that the DLL will operate at 500 MHz with a peak jitter of 88 ps. DLLs are important components of I/O interfaces, allowing the receiver to sync to the transmitter clock.

A CGaAs PLL clock generator which operates at up to 800 MHz with a 1.5 V supply and 120 ps phase jitter was designed and tested. The CGaAs design was operational at a supply voltage as low as 0.8 V. Step-up clock generators such as this are important components of modern microprocessor systems.

An accurate phase jitter simulation method, which includes the phase jitter model in transient simulations, has been developed. This tool can be used to analyze circuits for phase jitter, allowing the designer to optimize them in this important parameter.

A low-noise PLL clock generator, employing current-steering logic, was designed, fabricated and tested in a 0.5-micron, 3.3 V CMOS process. This design, which benefited from availability of the jitter simulator, was also done for a lower power supply voltage to study and develop low-headroom circuits. Measured results show a top frequency of nearly 800 MHz with a power supply voltage of 1.8 V. It achieves a measured absolute phase jitter of less than 60 ps, and an RMS cycle-to-cycle phase jitter of 10 ps. This is the best phase jitter performance of any published PLL, and it was achieved with low-voltage techniques which will have direct applicability to future CMOS circuits.

A transistor-level micro-placement tool called TEMPO was developed for two-dimensional leaf cell synthesis. It generates custom-quality layouts for such high-performance logic families as cascode voltage switch logic, pass transistor logic, and domino CMOS. This is achieved through powerful transformations such as dynamic geometry sharing through transistor chaining and arbitrary geometry merging. TEMPO enables the quick migration of cell libraries to new fabrication processes.

A constructive logic synthesis tool named M31 interleaves the traditionally separate technology-independent logic restructuring and technology-dependent library binding stages. M31 is based on Boolean decomposition strategy that ties together the structural properties of the functions being synthesized, the structural attributes of the implementation network, and the functional content of the target library. The resulting implementations are consistently smaller and faster than those generated using conventional logic synthesis. In addition, they can be incrementally modified to create variants that achieve other area/speed trade-offs.

A methodology and tools have been developed for minimizing the effects of capacitively-coupled crosstalk. By using an accurate and consistent empirical model for wiring resources and constraints, the tool has made coupled noise and delay predictable, and thus avoidable. A congestion-driven placement algorithm was developed to help minimize the incidence of capacitive coupling, and a global route-embedder was developed to guide the detailed router to meet timing and noise constraints.

Current Plan As this project comes to an end, several important tasks remain to be completed. First of all, the CGaAs processor must be tested. This will provide evidence of the capabilities of complementary GaAs for space applications.

The packaging technology based on gold bumps will be demonstrated with test chips on MCM substrates. These parts are being assembled, and will be tested to evaluate yield vs. bump pitch, bump aspect ratio, and number of bumps.

And finally, an MCM-L based system is being designed with the CGaAS FXU, CMOS cache, and an FPGA-based memory management system and PCI interface. This work will be completed under a related AASERT project.

Technology Transition This project has been carried out in close collaboration with Motorola Government and Space Technology Group and Cascade Design Automation. CGaAs process advances made in this project were of immediate benefit to US Government customers using the same process for digital circuits (contact Mike LaMacchia, 602-441-3071). In the past year, due to financial pressures and collapse of the Celestri program, Motorola has stopped offering the digital CGaAs process. The technology still exists at Motorola, and we hope that the demonstration of our CGaAs PowerPC will renew interest in this process which has much promise for space applications. We do appreciate the fact that Motorola fabricated our final wafers even though they had made the decision to end CGaAs production. DUET Technologies purchased Cascade Design Automation this past year and turned it into a cell library provider.

We have worked with Todd Weatherford at the Naval Postgraduate School (408-656-3044) on radiation hardness issues. We are collaborating with 3-M on flip-chip assembly and with Multek on MCM-L technology. We have also worked with SPEC (Garry McMillan, 512-306-1100), sharing the costs of wafers, masks and packaging for CGaAs runs. We have collaborated with Capt. Gerald J. Trombley at the National Security Agency, and visited with people from Wright Laboratory (Dr. Charles L. A. Cerny) and Phillips Laboratory (Capt. Kenneth G. Merkel) about CGaAs technology and circuits.


PUMA Project Home PagePUMA home page