
Fault
Tolerance and Reliability Techniques for High-Density
Random-Access Memories (Hardcover)
by Kanad Chakraborty,
Pinaki Mazumder
To learn more about the book, go to
Amazon.com at the following web-site:
http://www.amazon.com/Tolerance-Reliability-Techniques-High-Density-Random-Access/dp/0130084654
Look
Inside This Book
Browse Sample
Pages:
Front Cover | Table of Contents | Excerpt | Index | Back Cover
Editorial
Reviews
Book Info
Surveys the latest research and field-proven techniques for every form
of
memory fault tolerance, including manufacturing, online, and
field-related
fault tolerance. Authors focus on practical circuit and design
solutions.
From the Back Cover
The state of the art in
fault-tolerant
RAM development and production.
·
Embedded RAM
for SoC design: practical circuit and
layout design
principles and techniques
·
State-of-the-art
manufacturing, online, and field-related fault tolerance
·
Structured
custom design solutions for self-testable/self-repairable embedded RAMs
·
Includes
extensive illustrations and examples, plus a compendium of 500+
research papers
Next-generation electronic
devices require
advanced new nanofabrication CMOS technologies—and, in these
environments,
today's processing techniques simply will not produce adequate yields.
To
improve RAM reliability without compromising performance, cost, or
space
requirements, engineers are turning to advanced fault-tolerant
techniques. In
this book, Kanad Chakraborty
and Pinaki Mazumder
survey
the latest research and field-proven techniques for every form of
memory fault
tolerance, including manufacturing, online, and field-related fault
tolerance.
Coverage includes:
·
Embedded RAM for
SoC design: practical circuit and layout
design
principles and techniques
·
New research
into the mechanisms underlying soft and hard failures
·
Understanding
the impact of scaling on reliability
·
Modeling and
analysis of manufacturing yield
·
Manufacturing
fault tolerance: built-in self-diagnosis and repair, reconfiguration,
repair
via EEPROM switches, flexible redundancy, and more
·
Techniques
for mitigating radiation-induced single-event effects
·
Field fault
tolerance: error correcting codes and associated circuit techniques
·
Structured
custom design solutions for self-testable and self-repairable embedded RAMs: circuit and physical design
Chakraborty and Mazumder
focus on practical circuit and design solutions, presenting
extensive illustrations and explaining device physics and circuit
design theory
in a reader-friendly manner. They also provide a compendium of more
than 500
research papers on memory fault tolerance and reliability. Whether
you're a
design engineer, test engineer, manufacturer, or researcher, this is a
comprehensive resource for building next-generation RAM with
next-generation
reliability.
Modern Semiconductor
Design Series
About the Author
KANAD CHAKRABORTY is
currently Member of
Technical Staff, Agere Systems Research
(Communications Systems Technology Lab). He was formerly a software
engineer
and researcher with IBM's Electronic Design Automation Lab. His
contributions
include development of novel fault-tolerant memory architectures,
algorithms
for multiport memory testing, new design
automation
approaches, and neural network applications.
PINAKI MAZUMDER is
Professor in the Department
of Electrical Engineering and Computer Science,
Excerpt. © Reprinted by permission. All
rights
reserved.
1.
Preface
This book deals with the
study of
fault-tolerance and reliability techniques for semiconductor
random-access
memories. Topics in this book include: reliability testing and
prediction;
diagnosis, repair and reconfiguration; single-event effects and their
mitigation; use of error-correcting codes; yield analysis; and physical
design
issues for built-in self-repairable embedded RAMs.
This book is written primarily for academic researchers and practicing
engineers working in design and test of high-density random-access
memories (RAMs) of the twenty-first
century. It provides useful
exposure to readers on state-of-the-art diagnosis, repair, redundancy,
hardening, and error correction schemes for RAMs.
The
book may also be used as a supplementary text for undergraduate and
graduate
courses on VLSI fault tolerance and reliability.
Presently,
application-specific integrated
circuits (ASICs) and high-performance
microprocessors
such as Itanium and Compaq Alpha processors use a total of almost 75%
of chip
real estate for accommodating various types of embedded memories. For
example,
the Compaq Alpha EV7 chip shown on the front cover employs 135 million
transistors for RAMs alone, while the
entire chip has
152 million transistors. As the integration level increases to nearly 1
billion
transistors within a decade or so, as projected by the Semiconductor
Industry
Association (SIA) Roadmap, the relative silicon area occupied by
embedded
memories will tend to be 97% and even more. The ever-increasing need
for myriad
memory blocks within a VLSI chip with a view to improving the system
throughput
through larger caches and multilevel caches,
indicates
that the reliability of a complex VLSI chip will depend largely on the
reliability of these embedded memory blocks. With device dimensions
moving
rapidly toward the ultimate physical limits of device scaling, which is
in the
regime of feature sizes of 50 nm or so, a host of complex failure modes
are
expected to occur in memory circuits. The goal of this book is to
establish the
need for appropriate fault-tolerant and reliable design techniques that
cover
the entire spectrum of chip design, from system architectures to
nanofabrication. We discuss all these techniques in a systematic
manner. Future
generations of giant VLSI circuits could be manufactured with lower cost and have higher field reliability if
these
fault-tolerance and reliability techniques were to be incorporated
while
building embedded memories. Readers of the book will discover with us
that for
the highest levels of reliability and fault tolerance of such memories
in field
application, soft error correction and scrubbing are not adequate,
since
leakage currents produced by deep-submicron process technologies and
exacerbated by energetic ions in terrestrial and space environments can
cause
hard errors to accumulate over time. For reliable operation, such
errors need
to be repaired in the field using built-in self-repair, the importance
of which
is growing every day.
Organization
The book is organized as
follows.
Chapter 1 establishes the
need for quality and
reliability testing and prediction and describes the mechanisms
underlying hard
and soft failures. The impact of scaling on reliability has been
explained,
models for predicting reliability have been described, and techniques
for
safeguarding against failures and achieving fault tolerance, are
discussed.
Chapter 2 deals with
manufacturing fault
tolerance and examines the work that has been done for the past two
decades on
diagnosis, repair and reconfiguration of RAMs.
We
describe diagnosis algorithms, repair algorithms, reconfiguration
techniques,
repair using flash EEPROM switches, flexible redundancy, built-in
self-diagnosis
(BISD) and built-in self-repair (BISR), built-in redundancy analysis
(BIRA),
and case studies of BISR architectures.
Chapter 3 describes
radiation-induced
single-event effects and their mitigation techniques geared toward
reliability
enhancement. The topics examined include particles causing single-event
effects, basic mechanisms for nondestructive and destructive
single-event
effects in RAMs, factors that affect the
soft error
rate (SER), mitigation and hardening techniques, description of
experiments for
studying soft error rates and charge collection in memory devices, and
modeling
and simulation of charge collection. It is shown that radiation can
cause not
only soft errors but also hard errors, such as single-event gate
rupture (SEGR)
and single-event burnout (SEB), thereby eventually warranting the need
for hard
repair and reconfiguration of memory devices.
Chapter 4 introduces the
reader to online
testing and the techniques used in the implementation of
error-correcting codes
for RAMs. Such techniques are useful for
reliable and
fault-tolerant operation during field use. This chapter delves into the
theory
of error-correction coding (ECC) and describes fault-tolerant design
techniques
such as bit scattering, sparing, complement/recomplement,
consecutive correction and prestorage
protection. We
also describe ECC implementations (both on-chip and off-chip), and
reliability
evaluation and simulation of ECC-equipped memory.
Chapter 5 describes yield
modeling and
analysis techniques for fabrication processes. We describe simple
statistical
models for yield estimation such as cluster models, yield loss
mechanisms,
importance of negative binomial cluster models, critical area
simulation and
yield computation, effects of hardware redundancy, error-correcting
codes,
defect density, defect characteristics, and device scaling, on yield,
and the
relationship between yield and reliability. We also describe hardware
and
software techniques for yield management and improvement.
Chapter 6 describes the
issues underlying a
structured custom design solution, comprising both circuit design and
physical
design, for built-in self-testable and self-repairable embedded RAMs. A custom layout generator, BISRAMGEN
, has been used to study the characteristics of circuits that
would be
needed for fast memory access, high bandwidth, and low-overhead (in
terms of
both area and delay) BIST and BISR. Circuit techniques and BIST/BISR
solutions
are studied, their usefulness is analyzed, and the ensuing testability,
yield,
reliability, and cost benefits are investigated. This chapter also
includes a
new table-driven optimization approach for self-repairable RAM design,
and a
new algorithm for floorplanning
rectangular
components of a built-in self-repairable RAM array.
Usefulness
of the book
Semiconductor memories,
particularly RAMs, have always occupied a
very important place in
electronic circuits, from memory cards in board-level circuits, and
embedded
memory modules used in application-specific integrated circuits (ASICs), to microelectronic devices used in
spacecraft.
Nowadays, large quantitites of embedded
RAM cores
(including SRAM, DRAM, and flash memories) are being used extensively
in
systems-on-a-chip (SoCs). The importance of
reliability and quality testing, fault tolerance, diagnostic fault
coverage,
self-repair, reliability and online error correction of such memories
is
paramount, because embedded memories have pins that are difficult to
probe
externally for test and repair. These topics are described in Chapters
1, 2, 4,
and 6. Accurate analysis of processing yields and effective yield
management
techniques, described in Chapter 5, are very important in reducing the
manufacturing cost and in increasing the field reliability of memory
devices. A
vast majority of field-related problems nowadays are caused by ionizing
radiation, for memory devices used in both spacecraft as well as
terrestrial
electronics. We describe in Chapter 3 the basic mechanisms for these
problems,
and the techniques used for mitigating them and hardening memory
devices.
An article published last
year (September 5,
2001) by Vincent Ratford of Virage
Logic Corp., in EE Design (2001 CMP Media Inc.), provides an
interesting
perspective on BIST and BISR. While BIST has been called the future of SoC technology that will save SoC
(also FPGA and ASIC) from the ruin of inferior yields, BISR is being
hailed as
a substantial cost saver in the near future. Ratford
gave a typical example as follows: suppose that a company builds an xDSL modem chip in a 0.18 A
m
process incorporating 5 Mb of SRAM on an 8 A 8 mm die, and manufactures
1
million units in the first year. Let us further assume an average
selling price
of $25.00 per unit and a per-unit wafer cost of $2200. The wafer defect
density
is projected at 0.4 for memory and 0.3 for logic (the greater defect
density
for memory can be attributed to a higher density of transistors in the
memory).
Without BIST/BISR, die yield would be approximately 64%, compared to
82% yield
with BISR. Also, use of BIST/BISR instead of external testing and
repair could
produce total cost savings of about $500,000. The yield increase due to
BISR
alone can create an additional $2.4 million in savings. Such a project,
estimated at $25 million, would therefore witness up to 12% cost
savings (about
$3 million) with BIST and BISR technologies.
With deep-submicron CMOS
processing
technologies, feature sizes are shrinking below 0.1 A
m. In such technologies, static and dynamic RAM devices are operating
at much
lower supply voltages (e.g., 1 V) and have much smaller capacitances
(e.g., a
few fF) than in the past. As a result,
these memory
devices are very vulnerable to radiation-induced problems affecting
data
storage (described in Chapter 3) and low manufacturing yields
(described in
Chapters 5 and 6) due to even minor process variations. Therefore, a
design
engineer would want to learn about state-of-the-art processing and
circuit
techniques for RAMs that would produce
fault
tolerance, both at the time of manufacture (i.e., high processing
yield) and
during field use (i.e., high reliability). A test engineer would be
interested
in learning about fault diagnosis algorithms that would aid in self-repair, and circuit techniques that would
produce
practical self-test and self-repair solutions. These topics are
described in
Chapters 2 and 6. The book focuses on design issues and circuit
techniques. The
style of presentation is simple and is devoid of intricate details of
device
physics or circuit design theory. Our objective is to provide guidance
to design
and test engineers, manufacturers, and researchers on practical ways of
implementing high-yielding and high-reliability RAM architectures,
without
overwhelming them with a lot of theoretical issues.
Each chapter is provided
with a comprehensive
set of problems designed to stimulate readers to delve into research
papers
that go beyond the scope of the book. A sample solution to one problem
is
provided in each chapter. These problems are intended to provide a
reinforcing
experience to the reader. Most problems are accompanied by hints in the
form of
pointers to published articles. Also, this book has a lot of
illustrations,
most of which have been borrowed from recent publications, some with
modifications, for improved clarity.
This book presents a
compendium of the
state-of-the-art literature on diverse aspects of fault tolerance and
reliability of random-access memories, spanning about 500 research
papers
published in the last few decades. Although considerable effort has
been
invested to make sure that the book is devoid of glaring errors, we do
not
claim infallibility. The reader is requested to report any error to
either or
both of us.
Kanad Chakraborty
(kanadc@agere.com), Murray Hill, New Jersey
Pinaki Mazumder
(mazum@eecs.umich.edu), Ann Arbor,
Michigan