## **EECS 373**



## Design of Microprocessor-Based Systems

http://web.eecs.umich.edu/~prabal/teaching/eecs373



**Prabal Dutta** University of Michigan











1 cm<sup>3</sup> [IPSN'12]





[ISSCC'12]

Lecture 1: Introduction

January 8, 2015



What is an embedded system?



```
- Has MPU
                           - Peripherals
                           - Low prwer
- Is app-specific
                           - Small
- Has no OS
                           - Mobile / networked
  Real-time
                           - Cost constrained
  Integrated in device
  Fixed 1/0 / not expandable
 warny about memory mant
```

## Embedded, Everywhere





## Embedded, Everywhere - Fitbit





### Embedded, Everywhere - WattVision on Kickstarter





wattvision digital sensor front mount





## What is driving the embedded everywhere explosion?

## **Outline**



**Technology Trends** 

**Design Questions** 

Course Administrivia

Tools Overview/ISA Start

Moore's Law (a statement about economics): IC transistor count doubles every 18-24 mo





## Flash memory scaling: Rise of density & volumes; Fall (and rise) of prices









Figure-1 32Gb MLC NAND Flash contract price trend



Source: DRAMeXchange, Nov. 2009.

## Hendy's "Law": Pixels per dollar doubles annually





Credit: Barry Hendy/Wikipedia

## Dennard Scaling made transistors fast and low-power: So everything got better!



\*0.65V

#### Design of Ion-Implanted MOSFET's with **Very Small Physical Dimensions**

ROBERT H. DENNARD, MEMBER, IEEE, FRITZ H. GAENSSLEN, HWA-NIEN YU, MEMBER, IEEE, V. LEO RIDEOUT, MEMBER, IEEE, ERNEST BASSOUS, AND ANDRE R. LEBLANC, MEMBER, IEEE

#### Classic Paper

This paper considers the design, fabrication, and characteriza-tion of very small MOSFET switching devices suitable for digital integrated circuits using dimensions of the order of  $1 \mu$ . Scaling relationships are presented which show how a conventional MOS-FET can be reduced in size. An improved small device structure is presented that uses ion implantation to provide shallow source and drain regions and a nonuniform substrate doping profile. Onedimensional models are used to predict the substrate doping profile and the corresponding threshold voltage versus source voltage characteristic. A two-dimensional current transport model is used to predict the relative degree of short-channel effects for different device parameter combinations. Polysilicon-gate MOSFET's with channel lengths as short as 0.5  $\mu$  were fabricated, and the device characteristics measured and compared with predicted values. The performance improvement expected from using these very small devices in highly miniaturized integrated circuits is projected

#### I LIST OF SYMBOLS

| α                              | Inverse semilogarithmic slope of sub-  |
|--------------------------------|----------------------------------------|
| C.                             | threshold characteristic.              |
|                                |                                        |
| D                              | Width of idealized step function pro-  |
|                                | file for channel implant.              |
| $\Delta W_F$                   | Work function difference between       |
| ,                              | gate and substrate.                    |
| $\epsilon_{Si}, \epsilon_{ox}$ | Dielectric constants for silicon and   |
|                                | silicon dioxide.                       |
| $I_d$                          | Drain current.                         |
| k                              | Boltzmann's constant.                  |
| 6                              | Unitless scaling constant.             |
| L                              | MOSFET channel length.                 |
| $\mu_{\text{wiff}}$            | Effective surface mobility.            |
| $n_{ii}$                       | Intrinsic carrier concentration.       |
| $N_a$                          | Substrate acceptor concentration.      |
| $\Psi_s$                       | Band bending in silicon at the onset   |
|                                | of strong inversion for zero substrate |
|                                | voltage.                               |
| sTr                            |                                        |
| $Ψ_b$                          | Built-in junction potential.           |

This paper is reprinted from IEEE JOURNAL OF SOLID-STATE CIRCUITS, vol. SC-9, no. 5, pp. 256-268, October 1974.

Publisher Item Identifier S 0018-9219(99)02196-9.





(b)

Fig. 1. Illustration of device scaling principles with  $\kappa = 5$ . (a) Conventional commercially available device structure. (b) Scaled-down device structure.

portion of the region in the silicon substrate under the gate electrode. For switching applications, the most undesirable "short-channel" effect is a reduction in the gate threshold voltage at which the device turns on, which is aggravated

(a)

0018-0210/00\$10.00 @ 1000 TEFF





Fig. 2. Experimental drain voltage characteristics for (a) conventional, and (b) scaled-down structures shown in Fig. 1 normalized to W/L = 1.

## **Dennard Scaling...is Dead**





#### Industry's ride is over

Source: Joe Cross, DARPA MTO

#### The past: Dennard's Scaling



$$P_{\text{density}} = N_g C_{load} V^2 f$$

= power per unit area

 $N_{\rho}$  = CMOS gates/unit area

 $\overline{C_{load}}$  = capacitive load/CMOS gate

V = supply voltage

f = clock frequency

k = scaling factor

k = typically 1.4 per geometry shrink

1/k = device feature scaling factor (typically 0.7 per geometry shrink)

#### For each generation/geometry shrink:

$$P_{\text{density (scaling)}} = (k^2)(1/k)(1/k^2)(k) = 1$$

Double the transistors (functionality) and increase the clock speed 40% per generation with the same power

#### Today: Dennard's Scaling is dead



$$P_{\text{density (scaling)}} = (k^2)(1/k)(1/k^2)(k) = k^2 \approx 2$$

But, power density cannot increase!

This physics is limiting COTS power efficiency to well below what we need for embedded sensor processing applications

Approved for Public Release, Distribution Unlimited

## And the Party's Over...



## Decades of exponential performance growth stalled in 2004



Source: NRC, The Future of Computing Performance, Game Over or Next Level?

## Not so fast! Bell's Law of Computer Classes: A new computing class roughly every decade







Number Crunching Data Storage

productivity interactive

streaming information to/from physical world

#### year

"Roughly every decade a new, lower priced computer class forms based on a new programming platform, network, and interface resulting in new usage and the establishment of a new industry."



## **MEMS Accelerometers:** Rapidly falling price and power





 $10~\mu\text{A}$  @ 10~Hz @ 6~bits [ST Microelectronics, annc. 2009]  $_{16}$ 

#### **MEMS Accelerometer in 2012**







1.8 μA @ 100 Hz @ 2V supply!

ADXL362 [Analog Devices, 2012]

## **MEMS Gyroscope Chip**





## Energy harvesting and storage: Small doesn't mean powerless...





RF [Intel]





Clare Solar Cell



Thin-film batteries

Piezoelectric



**Electrostatic Energy** Harvester [ICL]





**CEDRAT Technologies** 



Thermoelectric Ambient **Energy Harvester [PNNL]** 





## Bell's Law, Take 2: Corollary to the Laws of Scale







Intel® 4004 processor
Introduced 1971
Initial clock speed

108 KHz
Number of transistors

2,300
Manufacturing technology

10µ



Quad-Core Intel® Xeon® processor
Quad-Core Intel® Core™2 Extreme processor
Introduced 2006
Intel® Core™2 Quad processors
Introduced 2007
Initial clock speed

2.66 GHz
Number of transistors

582,000,000
Manufacturing technology



Introduced 2008
Initial clock speed
106 kHz @ 0.5V Vdd
Number of transistors
92,499
Manufacturing technology
0.18 µ

**UMich Phoenix Processor** 

Photo credits: Intel, U. Michigan

## **Outline**



**Technology Trends** 

**Design Questions** 

Course Administrivia

Tools Overview/ISA Start



Why study 32-bit MCUs and FPGAs?

### MCU-32 and PLDs are tied in embedded market share







# What distinguishes a Microprocessor from an FPGA?



## **MPU**



The Cortex M3's Thumbnail architecture looks like a conventional Arm processor. The differences are found in the Harvard architecture and the instruction decode that handles only Thumb and Thumb 2 instructions.



General structure of an FPGA



## Modern FPGAs: best of both worlds!



## Is the party really over?





Technology landscape: move past power limitations, effectively utilize concurrency



Concurrency only path left - National Academy of Science report



Why study the ARM architecture (and the Cortex-M3 in particular)?

## Lots of manufacturers ship ARM products





























## ARM is the big player



- ARM has a huge market share
  - As of 2011 ARM has chips in about 90% of the world's mobile handsets
  - As of 2010 ARM has chips in 95% of the smartphone market, 10% of the notebook market
    - Expected to hit 40% of the notebook market in 2015.
  - Heavy use in general embedded systems.
    - Cheap to use
      - ARM appears to get an average of 8¢ per device (averaged over cheap and expensive chips).
    - Flexible
      - Spin your own designs.



What differentiates these products from one another?

#### The difference is...







## **Outline**



**Technology Trends** 

**Design Questions** 

**Course Overview** 

Tools Overview/ISA Start

## W' 15 Instructional Staff (see homepage for contact info, office hours)





Matt Smith Lab Instructor



Prabal Dutta Instructor



Pat Pannuto



John Connolly



Chris Fulara



Alex Laberge



Ryan Wooster

## My research interests





## Course goals



- Learn to implement embedded systems including hardware/software interfacing.
- Learn to design embedded systems and how to think about embedded software and hardware.
- *Design and build* non-trivial projects involving both hardware and software.

## **Prerequisites**



- EECS 270: Introduction to Logic Design
  - Combinational and sequential logic design
  - Logic minimization, propagation delays, timing
- EECS 280: Programming and Intro Data Structures
  - C programming
  - Algorithms (e.g. sort) and data structures (e.g. lists)
- EECS 370: Introduction to Computer Organization
  - Basic computer architecture
  - CPU control/datapath, memory, I/O
  - Compiler, assembler

## **Topics**



## Memory-mapped I/O

- The idea of using memory addressed to talk to input and output devices.
  - Switches, LEDs, hard drives, keyboards, motors

## Interrupts

- How to get the processor to become "event driven" and react to things as they happen.

## Working with analog signals

- The real world isn't digital!
- Common peripheral devices and interfaces
  - Serial buses, timers, etc.

## Example: Memory-mapped I/O





#### This is important.

- It means our software can tell the hardware what to do.
  - In lab 3 you'll design hardware on an FPGA which will control a motor.
    - But more importantly, that hardware will be designed so the software can tell the hardware exactly what to do with the motor. All by simply writing to certain memory locations!
- In the same way, the software can read memory locations to access data from sensors etc...

## Example: Anatomy of a timer system



#### **Grades**



```
Item Weight
======
Labs (7) 25%
Project 25%
Exams 35% (15% midterm; 20% final)
HW/Guest talks 10%
Oral presentation 4%
Course Evaluation 1%
```

- Project & Exams tend to be the differentiators
- Class median is generally a B+

#### Time



- Assume you are going to spend a lot of time in this class.
  - 2-3 hours/week in lecture (we cancel a few classes during project time)
  - 8-12 hours/week working in lab
    - Expect more during project time; some labs are a bit shorter.
  - ~20 hours (total) working on homework
  - ~20 hours (total) studying for exams.
  - ~8 hour (total) on your oral presentation
- Averages out to about 15-20 hours/week preproject and about 20 during the project...
  - This is more than we'd like, but we've chosen to go with state-of-the-art tools, and those generally have a steep learning curve.

#### Labs



- Start next week.
- 7 labs, 8 weeks, groups of 2
  - 1. FPGA + Hardware Tools
  - 2. MCU + Software Tools
  - 3. Memory + Memory-Mapped I/O
  - 4. Interrupts
  - 5. Timers and Counters
  - 6. Serial Bus Interfacing
  - 7. Data Converters (e.g. ADCs/DACs)
- Labs are very time consuming.
  - As noted, students estimated 8-12 hours per lab with one lab (which varied by group) taking longer.

## **Open-Ended Project**



- Goal: <u>learn how to build embedded systems</u>
  - By <u>building</u> an embedded system
  - Work in teams of 2 to 4
  - You design your own project
- The major focus of the last third of the class.
  - Labs will be done and we will cancel some lectures and generally try to keep you focused
- Important to start early.
  - After all the effort in the labs, it's tempting to slack for a bit. The best projects are those that get going right away (or even earlier)
- Some project lead to undergraduate research

## Sample projects from F' 10 and their results



Energy-harvesting sensors →
 Sensys demo, IPSN paper, TI project, Master's thesis











Wireless AC Power Meter →
 SURE, IPSN demo, NSF GRFP, SenSys paper, Grad School













## Letters of recommendation for graduate school



- Grad school apps will require supporting letters
- Faculty write letters and read "coded" letters
- Strong letters give evidence of research ability
- Strong letters can really help your case
- Weak letters are vague and give class standing
- Weak letters are useless (or even worse)
- Want a strong letter?
  - Do well in this class
  - Pull off an impressive project
  - Continue class project as independent research in W'15

#### Homework



- Start TODAY!
- 4-5 assignments
  - A few "mini" assignments
    - Mainly to get you up to speed on lab topics
  - A few "standard" assignments
    - Hit material we can't do in lab.
- Also a small part is for showing up to guest lectures

#### Midterm and Final Exams



- Midterm (Tue, Feb 25, 2015 from 1:30pm-3:00pm)
  - Emphasize problem solving fundamentals
- Final (Tue, Apr 28, 2015 from 4:00-6:00pm)
  - Cumulative topics w/ experience of projects
  - Some small amount of material from presentations

# Looking for me?



- Nominal Office Hours
  - Time TBD in 4773 BBB
  - Sometimes in lab sections

### **Outline**



Technology Trends

**Design Questions** 

Course Overview

Tools Overview/ISA Start

## We are using Actel's SmartFusion Evaluation Kit







#### A2F200M3F-FGG484ES



- 200,000 System FPGA gates, 256 KB flash memory, 64 KB SRAM, and additional distributed SRAM in the FPGA fabric and external memory controller
- Peripherals include Ethernet, DMAs, I<sup>2</sup>Cs, UARTs, timers, ADCs, DACs and additional analog resources
- USB connection for programming and debug from Actel's design tools
- USB to UART connection to UART\_0 for HyperTerminal examples
- 10/100 Ethernet interface with on-chip MAC and external PHY
- Mixed-signal header for daughter card support



### **FPGA** work





## "Smart Design" configurator



## Eclipse-based "Actel SoftConsole IDE"





# Debugger is GDB-based. Includes command line. Works really quite well.





#### ARM ISA: Elements of an Instruction Set Architecture



(registers, memory, word size, endianess, conditions, instructions, addressing modes)



# Assembly example: Work through on your own. We'll discuss next time



```
data:
    .byte 0x12, 20, 0x20, -1
func:
        mov r0, #0
        mov r4, #0
        movw r1, #:lower16:data
        movt r1, #:upper16:data
        ldrb r2, [r1],#1
top:
        add r4, r4, r2
        add r0, r0, #1
        cmp r0, #4
        bne top
```



Questions?

Comments?

Discussion?