The University of Michigan - Department of EECS EECS578 Correct Operation for Processor and Embedded Systems Midterm exam – December 3, 2015



# Name: \_\_\_\_SOLUTIONS\_\_\_\_\_\_

This exam is **CLOSED BOOKS**, **CLOSED NOTES**. You can only have pencil/pen and eraser. No electronic device is allowed. The exam has 10 questions. For questions where a box or a line for your answer is provided, please put your final answer there.

IMPORTANT: always show your work in deriving your answers in questions 8-10. **Correct answers without showing your work will receive no credit.** 

| Question                      | Points |
|-------------------------------|--------|
| 1 – SQ on Design Validation   | /11    |
| 2 – SQ on Reliability         | /6     |
| 3 – SQ on Security            | /6     |
| 4 – SQ on Networks on Chip    | /11    |
| 5 – SQ on Memory Consistency  | /8     |
| 6 – SQ on Developing Research | /5     |
| 7 – Research Papers           | /22    |
| 8 – Circuit to CNF            | /5     |
| 9 – Memory Accesses           | 17     |
| 10 – Logic Simulation         | /10    |
| TOTAL                         | /91    |

The rules of the Honor Code of the University of Michigan - College of Engineering apply for this exam:

# HONOR PLEDGE:

"I have neither given not received aid on this exam, nor have I concealed any violations of the Honor Code."

Signature:

(Exams without a signed pledge will not be graded)

Questions 1-6 below include some True/False, short answer – where we specify the maximum number of words you can use --, sentence completion and some "choose the best word for the sentence".

#### 1. Short questions on Design Validation – 11 points

A. [1pt] In post-silicon validation, designers may use scan-chains to generate targeted tests.

B. [1pt] Shmoo plots are used to validate that a chip operates correctly when operated within the ranges specified for a number of physical parameters (e.g.: temperature, operating voltage, etc.).

C. [1pt] An escaped bug is a bug that cannot be localized and diagnosed.

D. [1pt] A constrained random stimulus generator is a software tool that produces

E. [2pt] RTL software simulators can typically simulate a complex design (e.g. IBM or Intel modern processor) at 10 [ Mhz / Khz / Hz ]. Design emulators are also used in the pre-silicon validation effort, however, they entail limited [ observability / controllability ] of the design's internal nodes.

F. [1pt] The key benefit of accelerators and emulators is speed of simulation.

G. [1pt] 3SAT problems are NP-complete.

randomzed input sequences for a design.

H. [1pt] (enter a number) The clause (a + b +a') contains 3 literals.

I. [1pt] "Constraint learning" is a concept used in bounded model checking to describe the process of developing an input constraint for the design iteratively.

J. [1pt] DPLL is a SAT solver algorithm which explores the search space [ breadth / depth ] first.

TRUE / FALSE

# 2. Short questions on Reliability - 6 points

- A. [1pt] (enter words) TMR stands for \_Triple Modular Redundancy\_\_\_\_\_
- B. [1pt] FITs are measured in hours. (1/hours)

TRUE / FALSE

- C. [2 pts] (circle the correct option) 'Checkpointing and rollback' is equivalent to a
   [ DMR / TMR ] solution in [ space / time ].
- D. [2 pts] Ariadne is a fault-tolerant routing algorithm based on up\*/down\* routing. Among the statements below circle the correct ones:
  - Faulty links are avoided in Ariadne because no router broadcasts its location through a faulty link during the spanning tree construction.
  - Ariadne operates by computing a new spanning tree at regular time intervals
  - Once computed, the whole spanning tree is stored in each routing table, so that each router can determine how to route packets to any destination.
  - To route from S to D, a packet must always go up from S to the root and then down from the root to D

- 3. Short questions on Security 6 points
- A. [1pt] RSA is a very popular symmetric encryption protocol.

TRUE / FALSE

- B. [1pt] TPM stands for <u>Trusted Platform Module</u>
- C. [1pt] List at least two good sources for generating truly random numbers:

(<10 words) \_thermal noise, mouse movement, lava lamp, etc.\_\_\_\_

- D. [3 pts] Which of the following is a side-channel attack? Circle all that applies:
  - Timing attack
  - Stack overflow
  - Fault-based attack
  - Obfuscation
  - TPM
  - Brute-forcing the secret key

# 4. Short questions on Networks on Chip – 11 points

- A. [3pts] A typical network-on-chip router includes the following components:
   \_input/packet/flit\_ buffers, \_virtual channel\_ allocator, \_switch\_ allocator, crossbar and routing unit, which can be implemented either as a \_routing table\_ or as a \_logic circuit\_.
- B. [1pts] Routing functions can be deterministic or <u>adaptive</u>. These latter ones can in turn be minimal or non-minimal.
- C. [3pts] Several things can go wrong when routing packets in a network-on-chip: for instance, deadlock prevents packets from making forward progress towards their destination. Name three other possible routing issues that may arise in a NoC (one word each):
  - a. \_livelock\_\_\_\_\_congestion\_\_\_\_\_
  - b. \_misroute\_\_\_\_\_packet drop\_\_\_\_\_
  - c. \_starvation\_\_\_\_(not corruption, that relates to data, not routing)\_
- D. [4pts] Consider the NoC below. We would like to know what routing direction(s) (North, East, South, West, Local) is/are stored on the routing table of node <1,1> for packets destined to <2,2>, when the routing function is:



- a. deterministic XY routing \_East\_\_\_\_\_
- b. deterministic YX routing <u>North</u>
- c. minimal adaptive XY routing \_East, North\_\_\_\_
- d. minimal up\*/down\* routing with a root in <2,0> <u>East</u> (South leads to non-minimal)

- 5. Short questions on Memory Consistency 8 points
- A. [2pts] TSO [ relaxes / enforces stricter ] [ store→store / store→load / load→store/ load→load ] orderings when compared to SC.
- B. [1pts] The <u>MEMBAR / fence / barrier</u> instruction is used to enforce ordering on a multicore that implements RMO.
- C. [2pts] (circle all that applies) A memory constraint graph may include the following types of edges
  - Timing edges
  - Consistency edges
  - Dependence edges
  - Atomic edges
  - Program edges
  - Cumulative edges
- D. [3pts] Complete the sentences below as per the discussion in DACOTA [Mammo *et al.*, TCAD 2015] and PipeCheck [Lustig *et al.*, MICRO'14]:
  - Given an application, µhb graphs always have [more / fewer ] vertices than memory access graphs.
  - [ Both graphs / Only memory access graphs / Only μhb graphs ] capture the details of the microarchitecture.
  - [Both graphs / Only memory access graphs / Only μhb graphs ] capture the concept of performing location (i.e., pipeline stage).

#### 6. Short questions on Developing Research – 5 points

A. [3 pts] List three things you can do during an oral presentation to boost and promote the engagement of your audience:

| 1. (<10 words) _ | look at the audience                       |
|------------------|--------------------------------------------|
| 2. (<10 words) _ | Ask a simple question (often with a slide) |
| 3. (<10 words) _ | Offer a relevant (often humorous) anecdote |

B. [2 pts] Working in teams requires many skills beyond engineering knowledge and creativity. Describe one such skill that you learned this semester while working with your project team. Feel free to provide either a general skill you developed or describe a specific situation and how you addressed/solved it.

(<31 words) <u>Answers vary</u>

\_\_\_\_Often students reported having learned to have better intra-team communication, with clear expectations. And having learned to work a detailed project schedule\_\_\_\_\_

# 7. Research papers – 22 points

This semester we studied 25 research papers in class:

Papers presented by students:

- [Ais11a] "A systematic methodology to develop resilient cache coherence protocols"
- [Cak15] "Hardware Trojan detection for gate-level ICs using signal correlation based clustering"
- [Par06] "Exploring fault-tolerant network-on-chip architectures"
- [Fou11] "Accelerating microprocessor silicon validation by exposing ISA diversity"
- [Mei07] "Argus: low-cost, comprehensive error detection in simple cores"
- [Pow09] "Architectural core salvaging in a multi-core processor for hard-error tolerance"
- [Mar99] "GRASP: a search algorithm for propositional satisfiability"
- [Fos15] "Trends in functional verification: a 2014 industry study"
- [Lus15] "ArMOR: defending against consistency model mismatches in heterogeneous architectures"
- [Zha14] "PVCoherence: designing flat coherence protocols for scalable verification"
- [Adi11] "Threadmill: a post-silicon exerciser for multi-threaded processors"
- [Hon10] "QED: quick error detection tests for effective post-silicon validation"
- [Con13] "Automatic concolic test generation with virtual prototypes for post-silicon validation"
- [Deh15] "Transaction-based online debug for NoC-based multiprocessor SoCs"
- [Che08] "Runtime validation of memory ordering using constraint graph checking"
- [Kan09] "Decoupling dynamic information flow tracking with a dedicated coprocessor"
- [Ran15] "Raccoon: closing digital side-channels through obfuscated execution"

Papers introduced by the instructors:

- [Abd11] "Functional correctness for CMP interconnects"
- [Par11] "Formally enhanced runtime verification to ensure NoC functional correctness"
- [Lee14] "Brisk and limited-impact NoC routing reconfiguration"
- [Ais11b] "ARIADNE: agnostic reconfiguration in a disconnected network environment"
- [Aus99] "DIVA: a reliable substrate for deep submicron microarchitecture design"
- [Shy06] "Ultra low-cost defect protection for microprocessor pipelines"
- [Mam15] "Post-silicon validation of multiprocessor memory consistency"
- [Lus14] "PipeCheck: specifying and verifying microarchitectural enforcement of memory consistency models"

- A. [4pts] Which of the papers above... (provide acronyms from the list above):
  - 1. Proposes a fast algorithm for formal verification engines (1 paper): <u>Mar99</u>
  - 2. Generates self-checking post-silicon tests (1 paper): \_\_\_\_\_\_Fou11\_\_\_\_ (in Adi11 the system is self-checking, not the tests)
  - 3. Protects processor cores against transient faults (3 papers):

\_Mei07, Aus99, Shy06\_

(protects processor cores, not any uncore – NoC or memory)

- B. [0pts] Think of the paper that you presented in class. Which is it? \_varies\_
- C. [4pts] What is the key idea proposed in the paper you presented? Describe it in 50 words or less. Below is an example for [Lee14]: "Upon each new fault, [Lee14] generates localized routing detours by modifying the underlying segment-based routing function and leveraging pre-computed metadata stored at each node."

Key idea of the paper you presented: **WRITE NEATLY <51 words** 

Answers vary – points are taken off for poor English grammar and/or structure sentence

D. [6 pts] Connect each paper one the left with one idea on the right by drawing a line (this list only includes papers presented by students):

[Ais11a] equivalent instructions in ISAs [Cak15] non-chronological backtracking [Par06] parametric verification of cache coherence protocols [Fou11] design team data for industrial functional verification [Mei07] additional handshaking messages and transient states [Pow09] exploiting cross-core redundancy in multi-core processors [Mar99] redundant executions to reduce error detection latency [Fos15] checkers for control flow, data flow, computation and memory [Lus15] clustering signals based on statistical correlation [Zha14] on-platform test generation for post-silicon [Adi11]flit-based hop-by-hop retransmission [Hon10] precise memory ordering specification tables [Con13] a DIFT coprocessor [Deh15]executing both paths of a branch [Che08] constraint graph checking [Kan09] transaction-based programmable debug units [Ran15]test generation using virtual prototypes

0.5 points removed per incorrect arrow

Choose one paper among [Pow09], [Adi11], [Ran15] or [Che08] for each of the questions below. The paper you choose cannot be the one you presented in class. Be careful in choosing your words, you have a limit budget of words.

E. [4pts] What is the problem that the paper you chose addresses? Here is an example for [Lee14]: "[Lee14] strives to reduce the performance impact of NoC routing reconfiguration, both in the reconfiguration latency and in the number of routers affected."

Paper you chose: \_\_\_\_\_

# Answer: WRITE NEATLY - <31 words

[Pow09] tries to recover from permanent faults (i.e., hard errors) occurring at runtime, and provide high fault coverage compared to microarchitectural core-salvaging solutions.

[Adi11] addresses unique problems in post-silicon validation, such as lack of observability, while improving the utilization of post-silicon validation platforms.

[Ran15] strives to eliminate all digital side-channel attacks that exploit variations in program execution, e.g., two paths of a branch.

[Che08] detects any violation of memory ordering at runtime. Also, it strives to minimize the size (e.g., the number of vertices) of constraint graph.

Students chose: 2 – Che08, 13 – Pow09, 16 – Ran15

F. [4pts] How do the authors evaluate their solution and what is your assessment of the results?

For [Lee14]: "[Lee14] evaluates the cost in network performance caused by their nonoptimal routing, and the speed of recovery from a fault. Overall, their solution is successful in showing that they can recover from faults in much less time than prior solutions."

Paper you chose: (could be different than the one in the prior answer) \_\_\_\_\_\_\_\_\_\_ Answer: WRITE NEATLY - <31 words

[Pow09] evaluates the throughput of the proposed solution, compared to the defectfree configuration. The solution successfully provides high throughput close to the defect-free one. [Adi11] provides their experience during the verification of POWER7 processor. While the paper does not provide significant experimental data, their solution is successful by greatly speeding-up the post-silicon validation.

[Ran15] assesses their solution by measuring the overhead across a set of benchmarks. The solution entails a 16.1X overhead on average, which is a prohibitively significant slow-down.

[Che08] measures (1) the size of the constraint graph, and (2) overheads (e.g., traffic, hardware). The solution successfully keeps the size of graph small. Also the traffic only increases less than 10% overall.

Students chose: 1 - Adi11, 3 - Che08, 14 - Ran15, 13 - Pow09

# 8. Circuit to CNF - 5 points

Express the function the gate computes in clausal normal form. Use the variables as indicated in the diagram, where a, b, c and d are inputs and z is an output. Your solution should only use variables a, b, c, d, z, no additional variables. **Show your work to receive credit – correct answers not showing the work will not be credited**.

[Be careful in your Boolean logic manipulation! This problem should be quick if use the properties of Boolean algebra!]



$$(z = abcd)$$
  

$$\equiv (z \to abcd)(abcd \to z)$$
  

$$\equiv (\bar{z} + abcd)(\overline{abcd} + z)$$
  

$$\equiv (\bar{z} + a)(\bar{z} + b)(\bar{z} + c)(\bar{z} + d)(\bar{a} + \bar{b} + \bar{c} + \bar{d} + z)$$

Final CNF expression for the special gate:

 $(\bar{z}+a)(\bar{z}+b)(\bar{z}+c)(\bar{z}+d)(\bar{a}+\bar{b}+\bar{c}+\bar{d}+z)$ 

#### 9. Memory accesses - 7 points

Consider the multithreaded program snippet below:



where X and Y are memory locations and "ST 1, X" means "write 1 to memory location X", while "LD r1, X" means "load the contents of memory location X to register r1".

A. [2pts] Assuming that this program executes on a *sequentially consistent* multicore, draw edges on the diagram above with a *continuous line* to represent orderings that must be enforced by this consistency model.

Now assume that initially, the content of X and Y is 0 and that the outcome of the execution is  $r_1 = 1$ ,  $r_2 = 1$ ,  $r_3 = 0$ 

r1 = 1, r2 = 1, r3 = 0

- B. [3pts] Draw additional edges on the diagram above with a *dashed line* to represent the execution flow that led to the outcome specified.
- C. [0pts] Does the outcome reveal an incorrect execution?

YES / NO

D. [2pts] Explain why in <10 words

The graph contains a cycle

#### 10. Logic simulation- 10 points

Consider the circuit below:



Levelize the netlist.

A. [1pt] How many levels are in this levelized netlist? \_4\_\_\_\_

B. [3pts] In the table on the right list all the gates in each level, by reporting their letter name:

| Level | Gates   |
|-------|---------|
| 1     | A, E, H |
| 2     | F, D, C |
| 3     | В       |
| 4     | J, G    |
| 5     | -       |
| 6     | -       |

Oblivious vs. event-driven simulation

At time 0 the input values are all set to 1 and the circuit is stable. The clock cycle time of this circuit is 100ns. Assume that each gate has a propagation delay of 4ns.

C. [2pts] What are the stable values at the output of each internal gate? Fill the table on the right:

| Gate | Output value @0ns |
|------|-------------------|
| А    | 0                 |
| В    | 0                 |
| С    | 1                 |
| D    | 1                 |
| E    | 0                 |
| F    | 1                 |
| G    | 0                 |
| Н    | 1                 |
| J    | 1                 |

At time 50ns, inputs  $x_2$  and  $x_3$  change at the same time to value 0 and then nothing changes again until time 150ns.

D. [2pts] If you have an oblivious simulator, how many gates must be simulated to compute the new stable values (this computation will occur at the clock tick)?

\_\_\_\_9\_\_\_\_

E. [2pts] If you have an event-driven simulator, how many gates must be simulated to compute the new stable values?

\_\_\_\_5\_\_\_\_

(J is not simulated because B's output does not change)