International Symposium on Microarchitecture (Micro), Dec 2009.
Ability to replay a program's execution on a multi-processor system
can significantly help parallel programming. To replay a
shared-memory multi-threaded program, existing solutions record the
program input (I/O, DMA, etc.) and the shared-memory
dependencies between threads. Prior processor based record-and-replay
solutions are efficient, but they require non-trivial modifications to
the coherency protocol and the memory sub-system for recording the
shared-memory dependencies.
In this paper, we propose a processor-based record-and-replay solution
that does not require detecting and logging shared-memory dependencies
to enable multi-processor replay. It is based on our insight that, a
load-based checkpointing scheme that records the program input has
sufficient information for deterministically replaying each thread.
We propose an offline symbolic analysis algorithm based on a SMT
solver that determines the shared-memory dependencies using just the
program input logs during replay. In addition to saving log space,
the proposed approach significantly reduces the hardware support
required for enabling replay.