debugging by playback

Kragen kragen-discuss@kragen.dnaco.net
Mon, 14 Dec 1998 17:30:31 -0500 (EST)


Often when I'm debugging, I wish I could ask questions like these:

- Where did this memory get freed?
- When did this value in memory change?
- *Did* this value change?
- How did this value get computed?

Trouble is, it's impossible to answer these questions by looking at the
current state of an application.  You have to look into the past -- or,
more likely, you have to change something so that the next time the
event happens, it will be noticed, and then start over.

Suppose computers had write-once-read-many storage for their RAM.
(This isn't all that far-fetched; two-photon dye storage is one
candidate.  Remember that write-once-read-many storage was the state of
the art from 10,000 BC until about 1930.)  Then you could just
reconstruct the state of the program at any point in the past (since
its current state would have to be written to different parts of RAM.)
You could simply replay some of history, carefully watching for the
event you were interested in.

Then answering any of the above questions would be easy.

But we don't have to wait for write-once-read-many storage to take over.

A program goes through a sequence of states; each state is uniquely
determined by the inputs to the program at that time and the previous
state.  So given the full state of a program at some point and its
input stream, you can replay all of its states from that point into the
future.

So suppose you log all system calls and signals, as well as how a
program is started.  Then you can always start the program over from
the beginning, running it with the same input streams it had before,
and pass through exactly the same sequence of states as you did last
time.

That would enable you, theoretically, to answer the questions above.
In practice, it might take a minute or two or two hundred, and that
wouldn't necessarily be good enough.

So suppose you periodically -- say, every 30 seconds -- checkpoint all
of the program's memory, as well as its registers etc.  Complete
internal state.  (Some operating systems -- e.g. KeyKOS and EROS -- do
this already.)

Then it'll never take more than 30 seconds to replay the program's
state up to some particular point.

It might conceivably be hard to see what the program was doing -- for
example, if it had a GUI, all the system calls that let the program
talk to its GUI would have been redirected to "dummy" system calls that
just return the same thing they returned the first time the program
called them.  But this is gravy, really.

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
Silence may not be golden, but at least it's quiet.  Don't speak unless you
can improve the silence.  I have often regretted my speech, never my silence.
-- Adam Rifkin, <adam@cs.caltech.edu>