OCaml vs. SBCL, and various other interpreters

Dave Long dave.long at bluewin.ch
Mon Mar 12 11:58:38 EDT 2007


> Bytecode interpreters can offer unsurpassed code compactness, ...[This  
> box has] only 2MiB of L2 cache, and presumably something like
> 16-64KiB of L1 cache.  Thrashing the cache is soundly punished.

One problem, as you point out earlier, is that the question is not size of  
native code loop vs. size of bytecoded loop, but size of native loop vs.  
size of (bytecoded loop + its working path through the bytecode  
interpreter).

I haven't used MSVC in ages, but their compiler used to have the option to  
compile to a bytecode -- still, I think this was only ever useful for  
space, not for speed.  (for that, one must try to arrange the  
functionality to ensure only the outermost loops are being interpreted)

> I know of no other reason to implement an interpreter using bytecode.
>
> So I'm surprised it's such a popular thing to do!  I think the reason
> is probably that code space and compilation time used to be quite
> precious resources (not to mention portability), and programmers just
> haven't adjusted to the new realities.

Architecture Neutral Debugging Format.

Generating native code is not that difficult -- at least it doesn't take  
much to generate code that will beat an interpreter.  Debugging native  
code, however, can be a real pain (one either has to understand the  
debugging facilities of both the processor and the OS[0], or one has to  
understand the debugger's favorite symbol format and provide a decent map  
to it -- for bass, I was able to provide some simple gdb macros as long as  
the only register used was a TOS cache, but that broke immediately after  
providing register allocation)

Bytecode allows the debugger to control execution[1], and while bytecode  
constructs still have to be mapped back to source, at least the language  
developer (having done the mapping in the other direction) has an easier,  
single, job than when going all the way down to the iron, where a more  
difficult job must be repeated for each platform.

-Dave

:: :: ::

[0] it is somewhat instructive to look at DEBUG.COM (an artifact of 1980's  
era PCs still shipped with Redmond OS's to this day) for an example of how  
spartan an IDE can be.  Usually one wants an assembler to remember labels,  
and a debugger to remember breakpoints.  DEBUG.COM will turn opcodes into  
machine language, but you're on your own for labels (leading to a style in  
which one tries to minimize entry points, turning code into a collection  
of "loops with tails" and requiring utilities similar to BASIC's RENUM)   
Similarly, DEBUG.COM will handle all the machine level hassle of setting  
breakpoints in code, taking the interrupt, and then restoring the original  
code -- but you type in the list of breakpoints at each step.  Once upon a  
time, people actually developed apps with these bear skins and stone  
knives.

[1] bytecode also makes "eval" almost as trivial as in assembly.  Too much  
dynamism can get one in trouble, though.  Consider the "house of cards"  
cartoon on p.112 of the Smalltalk green book: "here, let me modify Array  
At:"
http://www.iam.unibe.ch/~ducasse/FreeBooks/BitsOfHistory/BitsOfHistory.pdf
In that situation, bytecode can be the difference between hosing a process  
and triple-faulting a box.

> The OCaml version is 24 instructions, 8 of which have immediate
> constants.  I don't know very much about PowerPC assembly, but let's
> suppose that every instruction is 32 bits, including any immediate
> constants; that means the whole function weighs 96 bytes.

Using gas or "objdump -D -b binary" would help if you want more accurate  
numbers.  If I remember properly, you're correct about instructions, but  
not necessarily immediates.

> [the MuP21] executes a stream of 5-bit zero-operand two-stack
> operations packed into 20-bit words.

For more along these lines, see http://www.jwdt.com/~paysan/b16.html

But these sort of games are better played in hardware than in software --  
hardware is great at parallel things (instruction decoding) and pretty  
weak at serial things, while software is the opposite.  (this was a  
traditional reason for bytecode -- to minimize work for dispatch) FSMs  
seem to be where the two meet: to add behavior to hardware (think  
microcode) or to add parallel-pattern-matching to software (think  
parsers), one tends to wind up synthesizing state machines.


More information about the Kragen-discuss mailing list