"tweetable" "symbolic" hex COM loader
Kragen Javier Sitaker
kragen at canonical.org
Mon May 21 01:46:24 EDT 2012
Here's a little analysis of the disassembly.
On Mon, May 21, 2012 at 01:11:18AM -0400, Kragen Javier Sitaker wrote:
> On Wed, May 09, 2012 at 06:39:25PM +0200, Dave Long wrote:
> > Apropos the bootstrapping thread[0], here's another hex loader:
> >
> > 0000000: 31c9 bf00 03ba 8a01 b40a cd21 a18b 013c 1..........!...<
> > 0000010: 047c 17b8 0001 01c8 bb00 0202 1e8c 0189 .|..............
> > 0000020: 07be 8c01 a5a5 9041 ebdb 31c0 a320 0289 .......A..1.. ..
> > 0000030: cd31 c9be 0003 bf00 02bb 0100 31c9 31d2 .1..........1.1.
> > 0000040: b800 42cd 21b8 0040 cd21 ac31 d231 c0ac ..B.!.. at .!.1.1..
> > 0000050: 3c20 740f bb01 0101 cb29 da01 f889 c38b < t......)......
> > 0000060: 0701 c231 c0ac 0c20 d410 d503 2c09 c0e0 ...1... ....,...
> > 0000070: 0401 c2ac 0c20 d410 d503 2c09 01c2 9090 ..... ....,.....
> > 0000080: b402 cd21 4139 e975 c1c3 5000 ...!A9.u..P.
>
> Here's the disassembly, for my benefit and for whoever else is reading this.
>
> kragen at VOSTRO9:~/devel$ objdump -m i8086 -b binary --adjust-vma=0x100 -D loader.com
>
> loader.com: file format binary
>
>
> Disassembly of section .data:
>
> 00000100 <.data>:
> 100: 31 c9 xor %cx,%cx
> 102: bf 00 03 mov $0x300,%di
> 105: ba 8a 01 mov $0x18a,%dx
> 108: b4 0a mov $0xa,%ah
> 10a: cd 21 int $0x21
int 21h function 0ah: buffered input from standard input, with buffer at %dx,
which points just past the end of the program. Not sure what's up with %cx and
%di here.
Note that 105 here is the jump target of the instruction at 128, so the
initialization of %cx and %di is outside of an input loop.
> 10c: a1 8b 01 mov 0x18b,%ax
"number of chars actually read"
> 10f: 3c 04 cmp $0x4,%al
> 111: 7c 17 jl 0x12a
If less than 4 chars read, exit the loop.
> 113: b8 00 01 mov $0x100,%ax
> 116: 01 c8 add %cx,%ax
> 118: bb 00 02 mov $0x200,%bx
> 11b: 02 1e 8c 01 add 0x18c,%bl
> 11f: 89 07 mov %ax,(%bx)
We're computing a two-byte value here in %ax to store in memory at %bx. %bx is
going to be 0x200 plus whatever was stored at 0x18c, which was the first byte
of input. So we're indexing a table at 0x200 with the first byte of input.
%ax is 0x100 plus %cx. %cx started out as 0 before entering the loop and gets
incremented each time through the loop, and I guess probably the system call
doesn't clobber it, so it's the line number. So this stores the current line
number (or, equivalently, output offset) in a table entry indexed by the first
byte of input.
It seems a little alarming that we're storing a two-byte line number/byte
offset in a single-byte table entry. I suppose that's not a problem as long as
your labels are always at least two letters apart... but isn't there an x86
addressing mode that makes that problem easier? So you could do `mov %ax,
[0x200+2*bx]` or something, with just the input byte in bx? Probably then
you'd want to initialize %di to 0x400 in case somebody wants to use extended
ASCII labels.
> 121: be 8c 01 mov $0x18c,%si
> 124: a5 movsw %ds:(%si),%es:(%di)
> 125: a5 movsw %ds:(%si),%es:(%di)
Now we append the first four bytes of input to the buffer at %di, which was
initialized to 0x300.
> 126: 90 nop
> 127: 41 inc %cx
> 128: eb db jmp 0x105
Okay, so that's the end of the input loop. From here we have straight-line
code until the output loop.
> 12a: 31 c0 xor %ax,%ax
> 12c: a3 20 02 mov %ax,0x220
Wiping out the definition of the "space" label.
> 12f: 89 cd mov %cx,%bp
Okay, so the total output size goes into %bp.
> 131: 31 c9 xor %cx,%cx
> 133: be 00 03 mov $0x300,%si
We're gonna be copying from the stored input program text?
> 136: bf 00 02 mov $0x200,%di
...into the symbol table?
> 139: bb 01 00 mov $0x1,%bx
> 13c: 31 c9 xor %cx,%cx
That seems a little redundant. %cx is already pretty zeroed.
> 13e: 31 d2 xor %dx,%dx
> 140: b8 00 42 mov $0x4200,%ax
> 143: cd 21 int $0x21
42h is lseek: set current file position. 00h in %al is from the start of the
file. 1 in %bx is fd 1, stdout. %cx:%dx = 0 is the offset from the start of
the file. Not yet sure why this lseek is useful; isn't that where you normally
start writing the output if it's been redirected?
> 145: b8 00 40 mov $0x4000,%ax
> 148: cd 21 int $0x21
0x40 is write(), which is somewhat unexpected, since we haven't done any
decoding yet. %cx is the number of bytes to write, which is presumably still
0. So this is sort of a mystery, maybe leftover code? Or maybe I screwed up
the disassembly? It's the end of the straight-line code; the output loop
starts here, which I still haven't really begun to analyze; perhaps tomorrow:
> 14a: ac lods %ds:(%si),%al
> 14b: 31 d2 xor %dx,%dx
> 14d: 31 c0 xor %ax,%ax
> 14f: ac lods %ds:(%si),%al
> 150: 3c 20 cmp $0x20,%al
> 152: 74 0f je 0x163
> 154: bb 01 01 mov $0x101,%bx
> 157: 01 cb add %cx,%bx
> 159: 29 da sub %bx,%dx
> 15b: 01 f8 add %di,%ax
> 15d: 89 c3 mov %ax,%bx
> 15f: 8b 07 mov (%bx),%ax
> 161: 01 c2 add %ax,%dx
> 163: 31 c0 xor %ax,%ax
> 165: ac lods %ds:(%si),%al
> 166: 0c 20 or $0x20,%al
> 168: d4 10 aam $0x10
> 16a: d5 03 aad $0x3
> 16c: 2c 09 sub $0x9,%al
> 16e: c0 e0 04 shl $0x4,%al
> 171: 01 c2 add %ax,%dx
> 173: ac lods %ds:(%si),%al
> 174: 0c 20 or $0x20,%al
> 176: d4 10 aam $0x10
> 178: d5 03 aad $0x3
> 17a: 2c 09 sub $0x9,%al
> 17c: 01 c2 add %ax,%dx
> 17e: 90 nop
> 17f: 90 nop
> 180: b4 02 mov $0x2,%ah
> 182: cd 21 int $0x21
2h is "write character (in %dl) to stdout".
> 184: 41 inc %cx
> 185: 39 e9 cmp %bp,%cx
> 187: 75 c1 jne 0x14a
> 189: c3 ret
> 18a: 50 push %ax
> ...
Kragen
More information about the Kragen-discuss
mailing list