"tweetable" "symbolic" hex COM loader
Kragen Javier Sitaker
kragen at canonical.org
Thu May 24 00:40:15 EDT 2012
Further bootstrap assembler disassembly analysis.
On Mon, May 21, 2012 at 01:11:18AM -0400, Kragen Javier Sitaker wrote:
> Disassembly of section .data:
> 00000100 <.data>:
> 100: 31 c9 xor %cx,%cx
> 102: bf 00 03 mov $0x300,%di
> 105: ba 8a 01 mov $0x18a,%dx
> 108: b4 0a mov $0xa,%ah
> 10a: cd 21 int $0x21
> 10c: a1 8b 01 mov 0x18b,%ax
> 10f: 3c 04 cmp $0x4,%al
> 111: 7c 17 jl 0x12a
> 113: b8 00 01 mov $0x100,%ax
> 116: 01 c8 add %cx,%ax
> 118: bb 00 02 mov $0x200,%bx
> 11b: 02 1e 8c 01 add 0x18c,%bl
> 11f: 89 07 mov %ax,(%bx)
> 121: be 8c 01 mov $0x18c,%si
> 124: a5 movsw %ds:(%si),%es:(%di)
> 125: a5 movsw %ds:(%si),%es:(%di)
> 126: 90 nop
I didn't previously note this, but this nop suggests that this code was not
written with an assembler. nop-padding allows you to expand and shrink code
sections without recalculating addresses. (If the 8086 had been designed with
this in mind, the designers might have chosen to multiply one-byte relative
jump offsets by 2, necessitating an extra nop half the time, but eliminating
almost all occurrences of foo: jz bar; jmp baz; bar: ...)
> 127: 41 inc %cx
> 128: eb db jmp 0x105
> 12a: 31 c0 xor %ax,%ax
> 12c: a3 20 02 mov %ax,0x220
> 12f: 89 cd mov %cx,%bp
> 131: 31 c9 xor %cx,%cx
> 133: be 00 03 mov $0x300,%si
> 136: bf 00 02 mov $0x200,%di
> 139: bb 01 00 mov $0x1,%bx
> 13c: 31 c9 xor %cx,%cx
> 13e: 31 d2 xor %dx,%dx
> 140: b8 00 42 mov $0x4200,%ax
> 143: cd 21 int $0x21
> 145: b8 00 40 mov $0x4000,%ax
> 148: cd 21 int $0x21
Okay, so here's our output loop. %si starts at 0x300, the buffer where we
had copied four bytes per line. %di starts at 0x200, the base address of the
label table, and doesn't change during the loop. I think %cx and %dx start out
> 14a: ac lods %ds:(%si),%al
I think this is just a way to increment %si (clobbering %al). The label
definitions from this column were stored into the table at 0x200 during the
> 14b: 31 d2 xor %dx,%dx
Zero %dx for what follows.
> 14d: 31 c0 xor %ax,%ax
> 14f: ac lods %ds:(%si),%al
This is the label reference column.
> 150: 3c 20 cmp $0x20,%al
> 152: 74 0f je 0x163
If it was space, we skip the following. This suggests that the zeroing of the
' ' (and, inadvertently '!') label is unnecessary.
> 154: bb 01 01 mov $0x101,%bx # bx: v1 = 0x101
> 157: 01 cb add %cx,%bx # bx: v2 = v1 + %cx
> 159: 29 da sub %bx,%dx # dx: v3 = 0 - v2
> 15b: 01 f8 add %di,%ax # ax: v4 = label + 0x200
> 15d: 89 c3 mov %ax,%bx # bx: v4
> 15f: 8b 07 mov (%bx),%ax # ax: v5 = mem[v4]
> 161: 01 c2 add %ax,%dx # dx: v6 = v5 + v3
Whew. That's pretty tricky. At the end, %dx is mem[label + 0x200] - (0x101 +
%cx). The last part is presumably the address where we're currently
assembling, and the first part is the value of the referenced label.
I wonder if there's a simpler way to write the above.
End of ' ' conditional. If we skipped it, %dx's value is still zero.
> 163: 31 c0 xor %ax,%ax
> 165: ac lods %ds:(%si),%al
Okay, so now we have the first of the two hex bytes.
> 166: 0c 20 or $0x20,%al
> 168: d4 10 aam $0x10
> 16a: d5 03 aad $0x3
> 16c: 2c 09 sub $0x9,%al
This is presumably a very clever way to convert hexadecimal digits into a
binary nibble. I've never learned enough about AAM and AAD to make use of
> 16e: c0 e0 04 shl $0x4,%al
This moves that nibble to the high nibble of %al.
> 171: 01 c2 add %ax,%dx
And here's the "relocation" where we adjust the jump target, if that's what it
is. Also we get that nibble safely into %dx.
> 173: ac lods %ds:(%si),%al
> 174: 0c 20 or $0x20,%al
> 176: d4 10 aam $0x10
> 178: d5 03 aad $0x3
> 17a: 2c 09 sub $0x9,%al
Same conversion for the second nibble, but without moving it to the high
> 17c: 01 c2 add %ax,%dx
So now we have our byte in %dl.
> 17e: 90 nop
> 17f: 90 nop
> 180: b4 02 mov $0x2,%ah
> 182: cd 21 int $0x21
Which is where int 21h function 02h writes it to stdout from. The interrupt
list I found warns that %dl=0x09 will get converted to some 0x20s, but that
can't be right, or this program would fail to assemble itself.
> 184: 41 inc %cx
So we're keeping track of the bytes output in %cx.
> 185: 39 e9 cmp %bp,%cx
> 187: 75 c1 jne 0x14a
> 189: c3 ret
So we exit the loop if the number of lines converted is the same as the number
of lines read, and then ret to exit to DOS.
> 18a: 50 push %ax
That's really the buffer size --- 80 bytes --- not an instruction.
So, in conclusion, very impressive. Thank you for sharing that.
My temptation to write a one-pass stack-based octal version is even greater now
:). It probably can't be under 40 bytes, but maybe it could be under 70. And
it could be used to write the above.
More information about the Kragen-discuss