"tweetable" "symbolic" hex COM loader

Kragen Javier Sitaker kragen at canonical.org
Thu May 24 00:40:15 EDT 2012


Further bootstrap assembler disassembly analysis.

On Mon, May 21, 2012 at 01:11:18AM -0400, Kragen Javier Sitaker wrote:
>     Disassembly of section .data:
> 
>     00000100 <.data>:
>      100:   31 c9                   xor    %cx,%cx
>      102:   bf 00 03                mov    $0x300,%di
>      105:   ba 8a 01                mov    $0x18a,%dx
>      108:   b4 0a                   mov    $0xa,%ah
>      10a:   cd 21                   int    $0x21
>      10c:   a1 8b 01                mov    0x18b,%ax
>      10f:   3c 04                   cmp    $0x4,%al
>      111:   7c 17                   jl     0x12a
>      113:   b8 00 01                mov    $0x100,%ax
>      116:   01 c8                   add    %cx,%ax
>      118:   bb 00 02                mov    $0x200,%bx
>      11b:   02 1e 8c 01             add    0x18c,%bl
>      11f:   89 07                   mov    %ax,(%bx)
>      121:   be 8c 01                mov    $0x18c,%si
>      124:   a5                      movsw  %ds:(%si),%es:(%di)
>      125:   a5                      movsw  %ds:(%si),%es:(%di)
>      126:   90                      nop    

I didn't previously note this, but this nop suggests that this code was not
written with an assembler.  nop-padding allows you to expand and shrink code
sections without recalculating addresses.  (If the 8086 had been designed with
this in mind, the designers might have chosen to multiply one-byte relative
jump offsets by 2, necessitating an extra nop half the time, but eliminating
almost all occurrences of foo: jz bar; jmp baz; bar: ...)

>      127:   41                      inc    %cx
>      128:   eb db                   jmp    0x105
>      12a:   31 c0                   xor    %ax,%ax
>      12c:   a3 20 02                mov    %ax,0x220
>      12f:   89 cd                   mov    %cx,%bp
>      131:   31 c9                   xor    %cx,%cx
>      133:   be 00 03                mov    $0x300,%si
>      136:   bf 00 02                mov    $0x200,%di
>      139:   bb 01 00                mov    $0x1,%bx
>      13c:   31 c9                   xor    %cx,%cx
>      13e:   31 d2                   xor    %dx,%dx
>      140:   b8 00 42                mov    $0x4200,%ax
>      143:   cd 21                   int    $0x21
>      145:   b8 00 40                mov    $0x4000,%ax
>      148:   cd 21                   int    $0x21

Okay, so here's our output loop.  %si starts at 0x300, the buffer where we
had copied four bytes per line.  %di starts at 0x200, the base address of the
label table, and doesn't change during the loop.  I think %cx and %dx start out
as 0.

>      14a:   ac                      lods   %ds:(%si),%al

I think this is just a way to increment %si (clobbering %al).  The label
definitions from this column were stored into the table at 0x200 during the
input pass.

>      14b:   31 d2                   xor    %dx,%dx

Zero %dx for what follows.

>      14d:   31 c0                   xor    %ax,%ax
>      14f:   ac                      lods   %ds:(%si),%al

This is the label reference column.

>      150:   3c 20                   cmp    $0x20,%al
>      152:   74 0f                   je     0x163

If it was space, we skip the following.  This suggests that the zeroing of the
' ' (and, inadvertently '!') label is unnecessary.

>      154:   bb 01 01                mov    $0x101,%bx  # bx: v1 = 0x101
>      157:   01 cb                   add    %cx,%bx     # bx: v2 = v1 + %cx
>      159:   29 da                   sub    %bx,%dx     # dx: v3 = 0 - v2
>      15b:   01 f8                   add    %di,%ax     # ax: v4 = label + 0x200
>      15d:   89 c3                   mov    %ax,%bx     # bx: v4
>      15f:   8b 07                   mov    (%bx),%ax   # ax: v5 = mem[v4]
>      161:   01 c2                   add    %ax,%dx     # dx: v6 = v5 + v3

Whew.  That's pretty tricky.  At the end, %dx is mem[label + 0x200] - (0x101 +
%cx).  The last part is presumably the address where we're currently
assembling, and the first part is the value of the referenced label.

I wonder if there's a simpler way to write the above.

End of ' ' conditional.  If we skipped it, %dx's value is still zero.

>      163:   31 c0                   xor    %ax,%ax
>      165:   ac                      lods   %ds:(%si),%al

Okay, so now we have the first of the two hex bytes.

>      166:   0c 20                   or     $0x20,%al
>      168:   d4 10                   aam    $0x10
>      16a:   d5 03                   aad    $0x3
>      16c:   2c 09                   sub    $0x9,%al

This is presumably a very clever way to convert hexadecimal digits into a
binary nibble.  I've never learned enough about AAM and AAD to make use of
them.

>      16e:   c0 e0 04                shl    $0x4,%al

This moves that nibble to the high nibble of %al.

>      171:   01 c2                   add    %ax,%dx

And here's the "relocation" where we adjust the jump target, if that's what it
is.  Also we get that nibble safely into %dx.

>      173:   ac                      lods   %ds:(%si),%al
>      174:   0c 20                   or     $0x20,%al
>      176:   d4 10                   aam    $0x10
>      178:   d5 03                   aad    $0x3
>      17a:   2c 09                   sub    $0x9,%al

Same conversion for the second nibble, but without moving it to the high
nibble.

>      17c:   01 c2                   add    %ax,%dx

So now we have our byte in %dl.

>      17e:   90                      nop    
>      17f:   90                      nop    
>      180:   b4 02                   mov    $0x2,%ah
>      182:   cd 21                   int    $0x21

Which is where int 21h function 02h writes it to stdout from.  The interrupt
list I found warns that %dl=0x09 will get converted to some 0x20s, but that
can't be right, or this program would fail to assemble itself.

>      184:   41                      inc    %cx

So we're keeping track of the bytes output in %cx.

>      185:   39 e9                   cmp    %bp,%cx
>      187:   75 c1                   jne    0x14a
>      189:   c3                      ret    

So we exit the loop if the number of lines converted is the same as the number
of lines read, and then ret to exit to DOS.

>      18a:   50                      push   %ax

That's really the buffer size --- 80 bytes --- not an instruction.

So, in conclusion, very impressive.  Thank you for sharing that.

My temptation to write a one-pass stack-based octal version is even greater now
:).  It probably can't be under 40 bytes, but maybe it could be under 70.  And
it could be used to write the above.

Kragen


More information about the Kragen-discuss mailing list