From address at example.com Tue May 13 13:31:21 2008 From: address at example.com (Kragen Sitaker) Date: Tue May 13 13:32:56 2008 Subject: testing an antispam measure Message-ID: <20080513173137.BF01E18346C@panacea.canonical.org> This message was sent to kragen-tol@canonical.org> and (There is an HTML version at .) On IRC, [Aristotle Pagaltzis](http://plasmasturm.org) was pondering how much performance variable-width encodings such as UTF-8 actually cost, because it's commonly suggested that fixed-width encodings such as ISO-8859-1 and UCS-4 are much faster. He suggested: > Huh, it just occurs to me that `strlen` is not at all expensive on > UTF-8-encoded strings. Not exactly as fast, but if you write it > in asm, it only takes one extra instruction to count characters in > UTF-8 vs those in an 8-bit encoding, per character. So, if you > factor in cache misses, it should make no measurable > difference. All you lose with a variable-width encoding is direct > random access to arbitrary indices in the string, which is > basically a non-use case. It turned out that he was partly wrong, but mostly right. And along the way, we discovered that GCC's standard implementation of `strlen` was quite pessimal. I'm using Linux on a 700MHz Pentium III laptop with GCC 4.1.2, using just the `-O` flag unless otherwise specified. My First Assembly Version ------------------------- First I thought about how to write `strlen`, and came up with this: .global my_strlen_s my_strlen_s: push %esi cld mov 8(%esp), %esi xor %ecx, %ecx ## repnz lodsb doesn't work because lodsb doesn't update ZF loop: lodsb test %al, %al loopnz loop mov %ecx, %eax not %eax pop %esi ret > For those who aren't well-versed in 80386 assembly language: > > - `lodsb` reads a byte from memory at `%esi`, puts it in `%al`, and > increments `%esi`; > - `loopnz` decrements `%ecx` each time through the loop, and jumps > back to the label `loop:` if `%ecx` isn't zero and if the zero > flag `ZF` isn't set (that's the "nz" part); > - `test %al, %al` sets the zero flag `ZF` if `%al` is zero (the C > string terminator), among other things. > - after the loop body has run N times, `%ecx` is -N, so `not %eax` > converts the negative number -N from `%ecx` into a positive number > N-1. > - you have to push `%esi` because it's a callee-saves register. (To > my surprise.) > > But there's a `scasb` instruction which I should have used instead > of `lodsb; test`. The inner loop there is three instructions. GCC's Assembly Version ---------------------- Then I looked at how GCC does `strlen`. It turns out it inlines it, even without any extra optimization flags. Here's a sample call, albeit with optimization: 80484d5: bf 00 99 04 08 mov $0x8049900,%edi 80484da: fc cld 80484db: b9 ff ff ff ff mov $0xffffffff,%ecx 80484e0: b0 00 mov $0x0,%al 80484e2: f2 ae repnz scas %es:(%edi),%al 80484e4: f7 d1 not %ecx 80484e6: 49 dec %ecx 80484e7: 89 4c 24 04 mov %ecx,0x4(%esp) There the inner loop is just the `repnz scas`. I'd forgotten about `SCAS`. A C Version ----------- Here's a reasonable `strlen` in C: int my_strlen(char *s) { int i = 0; while (*s++) i++; return i; } This compiles to the following: 080483c4 : 80483c4: 55 push %ebp 80483c5: 89 e5 mov %esp,%ebp 80483c7: 8b 55 08 mov 0x8(%ebp),%edx 80483ca: b8 00 00 00 00 mov $0x0,%eax 80483cf: 80 3a 00 cmpb $0x0,(%edx) 80483d2: 74 0c je 80483e0 80483d4: b8 00 00 00 00 mov $0x0,%eax 80483d9: 40 inc %eax 80483da: 80 3c 10 00 cmpb $0x0,(%eax,%edx,1) 80483de: 75 f9 jne 80483d9 80483e0: 5d pop %ebp 80483e1: c3 ret So here the inner loop is again three instructions: `inc %eax`; `cmpb 0, (%eax,%edx,1)`; `jne`. It's been optimized down to `while (s[i]) i++;`. The loop termination test is duplicated above the top of the loop for the empty-string case. My UTF-8 Assembly Version ------------------------- So then I thought about how to do what Aristotle was suggesting. In UTF-8, bytes that start new characters begin either with binary 0 or binary 11; the second and subsequent bytes of multibyte characters have binary 10 as their high bits. So to count the characters, you just have to count the bytes that don't begin with binary 10. I tried this: .global my_strlen_utf8_s my_strlen_utf8_s: push %esi cld mov 8(%esp), %esi xor %ecx, %ecx loop2: lodsb test $0x80, %al jz ascii # format 0xxx xxxx test $0x40, %al jz loop2 # format 10xx xxxx: doesn't start new char ascii: test %al, %al loopnz loop2 mov %ecx, %eax not %eax pop %esi ret So here we jump back to `loop2`, rather than decrementing `%ecx` with `loopnz`, in the case where the byte starts with 10. And we can skip the `test %al, %al` test, since 0000 0000 doesn't start with 10. The inner loop of this version has 5 instructions, including two taken conditional branches, in the usual ASCII case, and 7 instructions for non-ASCII bytes, rather than 3 instructions per byte. That's only two extra instructions in the "usual" case, but if every instruction were one cycle, that would still be a 67% increase in run-time. Counting instructions and adding up their cycle count isn't a very accurate way to measure performance in these superscalar days, though. My UTF-8 C Version ------------------ A C version looks like this: int my_strlen_utf8_c(char *s) { int i = 0, j = 0; while (s[i]) { if ((s[i] & 0xc0) != 0x80) j++; i++; } return j; } GCC compiles it to this: 080483e2 : 80483e2: 55 push %ebp 80483e3: 89 e5 mov %esp,%ebp 80483e5: 8b 55 08 mov 0x8(%ebp),%edx 80483e8: 0f b6 02 movzbl (%edx),%eax 80483eb: b9 00 00 00 00 mov $0x0,%ecx 80483f0: 84 c0 test %al,%al 80483f2: 74 23 je 8048417 80483f4: b9 00 00 00 00 mov $0x0,%ecx 80483f9: 0f be c0 movsbl %al,%eax 80483fc: 25 c0 00 00 00 and $0xc0,%eax 8048401: 3d 80 00 00 00 cmp $0x80,%eax 8048406: 0f 95 c0 setne %al 8048409: 0f b6 c0 movzbl %al,%eax 804840c: 01 c1 add %eax,%ecx 804840e: 0f b6 42 01 movzbl 0x1(%edx),%eax 8048412: 42 inc %edx 8048413: 84 c0 test %al,%al 8048415: 75 e2 jne 80483f9 8048417: 89 c8 mov %ecx,%eax 8048419: 5d pop %ebp 804841a: c3 ret An inner loop of 10 instructions --- but containing only a single conditional jump, the `jne` at the bottom. It uses the `and`; `cmp`; `setne`; `movzbl` sequence to put either a 0 or a 1 into `%eax`, depending on whether the byte fetched began with 10, and adds the result into `%ecx` each time through the loop. Aristotle's UTF-8 Assembly Version ---------------------------------- So after all this, I chatted with Aristotle some more, and it turned out he had a much cleverer trick up his sleeve than I had thought --- or, in fact, than he had thought. He wrote: > But wow, my code is *much* faster than any of the other variants. > Unexpectedly. Here's his version: .global ap_strlen_utf8_s ap_strlen_utf8_s: push %esi cld mov 8(%esp), %esi xor %ecx, %ecx loopa: dec %ecx loopb: lodsb shl $1, %al js loopa jc loopb jnz loopa mov %ecx, %eax not %eax pop %esi ret In this case, the inner loop is 6 instructions, but as few as 3 of them can execute. I hadn't realized that you could get the top two bits of a byte into the carry and sign flags with a single `shl` instruction like that! Aristotle explains: > `Js` catches all bytes of the form x1xxxxxx. `Jc` catches 1xxxxxxx, > but because `js` came first, that can only have been 10xxxxxx; and > `jnz` then catches all 00xxxxxx other than all-0. This runs about > 3x as fast as your `my_strlen_s` --- most of the time, anyway. Performance Results ------------------- So how do these different approaches fare? I wrote a program that creates a 32MB string and timed the different functions on it, in seconds, using wall-clock time. Here are the results from one run, sorted with `sort -t: -k1 -k3 -ns`. The first few lines are various functions' return values on the given strings. "": 0 0 0 0 0 0 "hello, world": 12 12 12 12 12 12 "na?ve": 6 6 6 5 5 5 "?????": 15 15 15 5 5 5 1: all 'a': 1: my_strlen(string) = 33554431: 0.227555 1: ap_strlen_utf8_s(string) = 33554431: 0.299494 1: strlen(string) = 33554431: 0.314887 1: my_strlen_utf8_c(string) = 33554431: 0.380355 1: my_strlen_s(string) = 33554431: 0.432079 1: my_strlen_utf8_s(string) = 33554431: 0.525443 2: all '\xe3': 2: my_strlen(string) = 33554431: 0.224037 2: ap_strlen_utf8_s(string) = 33554431: 0.299537 2: strlen(string) = 33554431: 0.311552 2: my_strlen_utf8_c(string) = 33554431: 0.378162 2: my_strlen_s(string) = 33554431: 0.436755 2: my_strlen_utf8_s(string) = 33554431: 0.589165 3: all '\x81': 3: my_strlen(string) = 33554431: 0.225011 3: ap_strlen_utf8_s(string) = 0: 0.313525 3: strlen(string) = 33554431: 0.316182 3: my_strlen_utf8_s(string) = 0: 0.322959 3: my_strlen_utf8_c(string) = 0: 0.390958 3: my_strlen_s(string) = 33554431: 0.432342 The 33554431 and 0 numbers are the return values; this ensures that GCC doesn't optimize out the `strlen` call. So, on my CPU, the C version of `strlen` took about 28% less time than the built-in inlined one for this long string; it only uses two registers instead of the three used by the built-in inlined one (the one that uses `repnz scasb`); and they both seem to be about 12 bytes. I don't know why GCC inlines the worse one. Most likely it used to be faster than whatever GCC generated at the time and hasn't been revisited. It's worth noting that while my C version of `strlen` was always faster than the built-in version, Aristotle's UTF-8 version was always in between. On Aristotle's Core 2 Duo 1.8GHz (also with GCC 4.1.2 and `-O`), the difference was very much greater. Here are his results: "": 0 0 0 0 0 0 "hello, world": 12 12 12 12 12 12 "na?ve": 6 6 6 5 5 5 "?????": 15 15 15 5 5 5 1: all 'a': 1: my_strlen(string) = 33554431: 0.025906 1: ap_strlen_utf8_s(string) = 33554431: 0.039629 1: my_strlen_utf8_c(string) = 33554431: 0.096041 1: strlen(string) = 33554431: 0.114821 1: my_strlen_s(string) = 33554431: 0.116529 1: my_strlen_utf8_s(string) = 33554431: 0.132648 2: all '\xe3': 2: my_strlen(string) = 33554431: 0.025912 2: ap_strlen_utf8_s(string) = 33554431: 0.039583 2: my_strlen_utf8_c(string) = 33554431: 0.095699 2: strlen(string) = 33554431: 0.114452 2: my_strlen_s(string) = 33554431: 0.114622 2: my_strlen_utf8_s(string) = 33554431: 0.136109 3: all '\x81': 3: my_strlen(string) = 33554431: 0.026112 3: my_strlen_utf8_s(string) = 0: 0.039656 3: ap_strlen_utf8_s(string) = 0: 0.039661 3: my_strlen_utf8_c(string) = 0: 0.096416 3: my_strlen_s(string) = 33554431: 0.115327 3: strlen(string) = 33554431: 0.116629 All of this code is online in two files: * [mystrlen.c](http://pobox.com/~kragen/sw/mystrlen.c) * [mystrlen.s](http://pobox.com/~kragen/sw/mystrlen.s) Conclusions ----------- 1. GCC is better at writing x86 assembly than I am. No surprise there. Even when its inner loop is 10 instructions, it beats my three-instruction inner loops for speed. 2. Aristotle is better at writing x86 assembly than GCC is. 3. Aristotle was essentially correct: the penalty for counting UTF-8 characters, or indexing into or iterating over the characters of a UTF-8 string, is very small. 4. However, there is a speed penalty. Although GCC's built-in `strlen` is much slower than Aristotle's function, a straightforward byte-counting C `strlen` compiled with optimization is faster still. 5. GCC should change to use the straightforward byte-counting C `strlen` instead of what it currently inlines. The version of strlen that GCC inlines is worse than the one it compiled from C in every way: it's more instructions, more bytes of machine code, four times slower, and uses more registers (one of which is a callee-saves register!). 6. People probably shouldn't worry about the efficiency of counting and iterating over characters in UTF-8 strings, at least not if they were using null-terminated C strings before. From kragen at pobox.com Thu May 22 03:37:02 2008 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Thu May 22 03:37:03 2008 Subject: running one's desktop inside QEMU? Message-ID: <20080520042720.A7362183436@panacea.canonical.org> So I've been playing with QEMU, which lets you run a virtual computer inside your normal computer. At the moment I'm using it to create a reproducible development environment on a project I'm working on. Among QEMU's features is the ability to save a virtual machine snapshot, which includes the entire state of the virtual computer: memory, CPU, even disk. This seems similar to KeyKOS's checkpointing facility, although it seems to be a bit slower, maybe to the point of being less useful. (It seems to do all of its I/O before continuing to run, rather than doing some kind of copy-on-write. It seems like fork() might be sufficient to get good copy-on-write performance.) (In one case, a VM snapshot of a 128MB VM took 53MB of space.) But suppose you ran your normal GUI session inside of QEMU. Maybe every few minutes, you could do a backup of your live session to a server somewhere nearby. Benefits -------- If you do this, you can transport your GUI session from one machine to another --- something the term "VNC", an abbreviation for "Virtual Network Computer", promised but never delivered. If your machine ever crashes or gets stolen, you can restore from the previous checkpoint; sometimes this might be worth doing even if only a single application within it has crashed. And you can have a large number of GUI sessions for different users in suspended animation on your disk. This kind of thing could give users in, for example, an internet cafe, the freedom to really customize their environment. Rather than storing all of their state on a web site, they could store much of it on the servers at the internet cafe itself, as if they owned it. They could install software, keep their files, and so on; and whenever they came into the cafe, their session would be waiting for them, just as it was when they left it. Problems -------- There are a lot of times these days where you'll want to run stuff that doesn't run very well inside QEMU: MPlayer, Art of Illusion, anything with SSE, MMX, or 3-D acceleration. I think that's kind of a minor problem, since the data involved in that part of the system (the latest frame of a movie, say) is usually quite transient and easy to recreate. Also, it's not uncommon these days for a GUI session to fill up gigabytes of RAM, and all of that RAM could in theory change its contents about once a second. So you still might end up copying all or the vast majority of the memory pages during a snapshot. From kragen at canonical.org Mon May 26 03:37:02 2008 From: kragen at canonical.org (Kragen Javier Sitaker) Date: Mon May 26 03:37:03 2008 Subject: Mounting a whole-disk-encrypted Debian disk on a new computer Message-ID: <20080523223404.58E5B183495@panacea.canonical.org> (This is available in HTML at .) So I installed Debian on a new disk, then stuck my old disk into an external USB enclosure to get the files I realized I had forgotten to copy over. But it didn't just mount automatically the way I expected it to, because the disk was encrypted using Debian Etch's automatic LVM-with-LUKS-and-`dm-crypt`-disk-encryption system, installed with `partman-crypto`. How I Figured It Out -------------------- I ran `dmesg` to see where it was: kragen@thrifty:~/tmp$ dmesg ... usb 1-1: new full speed USB device using uhci_hcd and address 2 usb 1-1: configuration #1 chosen from 1 choice SCSI subsystem initialized Initializing USB Mass Storage driver... scsi0 : SCSI emulation for USB Mass Storage devices usbcore: registered new driver usb-storage USB Mass Storage support registered. usb-storage: device found at 2 usb-storage: waiting for device to settle before scanning Vendor: IC25N040 Model: ATMR04-0 Rev: 0000 Type: Direct-Access ANSI SCSI revision: 00 usb-storage: device scan complete SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) sda: Write Protect is off sda: Mode Sense: 27 00 00 00 sda: assuming drive cache: write through SCSI device sda: 78140160 512-byte hdwr sectors (40008 MB) sda: Write Protect is off sda: Mode Sense: 27 00 00 00 sda: assuming drive cache: write through sda: sda1 sda2 < sda5 > sd 0:0:0:0: Attached scsi disk sda ... So the device was `/dev/sda`, and I recalled that the encrypted LVM group had been in the first logical partition. In the man page for `cryptsetup` (the version of `cryptsetup` that supports LUKS), I found the command `luksOpen`, so I tried: kragen@thrifty:~/tmp$ sudo cryptsetup luksOpen /dev/sda5 externaldisk And I succeeded in entering the key and unlocking the first key slot, so `cryptsetup` created `/dev/mapper/externaldisk`. Unfortunately the contents were a logical volume group, not a filesystem by itself. I looked at `man lvm` and the like for a while without seeing anything relevant to opening up an LVM volume group on an existing device. I did manage to find `lvscan`: kragen@thrifty:~/tmp$ sudo lvscan ACTIVE '/dev/Debian/root' [110.45 GB] inherit ACTIVE '/dev/Debian/swap_1' [1.10 GB] inherit That's the filesystem I'm currently running off of. So LVM is indeed the way to access this stuff (the old setup was set up with the same install CD) and it is not yet opened. I wonder how it happens at boot-time? I think it comes from some script in some `initrd` file stored in `/boot`. kragen@thrifty:~/tmp$ ls /boot config-2.6.18-4-686 lost+found config-2.6.18-5-686 System.map-2.6.18-4-686 config-2.6.18-6-686 System.map-2.6.18-5-686 grub System.map-2.6.18-6-686 initrd.img-2.6.18-4-686 vmlinuz-2.6.18-4-686 initrd.img-2.6.18-4-686.bak vmlinuz-2.6.18-5-686 initrd.img-2.6.18-5-686 vmlinuz-2.6.18-6-686 initrd.img-2.6.18-6-686 kragen@thrifty:~/tmp$ file /boot/initrd.img-2.6.18-6-686 /boot/initrd.img-2.6.18-6-686: gzip compressed data, from Unix, last modified: Tue Feb 19 02:21:30 2008, max compression kragen@thrifty:~/tmp$ gzip -dc /boot/initrd.img-2.6.18-6-686 > initrd.bin kragen@thrifty:~/tmp$ file initrd.bin initrd.bin: ASCII cpio archive (SVR4 with no CRC) So I successfully un`gzip`ped the `initrd` --- and it's a `cpio` file? Being an idiot, I tried extracting it with `cpio -o < initrd.bin`, but that was unsuccessful. Still being an idiot, I thought that perhaps `file` was incorrect in its guess that it was a `cpio` file, and so I tried mounting it as follows: sudo mount -o loop -t cramfs initrd.bin /mnt for type in $(ls /lib/modules/2.6.18-6-686/kernel/fs/); do mount -o loop -t "$type" initrd.bin /mnt done and that was unsuccessful too. I gave up on trying to figure out what the filesystem was and just `grep`ped it: kragen@thrifty:~/tmp$ grep -a lvm initrd.bin |less alias block-major-58-* lvm_mod alias char-major-109-* lvm_mod ... PREREQ="mdadm mdrun lvm2" if [ -e /scripts/local-top/lvm2 ]; then cryptlvm="" lvm=*) cryptlvm=${x#lvm=} ... Looks like gold! > (It turns out that what I *should* have done was `cpio -i < > initrd.bin`; `-o` is for creating `cpio` archives, not extracting > files from them. But I should have made a directory to do it in > first. The scripts excerpted above are `scripts/local-top/lvm` and > `scripts/local-top/cryptroot`.) kragen@thrifty:~/tmp$ less +/lvm initrd.bin Navigating around a bit in there eventually finds me this: FSTYPE='' eval $(fstype < "$NEWROOT") # See if we need to setup lvm on the crypto device if [ "$FSTYPE" = "lvm" ] || [ "$FSTYPE" = "lvm2" ]; then ... if [ -z "$cryptlvm" ]; then ... elif ! activate_vg "/dev/mapper/$cryptlvm"; then echo "cryptsetup: failed to setup lvm device" return 1 fi kragen@thrifty:~/tmp$ locate fstype /usr/lib/klibc/bin/fstype kragen@thrifty:~/tmp$ sudo sh -c '/usr/lib/klibc/bin/fstype < /dev/mapper/externaldisk ' FSTYPE=lvm2 FSSIZE=0 Great! I now know that I really do have an LVM volume group to deal with. What does `activate_vg` do? activate_vg() { local vg vg="${1#/dev/mapper/}" ... vgchange -ay ${vg} return $? } Maybe I can do that. kragen@thrifty:~/tmp$ sudo vgchange -ay externaldisk Volume group "externaldisk" not found kragen@thrifty:~/tmp$ sudo vgchange -ay thrifty 2 logical volume(s) in volume group "thrifty" now active kragen@thrifty:~/tmp$ sudo lvscan ACTIVE '/dev/thrifty/root' [35.92 GB] inherit ACTIVE '/dev/thrifty/swap_1' [1.10 GB] inherit ACTIVE '/dev/Debian/root' [110.45 GB] inherit ACTIVE '/dev/Debian/swap_1' [1.10 GB] inherit Yes! Apparently `-ay` means `--available y` --- that is, make it available. Good thing I remembered that the volume group was called `thrifty`, the hostname of the system it was built for. Is there a way I could have figured that out? kragen@thrifty:~/tmp$ ls /sbin/vg* /sbin/vgcfgbackup /sbin/vgcreate /sbin/vgmerge /sbin/vgs /sbin/vgcfgrestore /sbin/vgdisplay /sbin/vgmknodes /sbin/vgscan /sbin/vgchange /sbin/vgexport /sbin/vgreduce /sbin/vgsplit /sbin/vgck /sbin/vgextend /sbin/vgremove /sbin/vgconvert /sbin/vgimport /sbin/vgrename Only `vgdisplay` looks particularly helpful, and it will only display the volume group *after* we `vgchange -ay` it. So I don't know. (`vgscan`?) kragen@thrifty:~/tmp$ sudo mount /dev/thrifty/root /mnt [in dmesg, not shown on my screen] kjournald starting. Commit interval 5 seconds EXT3 FS on dm-4, internal journal EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. kragen@thrifty:~/tmp$ Summary ------- kragen@thrifty:~/tmp$ dmesg ... sda: sda1 sda2 < sda5 > ... kragen@thrifty:~/tmp$ sudo cryptsetup luksOpen /dev/sda5 externaldisk kragen@thrifty:~/tmp$ sudo sh -c '/usr/lib/klibc/bin/fstype < /dev/mapper/externaldisk ' FSTYPE=lvm2 ... (here you magically guess that the volume group name is `thrifty`) kragen@thrifty:~/tmp$ sudo vgchange -ay thrifty 2 logical volume(s) in volume group "thrifty" now active kragen@thrifty:~/tmp$ sudo lvscan ACTIVE '/dev/thrifty/root' [35.92 GB] inherit ACTIVE '/dev/thrifty/swap_1' [1.10 GB] inherit ACTIVE '/dev/Debian/root' [110.45 GB] inherit ACTIVE '/dev/Debian/swap_1' [1.10 GB] inherit kragen@thrifty:~/tmp$ sudo mount /dev/thrifty/root /mnt And then when you're done: kragen@thrifty:~$ sudo umount /mnt kragen@thrifty:~$ sudo vgchange -an thrifty 0 logical volume(s) in volume group "thrifty" now active kragen@thrifty:~$ sudo cryptsetup luksClose externaldisk From kragen at canonical.org Thu May 29 03:37:02 2008 From: kragen at canonical.org (Kragen Javier Sitaker) Date: Thu May 29 03:37:03 2008 Subject: Notes on Raph Levien's "Io" Programming Language Message-ID: <20080523223404.3B7CB183498@panacea.canonical.org> (Available in HTML at .) (This is distinct and unrelated to Steve Dekorte's "Io" programming language.) The original paper, which I don't have a copy of, is: > Raphael Levien, 1989, "Io: a new programming notation", SIGPLAN > Notices 24(12) December 1989 There is a little material about Io online, including quotes from the paper. From : > ## Coroutines ## > Coroutines are an important concept of computing science, but few > programming notations properly support them. It is surprising how easy > they are to implement in Io. > The idea of coroutines is to have two (or more) routines. When one of > the routines gets to a point where it can no longer proceed (such as, > when it needs more input), it is suspended, and another routine > continues until it, in turn, can no longer continue (such as, when it > has a value to output). Then, it is suspended and another routine is > resumed. > This is used, for example, in creating a stream. A stream carries a > sequence of numbers, without consuming storage. Therefore, it can be > infinite. Even in the case of a finite stream, though, it has an > advantage over a linked list, because computation can begin > immediately after the first number is known. > The Io implementation of streams is analogous to linked lists. A > stream takes two arguments. If there is no more data in the stream, it > performs its first argument. Otherwise, it performs the second > argument, with a data value and the continuation of the stream. > Here we define the operator `count-stream`, and bind an infinite > counting stream to the variable `s`. > count-streamO: ~ x out; > out x ~ null out; > +xl~x; > count-streamO x out. > count-stream: -..) ret; > ret .-9 null full; > count-streamO 0 full. > count-stream ~ s > S has exactly the same structure as a linked list. In fact, > `writelist s` will write `0 1 2 3 4 5...` on the screen. There seem to be some OCR errors here. I think `+x1~x` is supposed to be `+ x 1 ~ x`, and I suspect (from Raphael Finkel's book, see below) that `~` is actually supposed to be `->`. So the definition of count-stream0 should be as follows: count-stream0: -> x out; out x -> null out; + x 1 -> x; count-stream0 x out. In Scheme: (define count-stream0 (lambda (x out) (out x (lambda (null out) (%+ x 1 (lambda (x) (count-stream0 x out))))))) with the following definition of %+: (define (%+ a b cont) (cont (+ a b))) I'm more mystified about the `count-stream` definition. From the text, perhaps the definition is as follows: count-stream: -> ret; ret -> null full; count-stream0 0 full. Because then `s` gets `-> null full; count-stream0 0 full`, which takes two arguments (as the text explains) and hands the second one off to `count-stream0`, which performs it with a data value and the continuation of the stream. Raphael Finkel's 1995/1996 book ["Advanced Programming Language Design"](http://www.nondot.org/sabre/Mirrored/AdvProgLangDesign/), chapter 2, section 3, contains some more examples. write 5; write 6; terminate which means, in Scheme: (write 5 (lambda () (write 6 (lambda () (terminate))))) Then write-twice: -> number; write number; write number; terminate. which means (define write-twice (lambda (number) (write number (lambda () (write number (lambda () (terminate))))))) Then write-twice: -> number return; write number; write number; return. write-twice 7; write 9; terminate Which means (define write-twice (lambda (number return) (write number (lambda () (write number (lambda () (return))))))) (write-twice 7 (lambda () (write 9 (lambda () (terminate))))) Then + 2 3 -> number; write number; terminate which means (%+ 2 3 (lambda (number) (write number (lambda () (terminate))))) Then count: -> start end return; write start; = start end (return); + start 1 -> new-start; count new-start end return. count 1 10; terminate which means (define count (lambda (start end return) (write start (lambda () (%= start end return (lambda () (%+ start 1 (lambda (new-start) (count new-start end return))))))))) with the new definition of %=: (define (%= a b consequent alternate) (if (= a b) (consequent) (alternate))) This is the CPS expansion of this: (define (count start end) (write start) (if (not (= start end)) (count (+ start 1) end))) I don't know why there are parentheses in "= start end (return)" in the Io example. Perhaps it's an error introduced by Finkel. One final example, showing the use of parentheses: make-pair: -> x y return; user (-> client; client x y); return. which means (define make-pair (lambda (x y return) (user (lambda (client) (client x y)) (lambda () (return))))) Here's the definition of writelist mentioned above: writelist: -> list return; list return -> first rest; write first; writelist rest; return. emptylist: -> null notnull; null. cons: -> number list econtinuation; econtinuation -> null notnull; notnull number list. Usefulness ---------- I wouldn't want to program in Io in the raw way described above; it's pretty verbose and confusing. But it's *much* clearer than Scheme for expressing code in explicit CPS, for three simple reasons. First, a series of nested lambdas is a flat structure rather than a nested structure as in Scheme. Second, the syntactic overhead of the lambda is a single punctuation character, or possibly three, rather than ten characters including some letters: `(lambda())`. Third, as a result, in the usual case, the distance between the names of arguments and the place they come from (that is, the procedure that will eventually invoke the lambda that the arguments belong to) is much less, and they appear as a unit rather than as things far apart. `+ x 1 -> x;` is quite clear. (Unfortunately, this closeness of association is misleading sometimes; consider `out x -> null out;` in the definition of `count-stream0`, where the `-> null out; ...` part of the routine is suspended for some arbitrary period of time while the rest of the program runs, and may in fact never resume.) More Syntactic Sugar -------------------- If you actually wanted to write programs in the language, you could benefit from changing it to have a little bit more syntactic sugar. ### Nested expressions ### For example, you could define count [+ start 1] end return as an abbreviation for + start 1 -> new-start; count new-start end return and for procedures that have only a single exit point, you could imagine writing {-> number; write number; write number} as an abbreviation for -> number return; write number; write number return In cases where a "statement" contains more than a single set of square brackets, the order of evaluation could be undefined, so that e.g. string-scan src [+ srcidx 1] [- len 1] c could rewrite either to + srcidx 1 -> v1; - len 1 -> v2; string-scan src v1 v2 c or to - len 1 -> v1; + srcidx 1 -> v2; string-scan src v2 v1 c Or the order of evaluation could be defined; who cares? However, it's important for our sanity that this: string-scan src [+ srcidx 1]; foobar [- len 1] rewrite to this: + srcidx 1 -> v1; string-scan src v1; - len 1 -> v2; foobar v2 and not this: + srcidx 1 -> v1; - len 1 -> v2; string-scan src v1; foobar v2 Note that the above transformation is just the CPS transformation in Scheme for normal nested application expressions. It's just a thousand times more readable than usual because of the Io lambda notation. ### One-argument lambda sugar ### It might also be helpful to be able to write one-argument lambdas more concisely, with an automatic name for "the last result". In Python's REPL and in Arc, this variable is called "_". With this, for example, you could write each of the following: count-stream: ; _ -> null full; count-stream0 0 full. + 2 3; write _; terminate make-pair: -> x y ret; user (; _ x y) ret. Mostly this is duplicative with the []-nesting idea, though. I'm not sure which is better in the cases where both are applicable. Consider this example: def render(text): body = str(markdown.Markdown(text)) soup = BeautifulSoup.BeautifulSoup(body) headers = soup('h1') In Io, that looks like this: render: -> text; markdown.Markdown text -> foo; str foo -> body; BeautifulSoup.BeautifulSoup body -> soup; soup "h1" -> headers; ... With implicit single arguments: render: ; markdown.Markdown _; str _; BeautifulSoup.BeautifulSoup _; _ "h1" -> headers; ... With nesting: render: -> text; [BeautifulSoup.BeautifulSoup [str [markdown.Markdown text]]] "h1" -> headers; ... The nested expressions are more compact, but in this case, I think the implicit arguments are clearer. ### Conditionals ### It would be nice if there were a way to conveniently rejoin streams of control after a conditional. For example, it would be nice to be able to write if (= x y) (write "x y equal") (write "x y not equal"); if (= x z) (write "x z equal") (write "x z not equal"); if (= y z) (write "y z equal") (write "y z not equal"); whatever If the language had automatic currying, you could define this `if` quite easily: if: -> cond result alt cont; cond (result cont) (alt cont). You can use the above `if` definition without automatic currying if you write out the arguments explicitly: if (-> a b; = x y a b) (-> c; write "x y equal" c) (-> c; write "x y not equal" c) You could, however, imagine syntactic sugar for this as well. For example, this expression could expand into the above call to "if": = x y ? write "x y equal" : write "x y not equal" As with the nested expressions, note that this is just the CPS transformation for `if`.