From kragen at pobox.com Thu Mar 1 03:37:01 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Thu Mar 1 03:37:03 2007 Subject: my evolution as a programmer Message-ID: <20070209225844.94F6AE340E8@panacea.canonical.org> I was reading an article on "Lambda the Ultimate" about Bruce Mills's book "A Theoretical Introduction to Programming," and in particular about the difference between "menu-lookup" writing of glue code, and "real programming", which the author defines as "to increase the computational capacity, to begin with a set of operations, and develop them into new operations that were not obviously implicit in the original set." This brought me to thinking about my evolution so far as a programmer, which I can divide into the following phases. - BASIC, Logo, arrogance, and incompetence: 1980 - Pascal and C: beginning to learn the depth of my ignorance: 1988 - Grokking the Basics of Functional Programming: 1993 - Learning How Magic and Beauty are Possible: 1993 - Getting Practical and Eclectic: 1995 - Getting a Job: 1996 - Joining a Startup: 2000 - Learning to Read: 2002 - Today: 2007 I'm not going to claim that everyone goes through the same phases, or goes through them in the same order, but no doubt any programmer will recognize some of their own history in what lies below. In each phase, I failed to understand that it was merely a phase, and that my knowledge was still limited --- that a few years later, I would be able to do things I could only dream of at the moment. BASIC, Logo, arrogance, and incompetence: about 8 years ------------------------------------------------------- This stage lasted from the time I got access my first computer (about 1980) until my Arthur Sittler explained recursion to me, approximately, around 1988. I learned Logo during this time, which was interesting, but I never really made the connection with "real programming" in BASIC; I figured that since my Logo procedures didn't have line numbers, they weren't programs. I used recursion, but only tail-recursion; I remember one of my early Logo programs looked like this: TO EAT: PRINT [OM] EAT2 TO EAT2: PRINT [NOM] EAT2 I thought it was cool that I could define new commands in the language, but I never made the connection that I could use GOSUB in BASIC to do something similar, probably because the equivalent of Logo's SQUARE 200 (calling a user-defined routine to draw a 200-pixel square) would look like this: 1180 L = 200 1190 GOSUB 3200 and not like this: 1180 SQUARE 200 So I never made the connection, and I never wrote any interesting BASIC programs during this time. I learned how to use the graphics functions in TRS-80 Color Computer BASIC, so I wrote programs that did all kinds of cool random graphics, but basically their control structure was just loops within loops, so they couldn't do anything interesting. * "Is that a real program or is that something somebody wrote?" During this period, I was very interested in the outward appearance of things. A hacker friend told me this story: I had written a starfield screensaver, much like many other screensavers of the time, and it was running on my Mac. A co-worker walked by and saw the screensaver, and he asked me, "Is that a real program, or is that something somebody wrote?" This imaginary difference was something I was very concerned with --- between a "real program" that I could invoke by typing "INVADERS" on the command line, and "something somebody wrote" in BASIC, that I had to invoke indirectly through MBASIC, which ran slowly. I had the idea that "knowing how to program" was knowing the details about all the programming interfaces I could use --- how to use the PLAY statement to make music, how to use the RND function to generate random numbers, and so on. I didn't understand that there was a kind of programming knowledge that wasn't specific to a particular language, although for a period of time, all the computers I came in contact with ran BASIC, so I could usually figure out how to use them. Looking back, I was pretty self-assured about knowing "a lot" about computer programming, but also pretty insecure. Part of this was that certain family members thought that I knew "a lot", and this introduced an external-approval complication into the mix. This was to the point that I failed to seek out people who knew more than I did, and failed to understand why I hadn't, say, written any interesting games. I did spend endless time reading books about various microcomputer dialects of BASIC, and writing the same programs over and over again. I didn't really understand that the PLAY statement was really implemented by other code, not so different from the BASIC code I could read myself, rather than some kind of magic --- I didn't know how to take it apart and see what was inside, I didn't know the language it was written in, and it ran a lot faster than code I could write myself. I still meet a lot of people who think that knowing how to program is about knowing programming languages or the APIs of libraries or "frameworks", and who are much more impressed with the quality of the gradients an a program's UI than with its underlying functionality. In effect, they think programming is just writing "glue code". These beliefs amount to cognitive handicaps that prevent them from having much success as "programmers", finding "programming" interesting, or (so I hear) having successful work interactions with actual programmers. Once, I thought that I had believed these things myself because I was 6-12 years old, but apparently many adults have these cognitive limitations too. It isn't completely wrong, since of course learning APIs and languages is a necessity for getting certain things done, and UI chrome affects your mood every second you're interacting with the program. But knowledge of APIs and language details is neither necessary nor sufficient for "real programming"; nor is writing in a "fast language". And it only takes a few hours to weeks (of constant work) to become productive with a new language or a new API, while learning to program takes decades. The other thing that characterized this phase, for me, was a fascination with things like graphics and sound. I observe that generally these, together with continuous low-latency feedback from gradually enhancing a working program, seem to be good "hooks" to get people to start to learn to program. This phase began to came to a close as Arthur Sittler introduced me to Pascal, and showed me a Towers of Hanoi program with no line numbers that fit in half a page and didn't need an interpreter to run --- but didn't move any discs on the screen. Pascal and C: beginning to learn the depth of my ignorance: about 4 years ------------------------------------------------------------------------- This phase lasted from the time Arthur introduced me to the concept of recursion, around 1988, until I read Robert Wilensky's "Lispcraft", probably around 1993 or so. I had earlier seen the Towers of Hanoi problem as a game, written in BASIC, for an H89. You could play it interactively (trying to move the stack of discs from one peg to another) or you could make the computer solve it for you. Arthur showed me recursive programs to solve the Towers of Hanoi problem in Pascal and some other languages. They fit on one 80x24 screen. I had a hard time accepting them as programs at first, partly due to the lack of line numbers, but he was persuasive. ("We don't need line numbers any more. Line numbers were for when we dropped our stack of stones on the floor on our way from the stone punch to the stone reader, but we don't have that problem any more.") Also, the resulting program could run without an interpreter. I wonder what I would have thought of this OCaml version? (* move n discs from a to z (using third peg named x) *) let rec hanoi move n a z x = if n = 0 then () else (hanoi move (n-1) a x z ; move n a z ; hanoi move (n-1) x z a) let printmove disc a b = print_endline ("Move disc " ^ (string_of_int disc) ^ " from " ^ a ^ " to " ^ b) ;; hanoi printmove 4 "peg A" "peg C" "peg B" This was my first encounter with recursion as anything other than a way to express iteration through tail-recursion, and also my first encounter with compilers, with programs being stored in text files and edited with a separate program, and with "real programming", in the sense above. This began to show me that there was a big world of programming out there that I didn't know about, and that it was accessible to me. Unfortunately, I didn't have a copy of the Turbo Pascal Arthur was using on my Z-100, and I didn't know how I would go about learning Pascal anyway, so this was still a fairly theoretical realization. A couple of years later, I took an independent-study computer programming class at the Career Enrichment Center, where David Mains and Greg Gurule began to confront the depths of my ignorance and intransigence, and challenge me to do more --- to learn "real programming". I struggled with Pascal, and with my ignorance and insecurity, and with the level of self-direction needed in the independent-study environment. (Perhaps many twelve-year-olds would have the same problem.) Still, I made substantial progress because of that class. I learned Pascal, DEC VMS DCL, and some C, and I wrote my first interesting programs --- a Pascal's Triangle generator and a one-dimensional cellular automaton that generated a Sierpinski triangle --- and my first useful programs, command-line utility programs written in DCL. In retrospect, I didn't need teachers to teach me things; I needed teachers to force me to get out of my comfort zone, and partners to get me unstuck. I still had far too much respect for my own skill at programming, and it still impeded my learning. Although I could solve "fizzbuzz" level problems, even toward the end of this time, I hardly ever wrote an interesting program, and I would struggle for days with problems I can solve now in an hour. During this time, I also learned x86 assembler, although the only program I wrote in it was a tiny Towers of Hanoi program. Grokking the Basics of Functional Programming --------------------------------------------- Sometime around 1993, Arthur Sittler lent me "Lispcraft", which taught me a completely different way of looking at programming problems --- I began to understand recursion at last. But I didn't have a Lisp system available to me, so I couldn't actually use the insights I got from the book. Today, I don't need one; even if I were programming in 1980s MBASIC, I could use functional programming techniques to structure the program. At the time, this was still beyond my capacity. I still had the idea that skill at programming consisted primarily of competence with a particular set of tools (languages and libraries), rather than anything intrinsic. Learning How Magic and Beauty are Possible ------------------------------------------ Also, sometime around 1993, I read Robert Sedgewick's textbook, "Algorithms in C", which I borrowed from my father, Greg Sittler. It opened a whole new world to me. The programs in the book are all concise crystals of beauty, showing how a few lines of code can transform a pile of structs of integers and pointers into a binary search tree, or a hash table, or the minimal spanning tree of a set of points. It used only the basics of C, and did not rely on any libraries. This was a revelation to me; there was beauty and magic in these programs, and it wasn't because they were calling on powerful libraries hidden, Wizard-of-Oz-style, behind curtains. They were just plain C code, and not very much of it, that "developed a set of operations that were not obviously implicit in the original set," to borrow Bruce Mills's phrase. So I began learning about algorithms and data structures and big-O notation and so on. It was a mathematical side to programming that I hadn't even suspected of existing. "Algorithms in C" taught me by example about the miracle of Turing, which is sort of a computational analogue of Archimedes' boast, "Give me a fixed point on which to stand, and I will move the Earth." Turing's miracle is that given a computational substrate, even one without much in the way of built-in features, you can still build a program that does anything. (If you have enough space.) I didn't write a lot of useful software during this time period, because I became obsessed with writing code that was as beautiful and spare as Sedgewick's example code in the book. Sadly, code that touches the real world of existing libraries and hardware is never that beautiful, so I didn't get a lot written. I can think of a couple of things I pondered for days during this time, trying to get the object models right, that I hacked out in a few minutes to a few hours more recently --- a program for plotting curves of Lam? and a program for 3D modeling of toruses. Getting Practical and Eclectic ------------------------------ Around 1995, two revelations befell me, both from Matt Hudson. First, he wrote a Perl program called "eyefun.pl" (unless that was Brendan Conoboy, which is possible), which opened a bunch of "xeyes" windows in random places on your screen. Reading this program (and then the perl4 man page) was how I learned Perl, and thereafter I wrote a lot of little programs in Perl --- much quicker instant gratification than in C, and I didn't feel the same urge to equal Sedgewick's C in beauty --- obviously impossible in Perl4. Second, he recommended that I read Steve McConnell's "Code Complete", which he said changed the way he thought about programming. It changed the way I thought about programing too --- it's a book two inches thick of explanations about the practical aspects of programming, from the point of view of someone who spent a lot of time programming at Microsoft. Previously, all the books I'd read and all the (few) classes I'd taken took a fairly doctrinaire attitude about what you should and shouldn't do, essentially because they were aimed at complete novices, and in many cases were written by novices as well; "Code Complete" talks about each of the various doctrinaire points of view with a fair summary of the pluses and minuses of each. Essentially, "Code Complete" gave me a dose of practicality and just-get-down-to-it-ism that helped to counteract the worse effects of Sedgewick and Wilensky's books --- without leading me to dismiss their value. I also talked with Arthur some more, and he pointed out that I still needed some more theoretical grounding. I asked him for recommendations, and as a result, I ended up reading maybe five thousand pages of computer-science textbooks of various kinds during late 1995 --- the Dragon Book being the one I remember best, although various books about APL were pretty interesting too. I learned a lot, but none of them really changed my thinking as much as "Code Complete." During this later part of the time, I didn't actually have a computer at all; I would telnet from the local university computer center to my ISP's Linux box in another state to write and compile my C++ programs. Getting a Job ------------- In 1996, I moved to Silicon Valley to get a job, at a company making telecom network management software. I was still pretty incompetent; while I had written a fair number of toy scripts, I was terrible at debugging. There I met Jay Lark. Jay didn't have a lot of time (he was an executive at the company) but he explained a little bit about Lisp to me --- told me what a closure was, which Lispcraft had omitted since it was about dynamically-scoped Franz Lisp --- and lent me Abelson and Sussman's "Structure and Interpretation of Computer Programs." This book taught me more about programming than all the books I had read up to this point --- and I didn't even read the whole thing! What Sedgewick had started, Abelson and Sussman completed. SICP demonstrated not just the construction of algorithms and data structures on a substrate that lacked them; they demonstrated the construction of entire computational paradigms from almost nothing: arithmetic, object orientation, nondeterministic backtracking evaluation, and functions. The job also put me in contact with many other programmers more adept than I, and gave me the opportunity to learn a lot from them. And it forced me out of my comfort zone again. But programming with them wasn't my job there; I just ran the build system, the source-control system, and the bug tracker, all of which probably needed complete replacement rather than merely nursing along as I was doing. But I wasn't yet up to the task. After a year, I left for Ohio to start a marriage. But I began working on programming projects with other programmers, which could have given me many more opportunities to enhance my programming skills. This part is pretty depressing to think about; for several years, I didn't really make a lot of progress, partly because there wasn't a lot to force me outside of my comfort zone. Toward the end of this time, I became active on comp.lang.perl.misc, which was extremely helpful at improving my Perl and my programming-in- the-small skills. It was extremely helpful having other people look at my code (which, sadly, was lacking in my day job) and comment on how it could be improved --- and doing the same for others. This was extremely valuable for deepening my knowledge of the Perl language, but as I've said before, knowledge of languages is much less important than you might suspect at first. Sadly, I can't recommend comp.lang.perl.misc today; it was already somewhat dysfunctional at the time, and it has gotten much worse. During this time, I read "The Practice of Programming", which is a lot like "Code Complete", but shorter and much higher in quality. I had read the same authors' "The Elements of Programming Style" back in 1995, on much the same subjects, but that book is nearly unreadable today --- it's written in PL/1 and FORTRAN IV. TPoP, aside from being written with modern programming languages, also contains insights from several decades more of the authors' experience. Joining a Startup ----------------- In 2000, I moved across the country to join Rohit Khare and Adam Rifkin at KnowNow, a startup they had just begun. For the next year, I was constantly forced out of my comfort zone, forced to learn new programming languages and APIs (including a bunch that didn't work very well), and given the opportunity to work with several highly competent technical people, including Rohit himself, Ben Sittler, Strata Chalup, Mike Dierken, Scott Andrew, Lane Becker, Derek Robinson, Adam Zell, Greg Burd, Matt Haughey, and Meg Hourihan. This was my first time working with people who had such a high level of competence, and it taught me a lot about programming very quickly. Unfortunately, KnowNow made some bad business decisions (such as taking funding from Kleiner Perkins and hiring some really third-rate management), so the company kind of got stuck, and I got myself fired; very few of the highly competent technical people I listed above remained long after me. Learning to Read ---------------- In 2002, I joined a startup company named AirWave, which practiced Extreme Programming with a team of six; I stayed there for three years. The practical consequences of this included constant social interaction, high-bandwidth skill sharing, and constantly maintaining code I didn't write. Two of the XP core practices are "collective ownership" and "pair programming". The XP reasons for these are that they "produce business value," meaning they are valuable to the people who are using the software being developed, but they were also very beneficial to me as a member of the team. Collective ownership means that nobody owns any part of the code, so when you're making some modification, you're not any more likely to be modifying code you wrote than code any other team member wrote. In a team of six, that means that five sixths of the time you're programming, you're editing code somebody else wrote. This alone gives you a lot of practice reading code. "Pair programming" means that half the time you're working on a task, you're not even at the keyboard --- you're watching somebody else type. It's expected that you're paying attention, too, so that you can spot bugs, departures from the standard coding conventions, and excessive complexity in the code that's being written. Due to these two practices, I spent a heck of a lot of time reading code. This improved my skills in three unanticipated areas: reading, maintainability, and debugging. Unsurprisingly (in retrospect), reading code made me better at reading code. This is handy when I start any new programming project, especially one that someone else has been working on. I thought I could read code before this experience, but I got much, much quicker at it. Previously, I hadn't understood the psychological aspect of code-reading --- you have to understand not just what the code does, but what the previous programmer or programmers were thinking when they wrote it. Reading code also made me better at writing maintainable code. "Maintaining" code means changing it, usually to add features, and most of the time spent in a typical "maintenance" task is spent understanding the existing code. Consequently, the most important aspect of maintainability is readability --- avoiding unnecessarily complex constructs, clear naming, helpful commenting. Of course, "Code Complete" and "The Practice of Programming" have advice about how to do these things, but spending three years reading other people's code in order to figure out how to modify it is much more educational than reading a book or two. The biggest surprise was how much it helped in debugging. Debugging is what you do when the original programmer thought the program would do one thing, and it actually does something different. You can perform experiments of various kinds to figure out what it's actually doing, and you often must; but in the end, it comes down to finding the piece of wrong code, reading what it is actually doing, figuring out what its author wanted it to do, and understanding how to make it do that. It turns out this is the same process as reading code in general. I think I learned more about programming in three years from the AirWave team (Matt Albright, Joe Arnold, Darrell Bishop, Dan DiPasquo, Jason Luther, Blake Mills, Dave W. Smith, and Sujatha Mandava) than I had in my previous 22 years of programming. Unlike most of the previous teachers I listed, this was a relationship of partnership, where I think I taught as much as I learned. Today ----- Of course, I don't know what my next phase will look like. Presumably certain things I currently struggle with --- to the point of not knowing how they're even possible --- will get easy, and it will seem strange that they were ever difficult. But I don't know what those things will be. But I'm pretty sure I'm not done learning yet, and surely there are folks out there who look at the stuff I have trouble with now and scoff. And I'm pretty sure that having good partners or mentors will be crucial in getting to that next stage; I'm wandering around South America, in part, to find them. From kragen at pobox.com Mon Mar 5 03:37:02 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Mon Mar 5 03:37:02 2007 Subject: Smalltalk performance and Moore's Law Message-ID: <20070219073853.GA19959@canonical.org> Previous version posted at http://lambda-the-ultimate.org/node/531#comment-23457 on 2006-12-25. This is a partial rebuttal to Alan Kay's occasional assertion that computers aren't nearly as much faster at executing late-bound things like Smalltalk as you would expect from Moore's Law. In an interview with ACM Queue, Kay writes [7]: Just as an aside, to give you an interesting benchmark --- on roughly the same system, roughly optimized the same way, a benchmark from 1979 at Xerox PARC runs only 50 times faster today. Moore?s law has given us somewhere between 40,000 and 60,000 times improvement in that time. So there?s approximately a factor of 1,000 in efficiency that has been lost by bad CPU architectures. But Moore's Law is about price-performance, not absolute performance; here I estimate that the actual loss of price-performance attributable to bad CPU architectures is perhaps a factor of 10 to 50, and it is plausible that better compilers can remedy this. Guesswork ========= "Resuna" writes [6]: The [VAX] 11/780 was 3.6 MHz, 32-bit words. I don't know how fast the Alto or Dorado were, but with the Dorado being the archetypical "3M" machine I assume its performance was comparable to a nominally 1-MIPS 11/780. According to Wikipedia [0], the Dorado was an all-ECL machine. The abstract to Lampson and Pier's paper on the Dorado [1], which I haven't read, says it ran at 20MHz, had 16 hardware threads to provide zero-context task switching, and was built out of "approximately 3000 MSI [ECL] components". So it was considerably faster than a VAX. Maybe one of the older D-machines is "the archetypal 3M-machine". Apparently it could run 200k-400k Smalltalk bytecodes per second [2]. I'm guessing that the Dorado is the particular machine Kay was alluding to benchmarking, since it was introduced in 1979, and the context of the conversation is how machines designed to be efficient at high-level language execution were worthwhile. I don't think it was ever sold commercially (or even mass-produced in-house), which makes per-unit costs difficult to calculate. However, if we assume that each of the 3000 chips in the thing cost $20 each (unfortunately I have no real idea how much ECL chips cost in 1980), that's a $60 000 bill-of-materials cost. So it might have cost $100 000 per machine if it had been mass-produced, and since it was ECL, the electrical power cost of running it would likely be higher per chip as well. According to the squeak-dev thread on the subject [3], modern 600MHz uniprocessors are about 20x the speed of the Dorado when running Squeak, or 35 million bytecodes per second (which sounds more like 100x the speed of the Dorado, actually). However, the uniprocessors in question cost US$150 or so, which is inflation-equivalent to maybe US$75 in 1980 dollars. (They also include hundreds of megabytes of RAM, instead of the 8MB on the Dorado.) If you were going to spend $100 000 today (or when Kay gave this interview) on a computer to run Smalltalk on, you would probably get a Beowulf of 50 nodes, each node of which could run bytecodes at 50 to 200 times the speed of a Dorado, and that's running Squeak, which is not designed to be a particularly high-performance Smalltalk. But Moore's Law has still given us, by my rough estimates, a factor of 2500 to 10 000 in price/performance in this case. (That's not counting the difference between 8 megs of RAM and 50 000 megs of RAM, or the advantage of having 10TB of disk, etc.) A factor of 2500 is still noticeably less than the 131072x improvement that you might predict from a naive application of Moore's law, but the remaining factor of 10-50 is probably explicable in terms of Kay's explanation: the architecture is not optimized for Smalltalk bytecode execution, so you get a 10-50x slowdown when you use it as if it were a Dorado. (You might be able to get a Beowulf of 300 nodes at that price, depending on other circumstances.) How much faster are other Smalltalk implementations than Squeak? Various microbenchmarks seem to peg Strongtalk at 3x-10x faster than Squeak (Avi Bryant's [4], David Griswold/Klaus Witzel's [5]), which would nicely compensate for the remainder of Kay's complaint. References ========== [0] Wikipedia article "Xerox Alto", section "Diffusion and Evolution", as of 2006-12-25 > http://en.wikipedia.org/wiki/Xerox_Alto#Diffusion_and_evolution [1] "A Processor for a High-Performance Personal Computer", from Butler W. Lampson and Kenneth A. Pier, Xerox PARC, 1980, IEEE "CH1494-4/80/0000-0146" (whatever that means), 15 pp.; mentions, among other things, that the first machine "came up in the spring of 1979". > http://research.microsoft.com/Lampson/24-DoradoProcessor/Acrobat.pdf [2] Squeak-dev post "Dorado bytecodes per second", from Bruce ONeel (edoneel at sdf.lonestar.org), 2005-05-28T16:41:49 CEST, quoting previous post from Jecel Assumpcao Jr (jecel at merlintec.com): By running the benchmarks for the "green book" and doing a lot of rough extrapolations, my guess is that the Dorado would get between 200K and 400K bytecodes/sec. And followup from Tim Rowledge (tim at rowledge.org): That is pretty much what I remember as the claim for Dorados. > http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091211.html [3] Squeak-dev post "Dorado bytecodes per second", from Jecel Assumpcao Jr (jecel at merlintec.com), 2005-05-28T22:38:19 CEST --- he's talking about 600MHz ARMs. > http://lists.squeakfoundation.org/pipermail/squeak-dev/2005-April/091215.html [4] Blog post "Ruby and Strongtalk II", by Avi Bryant, on his blog "HREF Considered Harmful"; the microbenchmark in question did a billion accesses of a thousand-element array of small integers, took 0.7 seconds in Java, 7 seconds in Strongtalk, 70 seconds in Squeak, or 16 if you use Array instead of ByteArray. > http://smallthought.com/avi/?p=17 [5] Squeak-dev thread "Thue-Morse and performance: Squeak v.s. Strongtalk v.s. VisualWorks", started by Klaus D. Witzel 2006-12-17; several people, including David Griswold, point out flaws in Witzel's initial benchmark, and the results are interesting. > http://www.nabble.com/Thue-Morse-and-performance:-Squeak-v.s.-Strongtalk-v.s.-VisualWorks-t2834773.html [6] Comment "I still want to see Kay's benchmark...", from "Resuna", 2005-07-22 > http://lambda-the-ultimate.org/node/531#comment-7895 [7] ACM Queue article "A Conversation with Alan Kay: Big Talk with the creator of Smalltalk --- and much more.", by Stuart Feldman and Alan Kay, vol. 2, no. 9, Dec/Jan 2004-2005, is the origin of this quote. > http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=273&page=3 From kragen at pobox.com Thu Mar 8 03:37:01 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Thu Mar 8 03:37:03 2007 Subject: various approaches to primitive functions Message-ID: <20070304211133.32FECE34110@panacea.canonical.org> So now I am at the point where Bicicleta needs only native functions in order to run any interesting programs --- things like fac: {env: fac = {fac: x = 3, '()' = fac.x.'<'{lt: arg1=2}.'()'.if_true{i: then=1, else=fac.x.'*'{mu: arg1=env.fac{f: x=fac.x.'-'{m: arg1=1}.'()'}.'()'}.'()'}.'()'} }.fac{f: x = 4}.'()' or fib: {env: fib = {fib: x = 3, '()' = fib.x.'<'{lt: arg1=2}.'()'.if_true{i: then=1, else = env.fib{f: x=fib.x.'-'{m: arg1=1}.'()'}.'()'.'+'{p: arg1 = env.fib{f: x=fib.x.'-'{m: arg1=2}.'()'}.'()' } }.'()' }}.fib{f: x = 5}.'()' These parse, but they return errors, because numbers don't yet have any methods. Fac depends on integers having '*' and '-' methods that return integers, and a '<' method that returns a boolean (i.e. something with an if_true method that returns its then or else). Fib is similar, but wants '+' instead of '*'. So what's the best way to implement this? Here are some approaches I've seen before. The C++/Perl/CLOS/OCaml Approach: Primitive Objects Aren't Objects ------------------------------------------------------------------ In C++, Perl, CLOS, and OCaml, primitive objects don't have any methods, and you can't inherit from them. You access them in other non-method non-inheritance ways that exist in each language. In CLOS, you can at least define methods on them, but they don't come with any to start with. In all four cases, I think this is the result of retrofitting an object system to an existing non-object-oriented language. This is not an approach I am considering. The Python Approach: Native Objects And Native Functions -------------------------------------------------------- In Python, primitive objects like strings and numbers are different kinds of objects from user-defined objects. (This is less true now than before the "class-type unification".) They use different mechanisms for looking up their properties (such as methods), and you cannot add properties to them the way you can add properties to normal objects. Before the class-type unification, you couldn't inherit from them, either. Now you can, but the process has some pitfalls. This approach discourages the introduction of methods like Squeak's asWords method: 3252523 asWords 'three million, two hundred fifty-two thousand, five hundred twenty-three' Because it really wouldn't be worth it to write that in C. Another kind of primitive object is the "built-in function", meaning a function written in C. The Wheat Approach: Invisible Native Methods -------------------------------------------- In Wheat, primitive objects like strings belong to some class whose path is hardcoded into the language interpreter. If you put ordinary user-defined methods in that class, they start working on all objects of that primitive type. However, there are also non-user-defined methods in these classes, stuck there by the interpreter at startup. This means that there are two places to look to see whether, say, strings override some particular method, or inherit the version in the standard object. The Squeak Approach: Objects With Some Hidden State, And Primitive Methods -------------------------------------------------------------------------- Squeak has primitive methods, which are executed directly by the interpreter, and some objects like SmallInteger that don't have any instance variables and thus whose contents can only be accessed through primitives. If you browse ByteString>>at:, you will see a thing at the beginning that says . The comment above it says, "See Object documentation whatIsAPrimitive," and it turns out Object has a class method called "whatIsAPrimitive", consisting mostly of a long comment, part of which reads as follows: When the Smalltalk interpreter begins to execute a method which specifies a primitive response, it tries to perform the primitive action and to return a result. If the routine in the interpreter for this primitive is successful, it will return a value and the expressions in the method will not be evaluated. If the primitive routine is not successful, the primitive 'fails', and the Smalltalk expressions in the method are executed instead. These expressions are evaluated as though the primitive routine had not been called. There are also certain class-selector pairs for native methods that are not looked up through the normal method lookup mechanism, which the comments claim is for efficiency; but I suspect that this is also necessary to provide a base case for the recursion involved in things like method lookup. There are about 149 methods in Squeak3.8-6665full.image that specify primitive responses. The comment about "if the primitive routine is not successful" above is interesting. The most apparent reason the primitive routine would not be successful is that it may not be implemented in the Squeak virtual machine running your image at the moment, but there are other possibilities as well. For example, SmallInteger>>* has a primitive implementation that fails on overflow, allowing the Smalltalk version to fall back to arbitrary-precision, and BlockContext>>valueWithArguments: fails, for example, if the number of arguments is wrong, and the error reporting is all handled by this fallback. The Lua Approach: I Don't Know What It Is ----------------------------------------- Apparently Lua has a "clientdata" type which is basically an opaque pointer into C-land. It also has, I think, native functions which call into C when you call them, just as in Python. But you can define a "metatable" on a piece of "clientdata" to arrange for table lookup on the clientdata to return things, such as functions, in Lua or C, to be invoked as methods. This seems very similar to the Wheat approach. But I should finish reading the Lua manual before passing judgment. My Approach For the First Bicicleta Prototype --------------------------------------------- On one hand, at the bottom level, I am taking a Python-like approach: I have primitive objects, with a fixed set of primitive methods, from which you cannot inherit. However, integer literals and string literals do not evaluate to these objects; integers, for example, evaluate the expression "prog.sys.machine_integer" and derive from its result by overriding the method 'clientdata' to return the primitive object representing that particular integer. This level of indirection allows the program to wrap more methods around the primitive object, and even allows integer literals in certain parts of the program to evaluate to something different. From kragen at pobox.com Mon Mar 12 03:37:01 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Mon Mar 12 03:37:03 2007 Subject: OCaml vs. SBCL, and various other interpreters Message-ID: <20070304211133.21653E3410F@panacea.canonical.org> So I've been prototyping a Bicicleta interpreter in OCaml, and generally the experience has been great. I'd never used lex or yacc, but I know most of the theory, so given the reference-manual documentation for ocamllex and ocamlyacc, I was able to put together useful parsers pretty quickly. And OCaml is nice and fast. For most things, fast doesn't matter all that much any more; but a language interpreter, by its nature, imposes some significant slowdown on the language it's interpreting, usually from one to three orders of magnitude. So implementing your language interpreter in, say, Python, may not be a good idea --- the two orders of magnitude slowdown imposed by Python are two orders of magnitude that get imposed on every program running in your interpreter, on top of the 1-3 your interpreter imposes. Multiply it all out, and you can end up iterating through loops thousands of times per second, instead of hundreds of millions. OCaml really seems to be optimized for language implementations: pattern-matching, the facility it calls "variants", ocamllex and ocamlyacc, garbage collection, and static type-checking all make a lot of sense for compilers. And you can compile and deliver a binary. But I'm being tempted by SBCL! The Microbenchmark (OCaml, Python, Perl, Ruby, Elisp, Tcl) ---------------------------------------------------------- There's a microbenchmark I like to run on language implementations; here's the OCaml version: let rec fib n = if n < 2 then 1 else fib (n-1) + fib (n-2) in print_int (fib 32) ; print_newline() ;; This computes a Fibonacci number in a pretty slow way, recursing into the base case N times in order to compute the number N. On Bea's laptop, one of the new Intel MacBooks, it runs at about 3.1 million base-case recursions per second in the OCaml interpreter, or 5.2 million per second if I ocamlc it first. This may be a little unfair to OCaml, since I'm running a PowerPC version on this machine, through transparent binary translation. But the Python version runs at 0.73 million base-case recursions per second: def fib(n): if n < 2: return 1 return fib(n-1) + fib(n-2) print fib(32) That's Python 2.3, compiled for Intel. The Perl version runs at the same speed: #!/usr/bin/perl -w use strict; sub fib { $_[0] < 2 ? 1 : fib($_[0] - 1) + fib($_[0] - 2) } print fib(32), "\n"; So does Ruby (well, actually, 0.75 million): #!/usr/bin/ruby def fib(n) if n < 2 then 1 else fib(n-1) + fib(n-2) end end print fib(32), "\n" Byte-compiled elisp is a bit slower, at 0.46 million per second, timed with this: (let ((start (current-time))) (setq fib30 (fib 30)) (time-since start)) (see below in the SBCL section for the Lisp code for fib). Tcl is even worse, at 0.083 million per second: proc fib {x} { if {$x < 2} { return 1 } { return [expr [fib [expr $x - 1]] + [fib [expr $x - 2]]] } } puts [fib 28] So OCaml really isn't doing that badly! Especially considering it's running on an emulated CPU. I probably should have included JavaScript here (since it is, after all, The Next Mainstream Programming Language, after all) but I didn't. My experience is that, in both Firefox 1.5 and Safari, it's usually in about the same speed range as Python, Perl, Ruby, and Elisp. SBCL ---- But then I tried it in (Intel) SBCL: (defun fib (n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2))))) (print (fib 43)) And it ran, in the interactive interpreter, at 19 million per second, four times as fast as OCaml. SBCL, it turns out, immediately compiles everything to native code, even in the interactive interpreter. Compiling the above function takes 12 milliseconds. SBCL also lets you do (disassemble 'fib), which I did, and I saw that this version was still calling out to do generic arithmetic and comparison operations, even though SBCL's compiler does some type inference. So I whacked at it a bit until it didn't do that any more: (defun fib (n) (if (< (the fixnum n) 2) 1 (+ (the fixnum (fib (- n 1))) (the fixnum (fib (- n 2)))))) (print (fib 43)) This doesn't omit the type-checking entirely, but it omits the dynamic dispatch of the numeric operations, and consequently it ran more than twice as fast, at 47 million iterations per second, 9 times as fast as OCaml. I haven't figured out how to turn off the run-time type-checking on each call entirely, which I think might trim it from 91 instructions to 20 or so. The main body of those instructions, with added comments, is as follows: 115B0F9A: 8B55F0 MOV EDX, [EBP-16] ; no-arg-parsing entry point 0F9D: F6C203 TEST DL, 3 ; examine low two bits (tag) 0FA0: 0F858E000000 JNE L4 ; should be 00 for fixnum 0FA6: 83FA08 CMP EDX, 8 ; 8 is fixnum 2 (<<2) 0FA9: 7C70 JL L3 ; so if it's less, we go to L3 ; This next instruction is unnecessary because none of the previous four ; instructions modify the registers! 0FAB: 8B55F0 MOV EDX, [EBP-16] ; fetch the argument again 0FAE: 83EA04 SUB EDX, 4 ; subtract fixnum 1 ; Here's the call sequence for FIB: 0FB1: 8BDC MOV EBX, ESP ; save old stack pointer 0FB3: 83EC0C SUB ESP, 12 ; allocate three words ; Load function pointer from # 0FB6: 8B05340F5B11 MOV EAX, [#x115B0F34] 0FBC: B904000000 MOV ECX, 4 ; load fixnum 1 0FC1: 896BFC MOV [EBX-4], EBP ; save old base pointer 0FC4: 8BEB MOV EBP, EBX ; stick old stack pointer in EBP 0FC6: FF5005 CALL DWORD PTR [EAX+5] ; recursive call 0FC9: 7302 JNB L0 ; skip next instruction ...? 0FCB: 8BE3 MOV ESP, EBX ; restore old stack pointer 0FCD: L0: F6C203 TEST DL, 3 0FD0: 7568 JNE L5 ; type-tag test on returned value 0FD2: 8955F4 MOV [EBP-12], EDX ; save returned value 0FD5: 8B55F0 MOV EDX, [EBP-16] ; fetch the argument again 0FD8: 83EA08 SUB EDX, 8 ; subtract fixnum 2 ; same call/return sequence again: 0FDB: 8BDC MOV EBX, ESP 0FDD: 83EC0C SUB ESP, 12 0FE0: 8B05340F5B11 MOV EAX, [#x115B0F34] ; # 0FE6: B904000000 MOV ECX, 4 0FEB: 896BFC MOV [EBX-4], EBP 0FEE: 8BEB MOV EBP, EBX 0FF0: FF5005 CALL DWORD PTR [EAX+5] 0FF3: 7302 JNB L1 0FF5: 8BE3 MOV ESP, EBX ; End of call-return sequence. Whew! 0FF7: L1: F6C203 TEST DL, 3 ; type-tag test, again 0FFA: 7544 JNE L6 0FFC: 8B45F4 MOV EAX, [EBP-12] ; load other returned value ; This next sequence does the fixnum arithmetic in full 32-bit ; registers instead of the 30-bit fixnum form for some reason. 0FFF: C1F802 SAR EAX, 2 ; shift arithmetic right 2 bits 1002: C1FA02 SAR EDX, 2 ; on both 1005: 01D0 ADD EAX, EDX ; DO THE ADDITION! 1007: 8BD0 MOV EDX, EAX ; make useless copy of result 1009: D1E2 SHL EDX, 1 ; shift left 1 bit 100B: 7039 JO L7 ; jump if overflow 100D: D1E2 SHL EDX, 1 100F: 7035 JO L7 ; End of addition sequence. For positive fixnums, this would have ; been better and exactly equivalent (except that it doesn't ; clobber EAX): ; ADD EDX, EAX ; JC L7 ; but I think that won't work for negative fixnums. ; At this point, we have the return value in EDX, and we're ready to return. 1011: L2: 8D65F8 LEA ESP, [EBP-8] ; pop stack frame 1014: F8 CLC ; clear carry flag (for JNB?) 1015: 8B6DFC MOV EBP, [EBP-4] ; restore old frame pointer? 1018: C20400 RET 4 101B: L3: BA04000000 MOV EDX, 4 ; fixnum 1 if n<2 1020: EBEF JMP L2 ; ret val fixnum 1 is in EDX... There's another 44 instructions having to do mostly with exception and error handling: if the argument n wasn't fixnum (L4), if the first recursion returned a non-fixnum (L5), if the second recursion returned a non-fixnum (L6), if the addition overflowed to a bignum (L7); and there are some NOPs and dead code for handling invalid arg counts. Presumably OCaml programs would run at that speed too, or faster, if I compiled them to native code, but the native-code compiler needs an assembler, and installing an assembler under MacOS X apparently requires 3+ gigabytes of disk space. ocamlopt, for its part, compiles the fib routine in this microbenchmark to 24 PowerPC instructions, which, if I had an assembler, I could benchmark. (It might be more to the point to test the Intel OCaml.) Here's the ocamlopt PowerPC assembly; notice that it is much simpler: _camlFib__fib_57: mflr r0 addi r1, r1, -16 stw r0, 12(r1) L101: cmpwi r3, 5 bge L100 lwz r11, 12(r1) mtlr r11 li r3, 3 addi r1, r1, 16 blr L100: stw r3, 0(r1) addi r3, r3, -4 L102: bl _camlFib__fib_57 lwz r4, 0(r1) stw r3, 4(r1) addi r3, r4, -2 L103: bl _camlFib__fib_57 lwz r11, 12(r1) mtlr r11 lwz r4, 4(r1) add r7, r3, r4 addi r3, r7, -1 addi r1, r1, 16 blr And I got to thinking. Dynamic Code Generation ----------------------- If I write an implementation in OCaml, I get all of OCaml's shiny machinery for interpreting or compiling, but as far as I can tell, there is no "eval". There's the Dynlink module, which lets bytecode programs dynamically load bytecode files, so if I compiled stuff into OCaml, I could dynamically load the result (if I forgo native-code compilation); but the OCaml compiler isn't actually all that fast, so it might cause long pauses. But if I write an implementation in SBCL, I can dynamically compile code into a running SBCL process very easily, like with two lines of code: This is SBCL 0.9.15, an implementation of ANSI Common Lisp. ... * (defun return-const-fun (name const) `(defun ,name () ,const)) RETURN-CONST-FUN * (eval (return-const-fun 'foobar 40)) FOOBAR * (disassemble 'foobar) ; 11608A1A: BAA0000000 MOV EDX, 160 ; no-arg-parsing entry point ; 1F: 8D65F8 LEA ESP, [EBP-8] ; 22: F8 CLC ; 23: 8B6DFC MOV EBP, [EBP-4] ; 26: C20400 RET 4 ... a bunch of nops and boilerplate omitted ... The first line defines a function that generates code; the second one invokes it and compiles the result, into, as it turns out, five machine instructions. (160 is 40, shifted left by two bits.) And I have the impression that generating code in Common Lisp will be easier than generating code in OCaml, C, or assembler. Java ---- Oh, yeah, Java is even faster: class Fib { public static int fib(int n) { if (n < 2) return 1; return fib(n-1) + fib(n-2); } public static void main(String[] argv) { System.out.println(new Integer(fib(40)).toString()); } } That returns 165580141 in 2.4 seconds, about 69 million base-case recursions per second, or 14 nanoseconds each. But I'm not feeling tempted by Java. Sorry. Squeak ------ I'm also a little tempted by Smalltalk. The Smalltalk version is fib: n n < 2 ifTrue: [^1] ifFalse: [^(self fib: n - 1) + (self fib: n - 2)] In Squeak 3.8-6665full on an Intel VM, that takes 4410 ms to run fib: 35, which is 14930352; that's 0.295 ?s per base-case recursion, or 3.4 million per second, slightly slower than OCaml. Which is fairly impressive, considering that it's doing the same level of dynamic dispatch as SBCL, but in a bytecode engine like OCaml's. The bytecode looks like this: 9 <10> pushTemp: 0 10 <77> pushConstant: 2 11 send: < 12 <99> jumpFalse: 15 13 <76> pushConstant: 1 14 <7C> returnTop 15 <70> self 16 <10> pushTemp: 0 17 <76> pushConstant: 1 18 send: - 19 send: fib: 20 <70> self 21 <10> pushTemp: 0 22 <77> pushConstant: 2 23 send: - 24 send: fib: 25 send: + 26 <7C> returnTop Or, as I read it in pseudo-FORTH, "n 2 < if 1 return then self n 1 - recurse self n 2 - recurse + return". The bytecode that calls #fib: seems to be implemented by a method called sendLiteralSelectorBytecode, which occupies all of the bytecodes through , with the last four bits indexing into a "literal table" associated with the method. It's impressive to me that Squeak got this into 18 bytes of bytecode (plus 8 bytes of other stuff), considering that the original is 27 tokens. (Subtracting all the delimiters and considering #ifTrue:ifFalse: as one "token", we get down to 18 source tokens.) Python's bytecode ----------------- Compare the Python bytecode, from dis.dis(fib.fib): 3 0 LOAD_FAST 0 (n) 3 LOAD_CONST 1 (2) 6 COMPARE_OP 0 (<) 9 JUMP_IF_FALSE 8 (to 20) 12 POP_TOP 13 LOAD_CONST 2 (1) 16 RETURN_VALUE 17 JUMP_FORWARD 1 (to 21) >> 20 POP_TOP 4 >> 21 LOAD_GLOBAL 1 (fib) 24 LOAD_FAST 0 (n) 27 LOAD_CONST 2 (1) 30 BINARY_SUBTRACT 31 CALL_FUNCTION 1 34 LOAD_GLOBAL 1 (fib) 37 LOAD_FAST 0 (n) 40 LOAD_CONST 1 (2) 43 BINARY_SUBTRACT 44 CALL_FUNCTION 1 47 BINARY_ADD 48 RETURN_VALUE 49 LOAD_CONST 0 (None) 52 RETURN_VALUE Python's bytecode, very similar in design, uses 53 bytes for the same method! Much of the difference comes from Python using three bytes for anything that takes a parameter, so it only uses a single bytecode (called LOAD_FAST) for pushing parameters and the like, where Squeak uses all the bytes from <10> to <1F> for pushing the first 16 local variables; then there's <80> for up to 32 local variables. I don't know what it does when you have more than 32 local variables. Similarly, CALL_FUNCTION takes two bytes to tell it how many arguments the function has, I guess in case you have a function with 16383 arguments, and additionally it's separated from the LOAD_GLOBAL bytecode. Then Python's JUMP_IF_FALSE doesn't pop the boolean value it uses, so we need a separate bytecode for that, and then there are dead bytecodes at 17, 49, and 50, immediately after RETURN_VALUE bytecodes, and a dead bytecode at 20, immediately after an unconditional jump. The consequence of all of this is that Python is only currently using 109 of the 256 possible bytecodes, that Python bytecode is bloated and slow, and that the Python compiler and bytecode interpreter are fairly simple. Other Bytecodes --------------- As far as I can tell, OCaml doesn't come with anything analogous to SBCL's (disassemble 'fib), Python's dis module, or Squeak's browser option "What to show: bytecodes", so all I know about the OCaml bytecode is that the whole fib.cmo bytecode file produced by ocamlc was 320 bytes, and that .cmo files seem to largely consist of 32-bit big-endian binary two's-complement integers, with a big blob of stuff at the end containing strings and other data. I don't know anything about the implementation of the Ruby interpreter. perl -MO=Concise,fib fib.pl (using the B::Concise module) tells me the following: main::fib: 7 <1> leavesub[1 ref] K/REFC,1 ->(end) - <@> lineseq KP ->7 1 <;> nextstate(main 2 fib.pl:3) v/2 ->2 - <1> null K/1 ->- 5 <|> cond_expr(other->6) K/1 ->8 4 <2> lt sK/2 ->5 - <1> ex-aelem sK/2 ->3 - <1> ex-rv2av sKR/3 ->- 2 <#> aelemfast[*_] s ->3 - <0> ex-const s ->- 3 <$> const[IV 2] s ->4 6 <$> const[IV 1] s ->7 k <2> add[t10] sK/2 ->7 d <1> entersub[t5] sKS/TARG,3 ->e - <1> ex-list sK ->d 8 <0> pushmark s ->9 b <2> subtract[t4] sKM/2 ->c - <1> ex-aelem sK/2 ->a - <1> ex-rv2av sKR/3 ->- 9 <#> aelemfast[*_] s ->a - <0> ex-const s ->- a <$> const[IV 1] s ->b - <1> ex-rv2cv sK/3 ->- c <#> gv[*fib] s/EARLYCV ->d j <1> entersub[t9] sKS/TARG,3 ->k - <1> ex-list sK ->j e <0> pushmark s ->f h <2> subtract[t8] sKM/2 ->i - <1> ex-aelem sK/2 ->g - <1> ex-rv2av sKR/3 ->- f <#> aelemfast[*_] s ->g - <0> ex-const s ->- g <$> const[IV 2] s ->h - <1> ex-rv2cv sK/3 ->- i <#> gv[*fib] s/EARLYCV ->j This is pretty confusing; it helps to know the following operations occur before and to the left of their operands; the two "entersub" nodes are the recursive calls and the const[IV 2] nodes are integers such as 2; Perl's treecode runs with a stack as its working memory; it builds lists on the stacks PostScript-style, starting with a "pushmark"; every procedure call involves constructing a list of parameters. There's a '-exec' option to B::Concise that puts things in execution order, which makes it look a lot like Python, but leaves out all the ops labeled above with a "-". Java is yet another stack-based bytecode virtual machine. javap -c Fib says: public static int fib(int); Code: 0: iload_0 1: iconst_2 2: if_icmpge 7 5: iconst_1 6: ireturn 7: iload_0 8: iconst_1 9: isub 10: invokestatic #2; //Method fib:(I)I 13: iload_0 14: iconst_2 15: isub 16: invokestatic #2; //Method fib:(I)I 19: iadd 20: ireturn I suspect that the numbers in the left column are byte offsets, which would make this method 21 bytes, but I don't know enough about Java bytecode to be sure. Elisp compiles the same Lisp I fed to SBCL into the following stack-based bytecode: byte code for fib: args: (n) 0 varref n 1 constant 2 2 lss 3 goto-if-nil 1 6 constant 1 7 return 8:1 constant fib 9 varref n 10 sub1 11 call 1 12 constant fib 13 varref n 14 constant 2 15 diff 16 call 1 17 plus 18 return Or, more briefly, asciified with cat -vt: (defalias 'fib #[(n) "^H\301W\203^H^@\302\207\303^HS!\303^H\301Z!\\\207" [n 2 1 fib] 4]) That is 19 bytes of bytecode, although much of it is represented with octal backslash-escapes. The bytecode is not documented in section 16.8 of the Elisp Reference Manual, "Disassembled Byte-Code". Bytecode as an Implementation Strategy -------------------------------------- Bytecode has several claimed advantages over dynamic compilation as a strategy for implementing high-level languages: it's smaller than machine code, interactive compilation is faster because the compiler does less work, it's an architecture-neutral distribution format, and porting to new platforms is easier. There are, however, some disadvantages. * It's Smaller Than Machine Code The compiled optimized SBCL version totaled 247 bytes of machine code. The unoptimized version, with calls to GENERIC-< and the like, was 162 bytes. And this is on the x86, which has historically been pretty good for code density, although as you can see in the above, the 32-bit immediate constants diminish that advantage significantly; 60 of the 247 bytes are in 32-bit constants. The OCaml version is 24 instructions, 8 of which have immediate constants. I don't know very much about PowerPC assembly, but let's suppose that every instruction is 32 bits, including any immediate constants; that means the whole function weighs 96 bytes. Compare this to the 26 bytes of the Squeak version, and you can see that bytecode systems have a compelling advantage in situations where code size is critical. Even indirect-threaded code would probably have been at least 32 bytes, assuming 16-bit addresses. So at least this claimed advantage is well-founded. If you write the same program in the same way in SBCL and in Squeak, it is likely to take considerably more code space in SBCL. * Smaller Than How Much Machine Code? Speaking of code compactness, though, for small programs, the machine-code version of your program may be smaller than the bytecode interpreter. I tried to build the C version of Squeak with "VMMaker new generateEntire" to see how big it was, but apparently there are some .h files and things that live outside of the Squeak image; the best I can say is that Interpreter.st is 337404 bytes and ObjectMemory.st is 96646 bytes, for a total of 434050 bytes, about 7000-10000 lines of Smalltalk, gzipping to 98274 bytes. I am guessing that the machine-code version would be around 100KiB. Presumably you can build an almost arbitrarily-small bytecode interpreter (say, one that implements just the S and K combinators) but the question is how big the bytecode interpreter has to be to make the bytecode version of your program smaller than the corresponding machine-code version, while still running at an acceptable speed. Chuck Moore's Novix NC-4000 and MuP21 work suggests that a useful bytecode interpreter could be very small indeed; the MuP21 is a 6000-transistor chip in which the CPU (there's also some other stuff on the chip) executes a stream of 5-bit zero-operand two-stack operations packed into 20-bit words. (I forget how LITERAL, aka load-immediate, is implemented.) The code for the machine is claimed to be quite compact. Unfortunately, the follow-on chips (the F21, the iTV i21, and the 25x) have not been successfully fabricated. * Interactive Compilation is Faster This advantage is probably irrelevant now, except for very small machines, say a 70MHz LPC2103 microcontroller. The SBCL compiler compiled the unoptimized Lisp version in 12ms on a 1.8GHz dual-core CPU; a cleverer implementation, like Apple ][ Integer Basic, could parse and compile as you typed, keeping pauses to a minimum. * Architecture-Neutral Distribution Format That is to say, you don't lose your software when you change CPU architectures. Source code is also an architecture-neutral distribution format; I am going to ignore the political reasons for not distributing source code and focus on the technical reason, which is that bytecode is smaller. In effect, the claimed advantage is that bytecode is a good compression algorithm for source code. If this is the case, we should perhaps compare it against generic compression algorithms like gzip. Here are some results from gzip -9: file size gzipped ratio language bicicleta_lexer.mll 2380 965 2.5 ocamllex readbmp.py 5022 1910 2.6 Python js-calc.js 15520 5767 2.7 JavaScript webrick/cgi.rb 6779 2251 3.0 Ruby cursmail.py 11627 3938 3.0 Python bicicleta.ml 13426 3461 3.9 OCaml I think these are fairly typical compression ratios for gzip on source code. The Perl source code version of the fib function is 60 bytes; OCaml is 61, Ruby is 62, unoptimized Lisp is 63, Python is 66, Squeak is 75 if you put it all on one line, Java is 87 if you put it all one line, and Tcl is 108. So if 2.7 is a typical compression ratio and 63 bytes is a typical size for implementations of this function, we'd expect gzip to produce a result of about 23.3 bytes; if you apply the 2.7 ratio to the Smalltalk function, you get 27.8 bytes. These compare favorably with the Squeak bytecode version at 27 bytes. For larger functions, bytecode seems to work better as compression. Here's SkipList>>add:ifPresent:, which was the first largish method I found while looking around Squeak at random: add: element ifPresent: aBlock | node lvl s | node := self search: element updating: splice. node ifNotNil: [aBlock ifNotNil: [^ aBlock value: node]]. lvl := self randomLevel. node := SkipListNode on: element level: lvl. level + 1 to: lvl do: [:i | splice at: i put: self]. 1 to: lvl do: [:i | s := splice at: i. node atForward: i put: (s forward: i). s atForward: i put: node]. numElements := numElements + 1. splice atAllPut: nil. ^ element That's 619 bytes of source if tabified, or 465 bytes if untabified. The bytecode representation is 118 bytes, including 32 bytes of non-bytecode (literal and constant tables, I guess?) and 86 bytes of bytecode. That's a compression ratio of 3.9 to 5.2, which is substantially better than gzip would do. (Maybe I should try 7-zip, but it's not installed on this Mac. bzip2 does worse than gzip on such small files.) The names of the method selectors and classes referenced herein are not included in those 32 bytes --- rather, there are seven four-byte pointers: #search:updating: #randomLevel, #on:level:, SkipListNode, #atForward:put:, #forward:, and #atAllPut:. (The other method selectors, like #+, #at:, #value: and #at:put:, have special bytecodes for them.) These 69 bytes have to be included somewhere in a bytecode file intended for use in an architecture-neutral distribution format that supports dynamic linking, but they only have to be included once, and if you don't need to link dynamically or modify the software after delivery, they don't need to be included at all. So it looks like bytecode is a somewhat more compact architecture- neutral distribution format than gzipped source code; the latter has some additional technical advantages, such as being easier to modify, being easier to optimize, containing comments, and not requiring a special-purpose compressor. However, this advantage is less compelling than the code-size advantage; while Squeak needed a factor of 3 to 10 less code space than native-code compilation systems, it looks like bytecode uses more like a factor of 1 or 2 less space than gzipped source code --- in a couple of trivial examples, which might not be completely realistic. * Porting to new platforms is easier I'm not sure if this is true, but it seems plausible: a native-code generator is more tightly coupled to the CPU architecture than a bytecode interpreter, because if the bytecode interpreter is written in a language available on the new target platform, you may be able to port it with a simple recompile. This may seem irrelevant today when all "personal computers" and most servers use the Intel 80386 instruction set, but there is a lot of uncertainty on the horizon: - ARM is still very popular on mobile phones, which are computers, are personal, and outnumber "PCs" perhaps ten to one; - AVR and PIC are very popular in the deep-embedded space; - the smoothness of Apple's recent move to Intel processors demonstrates that the barrier to entry for new CPU architectures is at an all-time low; - most networked appliances (from companies like Cisco, D-Link, Axis, and Broadcom) use non-386 CPUs that run Linux, and there are a huge number of them; - the Cell chip in the Playstation 3 has two different kinds of non-386 CPUs, and like the above, game consoles outnumber "PCs", but unlike the above, have enormous computational power; - after the coming transition to highly concurrent software takes place, the metric by which CPUs are valued may change from serial processing throughput to aggregate processing throughput, which may create an opportunity for disruptive CPU innovation. But, for the moment, portability matters much less than it used to. * Other advantages? It could be argued that bytecode compiler/interpreters are simpler than dynamic native-code compilers. I don't have enough experience to know whether this is true in general, but I'm pretty sure that if you're running on top of an environment like SBCL, it isn't. Even spawning an instance of gcc and dlopening the resulting object is likely to be an acceptable speed, and then you only have to generate C code. (You have to be careful not to #include any large .h files, though.) Parse-tree walkers, such as the Perl interpreter and the kind of Lisp interpreter found in books like SICP and EOPL, are simpler than either one, but they tend to be slower too. * Disadvantages Bytecode interpreters are usually pretty slow; the bytecode interpreter used by Squeak probably accounts for most of the difference between it (295ns) and unoptimized SBCL (52ns). JIT bytecode interpreters, such as recent Java virtual machines, can be less slow, but they clearly lose the code-size advantage normally enjoyed by bytecode interpreters, the postulated simplicity advantage, and probably the compilation-time advantage. * Conclusion Bytecode interpreters can offer unsurpassed code compactness, although Python's and OCaml's bytecode formats completely fail to do so. This matters a lot on microcontrollers, but I don't know whether it matters a lot or a little on modern personal computer CPUs --- this Mac has 2GiB of RAM, but only 2MiB of L2 cache, and presumably something like 16-64KiB of L1 cache. Thrashing the cache is soundly punished. They also offer a size advantage over gzipped source as a software distribution format, but this advantage is less compelling. This suggests that they ought to be popular in the demoscene, where there's apparently a lot of action in the "coolest audiovisual demo under N bytes" categories these days, for values of N such as 128, 256, 4096, and 65536; if they aren't, it casts doubt on this proposed advantage. I know of no other reason to implement an interpreter using bytecode. So I'm surprised it's such a popular thing to do! I think the reason is probably that code space and compilation time used to be quite precious resources (not to mention portability), and programmers just haven't adjusted to the new realities. From kragen at pobox.com Thu Mar 15 03:37:01 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Thu Mar 15 03:37:03 2007 Subject: first Bicicleta interpreter speed measurements: very slow! Message-ID: <20070314193211.B7545E34112@panacea.canonical.org> Summary ------- My OCaml-hosted Bicicleta interpreter is now to the point that it can run programs with integers and booleans, but it's about 8000 times slower than the OCaml interpreter at doing simple arithmetic, and it leaks huge quantities of space, apparently about a byte per method call in my microbenchmark. I need to come up with a solution to the excessive space usage if the system is going to be usable, even for experimentation, and I may need to make it faster. The interpreter is under 400 lines of OCaml, not counting unit tests and some 250 lines of code in the Bicicleta language itself. Without Attribute Caching ------------------------- So I thought I'd try the dumb fib microbenchmark to get a vague idea of how slow the current, tree-reduction-based OCaml implementation of the Bicicleta language is. Here's the OCaml version: let rec fib n = if n < 2 then 1 else fib (n-1) + fib (n-2) in print_int (fib 32) ; print_newline() ;; Byte-compiled, in PowerPC emulation, that runs at about 5 million base-case recursions per second. (Since it adds together all the values produced by the base-case recursions, and all those values are 1, the return value is the number of base-case recursions. The total number of fib calls is one less than twice the number of base-case recursions.) Here's a version in Bicicleta's language: {env: fib = {fib: arg1 = 3 '()' = (fib.arg1 < 2).if_true(then = 1, else = env.fib(fib.arg1-1) + env.fib(fib.arg1-2))} }.fib(16) That runs about about 120-140 base-case recursions per second, running on top of the OCaml implementation mentioned earlier, and it seems to take time roughly linear in the number of base-case recursions. That's slower by about a factor of 44000 than its host interpreter, which is a lot. It's still fast enough that you could use it for small experiments. There's some headroom (about a factor of 10 or 20) above the OCaml implementation that I can probably take advantage of by using ocamlopt and not doing CPU emulation. The basic plan was as follows: - spend as little effort as possible to write a very minimal implementation in some language with an implementation that already runs, providing as few primitives as possible. The current implementation is 372 non-blank non-comment non-unit-test lines of OCaml, ocamllex, and ocamlyacc, and it can now run "hello world" style programs like the one above. It's still missing some features, especially introspective and imperative ones. This has taken me two weeks, so I'm glad I didn't start out with a bigger piece of the project. - write an IDE for the language in itself. This doesn't have to be anything fancy, but it needs to be enough that I start to experience the benefits supposedly accruing to the spreadsheet-style UI enabled by the language semantics. - write a compiler in Bicicleta for itself, so that we can generate C or JavaScript from some version of the Bicicleta metacircular interpreter. - compile a native-code version of the IDE and compiler using the metacompiler. Ideally this version will support dynamic compilation to machine code without restarts, reducing the performance problems inherent in the language's difficult semantics to a tolerable level, allowing focus on the next task: - polishing the IDE and tailoring it to particular application areas. But it looks like maybe things are slow enough that I'd better put in a little more work on the first step before proceeding any further. I tossed in a few lines of code to count method calls by name, and here's what I found, on exactly the code above, which returns 1597: sys: 326595 arg1: 313819 (): 252596 userdata: 140008 object: 110994 normal_commutative_number: 106203 native_integer: 106203 intrinsics: 106203 new: 66013 result: 36997 other: 36997 op: 36997 coerce: 36997 binop: 36997 as_integer: 36997 arg2: 36997 add: 36997 +: 36997 !!: 36997 integer_add: 33804 subtract: 32208 negated: 32208 integer_negated: 32208 -: 32208 true: 6386 if_true: 4789 less_than: 3193 integer_less_than: 3193 fib: 3193 bool: 3193 <: 3193 false: 3192 then: 1597 else: 1596 native_string: 2 show: 1 integer_show: 1 total: 2094769 So we're only calling fib 3193 times, which seems about right. But we end up calling prog.sys 326595 times and various arg1 things 313819 times, and so on, and doing about 2.1 million calls in all. It's a little strange that every call to 'fib' involves almost 12 calls to '+'. With Attribute Caching ---------------------- So this suggests that we can get a substantial speedup already by caching object attributes, converting tree reduction into graph reduction: maybe a factor of 100? Well, I added the code to cache object attributes, and here were the results: arg1: 52675 (): 49484 userdata: 23944 sys: 19162 intrinsics: 19155 native_integer: 15963 result: 7981 other: 7981 op: 7981 new: 7981 coerce: 7981 as_integer: 7981 arg2: 7981 add: 7981 !!: 7981 +: 6385 binop: 4789 integer_add: 4788 true: 3196 if_true: 3194 less_than: 3193 integer_less_than: 3193 fib: 3193 bool: 3193 <: 3193 subtract: 3192 negated: 3192 integer_negated: 3192 false: 3192 -: 3192 then: 1597 else: 1596 object: 3 native_string: 2 show: 1 normal_commutative_number: 1 integer_show: 1 total: 309690 That's about a factor of 6.8 improvement on this microbenchmark, which is noticeable but still not that great. This adds up to: - 16.5 accesses to arg1; - 15.5 calls to '()'; - 2.5 calls to '!!', add, as_integer, coerce, new, op, other, and result; - 2 calls to '+'; - and 1 call to '-' per call to fib. And it runs considerably faster, at 650 base-case recursions per second, about 5 times faster. A lot less than the factor of 100 I hoped for! But not bad for adding the 12 lines of code to cache the results. Some form of this optimization is absolutely necessary for any larger program. Memory Usage With Attribute Caching ----------------------------------- However, it also took 100 MB of RAM to calculate fib(20). Another six lines of code to only allocate caches only for objects that use them cuts it down to 70MB (and speeds it up very slightly), but 70MB is still way too much --- that's for about 2.1 million calls. It runs in more or less constant space (hovering around 3.3MB for four minutes) if I clear the cache immediately after putting each item into it, which of course makes it run very slowly again, but it makes it clear that it's the caches that are the problem and not, say, a lack of tail-recursion. The size of the problem suggests that it's hanging on to some amount of stuff from every one of those 2.1 million calls --- not just the 20 that might be on the call stack at any one time. This surprised me, because I would expect expressions like the env.fib{fib.arg1-2} in env.fib{fib.arg1-2}.'()' to become garbage fairly quickly --- it's an anonymous class produced by inheriting from env.fib, and the only thing we hold on to from it is its '()'. Since this is just a cache-management problem, coming up with a version that doesn't break stuff is easy, but performance characteristics will be potentially very tricky. Slowness Is Partly Due To Indirection ------------------------------------- It's still about 8000 times slower than OCaml. Looking at it a slightly different way, it's executing about 92000 method calls per second in order to perform the 650 base-case recursions or 1300 calls to "fib" per second, while the OCaml and other versions need only execute one function-call and return per call to "fib". Considered that way, it's only about 110 times slower than its host OCaml, which is pretty much what you'd expect for an interpreter. It's just that building each call to "fib" out of 70 lower-level method calls ends up costing an additional factor of 70 in performance. So if, for example, you took out some of the library magic to support multimode arithmetic and double dispatch and the very small primitive set, it could get a lot faster. Compare --- fib(20) on the normal version: arg1: 361192 (): 339303 userdata: 164179 sys: 131350 intrinsics: 131343 native_integer: 109453 result: 54726 other: 54726 op: 54726 new: 54726 coerce: 54726 as_integer: 54726 arg2: 54726 add: 54726 !!: 54726 +: 43781 binop: 32836 integer_add: 32835 true: 21894 if_true: 21892 less_than: 21891 integer_less_than: 21891 fib: 21891 bool: 21891 <: 21891 subtract: 21890 negated: 21890 integer_negated: 21890 false: 21890 -: 21890 then: 10946 else: 10945 object: 3 native_string: 2 show: 1 normal_commutative_number: 1 integer_show: 1 total: 2123396 real 0m23.360s user 0m23.084s sys 0m0.252s And a version where '+', '-', and '<' just go straight to a primitive without any further layers of indirection, and we have a '-' primitive: '()' = arg1: 186070 (): 131345 sys: 109460 userdata: 109454 native_integer: 87563 arg2: 54726 true: 32840 new: 32836 false: 32835 if_true: 21892 integer_less_than: 21891 fib: 21891 bool: 21891 <: 21891 integer_subtract: 21890 -: 21890 then: 10946 integer_add: 10945 else: 10945 +: 10945 object: 3 native_string: 2 show: 1 normal_commutative_number: 1 intrinsics: 1 integer_show: 1 total: 974155 real 0m8.908s user 0m8.764s sys 0m0.131s The normal version has 2.18 times as many calls and takes 2.62 times as much time; that makes this version only about 3000 times as slow as OCaml. So this suggests that more symbolic work should suffer somewhat less of a speed penalty than the fib microbenchmark suggests. Also, although I feel a little silly suggesting it at this point given the gross ineffiency of the interpreter in general, that's the kind of cost that polymorphic inline caches and the like can help a lot with. Could It Be Fast Enough Already? -------------------------------- I'm kind of thinking that 1300 calls per second, plus a hypothetical 10x speedup from compiling OCaml to native code, makes it about the same speed as the Bourne shell, far slower than Tcl. I wouldn't really want to develop a compiler written in the Bourne shell, but that has more to do with the semantics of the language than with its performance. Here's a dumb fib in bash, coded to avoid forking: #!/bin/bash fib () { if [[ $1 -lt 2 ]] ; then rv=1 ; else fib $(( $1 - 1 )); set $1 $rv fib $(( $1 - 2 )); rv=$(( $rv + $2 )) fi } fib 25 echo $rv The bash version runs at about 7800 base cases per second; we can expect the Bicicleta one to run at 6500 if the speedup is exactly 10x. Vector Operations? ------------------ If I implement some APL-style vector operations as primitives, which would probably be another 50-100 lines of OCaml, then Bicicleta programs will get essentially native speed on vectorizable operations. That would be great for interactive image processing and 3-D graphics, but I'm not sure it would help with writing a Bicicleta metacompiler. It also has the drawback that it requires that many more primitives from future implementations of the language. Right now, I'm not even using an integer subtraction primitive --- I'm using the negation and addition primitives instead. In theory, I should be able to get all of integer comparison and arithmetic in exchange for just add, negated, multiply, divmod, power, and less-than. From kragen at pobox.com Mon Mar 19 03:37:01 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Mon Mar 19 03:37:03 2007 Subject: Comparison between Aardappel and Bicicleta Message-ID: <20070304211132.EC1D3E3410C@panacea.canonical.org> Aardappel is a very inspiring language that has some important things in common with Bicicleta: - Both languages use a simple tree structure. - There is no fundamental distinction between functions and data structures. - Function arguments have example values. - Consequently, the program is normally never unrunnable while editing, so you can evaluate any subexpression of your program in-place. - Editing normally takes place by dragging bits of the program around with your mouse to make references to or copies of them. But there are some differences: - I'm not entirely satisfied that I know what Bicicleta's imperative side is going to look like. Aardappel's approach to state (and concurrency!) preceded the rest of the language design, and is the most interesting part of the language. - Subexpressions of an Aardappel program are not evaluated automatically while it is being edited; you have to ask for an expression to be evaluated explicitly. - Aardappel uses positional parameters for many things; Bicicleta generally uses named parameters instead. - You can't walk through the evaluation of an Aardappel expression step by step. - Aardappel entirely rejects names for variables, while Bicicleta uses them extensively. - Aardappel uses unification in place of conditionals and consequently can use eager evaluation; Bicicleta uses lazy evaluation and polymorphism in place of conditionals, and does not support unification or even pattern-matching. It's possible that this is a mistake on Bicicleta's part. - In Aardappel, while there isn't a fundamental distinction between functions and data structures, it generally isn't possible for a single object to be both. In Bicicleta, it is. This is less of a difference than it sounds like --- if you have an object X that's a data structure and you want to use it as a function, in Bicicleta you call a method on it, such as X() or X.width or something, and in Aardappel you would say value X or width X. The difference is that Bicicleta lets you do this to any object, even one originally intended for use as a function. - Aardappel doesn't have much in the way of separate namespaces. Bicicleta has so many namespaces you wouldn't believe it; a two-line dumb recursive factorial pogram creates five namespaces. - Aardappel has an existing implementation that produces good code. From kragen at pobox.com Thu Mar 22 03:37:01 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Thu Mar 22 03:37:03 2007 Subject: object-oriented equational rewrite rules Message-ID: <20070304211132.D8302E3410A@panacea.canonical.org> Contents -------- Introduction Extensions Scoping * Syntactic sugar for list comprehensions, using "higher-order" patterns * Explicit declarations * Implicit pattern augmentation Global Scope SIMD Prefix syntax: An Enlightening Syntactic Digression A More Extended Example: A Recipes File Efficiency Connection To SnikiSniki Introduction ------------ I was reading Wouter van Oortmerssen's brilliant thesis again, and I had an idea. His Aardappel language is a linear eager (innermost-first) tree-rewriting system, and he points out that although it doesn't support any kind of inheritance, you can substitute new kinds of objects for old ones, simply by adding new rewrite rules for the functions that need to be successfully applicable to the new objects. So I got to thinking. What if we take that approach to the extreme? Suppose you could define properties on records in terms of rewrite rules on those records' properties? { x: X, y: Y }.r = (X*X + Y*Y).sqrt That would define an "r" method, or property, on any record that had X and Y properties. But you could nest the pattern more deeply, use constants, and just use the property names to name the property values in cases where it wasn't ambiguous; here's an example taken from my toy APL in OCaml (recently posted to kragen-hacks, I think) that shows how to render APL expressions: { unop, value }.show = Unop + " " + Value.show { atom_value }.show = Atom_value.show_atom { parenthesized_value }.show = "(" + Parenthesized_value + ")" { left_op: { atom_value } as Left_op, bin_op, right_op }.show = (Left_op, Bin_op, Right_op).show_binop { left_op: { parenthesized_value } as Left_op, bin_op, right_op}.show = (Left_op, Bin_op, Right_op).show_binop (L, O, R).show_binop = L + " " + O + " " + R { left_op, bin_op, right_op}.show = { parenthesized_value: Left_op }.show + " " + Bin_op + Right_op.show [].show_atom = "()" [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom [N].show_atom = N.show_num Here I'm using [N, M, ...Lst] to mean a list whose first two items are N and M and whose remainder is Lst, a la Lisp (N . (M . Lst)), Prolog [N|[M|Lst]], or OCaml N :: M :: Lst. Without any syntactic sugar, you could still write it as { car: N, cdr: { car: M, cdr: Lst } }. Here's a fragment from my prototype Bicicleta implementation showing the use of a constant ("nodefs") in a pattern: { method_name, method_body, rest: nodefs }.definition = (Method_name, Method_body).show_method { method_name, method_body, rest }.definition = (Method_name, Method_body).show_method + ", " + Rest.show_methods (The real code is in OCaml; that's just a translation.) Presumably you'd want to namespace the property names by some other mechanism to reduce spurious pattern matches. This pattern-matching syntax is less concise than the Caml or Aardappel equivalent, but it seems that it should make it easier to define new kinds of data that implement multiple protocols. It is probably harder to implement efficiently, though. However, you could imagine that many pattern matches will be for the simplest cases, where we're only interested in the properties of a single record and have no particular requirements for them. In this case, you could even omit the entire record! So, for some of the previous examples, you could write: r = (X*X + Y*Y).sqrt show = Unop + " " + Value.show show = Atom_value.show_atom show = "(" + Parenthesized_value + ")" { left_op: { atom_value } as Left_op, bin_op, right_op }.show = (Left_op, Bin_op, Right_op).show_binop { left_op: { parenthesized_value } as Left_op, bin_op, right_op}.show = (Left_op, Bin_op, Right_op).show_binop (L, O, R).show_binop = L + " " + O + " " + R show = { parenthesized_value: Left_op }.show + " " + Bin_op + Right_op.show Perhaps you could shorten it further by inferring extra required property names even in cases where the pattern record is not entirely omitted: { left_op: { atom_value } as Left_op }.show = (Left_op, Bin_op, Right_op).show_binop { left_op: { parenthesized_value } as Left_op }.show = (Left_op, Bin_op, Right_op).show_binop By itself, the ability to define these rewrite rules, create records with properties whose names are known at compile time, and read properties whose names are known at compile time, suffices for a Turing-complete higher-order functional programming language; the rest of the above (with infix operators, tuples, lists, and so on) can be viewed as syntactic sugar. (X + Y might rewrite as {left: X, right: Y}.sum, for example.) (The "as Left_op" requires some explanation; it means that the pattern on the left side of the "as" should be bound to the name on the right side, in the above cases the objects that matched {atom_value} and {parenthesized_value}, which might have arbitrary other data in them. I'm not sure if this feature is more than just syntactic sugar, but I suspect so.) In a way, it has a very APLish feel to it --- r = (X*X + Y*Y).sqrt, thought of as a statement, simultaneously creates a new "column" called "r" for all the values known as "x" and "y" (when they are on the same records). In Bicicleta, you can occasionally reach that same level of brevity, but only inside the context of the object. Extensions ---------- There are other obvious extensions --- unification, properties with property names not known at compile-time (in patterns, property accesses, and even property definitions) and inheritance, but these are not necessary --- not even inheritance! Defining a new property will automatically create that property on any record having the necessary attributes, which gives you a sort of multiple inheritance already. I was thinking that this might be an interesting basis for an end-user-oriented database. Doing the equivalent of SQL's SELECT ... WHERE would require a way to name attributes, preferably anonymous ones, so that you could say e.g. "(R < 4).where" or (with an infix operation "where") "Mypoints where (R < 4)", with "(R < 4)" evaluating to an anonymous function or anonymous property that is True for records whose R is less than 4, and False otherwise. Scoping ------- But the database idea in the "extensions" section introduces multiple name scopes in the same expression: R presumably comes from each point, and Mypoints presumably comes from some other context --- it clearly can't come from each point. And we probably don't want to restrict where-expressions to contain only constants and properties of the queried items --- and if we're using the set of properties mentioned in each subexpression to determine when they are even applicable, we have an untenable situation. Here are the options I have come up with. * Syntactic sugar for list comprehensions, using "higher-order" patterns The language as described earlier is already powerful enough to handle this sort of query without the ability to do things like apply anonymous functions. Here the first four lines define a "filter" function, the fifth line defines an rlt4 "function", and the sixth line is the translation of "Mypoints where (R < 4)". ([], _).filter = [] ([A, ...As], F).filter = ((F, A).apply, As, F).filternext (true, As, F).filternext = [A, ...(As, F).filter] (false, As, F).filternext = (As, F).filter (rlt4, A).apply = A.r < 4 (Mypoints, rlt4).filter You could add syntactic sugar so that you could write (Mypoints, (A -> A.r < 4)).filter or or [A in Mypoints if A.r < 4] that translated into the above. A more general list-comprehension would also support [A for A in Mypoints if A.r < 4], and perhaps also multiple sequences to loop over. Still, it would probably be better to be able to write that with get-property-by-name, as follows: ([], _).filter = [] ([A, ...As], F).filter = (A.F, As, F).filternext (true, As, F).filternext = [A, ...(As, F).filter] (false, As, F).filternext = (As, F).filter A.rlt4 = A.r < 4 (Mypoints, rlt4).filter Because then you could say (Mypoints, is_visible).filter. Maybe that doesn't matter if you can say [A in Mypoints if A.is_visible]. In the language as I've discussed it so far, (rlt4, A).apply (or A.rlt4) is still defined for records that don't have an r --- the left-hand-side pattern doesn't mention r, and the right-hand side doesn't mention R (which would cause an inferred r on the left-hand-side). But presumably "A.r < 4" would evaluate as "error < 4" or "nil < 4" or something, and that should presumably not get ignored by default. But maybe you could write a different kind of filter that did ignore it. Maybe you could also write ({r} -> R < 4), in which case (rlt4, {}).apply would fail to match (rlt4, {r}).apply, and your evaluation would get stuck in the middle when ((rlt4, {}).apply, As, F).filternext failed to reach normal form. Probably that should also cause it to return an error. Unlike Aardappel, this language family does distinguish between data structures and functions at a basic level, which would give it an excuse to return an error. The anonymous-property syntactic sugar would probably benefit from multiple pattern-match cases, so you could write ({r} -> R < 4 | {nonpolar} -> false) or some such. Anyway, with this kind of filtering, you could add attributes to master records from joins --- using Avi Bryant's example from DabbleDB, where you have a table of talks with presenter names, and a table of presenters with presenter names and biographies: presenter_rec = [P in Presenters if P.name == Presenter].first presenter_bio = Presenter_rec.bio * Explicit declarations Here's a second way out. In Python and Tcl, you can only assign to variables in the innermost syntactic scope. You could take an analogous approach and only implicitly introduce new pattern elements when a previously undeclared variable was mentioned, and then introduce it automatically in the innermost scope, in which case you could write "Mypoints where (R < 4)" but you wouldn't get the right effect from "Presenters where (Name == Presenter)" --- that would return records that had two properties, name and presenter, with the same value. But you could write {Presenter}.presenter_rec = (Presenters where (Name == Presenter)).first presenter_bio = Presenter_rec.bio * Implicit pattern augmentation You could, also, just declare that property accesses implicitly augment the pattern match, and then you could write Talk.presenter_rec = (Presenters where (Name == Talk.presenter)).first It would be somewhat undesirable to ignore all properties whose computation asked for an undefined property, which I think would be a result of this approach. Global scope ------------ I didn't mention where the variable Presenters comes from in the above. I assumed it came from some global scope, not from every record. Maybe it should have some Ruby-style sigil to indicate that. SIMD ---- Having properties that might have values of vectors of other records could simplify the process, by making query screens, tables of detail records, and the like, just ordinary records. The ().first thing in the example above is ugly; it would be nicer to say this instead: presenter_rec = [P in Presenters if P.name == Presenter] presenter_bio = Presenter_rec.bio For that to work, though, the .bio property access has to automatically map over whatever "where" returns --- which probably means that all properties should be potentially multivalued, as in Prolog, Icon, Pick, Lotus Agenda, or perhaps APL. This suggests the need for either a small and well-defined set of properties that apply to the entire collection rather than each item, or some special syntax for applying any property to the entire collection rather than each item. I'm going to ignore that problem for now and just let some methods apply to the whole thing, while others apply only to part of it, the same mess most of those languages have. This could also provide a neat solution to clashing rewrite rules, although only time will tell if the neat solution is also a useful solution --- it might be preferable to be able to do the equivalent of overriding a method in a subclass so that the base class method is ignored, by providing a more specific rewrite rule. Prefix syntax: An Enlightening Syntactic Digression --------------------------------------------------- Suppose I rewrite some of my previous examples with prefix syntax. r { x: X, y: Y } = sqrt(X*X + Y*Y) r = sqrt(X*X + Y*Y) show { left_op: { atom_value } } = show_binop (Left_op, Bin_op, Right_op) show_binop (L, O, R) = L + " " + O + " " + R show = show { parenthesized_value: Left_op } + " " + Bin_op + show Right_op show_atom [] = "()" show_atom [N, M, ...Lst] = show_num N + " " + show_atom [M, ...Lst] show_atom [N] = show_num N definition { rest: nodefs } = show_method (Method_name, Method_body) definition = show_method (Method_name, Method_body) + ", " + show_methods Rest filter (_, []) = [] filter (F, [A, ...As]) = filternext ((F, A).apply, F, As) filternext (true, F, As) = [A, ...filter(F, As)] filternext (false, F, As) = filter (As, F) apply (rlt4, A) = A.r < 4 filter (Mypoints, rlt4) presenter_rec = [P in Presenters where name P == Presenter].first presenter_bio = Presenter_rec.bio This doesn't change the semantics at all, but it ought to look familiar to users of Haskell or OCaml; now the "methods" look like functions. There are only a few important differences: 1. The patterns are defined, not on the structural representation of the objects, but on the set of functions applicable to those objects. Remember, the list, tuple, and infix notation is just syntactic sugar --- in the list case, [N, M, ...Lst] means { car: N, { car: M, cdr: Lst } }, in the tuple case, (true, F, As) means { n: 3, arg1: true, arg2: F, arg3: As }; and in the infix case, (X*X + Y*Y) means ((X, X).'*', (Y, Y).'*'),'+'. So you can always define new objects that implement whatever protocol is desired for some existing operation, unlike in OCaml. This probably implies that the process of figuring out how and whether a function can be applied will resemble some kind of deduction system. The simplest solution is probably to maintain the set of functions applicable to any particular object. (Fortunately, in the language so far presented, this doesn't require actually running any of the functions --- just examining their dependencies. See "Efficiency".) 2. The objects' "contents" are really point-wise overrides of functions perhaps not otherwise applicable to those objects. You can view a definition like f = { x: 1, y: 2 } as syntactic sugar for the following: x g23132 = 1 y g23132 = 2 f = g23132 Except that { x: 1, y: 2 } is potentially garbage-collectable, while g23132, taken literally, would not be. 3. You can define a new pattern-action rule for an existing function anywhere. This probably implies some kind of specificity ordering, as in Aardappel, and some kind of feedback about when it's applicable. A More Extended Example: A Recipes File --------------------------------------- Suppose you have a recipe file of the following form: {recipes: [ { instructions: "This is how we do it..." ingredients: [ {ingredient: "celery", quantity: 3, unit: "stalk"} ...], servings: 4, } ...] } You can imagine a bunch of queries that might not be too hard to write: is_celery = Ingredient == "celery" celery_ingredients = [I in Ingredients if I.is_celery] {celery_ingredients: [N, ...Lst]}.has_celery = true celery_quantity = Celery_ingredients.quantity.sum # assuming SIMD auto-map celery_per_serving = Celery_quantity / Servings # It would be cool to be able to define quantity_per_serving as a # property of each ingredient, but that requires access to the # surrounding context. ingredients_per_serving = (Ingredients, Servings).ing_divide (List, Divisor).ing_divide = [{ ingredient: Item.ingredient, quantity: Item.quantity / Divisor, unit: Item.unit } for Item in List] # assuming list comprehensions # This next item only applies when you add a Desired_servings field to a # particular recipe. ingredients_for_desired_servings = (Ingredients, Servings / Desired_servings).ing_divide calories_per_unit = [Nut.calories for Nut in Nutrition_database if Nut.ingredient_name == Ingredient && Nut.ingredient_unit == Unit] calories = Calories_per_unit * Quantity calories = Ingredients.calories.sum calories_per_serving = Calories / Servings cost_per_gram = [Item.cost for Item in Latest_shopping_price if Item.ingredient_name = Ingredient] cost = Cost_per_gram * Grams grams = [[Conversion.factor * Quantity for Conversion in Conversions if Conversion.from == Unit && Conversion.to == "grams"], [Density.factor * Quantity for Density in Food_densities if Density.per_what == Unit && Density.units == "grams"]].coalesce cost = Ingredients.cost.sum cost_per_serving = Cost / Servings # Search for a recipe to use up the leftovers in the fridge: ({ingredients}, ingredient).contains_ingredient = [I in Ingredients if I.ingredient == Ingredient].any [Term, ...Terms].search_results = [Recipe for Recipe in Recipes if [(Recipe, Ingredient).contains_ingredient for Ingredient in [Term,...Terms]].all] Most of these should be no harder to write in Bicicleta, but being able to define methods "out-of-line" like this might have its advantages. Efficiency ---------- For a given set of method definitions, the set of methods that apply to a particular object can be mostly statically computed from the set of methods that are assigned values for that object; we can call that set the "base object shape". A finite program contains a finite set of base object shapes, at most one for each object literal in the program, and probably usually a lot less, so you can precompute most of the applicable method definitions for each object shape. It is possible to define methods that exist on some objects of a particular shape, but not others. For example, if we have only this definition: [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom then some objects of shape {car, cdr} will have show_atom defined, while others will not, depending on what their cdr is; and some of those that have it defined will have an error when they try to call show_atom on their cdr. (If we take the suggestion from earlier that we augment the pattern on the left-hand side so that the call on the right-hand-side cannot fail, then we end up with a pattern that matches only infinite lists.) Adding this definition helps matters a bit: [N].show_atom = N.show_num Now any object of {car, cdr} whose cdr is either of shape [] or {car, cdr} has a match. It seems plausible that some kind of type inference might be able to move this kind of pattern-matching to compile-time, leaving only a single conditional branch to run-time. Effectively, this proposal suggests inferring a set of classes in a system from a set of objects and method inference rules, in order to be able to use efficient lookup methods. Some kind of "cut", as in Prolog, is probably also necessary. In much of the above, I've assumed that only the most specific method definition ever applies, which is a kind of "cut". Connection To SnikiSniki ------------------------ In Darius Bacon's SnikiSniki, you can make little tables out of little conjunctions of clauses, which Prolog-style look for solutions in the triple database; if you say [[Person parent Parent, Parent parent Grandparent]] then you get a table of people with their parents and grandparents. In SnikiSniki, this doesn't create a new "grandparent" relationship, but you could imagine that it could. You could express something like the above in this approach as follows: {parent: {parent: Grandparent}}.grandparent = Grandparent Most pattern-matching languages (OCaml, Haskell, etc.) have some kind of "as" feature that lets you give a name an intermediate part of the pattern-matching tree, so that you don't have to write [N, M, ...Lst].show_atom = N.show_num + " " + [M, ...Lst].show_atom and can instead write [N, ...[M, ...Lst] as Rest].show_atom = N.show_num + " " + Rest.show_atom (the point of the M here is to ensure that there are at least two elements in the list --- we don't want a space at the end of the list's show_atom.) With this feature, you could write {parent: {parent: Grandparent} as Parent}.grandparent = (Parent, Grandparent) and get a closer match to the SnikiSniki semantics. I have often thought that SnikiSniki would be dramatically more powerful if you could define object property inference rules like this, and more convenient if you could leave some columns out. However, SnikiSniki's expressions are strictly more powerful, because they allow pattern-matching of arbitrary graphs, not just DAGs: [[Selfdealer owns Foundation, Foundation givesGrantsTo Selfdealer]] [[Doctor caresFor Patient, Nurse caresFor Patient, Doctor is doctor, Nurse is nurse]] [[Person parent Mother, female isSexOf Mother]] I suspect that for cases where the QBE-like equational rewriting approach applies, it is generally terser and easier to understand. You could gain back the expressive power of SnikiSniki where it's needed, at some additional cost to terseness and readability in these cases, by defining the patterns with unification (as SnikiSniki does) and allowing multiple object-tree fragments on the left side: {owns: Foundation} as Selfdealer, ({givesGrantsTo: Selfdealer} as Foundation).self_dealer = Selfdealer {caresFor: Patient, is: doctor} as Doctor, {caresFor: Patient, is: nurse}.works_with_doctor = Doctor female.isSexOf = Mother, {parent: Mother}.mother = Mother These last examples rather imply that the attributes are all multivalued, which is clearly part of Sniki but only potentially part of an equational rewrite system on properties. From kragen at pobox.com Mon Mar 26 03:37:02 2007 From: kragen at pobox.com (Kragen Javier Sitaker) Date: Mon Mar 26 03:37:03 2007 Subject: minuses of the Bicicleta design Message-ID: <20070304211133.090ADE3410D@panacea.canonical.org> Considered as a programming language, Bicicleta has some drawbacks. 1. It tends to err silently. If you use it in the way I expect people normally will, with every argument to every function having a sensible default, the consequence is that forgetting or misspelling the argument in the function call will produce a wrong result rather than an error message. And, by design, there's no way to detect and complain about extra arguments --- for example, if you misspell the argument name. You can reduce the first part of this problem by overriding all the arguments with errors in the public version of a function, or deleting them entirely from the definition, but I expect this to be uncommon. 2. There's no way to override the behavior on derivation. "No way to detect extra arguments" is a special case of this. 3. There's no multiple inheritance. 4. There's no static typing. 5. Class hierarchies are likely to be relatively deep. This has been an obstacle to understanding in other languages in the past, such as Smalltalk, but I am hoping that Bicicleta's user interface improvements will compensate. 6. You can define new methods on objects inherited from existing classes, and you can evaluate code in a context where it will use your modified objects instead of the basic ones, but it might be to add new methods to standard library objects, just as in Java or Python. I expect that these drawbacks will be less important than its advantages.