Linda and name-value sets

Kragen Sitaker kragen@pobox.com
Wed, 17 Jul 2002 16:17:58 -0400 (EDT)


This post is probably more half-baked than the usual kragen-tol post,
but I might revisit it in a few years and cook it a bit more
thoroughly.  However, at the moment, it lacks motivation, it is likely
to duplicate earlier work without acknowledging it, and it is probably
mostly wrong, because I don't know anything about Linda.

Introduction to Linda
=====================

Linda was originally constructed to handle the problem of writing
parallel programs for massively-parallel supercomputers.

Linda is a nice, simple interface that unifies publish-and-subscribe
event notification, data storage, and general interprocess
communication.  It's based on tuples; each Linda tuple contains some
fields that contain data ("actuals") and others that don't
("formals").  At least the first field must be actual.

There is a shared, flat space of anonymous tuples.  In typical Linda
implementations, tuples are actually stored on one or another
particular node, but this is not visible to the client program.

There are three basic operations in Linda, "in", "out", and "rd", each
of which take a tuple argument; "in" treats its tuple argument as a
pattern with variables where the tuple has "formals", pattern-matches
it against the set of tuples in the tuple space, and deletes a tuple
that matches, returning its contents to the calling program; "rd" is
exactly the same except that it doesn't delete the tuple from the
tuple space; and "out" adds a tuple to the tuple space.  "rd" and "in"
block until a matching tuple is available, and typical Linda
implementations also include non-blocking versions.

There's also a modified version of "out" called "eval", which I don't
understand; it appears that the tuple passed to "eval" may have fields
containing expressions in the client programming language, which Linda
will then cause to be evaluated in parallel in other tasks before
adding the tuple to the tuple space.

Linda tuple spaces are surprisingly powerful.

For example, a tuple space is at least as powerful as a standard
filesystem, modulo permission checking; simply insert <"filecontents",
"/home/kragen/.login", "#!/bin/sh\n..."> tuples into the tuple space,
extending as necessary to support partial file reads, directory
contents, etc.  It supports mutual exclusion straightforwardly; a
particular tuple is inserted into the tuple space at startup, and any
process needing it removes it temporarily.  It supports queues with
<streamname, indexnum, value> tuples, plus special tuples to indicate
the head and tail of the stream; and the queues can support multiple
readers and writers.  It supports dataflow programming: parts of your
program that need data that hasn't yet been computed will block until
it's been computed.  It supports shared-memory programming: <"shm",
address, value>, at whatever granularity of addressing performs best.

Directions to extend Linda
==========================

There are a variety of directions it might be interesting to extend
Linda, most inspired by the creation of more sophisticated
less-centralized systems in the nearly 20 years since Linda's genesis.

Name-value pairs
----------------

It appears to me that using sets of name-value pairs might allow for
the extension of tuples without rewriting all the other code that uses
those tuples.  This is probably more desirable when there are "rd"
clients out there that will match it; if the tuple is going to be
immediately deleted by an "in" operation, it doesn't make a lot of
sense to add extra information to it that won't be used.

This is probably essential for Linda programs to be developed in a
decentralized fashion and have their components evolve independently.

Python lets you write function declarations like this:

def foo(x, y, z, bar="baz", znork=3): pass

Normally, this means that the function has five parameters, named x,
y, z, bar, and znork, of which the last two have default values.
Suppose that this were interpreted to mean "this function should be
called for any tuples with fields x, y, and z, with value "baz" in
field bar and value 3 in field znork"?  Such an implicit "rd" could be
useful for some applications.

Python also offers good syntactic support for a name-value-pair
version of "out": out(x="boo", y="foo", z="goo", bar="baz", znork=3)
is valid Python with no extra preprocessing.

URLs as names
-------------

The first tuple field, which must be actual, is generally used as a
sort of type indicator or name; it's used essentially to prevent
interference between unrelated parts of the program using the same
global tuple store.  Systems in which the coordination costs of
assigning unique names become high could benefit from using URLs or
some other hierarchical scheme for names, ideally with xmlns-style
abbreviation support at the edges.

Needless to say, tuples that are themselves sets of name-value pairs
could probably benefit from a similar approach for their field names.

This is related to Hayek's justification for ownership by preventing
plan interference.  URLs assign ownership of parts of the namespace,
preventing conflicting plans for the use of the same name.

A Web-native tuple-space implementation, with one tuple space for the
whole Web, is conceivable.

Tuple spaces as tuple elements
------------------------------

An alternative approach to preventing plan interference is to provide
private tuple spaces, which can themselves be tuple elements.

I should look up this paper:

   D. Gelernter. Multiple Tuple Spaces in Linda. In J. G. Goos, editor,
   Proceedings, PARLE '89, volume 365 of LNCS, pages 20-27, 1989.

Alternatively, tuples themselves could be tuple spaces, which would
require either that tuple spaces be able to contain something else in
addition to tuples (such as primitive data types, numbers and strings
and so forth) or that some tuples be viewable either as tuples or as
primitive data values.  (For example, a tuple <"value", 3> might be
viewable simply as 3.)

Leases, Transactions
--------------------

In a system involving unreliable parts, you have to be careful about
an agent gaining exclusive access to a piece of data and then never
releasing it because it failed.  Often, such an agent will delete a
tuple from the tuple space, then insert another tuple that matches
many of the same outstanding patterns the old one would have.

If this is all bundled into a transaction, and there is a "lease"
timeout after which the transaction will be aborted (assuming the
process has failed), it is possible to ensure that crashing processes
or network partitions do not break system invariants.

I think JavaSpaces does this.  (That Bill Joy, he's a pretty sharp
guy, or at least he works with some pretty sharp guys.)

Agency
------

Much of the information we handle asserts some proposition or other:
this message is spam, this software is interesting, this paper is
interesting, this software update is safe, this proposed law is
terrible.  Usually the proposition is of uncertain truth --- that is,
reasonable people in possession of all of the relevant information
might disagree as to whether they are true or false --- and even when
they are not, their truth or falsity is often difficult to establish.

As people, we have a well-developed system for guessing about the
truth or falsity of these propositions without thinking about them: we
trust other people to different degrees, and we believe things that
people we trust assert, even without proof.  We tend to trust people
more or less depending on how true their past assertions have been.

Since computer programs are relatively poor at thinking, it should be
useful for them to employ similar techniques.

The Web supports this; global mutable data stores like a tuple space
do not.  Including provenance in tuple-space items might help this.

A counterpart to this is that removing tuples from tuple-space is an
effective way to silence their publisher, so it should perhaps be a
privilege less widely available than the privilege to read them.

Iteration
---------

For the applications Linda was originally developed for, being able to
iterate over the tuples matching a particular pattern was pointless
--- you could just as well use "in" to remove them from the tuple
space one by one.  But I think there are probably applications where
it would be desirable for users to be able to examine and
interactively explor the tuple space, and being able to enumerate the
tuples matching a certain pattern probably requires this.

Objects
-------

The tuples in the Linda tuple space contain simple, passive data, like
the entities transferred over HTTP.  What if they contained mobile
objects with methods instead?

Security and fault isolation
----------------------------

Generally, having a globally-accessible mutable flat data store is
theoretically problematic for preventing faults (accidental or
deliberate) in one part of a system from affecting other parts.
Multiple tuple spaces solve the problem somewhat, especially if the
authority to out() to them is separate from the authority to in() or
rd() them.  But how do you control access to the multiple tuple
spaces?  You can't just put that authority in a tuple in another tuple
space --- you might as well just merge the two.

Would it suffice to require that some field of the tuple (the first,
say) be an actual field whose value is a Lisp-style symbol rather than
a string?

More generally, it isn't apparent to me how to integrate transactions
and capability discipline.

General-purpose computing
-------------------------

Linda is a pretty general-purpose high-level substrate for coupling
pieces of code into a larger system.  What would it take for all the
interprocess communication in my computer to happen through a
Linda-like mechanism with transactions?  Perhaps I could then go from
having a personal computer to having a personal cluster with vastly
more computational power.  Note that in this new world where power
consumption, rather than hardware complexity, is the limit to computer
speed, and power consumption is proportional to the cube of clock
speed, desktop and portable parallelism could be worth a lot more.

-- 
/* By Kragen Sitaker, http://pobox.com/~kragen/puzzle4.html */
char b[2][10000],*s,*t=b,*d,*e=b+1,**p;main(int c,char**v){int n=atoi(v[1]);
strcpy(b,v[2]);while(n--){for(s=t,d=e;*s;s++){for(p=v+3;*p;p++)if(**p==*s){
strcpy(d,*p+2);d+=strlen(d);goto x;}*d++=*s;x:}s=t;t=e;e=s;*d++=0;}puts(t);}