scalable flooding information distribution
Kragen Javier Sitaker
kragen at canonical.org
Sat Jun 11 13:48:25 EDT 2011
On Sat, Jun 11, 2011 at 07:59:43AM -0600, Zooko O'Whielacronx wrote:
> Possibly related:
> The description refers to other Tahoe-LAFS terminology, but basically
> the proposal amounts to using the Chord structure for broadcast
> instead of routing.
I see. If I understand correctly, the algorithm you've described is the
standard hypercube broadcast algorithm; if you have 2^N nodes, one for each
node identifier (which I know is not the usual case for Chord!) then the
fingers are exactly the links in the hypercube, and the algorithm is simply
that at timestep i, all nodes that already have the message transmit a copy to
the node whose address differs from their own by 2^i.
You've generalized this implicitly by using Chord as the basis, so that the 2^N
node identifiers are distributed among a smaller number M of physical nodes,
and you can skip the first N-lg(M) timesteps.
Unfortunately, this is just a particularly low-latency way to implement
flooding, (in the simple hypercube case, at least, it's provably optimal) and
still suffers from flooding's limit on scalability. While Moore's Law has made
that limit relevant for a progressively diminishing number of applications,
there will always be applications for which it is relevant.
In the case where each node only cares about a subset of possible
announcements, and the subset is easily identifiable by a key, you can
distribute asynchronous announcements in a manner more efficient than flooding
by using the Chord topology, and a cache-invalidation protocol: when you fetch
a key-value pair from a node, the node sends the current value immediately, and
later, if possible, an asynchronous notification that the previous value is no
longer valid, which prompts you to refetch it. You also poll periodically in
case the asynchronous notification failed to get sent --- perhaps the node ran
out of space for addresses to notify of invalidations, or crashed, or went
offline, or couldn't reach you; or perhaps your IP address changed.
If the value is something like "the last day's worth of announcements on
channel 328f" or "the last 128 announcements on..." then you have a workable,
though clearly not optimal, system for distributing events.
More information about the Kragen-discuss