memories and Bloom filters
Kragen Sitaker
kragen@pobox.com
Mon, 13 Mar 2000 15:43:41 -0500 (EST)
Keys stored in Bloom filters have a couple of suggestive similarities
to memories in the human brain:
- they suffer from false positives;
- any part of the filter can be destroyed without destroying any of the
information stored in the whole filter, but only making it more
difficult to retrieve (and, of course, increasing the false positive
rate).
I have explained previously on this list how to use Bloom filters to
associate arbitrarily-large amounts of information with a given key,
albeit with slow bit-serial retrieval (another suggestive similarity?)
Here's a Bloom-filter variant that's even more like human brains. It
is much less efficient on standard hardware, but purpose-built hardware
(which would be fairly cheap and simple) could run it just as quickly
as standard hardware can run normal Bloom filters.
First, randomly turn off bits. Start with a bitvector that is half
full; every time the storage of a key turns a bit on, pick a bit at
random to turn off, so the total number of 'on' bits stays constant.
Retrieval must obviously change. So, second, when searching for a
key, instead of simply ANDing together the appropriate component bits,
you should count them instead. If the count is over some threshold,
you have found a previously-stored key; applying the standard
modification to allow 4-7 bits of 'value' data to be stored requires a
great decrease in performance, or else a machine with 16-128 counters
running in parallel.
Now, this modified system is already more similar to human memory than
the standard Bloom filter is: it "forgets" and gets false negatives.
The older a memory is, the more likely it is to get "forgotten"; it
gradually decreases in recallability until it is no longer
distinguishable from the noise.
If we re-store a key after each time we successfully retrieve it, then
memories that are frequently remembered will be less likely to be
forgotten. This corresponds to observed facts about human memories.
If, instead of storing a key by setting all the bits needed to
recognize it, we only set some random subset of them, then keys that
are stored more frequently will be remembered better. As a handy side
effect, the memory will forget more slowly.
If that random subset of bits is somehow determined by unrelated
circumstances, then you will see the effect demonstrated by
psychologists wherein subjects remember things better if they learn
them in many different contexts than if they learn them in only one
context. Weighting the bits as they are summed for recall according to
the same function --- such that the bits that are weighted most heavily
in particular unrelated circumstances are the ones that would have been
most likely to be set if the memory had been stored in those unrelated
circumstances --- gives context-dependent recall, where you remember
things you learned in the classroom better when you are in the
classroom than when you are in an exam hall.
But that's not the only way to get context-dependent recall, I think.
Note that searches can be carried out on a subset of the whole
bitvector --- the first half, for instance. So you can carry out many
searches in parallel. This is very handy if you don't know which
pieces of your key you are looking for, as human beings generally
don't. (This idea is unfinished.) This is also handy if you need to
search quickly; obviously, your false-positive rate will go up using
only a fraction of the bitvector, so you'll have to raise your
threshold to compensate. (This is suggestive; human long-term memories
recall things faster when the memories are stronger.)
A set of memories that forget (and thus memorize) at different rates
might help to explain other aspects of human memory, but this might be
unnecessary and better-explained by other dei ex machinae.
I don't see any obvious ways to get Bloom filters to act like human
memories in some other ways, though. Human memories are good at
remembering things from partial information; I don't see any way to get
Bloom filters to be good at that. Human memories are also good at
remembering things incorrectly; I don't see how Bloom filters can be
good at that. Human memories are tightly interwoven with human
thought; Bloom filters can't think.
Also, human memories aren't Bloom filters; there's no centralized
hash-function generator to be a SPOF, and they don't store strings.
But these similarities are suggestive; I suspect human memories have a
lot more in common with Bloom filters, especially of this modified
type, than with RAM or standard content-addressable memories.
The original Bloom filter paper, by the way, seems to be:
Burton H. Bloom.
Space/time trade-offs in hash coding with allowable errors.
Communications of ACM, pages 13(7):422-426, July 1970.
A Burton Bloom wrote AI Memo 47 in February 1963, on chess-playing via
heuristic search. These are the only two publications I've been able
to find that list him as an author. (Although AltaVista does offer a
link: "Comparison shop for Burton Bloom.") Knuth is evidently looking
for him too; the only non-CS possibility I've been able to find is a
Burton Bloom born in 1908 who died in 1975 at 66 years of age, who
married a Marjorie Murray:
http://mhv.net/~treetop/bloom/i0002616.htm#i2618
Searching, though, I found a good page on how joins work:
http://redbook.cs.berkeley.edu/lec5.html
--
<kragen@pobox.com> Kragen Sitaker <http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08. Hurrah!
<URL:http://www.pobox.com/~kragen/bubble.html>
The power didn't go out on 2000-01-01 either. :)