patricia indices for directed graphs
Kragen Sitaker
kragen@pobox.com
Fri, 8 Feb 2002 17:19:29 -0500 (EST)
Dave Long writes:
> > Unfortunately, I think that the resulting Patricia trees will need to
> > represent the DAG paths internally, as they can't rely on the DAG to
> > hold them (the way they can with linear sequences of text.)
>
> Why not?
Well, in a string, I can identify a suffix of the string uniquely by
its start point; so a Patricia tree can store only the start point.
But in a DAG, in the general case, there are many possible paths from
each start point; there is no unique point in the tree to identify a
particular path. In a tree, you can uniquely identify a path by its
start point and its end point (and you can follow a path so
identified, albeit backwards, if you have pointers back up the tree),
but in a DAG, there may even be many paths between a start/end pair.
Side thought: the number of paths in a tree or DAG also grows a bit
more quickly than the number of nodes (but not unduly so), while the
number of suffixes of a string is simply one more than the number of
characters in it.
In a tree, the number of paths ending at a leaf node is the number of
nodes on the path from the root node to the leaf. So the number of
paths ending at leaf nodes is the sum of the depths of the leaves of
the tree, which grows fairly slowly with the size of the tree.
A DAG can be converted to a tree by deep-copying subtrees with more
than one ancestor; the number of paths to leaf nodes in the DAG will
be no more than the number of paths to leaf nodes in this tree. (It
will be less by the number of paths inside the copied subtrees.) But,
of course, general DAGs can produce exponentially-growing
corresponding trees.
Patricia indices on trees could be useful when, for example,
performing XQL queries on large documents.