rapid genetic evolution of regular expressions

Dave Long dl at silcom.com
Mon Apr 26 11:31:19 EDT 2004



>                                      ... we can evaluate the
> probability that a particular regular expression matches messages in a
> "spam" or a "ham" corpus ...

Intuitively, I'd think one would
need more semantics to winnow the
spam from the ham, but there's a
counterexample (contra Chomsky!)
reported on in:

Liberman, Language Log: "Colorless green probability estimates", 4 Oct 2003
<http://itre.cis.upenn.edu/~myl/languagelog/archives/000025.html>

and I must admit I've been having
a bit of luck with only a handful
of simple rules.

> x86 machine code is a pain, what with the short and long jump
> displacement formats, but it's not intractable.

That's what I'd been trying to do
with bass: provide something just
high enough from the iron to be
easy to generate code for, but at
a low enough level that it'd beat
generating C code, with the idea
that one could use python for the
static-code dynamic-heap parts of
a problem, and bass for crunching
with dynamic code and fixed heap.
(bass' lack of activation records
would work perfectly for a DFA)

I've found it tough to beat C by
more than a factor of 2 (which is
consistent with the fate of most
low-level languages, it appears)
and I've been sidetracked by the
attempt to come up with a duality
between code and data at that low
a level, but it might not be too
difficult to get what's there to
run dynamically generated DFA's
under the control of (and on the
objects from) a python program.

-Dave


More information about the Kragen-discuss mailing list