rapid genetic evolution of regular expressions
Dave Long
dl at silcom.com
Mon Apr 26 11:31:19 EDT 2004
> ... we can evaluate the
> probability that a particular regular expression matches messages in a
> "spam" or a "ham" corpus ...
Intuitively, I'd think one would
need more semantics to winnow the
spam from the ham, but there's a
counterexample (contra Chomsky!)
reported on in:
Liberman, Language Log: "Colorless green probability estimates", 4 Oct 2003
<http://itre.cis.upenn.edu/~myl/languagelog/archives/000025.html>
and I must admit I've been having
a bit of luck with only a handful
of simple rules.
> x86 machine code is a pain, what with the short and long jump
> displacement formats, but it's not intractable.
That's what I'd been trying to do
with bass: provide something just
high enough from the iron to be
easy to generate code for, but at
a low enough level that it'd beat
generating C code, with the idea
that one could use python for the
static-code dynamic-heap parts of
a problem, and bass for crunching
with dynamic code and fixed heap.
(bass' lack of activation records
would work perfectly for a DFA)
I've found it tough to beat C by
more than a factor of 2 (which is
consistent with the fate of most
low-level languages, it appears)
and I've been sidetracked by the
attempt to come up with a duality
between code and data at that low
a level, but it might not be too
difficult to get what's there to
run dynamically generated DFA's
under the control of (and on the
objects from) a python program.
-Dave
More information about the Kragen-discuss
mailing list