[kragen@canonical.org: Re: reducing charset size for
compressibility with case-shift characters (in Python)]
Kragen Javier Sitaker
kragen at canonical.org
Mon Apr 18 16:33:40 EDT 2011
----- Forwarded message from Kragen Javier Sitaker <kragen at canonical.org> -----
From: Kragen Javier Sitaker <kragen at canonical.org>
To: Joe Blaylock <jrbl at jrbl.org>
Subject: Re: reducing charset size for compressibility with case-shift characters (in Python)
On Sat, Apr 16, 2011 at 10:50:34AM -0700, Joe Blaylock wrote:
> On Sat, 2011-04-16 at 03:37 -0400, Kragen Javier Sitaker wrote:
> > lowercase = 'abcdefghijklmnopqrstuvwxyz'
> > numbers = '0123456789'
> >
> > else:
> > yield current_state[lowercase.index(char)]
> > elif char == DC3:
> > current_state = numbers
>
> Couldn't you achieve a modest increase in compressibility at the expense of
> calculation time by representing all numerical sequences as base-26 encoded
> strings?
Quite possibly. In the Project Gutenberg Bible, that would make a
substantial fraction of the numbers one digit instead of two, or two
digits instead of three.
> You'd have to run a buffer large enough for any numeric runs you
> process, but the transformation itself is easy. You couldn't do that
> nice direct-indexing thing any more though. Well, not without
> creating more abstraction.
Indeed. May I forward this to kragen-discuss?
Kragen
----- End forwarded message -----
More information about the Kragen-discuss
mailing list