[kragen@canonical.org: Re: reducing charset size for compressibility with case-shift characters (in Python)]

Kragen Javier Sitaker kragen at canonical.org
Mon Apr 18 16:33:40 EDT 2011


----- Forwarded message from Kragen Javier Sitaker <kragen at canonical.org> -----

From: Kragen Javier Sitaker <kragen at canonical.org>
To: Joe Blaylock <jrbl at jrbl.org>
Subject: Re: reducing charset size for compressibility with case-shift characters (in Python)

On Sat, Apr 16, 2011 at 10:50:34AM -0700, Joe Blaylock wrote:
> On Sat, 2011-04-16 at 03:37 -0400, Kragen Javier Sitaker wrote:
> > lowercase = 'abcdefghijklmnopqrstuvwxyz'
> > numbers = '0123456789'
> > 
> >             else:
> >                 yield current_state[lowercase.index(char)]
> >         elif char == DC3:
> >             current_state = numbers
> 
> Couldn't you achieve a modest increase in compressibility at the expense of
> calculation time by representing all numerical sequences as base-26 encoded
> strings?

Quite possibly. In the Project Gutenberg Bible, that would make a
substantial fraction of the numbers one digit instead of two, or two
digits instead of three.

> You'd have to run a buffer large enough for any numeric runs you
> process, but the transformation itself is easy.  You couldn't do that
> nice direct-indexing thing any more though.  Well, not without
> creating more abstraction.

Indeed. May I forward this to kragen-discuss?

Kragen

----- End forwarded message -----


More information about the Kragen-discuss mailing list