downloading mail batches (synchronizing large files efficiently)

Dave Long dave.long at bluewin.ch
Sat Oct 21 07:22:43 EDT 2006


> My usual mail-management scheme is to rsync my inbox, or at least the
> tail end of it, onto my laptop.  But rsync doesn't run over
> store-and-forward.
> ...
> There's an additional problem that both the sender and the receiver
> have to read the entire file before any of it gets transferred, which
> is a problem when I'm being charged by the minute in a cafenet.

Rsync sounds heavy: aren't mailboxes append-only?  When does the head 
of a mailbox change?  Why not sequence your inbox as a TCP-style 
stream?

To request new mail, send the length of your laptop's mailbox.  The 
mail cache then replies with the chunk that lies between 
(len(mailboxtail) + offset(mailboxtail)) and your request.  Same logic 
for your mirror, only because the mirror has a full copy, its offset is 
always 0.  (so if your request is too stale for the cache, it can tell 
you to retry with the full mirror)

That, like TCP, runs just fine over store-and-forward.
(how easy is it to generate HTTP Content-Encoded Range Requests from 
python?)

-Dave

> control over the data I'm rsyncing; with a naive implementation of the
> above algorithm, they could send me a chunk of data with the same CRC
> as some later email they expect I will get, in order to prevent me
> from getting the later email.

This scheme has a similar problem, in that Mallet could send a bogus 
update.  A slight modification would be to request a sequence starting 
a few (from a suitable unbounded random distribution?) pages before the 
end of your current mailbox, and signalling an error if the overlaps 
ever failed to match.



More information about the Kragen-discuss mailing list