downloading mail batches (synchronizing large files efficiently)
Dave Long
dave.long at bluewin.ch
Sat Oct 21 07:22:43 EDT 2006
> My usual mail-management scheme is to rsync my inbox, or at least the
> tail end of it, onto my laptop. But rsync doesn't run over
> store-and-forward.
> ...
> There's an additional problem that both the sender and the receiver
> have to read the entire file before any of it gets transferred, which
> is a problem when I'm being charged by the minute in a cafenet.
Rsync sounds heavy: aren't mailboxes append-only? When does the head
of a mailbox change? Why not sequence your inbox as a TCP-style
stream?
To request new mail, send the length of your laptop's mailbox. The
mail cache then replies with the chunk that lies between
(len(mailboxtail) + offset(mailboxtail)) and your request. Same logic
for your mirror, only because the mirror has a full copy, its offset is
always 0. (so if your request is too stale for the cache, it can tell
you to retry with the full mirror)
That, like TCP, runs just fine over store-and-forward.
(how easy is it to generate HTTP Content-Encoded Range Requests from
python?)
-Dave
> control over the data I'm rsyncing; with a naive implementation of the
> above algorithm, they could send me a chunk of data with the same CRC
> as some later email they expect I will get, in order to prevent me
> from getting the later email.
This scheme has a similar problem, in that Mallet could send a bogus
update. A slight modification would be to request a sequence starting
a few (from a suitable unbounded random distribution?) pages before the
end of your current mailbox, and signalling an error if the overlaps
ever failed to match.
More information about the Kragen-discuss
mailing list