I've done some work on this subject.
See http://rsync.samba.org, which explains the rsync technology, and
http://rproxy.sourceforge.net/doc/protocol/protocol.html, which explains
how to use rsync over HTTP in a clever way that reduces download
bandwidth.
Rsync is most importantly a way of defining a "signature" for a file. A
signature consists of a weak rolling checksum, to identify where blocks
may begin, and a strong checksum over blocks of fixed size, to identify
which blocks are different between two files with slightly different
signatures. However, it is not intended for use with HTTP/WebDAV and
the high latency of the Web, thus the Rproxy adaptation of Rsync for the
web.
Xdelta
(http://boss.cae.wisc.edu/ftp/hpux/Users/xdelta-1.1.1/xdelta-1.1.1.READM
E.html) is a binary diff format derived from the Rsync work: when two
rsync signatures have been compared an xdelta diff can be generated from
only one copy of the file (unlike most diff algorithms which require two
copies of a file to generate a diff, therefore those algorithms require
near-complete version history). The RProxy specifications at
sourceforge.net use a customized 'gdiff' format, and I've asked why but
not received an answer.
Both rsync and early xdelta releases are under GPL, although more recent
releases are now under a BSD-style license.
The basic idea underlying RProxy is useful even if you don't have a
proxy server. You can set up an original-content HTTP server with
support for a single new request header, and clients that support the
feature send the request header & indicate their support. The server
can often respond with far less data than otherwise necessary. Adding
proxies that support the feature makes the system better, but these are
not necessary. Conversely, having source HTTP servers support the
feature is also not required; as long as the proxies around that support
the feature, the client still benefits.
The thing that is really really required for it to work is client
support. If you're aware of client support for RProxy or interested in
implementing it, I'd love to hear about it.
lisa
> -----Original Message-----
> From: Rahul Dave [mailto:rahul@reno.cis.upenn.edu]
> Sent: Friday, January 12, 2001 12:04 PM
> To: Tony Finch
> Cc: Justin Mason; Dave Winer; bryan boyer; Adam Rifkin; FoRK@XeNT.CoM
> Subject: Re: n-way sync
>
>
> My original point was, syncing over HTTP, and included on the users
> clerver..
> CVS is great, use it all the time, needs to be adapted for transparent
> use I guess.
> All these are filesystem based, but there is no reason not to
> extend to
> arbitrary blobs..
> Rahul
> I got this from you:
> >
> > Justin Mason <jm@jmason.org> wrote:
> > >
> > >I'd like to see a generalised multi-site synchronisation algorithm
> > >implementation for use in a range of stuff, e.g. mail
> syncers, etc. It's
> > >a big problem for decentralised working, and crops up with
> all kinds of
> > >systems...
> >
> > Coda? rsync?
> >
> > Tony.
> > --
> > f.a.n.finch fanf@covalent.net dot@dotat.at
> > "Because all you of Earth are idiots!"
> >
This archive was generated by hypermail 2b29 : Fri Apr 27 2001 - 23:18:30 PDT