RE: n-way sync

From: Meltsner, Kenneth (Kenneth.Meltsner@ca.com)
Date: Fri Jan 12 2001 - 14:01:37 PST


If you can't get client support, it might make sense for the "edge servers" and "internet acceleration appliances" that are starting to show up. Hash-driven document equality checks were part of Marimba's content distribution protocol which was submitted to the W3C without any further comment from anyone. It's also behind FilePool's idea of allowing users to register documents for storage at a central service, which would then issue them a token (hash + a signature, I'd guess) that could be used to retrieve the document for the foreseeable future.

If I were going to make the perfect edge server, it would definitely have a proxy component for caching content -- in the morning, I saw 60-80% cache hit rates at my former employer because everyone came in and checked the same stocks at the same portals. The RProxy feature would be a nice addition, as would a GZIP decoder so content could come over the WAN connection as compact as possible.

The U Wisc researchers on proxy services had other good ideas as well, including active proxy content (essentially applets) which could perform simple actions to customize returned content, send retrieval notices to a central server (good for ad banners), etc. Pei Cao, the researcher behind their proxy research, went off to California, started a proxy server company and disappeared from site (as far as I can tell) when Intel liked their proxy server so much they bought the company.

http://www.cs.wisc.edu/%7Ecao/wisweb.html
-----Original Message-----
From: Lisa Dusseault [mailto:lisa@xythos.com]
Sent: Friday, January 12, 2001 3:03 PM
To: Rahul Dave; Tony Finch
Cc: Justin Mason; Dave Winer; bryan boyer; Adam Rifkin; FoRK@XeNT.CoM
Subject: RE: n-way sync

I've done some work on this subject.

See http://rsync.samba.org, which explains the rsync technology, and
http://rproxy.sourceforge.net/doc/protocol/protocol.html, which explains
how to use rsync over HTTP in a clever way that reduces download
bandwidth.

Rsync is most importantly a way of defining a "signature" for a file. A
signature consists of a weak rolling checksum, to identify where blocks
may begin, and a strong checksum over blocks of fixed size, to identify
which blocks are different between two files with slightly different
signatures. However, it is not intended for use with HTTP/WebDAV and
the high latency of the Web, thus the Rproxy adaptation of Rsync for the
web.

Xdelta
(http://boss.cae.wisc.edu/ftp/hpux/Users/xdelta-1.1.1/xdelta-1.1.1.READM
E.html) is a binary diff format derived from the Rsync work: when two
rsync signatures have been compared an xdelta diff can be generated from
only one copy of the file (unlike most diff algorithms which require two
copies of a file to generate a diff, therefore those algorithms require
near-complete version history). The RProxy specifications at
sourceforge.net use a customized 'gdiff' format, and I've asked why but
not received an answer.

Both rsync and early xdelta releases are under GPL, although more recent
releases are now under a BSD-style license.

The basic idea underlying RProxy is useful even if you don't have a
proxy server. You can set up an original-content HTTP server with
support for a single new request header, and clients that support the
feature send the request header & indicate their support. The server
can often respond with far less data than otherwise necessary. Adding
proxies that support the feature makes the system better, but these are
not necessary. Conversely, having source HTTP servers support the
feature is also not required; as long as the proxies around that support
the feature, the client still benefits.

The thing that is really really required for it to work is client
support. If you're aware of client support for RProxy or interested in
implementing it, I'd love to hear about it.

lisa

> -----Original Message-----
> From: Rahul Dave [mailto:rahul@reno.cis.upenn.edu]
> Sent: Friday, January 12, 2001 12:04 PM
> To: Tony Finch
> Cc: Justin Mason; Dave Winer; bryan boyer; Adam Rifkin; FoRK@XeNT.CoM
> Subject: Re: n-way sync
>
>
> My original point was, syncing over HTTP, and included on the users
> clerver..
> CVS is great, use it all the time, needs to be adapted for transparent
> use I guess.
> All these are filesystem based, but there is no reason not to
> extend to
> arbitrary blobs..
> Rahul
> I got this from you:
> >
> > Justin Mason <jm@jmason.org> wrote:
> > >
> > >I'd like to see a generalised multi-site synchronisation algorithm
> > >implementation for use in a range of stuff, e.g. mail
> syncers, etc. It's
> > >a big problem for decentralised working, and crops up with
> all kinds of
> > >systems...
> >
> > Coda? rsync?
> >
> > Tony.
> > --
> > f.a.n.finch fanf@covalent.net dot@dotat.at
> > "Because all you of Earth are idiots!"
> >



This archive was generated by hypermail 2b29 : Fri Apr 27 2001 - 23:18:31 PDT