[FoRK] Streaming databases

J.Andrew Rogers andrew at ceruleansystems.com
Sat Nov 20 10:40:37 PST 2004

On Oct 30, 2004, at 12:51 PM, Meltsner, Kenneth wrote:
> Possible concrete example supporting my near-total inability to pick 
> technologies: Stream processing databases.

There is something to this, but the precise technology that is needed 
is not obvious yet.  You have competing features and capabilities, and 
one can find just about every conceivable combination out in the wild 
on an experimental basis.  There is a lot of interest in the 
technological concept of decentralized stream models, but what the 
anointed model will be is not yet obvious.

While we are not a "stream processing" company, we (err, "I") actually 
developed a software framework very similar to it internally to deal 
with our geographically distributed applications.  And we've actually 
gotten a ton of interest in that system such that it is now being 
converted into a commercial product -- some very large companies and 
VCs have basically done a "the application is mildly interesting, but 
we are very interested in the technology that made it possible" 
(neither the application nor our framework is our business, but 
operational support).  So there is some money and heat there.  I'm 
working on a redesign of the system right now, fixing some nuisances 
and design issues that showed up in production with the original model.

The particular model we've developed and settled on is essentially a 
transparently and pervasively decentralized event-driven Message 
Queuing model that adaptively optimizes its internal data flow and 
allows some basic stream shaping -- we've had no need to add 
sophisticated querying, though that would be a trivial extension of the 
underlying architecture.  There are quite a few things to work out: 
ultra-reliable delivery, delivery order guarantees (and compromises), 
extreme fault tolerance and recoverability, scalable traffic patterns, 
the usual distributed identity/transaction problems, etc.   I've 
noticed that most streaming database models tend to assume more 
centralization which simplifies some of the problem space, whereas 
genuine decentralization was a necessary requirement for us that drove 
us to do the original work on this in the first place.  In fact, the 
reason we originally built our own system was that there really was no 
existing system that we were aware of that had the feature set we 
needed.  The event-driven MQ model is very nice programmatically, but 
making the model work transparently on what is essentially an 
unreliable P2P fabric is a bit of a pain.  Fortunately, there has been 
a lot of work on P2P from which to bum idea starting points.


j. andrew rogers

More information about the FoRK mailing list