[FoRK] Amazon S3 storage service

Stephen D. Williams < sdw at lig.net > on > Tue Mar 14 18:29:33 PST 2006

I firmly believe that both full filesystem semantics and ACID integrity 
constraints are red herrings and have seen a number projects reach the 
same conclusion.  If you relax those semantics and go for things like 
immutable distributed RAIN with opportunistic caching and manage 
metadata separately, you can be more efficient and far simpler to 
implement.  It's cool to have full distributed filesystems like DFS and 
IBM's newly announced system, but not everything should be mapped onto a 
filesystem API.

Amazon's service is perfect for a range of needs, as is the Google File 
System and other simplified, yet more powerful systems.

With database-based applications, even when you have ACID capabilities, 
there are a number of reasons to avoid updates, avoid "accumulators", 
and otherwise avoid many of the situations where you needed transactions 
to begin with.


J. Andrew Rogers wrote:
> On Mar 14, 2006, at 10:08 AM, Stephen D. Williams wrote:
>> This kind of capability stands in stark contrast to the high premium 
>> for storage paid for infrastructure based on EMC and other 
>> fibre-channel SAN systems.
> What people want is a mostly indestructible networked multi-user file 
> system that just works.  Ironically, some of the most catastrophic 
> file system failures I have seen in the last few years *were* the 
> high-dollar EMC storage systems.  The problem in doing this with 
> geographically distributed clusters of commodity hardware is 
> synchronizing metadata without that becoming the bottleneck 
> somewhere.  If one can specify the capabilities required carefully, 
> this is usually not a huge issue but tends to require architectural 
> matching to the specification.
> The big money market that exists for this type of thing right now is 
> extremely robust continuity-of-business for geographically distributed 
> companies with virtually zero data loss or synchronization loss.  
> Standard transaction theory makes this a PITA, but some networks are 
> becoming fast enough that it is becoming plausible if the file store 
> architecture is tuned for the network parameters/topology.  In fact, 
> one can reliably buffer a huge amount of virtual disk I/O "in-flight" 
> on the network in theory in a fashion that could be very competitive 
> with massive onsite storage arrays for performance, but I do not know 
> of anyone actually exploiting this, possibly because it requires more 
> knowledge of the network than one would normally expose to software.  
> This is the particular problem space that I have been primarily 
> interested in and have spent time researching, in large part because 
> the networks I work on are about as well-suited for this application 
> as they could be.
>> I can think of a couple applications right away.  It is threatening 
>> to distract me from current projects, but I'm resisting so far. ;-)
> Yeah, me too.  It is low-hanging fruit and the market demand is 
> under-served by existing solutions.  There is plenty of money to be 
> made there.
> I have been playing with FUSE for a while, which while cool has a few 
> limitations and quirks.  It has a lot of potential as the glue for 
> this type of thing.
> J. Andrew Rogers
> _______________________________________________
> FoRK mailing list
> http://xent.com/mailman/listinfo/fork

More information about the FoRK mailing list