Re: eBay's hot potato toss

Kragen Sitaker (kragen@pobox.com)
Fri, 18 Jun 1999 13:58:16 -0400 (EDT)


Someone wrote:
> Last I checked, Oracle has (very expensive) redundancy software, so when
> one server goes kaput the other takes its place. They can do this with n
> servers, in a lineup (where n-1 are backups waiting to take over if the
> guy in front fails) or in combinations of active and backup servers, with
> either full redundancy or load spread across db's (kinda like raid).
>
> Yes, it's expensive. Yes, they'd need a whole extra dba or two. No
> excuse for a publicly traded company not to invest the $$ protecting their
> bread and butter...

Redundancy alone generally does not increase resilience in the face of
software failure. As noted in the message to which you were replying,
replicating a corrupt database gives you multiple copies of a corrupt
database.

I would not be willing to publicly speculate, as you just did, that
eBay did not replicate their database on multiple servers. My guess is
that they did, and it didn't help.

One practice that *does* increase resilience in the face of software
failure is diversity -- maintaining copies of all your data on
different versions of the same database server, or even different
brands of database server. The problem with diversity is that it is
*extremely* expensive, and doesn't do any good unless you have the kind
of database failure eBay just did -- something that probably happens,
on average, every 50-100 years per Oracle installation.

[DISCLAIMER: I am not a DBA. I haven't been a DBA since 1996, I wasn't
a mission-critical DBA then, and I have never been an Oracle DBA. I
have never set up systems using diversity to guard against software
failures.]

-- 
<kragen@pobox.com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
According to my medieval text in the seventh century a finalizer raised a
dead object named Gorth who infected every computer in Cappidocia ending
Roman rule in the region.  -- Charles Fiterman on gclist@iecc.com