From: Eugene Leitl (eugene.leitl@lrz.uni-muenchen.de)
Date: Fri Sep 08 2000 - 00:45:59 PDT
Joseph S. Barrera III writes:
> Some questions to think about:
>
> How do you back up a 10 TB disk?
Why, on a yet another 10 TB disk. Or several of them. I don't use
tape. I backup HDs to HDs at home.
> If you forgot where you put something
> on a 10 TB disk, how long does it take
> to do a grep on the whole thing?
If locate is insufficient (I use it a lot), I will have to use a
full-text index. It can occupy 15% of total space, I don't care.
Global grep of mere 60 GBytes is nonviable even now.
> Let's say we're really optimistic
> and you can read your 10 TB disk
> at 100 MB/s. Then to read the whole
> thing, you need
>
> 10^7 MB / 10^2 MB/sec = 10^5 sec
>
> 10^5 sec / 3600 sec/hr = 27.8 hours
Great, 100 MByte/s stream for >24 h sounds very good for data
acquision.
> ... so it takes more than a day
> to do a linear scan of the disk.
So we have to index incrementally, so that no total reindex will be
necessary, only incorporation of diffs. And clearly, fsck is a no go,
so a journalling file system must be used.
> If you're grepping, or actually
> reading files from a file system,
> then it's going to take a lot longer.
Of course the real solution for this is parallelism. You can't expect
to treat huge masses of data in a purely sequential fashion. Rotating
bits will have to go.
> Jim Gray has a talk on his website
> that goes into more detail about
> the challenges of huge disks.
> If I have time, I'll track it down.
>
> - Joe
This archive was generated by hypermail 2b29 : Fri Sep 08 2000 - 02:40:51 PDT