Date view Thread view Subject view Author view

From: Dan Kohn (
Date: Mon Mar 13 2000 - 01:07:08 PST

The technology battle I'm fighting is to find outsourcable tasks that are
simple enough that English-speaking high school graduates with good tools
and a little training can perform them well, but not so simple that they can
be more cost-effectively performed by computer. Thankfully, computers show
no sign of approaching human capabilities in image and voice recognition,
and the general use of semantic context to interpret meaning. Eventually, I
believe computers will get there, but probably not in my lifetime.

But for a different view, check out Ray Kurzweil's entertaining book on the
coming artificial intelligence revolution, The Age of Spiritual Machines

As for XML vs. ASCII, I think Michael Hart has done a disservice to the open
source ebook community by insisting solely on ASCII. Specifically, I'm
concerned that his use of ALL CAPS (and lack of any authoring guidelines
regarding encoding structure) eliminates the simplest HTML distinction
between Emphasis and Strong, let alone the reasonable application of

Yes, it was hardly clear in 1971 that XML would become the One True Way(tm),
but INTIME and GML were available and (with a simple DTD) would have been
perfect. Requiring the latest beta of a web browser is a complete red
herring, because all files could have (and now will be) made available as
ASCII as well. It's just that the source would have included structure, for
those who wanted it. I think it's unreasonable to put a generic markup
system in the same category as an OS or program, given that the former can
be programmatically reduced to plain vanilla ASCII in seconds.

Kragen could probably write a Perl program to do it in under 10 minutes (and
yes, Perl wasn't around in 1971 either, but Cobol was). It's always easy to
remove information (editing out tags), but (until we have strong AI, see
above) it's impossible to programmatically (and reliably!) add structure
into documents that are lacking it.

Thus, I presume we would do the data entry as XML and submit both XML and
ASCII versions simultaneously.

                - dan

Daniel Kohn <>
tel:+1-425-602-6222  fax:+1-425-602-6223 

-----Original Message----- From: Adam L. Beberg [] Sent: Thursday, 2000-03-09 21:36 To: Dan Kohn Cc: Fork (E-mail) Subject: Re:

On Thu, 9 Mar 2000, Dan Kohn wrote:

> I'd appreciate getting your comments on >

I wonder how long your window of opportunity is. OCR gets better every day, and voice is doing the same. Better-Than-Human(tm) OCR is not far off with the dictionary and grammer checking. IBM has some smoking voice recognition stuff (in their labs) that makes my hearing seem bad - which it is...

Unfortunately I see a very similar situation with many things. Can the undeveloped world catch up at all before they are obsolete, and can the human race?

Since I'm never gonna finish my bloody essay, here is what I have so far definately worth the ponder even in it's unfinished form. It's pre .../out-there.html - I'm far less optimistic now :)

> "Also, all spare "cycles" would be applied to entering Project > Gutenberg texts, which I think is a great way to demonstrate the > power of the approach."

As the one who made sure PG got that 8K$ from rc5-56 [insert long story here] all i can say is "Whoohoo" :) But please stick to ASCII until _all_ books are entered - the XML project is a "second pass" effort. The value to humanity from paper->ASCII is far far greater then ASCII->XML. ASCII is still more universal, and doesn't look stupid without the latest beta of a web browser.

- Adam L. Beberg The Cosm Project - -

Date view Thread view Subject view Author view

This archive was generated by hypermail 2b29 : Mon Mar 13 2000 - 01:20:04 PST