EPS weighs in on encoding standards

Rohit Khare (khare)
Sat, 28 Jan 1995 02:32:58 -0800


I think Eric's on the right track; the only solution to intra-NeXTSTEP HTML
exchange is to extend the type, and write a standard filter service between
these.

Rohit

Begin forwarded message:

Date: Sat, 28 Jan 95 01:18:03 PST
From: eps@toaster.SFSU.EDU (Eric P. Scott)
To: khare@xent.caltech.edu (Rohit Khare)
Subject: Re: MAILING LIST: WebStep - a standards effort for W3-aware document
management
In-Reply-To: <3g20ke$jqt@digifix.digifix.com>
Newsgroups: comp.sys.next.announce
Organization: San Francisco State University
Reply-To: eps@cs.SFSU.EDU

Sigh, whimper.

In srticle <3g20ke$jqt@digifix.digifix.com> you write:
> * Define interchangeable file & pasteboard formats for
> - W3 URLs, URIs, URNs

pasteboard formats are trivial. These are ASCII text, right?

(file formats? huh?)

> - HTML Pasteboard type

messy. I can envision at least three text representations:

"Canonical HyperText Markup Language v2.0 pasteboard type"
All text is ISO Latin 1; CRLF separates lines.

"Portable HyperText Markup Language v2.0 pasteboard type"
All text is ISO Latin 1; \n separates lines.

"NeXT HyperText Markup Language v2.0 pasteboard type"
Literal 8-bit characters use NextStepEncoding;
&#nnn still interpreted as ISO Latin 1; \n separates lines.
[This is most similar to "NeXT plain ascii pasteboard type"]

Of course, if you want to turn HTML into typed streams, things
get a bit more complicated!

> * Specify the .htmd/.htmld document types

bogus. :-)

> * References to selections within NS documents

As in NXSelection? Good luck.

> * Opening URLs in compatible applications

yawn.

> * Explore encodings from NeXTSTEP & Symbol to HTML

obvious. HTML truly believes in ISO Latin 1. This means
that most of NextStepEncoding maps over, and very little of
Symbol does. However, many of the "missing" characters have
agreed-upon named entities in the SGML world, and it would seem
logical to use those. For the remainder, I'd simply extend
&#nnn; to &#nnnnn; where nnnnn is a Unicode code point. This
should take care of NEXTSTEP, Symbol, Zapf Dingbats, etc.

-=EPS=-
"don't thank me, I'll bill you later"