SOX & SOAP: OO apis to XML & XML for OO apis (part 1)

Rohit Khare (
Tue, 06 Oct 1998 00:49:13 -0700


Latest flea over the transom: veoSystems finally surfaces with their XML
superglue application integration formula, in
six parts:

1. Schema for Object-oriented XML (SOX) Specification
2. Core XML DTD for SOX
3. HTML Text DTD
4. Core schema for SOX
5. HTML Text schema module
6. Typedefs schema module

Connolly's official comment is to the point: "W3C is pleased"

Not goaded, but pleased. It's been lumped in with DCD and RDF for the XML
Schema wonks to pore over. At least there's something to read through now,
unlike still-secretive efforts to hammer out an XML-RPC between DataChannel,
Dave Winer, Microsoft, and assorted human shields. MS has been, ahem,
driving the Simple Object Activation Protocol (SOAP) effort to catch back up
with that meme. Damn, I miss getting the good dirt firsthand. Grad school is
such the boonies some mornings...

Ponder the symmetry noted in the Subject line: We have efforts to
disassemble XML into object fragments to process structured data natively
within fine-grained application module calls. We have efforts to marshal
fine-grained intra-application calls into XML over the wire. Why not
short-circuit XML and Web protocols and just do the damn RPCs?! I sense more
than a little of that in the HTTP-NG spirit... and yet, there is value in
the 'bloated', human-editable intermediate stage: Decoupling. As usual, the
intermediate becomes a fulcrum for people to mix and match technologies from
column A and column B:

(decreasing power)
================== ===========
SOX webMethods
DCD WebBroker
RDF (HTTP-NG type system)
MS Forms by-hand

[Anyway, on to SOX...]

"SOX provides an alternative to XML DTDs for modeling markup relationships
to enable more efficient software development processes for distributed
applications. SOX also provides basic intrinsic datatypes, an extensible
datatyping mechanism, content model and attribute interface inheritance, a
powerful namespace mechanism, and embedded documentation. As compared to XML
DTDs, SOX dramatically decreases the complexity of supporting interoperation
among heterogenous applications by facilitating software mapping of XML data
structures, expressing domain abstractions and common relationships directly
and explicitly, enabling reuse at the document design and the application
programming levels, and supporting the generation of common application

[Naturally, like all good little soldiers, SOX documents are themselves
valid XML. The separate DTD grammar of SGML's has been thoroughly repudiated
in the market. And the market, snide comments aside, is the final arbiter.
Lispers can holler that XML breaks no new ground in systems philosophy, but
angle brackets are kicking parentheses' flabby buttocks.]

"a modelling language for information modeling itself"

[Wow, good meta-meta slam in the opener! First, the suck-up quote curtsying
to Dan and Tim (hey, I'd do it myself!), and then a head-spinning
circularity that knocks the reader down and makes 'em listen... for a graf
at least.]

[Wave a SOX processor over a sox file, and ye shall grow fruitful and

Transformation of SOX documents will yield XML DTDs and object-oriented
language classes... documentation derived from the documentation-based
elements in SOX itself, and user interface components.

[Murray Maloney wrote the earliest draft proposal in Dec 1997. In January,
it was a state secret when I sat down with Marty. In May, it was
stonewalling. Still, nine months is *not bad*, especially for a company
whose strategy considered, and demoted standardization as a priority. Veo is
building stuff, but still radiated a spec once the XML Schemas frying pan
got sizzling.]

The development of this schema language was begun by Murray Maloney in
December, 1997 to satisfy the need for a single XML-based language capable
of expressing sufficient information to define simple or complex data
definitions, structures and formats, universally usable names or
identifiers, and documentation. In early 1998, Matt Fuchs began implementing
a processor to derive DTDs, documentation, and programming language
interfaces from Common Business Library schemas defined by Terry Allen.
Based on experience gained building a processor that generates Java beans
from SOX documents, and also his earlier work at Disney and New York
University, Matt Fuchs invented many object-oriented extensions that make
the SOX inheritance features possible. Alex Milowski suggested the concept
of parameterized element types and a syntax for encoding this concept that
led to a further refinement of the object-oriented extensions. Terry Allen's
practical experience creating schemas fed back into an ongoing refinement.
The software development team at Veo, inspired by CTO

[Here's a roadmap -- a very well rationalized document...]


1. Schema language declaration constructs should be useful for the purpose
of modeling markup relationships.

2. SOX documents, as compared with XML DTDs, should enable more efficient
software development processes for distributed applications and dramatically
decrease the complexity of supporting interoperation among heterogenous

* SOX should enable software mapping from SOX documents into data
structures in relational databases, common programming languages, and
interface definition languages (such as Java, IDL, COM, C and C++),
resulting in usable code.
* SOX should enable reuse at the document design and the application
programming levels
* SOX should be able to express domain abstractions and common
relationships among them directly and explicitly. (e.g., subtype/supertype,
* SOX should support the generation of common application components
(marshal/unmarshal, programming data structures) directly from SOX


1. SOX shall use XML syntax and be expressed in valid instances according to
a valid XML DTD.

2. SOX and SOX documents shall be interoperable with XML software and

3. SOX shall enable a software mapping from SOX documents into an XML DTD,
and from an XML DTD into a SOX document without losing the grammatical
structure of the original DTD.

4. SOX shall provide an extensible datatyping mechanism.

5. SOX shall comply with and be compatible with applicable W3C
recommendations, IETF RFCs and ISO Standards, and Proposed Standards.

6. SOX documents shall provide support for embedded documentation.

7. SOX documents shall be human-readable.

[These are my summaries of this section]


Base element types -- based on regexes, provides tuples and columns,
parametrized types

Base data types -- {scalar, enumerated} x formats. (bin, bool, char, time,
num, str)

Documentation -- literate programming

entity simplification -- obviated in some cases by attr/element inheritance;
desire for a general way to defer parsing of an entity for the moment (thus
restricted to atoms)

enumeration -- on any attr or type, not just XML attributes of type NMTOKEN

inheritance -- of content models, attr defs, attr lists, and even attr

namespaces -- defaulting and inheritance intent seems to converge with
latest W3C draft

XML compliant -- SOX dox are xml; though SOX has even stricter syntax rules

[Jargon watch: "schemographer," one who designs schemas. Funny, I thought it
was a typing pool with an agenda: "Nixon had a dedicated pool of
schemographers redacting the tapes day and night, 18 1/2 minutes at a

Future Work

The & connector -- in-core, few data structures *have* to be ordered in one
fixed way, for performance reasons or otherwise. Serializing to a publishing
format, though, does have its biases towards fixed sequences. The SGML hack
for reorderable data was not included here. But they're mentioning it, which
says something...

Valid vs. well-formed -- they'd like some values to be loosed as merely
well-formed XML (say, a used car Description). They were tempted to propose
a wfxml content specification on a par with empty or any.

XLink and XPointer -- too premature to build into SOX.

[the spec then begins the meat of its teaching. *That* I'm not going to
replicate here: just read the section on the Memo DTD-alike they build.
@@occurs="1,*" or "3,9" is quite different from RDF's occurs. @@Actually,
I'm going to have to stop here. I don't have the time tonight to do this
analysis in detail. Short version: read the source to and you'll have it made :-]