Separating Heads and Tails, or Code and Data

Rohit Khare (rohit@bordeaux.ICS.uci.edu)
Wed, 18 Feb 1998 03:29:51 -0800


[Rohit's section]

Glen Ropel quoth:

> Both data and code are fossilized behavior. One cannot
> specify behavior in a static medium any more than one can
> identify data in a dynamic medium.
>
> Whether one is distinguishing between data and code by saying
> that "Code is expensive and data is ephemeral" or "... as platform
> half-lives collaps, externalized data lasts longer and longer by
> comparison", it's still a distinction between code and data.
> And this distinction is false.

and Patrick Logan, chorus :

> Ideally, I agree. The lack of a distinction is the asymptote.

I think we'd all agree that today, in practice, there is a vast gulf between
data and behavior represenations. The difference in philosophy is that we
think we think the orthogonalization is essential, not accidental (reread that
sentence: that's not a duplicate imperative).

Many algorithms can trade off space for time, but we'll put aside the issues
of working store as 'data' and concentrate on the 'externalizable' state of an
application object. An auto part record has many negotiable particulars in it,
but the choice of what data to include or include in its state description is
more restricted than the choice of how many factors to leave on the heap in
differential cryptanalysis.

Application-domain modeling inherently has to separate the state of an object
and the actions upon it. The actions might be purely functional, allowing the
behavior to be enumerated consequent to the data; the data might limit the
range of actions -- there are many interactions between the two that vitalize
the object in the context of a given application. But there is something
essential, historic, about the continuous record of the "state" of said
object.

The crux of the matter is that we think that those descriptions are often more
stable than the methods impinging on that state. The auto part record changes
more slowly than the inventory, accounting, security, tax, and hazardous-waste
tracking subsystems do. The more methods, the lower the mean time to failure
(new version) for the 'code'; and yet, if we can easily extend the
externalizable state (a new haz-mat bit, say), then the state representation
does *not* fail (get re-versioned) as often. The auto-part's web-page gets
richer and richer as more meanings are aggregated together (especially easy in
XML; especially hard with CORBA/DCOM serialization).

The only essential (in the Brooks sense) difference we can cleave to is the
interorganizational intent. The state description of an artifact is intended
to be understandable by any observer. The behavior is intra-organizational by
default: behavior is only standardized by exceptional law.

Attack it as circular logic, if you please, but we see Data as precisely the
description which any two observers can agree on; that's why you're
externalizing it in the first place.

===============================================================================

That said, we can return to our advocacy of XML already in progress...

[Adam's section]

Doug Lea quoth:
> > We would like to see more use of XML to capture the thing itself,
> > not just the interface to the program which manipulates the thing.
> In the general case, `capturing the thing itself' requires behavior
> description. Right?

Our philosophy is that behavior cannot be totally known to the outside
observer -- that it is too difficult to completely describe a complex
object's behavior outside of some constraints on inputs and outputs.

What we think will happen is that developers will take a scientific
approach to "learning" behavior: since all you can do is observe the
results of the behavior -- the artifacts it leaves behind as snapshots
-- then you'll have to draw conclusions as to what kind of behavior
would generate such results based on those snapshots.

The only way to completely know an object's behavior is either prove it
conforms to a rigorous specification [very hard for complex objects] or
look at the source code and step through it operationally for all
possible sets of inputs and outputs.

If you don't have a rigorous proof, and you cannot look at the code
behind an object's interface curtain, then you have to look at the
outputs produced by that object, and make conclusions from there.

Doug Lea, further:
> I remain clueless about how this is supposed to
> work. Purely declarative approaches to behavior description are
> challenging at best. (As far as I can tell, the remarks that Dennis
> deChampeaux (mostly) and I wrote about this 6+ years ago --
> http://gee.cs.oswego.edu/dl/oosdw3/ch5.html -- still basically hold.)
> More concretely, suppose I want a description of:
> A Water tank
> A Car alarm system
> A Web server
> A Bank
> A Telecom switch
> ...
> How would I go about it in an XML/.../... -based object system?

You would describe the artifacts left by them -- in the form of data
stored or exchanged (usually attribute-value pairs):
A Water tank -- temperature, volume, contents of tank
A Car alarm system -- events dealt with and responses
A Web server -- log of requests and server respones
A Bank -- debits and credits for transactions on accounts
A Telecom switch -- list of switching decisions made and bits moved

Of course, this doesn't address the conveying of the behavior of these
systems. If we take a more data-centric approach, we can instead try to
ascertain behavior from our observations of the system -- the way
physicists, chemists, etc do. This seems a much more promising approach.

[Rohit speaking]

I have to offer my kudos to Doug for introducing a well-written resource to
ground this discussion by citing the book chapter. The truth is that in many
ways, there *isn't* anything whiz-bang about an XML/.../... object system. I
wouldn't quibble over the book's description of a door's state. I'm just
saying I'd prefer to see <FRAME><DOOR ANGLE=37></FRAME> because that snippet
shows:

* building a data schema, a comprable task to designing a class hierarchy.
The data architect has to invest additional time and thought to say that his
House DTD needs to have FRAME with possible subelements DOOR and WINDOW, much
as a CAD designer might have to for the behaviors thereof.
* allowing well-formedness to add value to a common artifact. Later, a house
visualization on top of the door-actuator can add <COLOR> elements to any part
of the house for its own purpose -- without interfering with the
door-actuator's ability to parse the door angle.
* using a human-editable format. It's more conceivable that if my home
automation system was described in such files, I could debug it in EMACS
xml-mode, write scripts to shut all the doors, or so on.

These are engineering tradeoffs of marshalling formats. That's why we argue
for XML over XDR or Q in our paper:

http://www.cs.caltech.edu/~adam/papers/xml/xml-for-archiving.html

Our opinion is that data can be observable without the code, left as a
document representing a checkpoint in the computation.

Think of a mailing list. Each post is an observable checkpoint in that
mailing list, which is both stored and transferred as a document. It
represents a statically viewable slice of an otherwise fluid, dynamic
discussion -- the mailing list "object."

> > Finally, a word or two on the hype behind XML. There's a lot of it. We
> > contribute to it as often as we can. We think it's important to get the
> > word out to the 97%.
> Whoa. Lets look at that statement it little, its a wee be pretentious.
> "Get the word out to the 97%". I'm sorry, but that is INSULTING.

Why is it insulting to get the XML message out to the 97% who have not
heard of XML?

> I'm finishing a Masters degree in this stuff and I've been working
> professionally as a programmer for almost 7 years now and I'm lumped in
> the unwashed 97%. My opinion is just as revelent as they guy who just
> got his first AOL disk just because I'm not in the object church.

When did I say your opinion wasn't relevant? All I meant was, XML can
be useful to some people, so we should get the word out about it.
Remember, I believe in the right tool for the job. The more tools you
know about, the more tools you have available when it comes time to pick
the right tool for the job.

> I have news for you, you have got to work on your evangelizing skills.
> You aren't informing. You are ANNOYING. Professionals in the field
> are not going to LISTEN you with this kind of attitude.

What attitude? We like XML, we want other people to like XML.
What's wrong with that?

Remember, too, Dave, that this is a mailing list debate, not a refereed paper.
And if you understand the Zen of the 97%, you'll recall that no one "is" or
"isn't" in the 3%. It's a point of view on an issue, not a call for the Elect.
The truth is 97% of the world hasn't even heard of SGML or XML; 97% of those
who have heard haven't studied it; 97% of those who have studied it haven't
worried about how to apply it to systems design (not just traditional
"documents"); and 97% of those who have considered its role in distributed
computing state capture haven't worried about the fat of its concrete syntax;
and 97% of the proposals I've seen to slim down the syntax and compress XML
streams don't worry about temporal layout of streamed XML parse trees
(breadth-first for page layout; depth-first for dictionary delivery...)

And I have a hint: there's no Swami Connolly or Guru Bosak or Maharishi
Berners-Lee sitting up on the peak offering enlightenment. Some of us are
gonna have to knock stones together until something catches fire. Or gets
flamed to death :-)

Rohit
(Adam)