In the beginning, Tim spake <ISINDEX>, and it was good. He brought forth unto the world (or at least that patch bounded by the 27-kilometer diameter of the CERN particle accelerator) a new and wonderful scheme for storing all the words and deeds of the multitudes, forever stealing man's innocent pride in claiming idly "Surely there's nobody out there who cares about PEZ dispensers as much as I do." Yea, in that moment we saw the flash from the future of all the possible documents great and small that would one day roam the World Wide Web.
But in the meantime, it made a darned convenient phonebook...
Indeed, the first killer app on the Web was an intranet telephone directory for CERN --- not quite as profound as, say, physics paper preprint distribution. Tim Berners-Lee and Bernd Pollerman had just hooked up a web server to a database query engine to extract employee entries in 1991 as the first "intelligent hypertext" on the Web. By this point, the Web already had its URL syntax, while Gopher had already pioneered the searchable index file. Johnny-come-lately Web, too, needed a prompt to say "This page is a searchable index. Enter a search term: __." And thus it was that an unused question mark rattling around in the URL syntax toolbox was patched on the end to demarcate a search specifier: hence http://host/phonebook?Caillau
That lone empty element <ISINDEX>, then, was the primordial step towards today's interactive Web. But the browser we see today -- "a mere multimedia 3270 terminal" -- was not dealt out of the original deck of Web technology. The modern <FORM> element, bristling with drop-downs and radio buttons and file uploads and text boxes only dates back to Dave Raggett's HTML+ proposals from July-November 1993. But, of all the battle fronts in the Browser Wars, this remained the quietest.
There's been very little innovation here since the initial Mosaic release. Instead, the long Interregnum we're passing through has been delivered a slew of media- and device-specific interaction markup languages for handhelds, voice, paper forms, and so on. This column, then, traces the haphazard evolution of User Interface Description Languages (UIDLs) for the Web to this point; in the next issue we'll detail the challenges facing W3C's XForms effort to reengineer it.
Putting aside the torrent of names and historical claims, let's consider the philosophical limits of a Turing-complete hypertext. Suppose you begin at a page representing a chessboard in its opening state. Clicking on any chess piece traverses a hyperlink to another document, namely another chessboard depicting the result of that move. Is it possible, then, to claim that "You Are Here" at a single point within the near-infinite hypertext of all possible chess games?
What if the computer takes the next move as, say, a CGI script choosing the next link to be traversed. If you lose every time you start browsing this Borgesian "Never-Ending Library" of chess games, are you interacting with a book or a grand master? Is the question easier to answer for the case of Amazon.com?
While the hypertext research community has long recognized the potential for computation over a hyperweb ("What Problem documents are not linked to Solution nodes?"), and within a hyperweb ("Select the link target closest to your hometown") there was less appreciation for computation as a hyperweb in the 80's. Interactive components within compound document architectures were barely becoming practical on desktop PCs at the genesis of the Web. Remember Dynamic Data Exchange (DDE) and, later, Object Linking and Embedding (OLE) for Windows? Or Apple's publish-and-subscribe? Or even more exotic promises from Taligent and NeXTstep?
Instead of interactivity at the client -- a dream that Java applets attempted to resuscitate in 1995 -- Web tools merely submitted input strings back to server-side applications to "compute the next document" in the hypertext. Soon, the portion of the URL after the ? adopted an internal syntax for lists of argument name/value pairs, as in http://host/phonebook?LastName=Caillau&FirstName=R. Clickable imagemaps even went so far as to adopt a convention for naming the x and y coordinates of the click (though client-side imagemaps are the only significant case of server-side processing migrating back in Web history). Over time, the argument vectors grew so long they were split off as the HTTP payload of a new request method, namely POST. Nominally capable of submitting any MIME object (as for file-upload forms), the most common form by far has remained x-www-form-urlencoded. This forces even the simplest Web server application to understand multiple native character sets, SGML entities and quoting, and URL character escapes to extract a single form value.
However, the power and elegance of the Web's remote invocation model (ahem) wasn't the reason it took off. HTML's new <FORM> tag became the first successful cross-platform UIDL, allowing Web developers to code once and see it rendered automatically under a litany of GUI windowing systems and widget sets -- even text-only line-mode browsers. Of course, the more common the denominator, the lower it must be. Thus, we still have pop-up menus, but not sliders or drag-and-drop, to say nothing of a dynamic purchase-order template that could 'grow' new line-items on the fly.
But before we go off designing übersolutions, though, let's trace how we got here. As the emerging HTML community coalesced on the www-talk mailing list and hammered out HTML+ (which became HTML 2.0), NCSA Mosaic ran away with the show. After all, running code is at least as important as rough consensus, and the NCSA team built off a rich-text view object for X-Windows that made it possible to roll out new <INPUT> types easily. By the January 1994 Mosaic 2.0 release, it defined the modern web "widget set" as text areas, fill-in-fields, checkboxes, radio buttons, pick-lists, and pop-up lists, not to mention a now-lost input type called 'scribble' for pen-drawn images in the Jot media type (remember Go, Eo, and Windows for Pen Computing?).
In lieu of new INPUT TYPEs, the baton of innovation passed on to scripting languages and cookies. The demand for even minimal client-side input verification and assistance inspired what became JavaScript (however minimal its connection to Java proper). Rather than looking to include spreadsheet-like declarative formulae, the trend was to Turing-complete programming languages running within the context of the web page & browser. That, in turn, highlighted the need for state management mechanisms to refer to earlier transactions. In the earliest days, HIDDEN input values were used to convey the entire state back and forth within a continuous exchange of forms, but cookies identified browsers over very long periods of time since cookies are sent with each access to a given domain. This was especially important as dialup consumers with dynamic IP address allocation obsoleted customer tracking by IP.
The net effect was the Web's reshuffling of the long-standing Model-View-Controller paradigm for UI development. Consider W3Kit, developed at the late Geometry Center of the University of Minnesota back in 1994. Paul Burchard's system migrated a desktop GUI application that tiled user's sketches in 2D space onto the Web (Kali-Jot) by using an inline GIF for the View, the browser's scribble input as Control, and hidden fields to pickle the Model. As users changed the algorithm's parameters, each action submitted the form back to the server, which repopulated a fresh "real" Model object -- the one with the mathematical algorithms -- and re-rendered the new application view to ship back to the user.
W3Kit defined the genre of split-interface Web applications to this day. Web browsers became a very basic virtual machine for rendering Views (as HTML documents), with only a few Controls. The Web server maintained the Model, working around HTTP's stateless submission of form input elements by passing along the model's context by value (in hidden fields) or by reference (via cookies or HTTP login).
Since then, innovation has been limited to layout, usability, and a very few new INPUT TYPEs. Visually, forms have evolved alongside the rest of HTML with new fonts, colors, spacing, and alignment capabilities from Cascading Style Sheets (CSS). Structurally, HTML 4.0 introduced a few additional decorations to aid usability and accessibility for alternate platforms. Tags like LABEL and LEGEND help screen-readers associate controls with explanations, while OPTGROUP dividers distinguish presentation of menu segments. Furthermore, TABINDEX allowed designers to specify a default navigation order for filling in fields, as well as Focus and Blur event handlers when the user enters or exits the context of an INPUT element.
But there has been only one successful new input widget, the file-upload extension. RFC 1867 defines a new MIME type for POSTing forms, multipart/form-data. Each file uploaded is included as a separate MIME body part after the existing (still URL-encoded) data fields. Of course, this runs into the same problems that made file-PUT problematic enough to warrant the Web Distributed Authoring and Versioning (WebDAV) extensions to HTTP, including forcing the server to silently shut off clients posting larger files than it is prepared to receive. More typical is the case of <KEYGEN>, a deservedly-obscure Netscape-specific input element instructing the browser to generate keying material and public-key pairs for certificate submission.
The most confounding case, though, is the inability to accommodate forward-incompatible input types. Consider moving the HTTP username & password dialog box from being a dialog box put up by a broswer to being part of an HTML form, thus allowing the Webmaster to control its presentation and explanation. A September 1998 submission to W3C [authform] called for five new parameters: "AUTHUSER, AUTHSECRET, AUTHLOAD, and AUTHUNLOAD, and a SELECT element with the special type AUTHREALM." But:
[Using FORM] left open the possibility that such a form might be sent to the server as a GET or POST, exposing the credentials. Since AUTHFORM will not be understood by existing software, the various INPUT elements should not be rendered as a form, and this problem does not occur. A similar case might be made for using new elements where this proposal uses types on INPUT elements."
In fact, "innovation" in web interactivity (if it can be called that) has proceeded outside of HTML Forms entirely: VoxML for voice browsing, Wireless Markup Language for cellphones, Java applets, and even custom UI markup languages based on traditional paper forms automation formats.
In other words, FORMs recapitulate the central dilemma XML began with: How can we introduce new ontologies (tagnames, input types) and know their grammar rules (document type definitions; form input validation) while easing interoperability and reuse of domain-specific ontologies, tailored to particular applications, media, or devices?
So in due course, the World Wide Web Consortium has applied the same solution pattern. Now that basic HTML 4.0 capabilities have been translated into an XML-validatable series of modules as XHTML, the MarkUp Working Group has established an XForms subgroup to propose further innovations. Their initial goals include:
- Support for handheld, television, and desktop browsers, plus printers and scanners
- Richer user interfaces to meet the needs of business, consumer and device control applications
- Decoupling data, logic and presentation
- Improved internationalization
- Support for structured form data
- Advanced forms logic
- Multiple forms per page, and pages per form
- Support for Suspend and Resume; and
- Seamless integration with other XML tag sets
Naturally, there are already several contending entrants for the title of 'next-generation forms markup language', so the group's first product was a requirements statement (see Table 1). As their public overview page at http://www.w3.org/MarkUp/Forms/ suggests, these requirements separate into three layers of concerns (although what follows is my own opinion and analysis, not the group's). First, the Presentation layer addresses rendering of interactors, whether as GUI widgets, voice prompts, or fax-back paper forms. Second, there is a Logical layer govering the order of form field fill-in, multipage and sequenced forms, and scripting for input validation. Finally, the Data layer adds more structure and coherency to existing text-string-only values by reusing other schemas.
Interoperability and Accessibility Separation between purpose and presentation |
Presentation Alignment with existing and emerging presentation mechanisms |
Forms Logic Field calculations Interaction Richer client/server interaction mechanisms |
Internationalization Support for various languages and character sets Data types Input validation |
Table 1. Excerpts from the W3C XForm group's XHTML™ Extended Forms Requirements Working Draft |
It's hard to believe it's been over six years since HTML Forms were invented. Six calendar years, not Web years! But the dream hasn't died: the effort to smarten up forms is just the latest incarnation of the Software Engineering vision of automated interface design. The kinds of issues XForms faces are grounded in past SE research at each layer -- and the same kinds of traps and ratholes. Furthermore, the challenge is not just to automate human interface construction for the universe of Web-accessible devices, but to create automatable interfaces suitable for programmatic reuse on the "Semantic Web", as W3C terms it.
In the next issue, we'll recap the theoretical roots of the problem and present a detailed analysis of serveral entrant's divergent strategies: Formsheets as declarative extensions to interact with any tag; Forms Markup Language to generate procedural forms, and XML Forms Description Language to replicate the role of paper precisely. And of course, we'll also have to ask what makes us think radical innovation will actually be adopted by the Web community -- or Is Worse Really Better?