There's a new space race of late: a quest to build the world's smallest Web server. The current record holder is the size of a match-stick head. iPic is a mere quarter cubic centimeter, yet includes a full TCP/IP stack and HTTP server!
But what about the world's thinnest Web client? Would you believe less than .01 millimeter thick? Xerox PARC has turned an ordinary sheet of paper into a functional Web browser. They recently demonstrated Web access through a fax machine. In their demo, the client takes a regular HTML form, prints it out with gridlines and checkboxes for its input fields, faxes it to a field worker, applies Optical Character Recognition (OCR) to the filled-in-form, submits the resulting HTTP transaction to the original website, and faxes back the printed results.
This little hack has its limits, of course. Consider using it at a travel website that 'helpfully' offers hundreds of airport locations in a pop-up list for the origin. And another copy of the same list for choosing the destination. Putting aside the wasted bandwidth of transmitting the world airport database (twice!), there's no way for this poor fax-back translator to recognize it should just give up and replace this pages-long pick list with a three-letter airport code input.
As I laid out last issue, today's Web FORMs are hopelessly tied to the original GUI of NCSA Mosaic for X Windows, circa 1994. That Xerox's "thinnest client" works at all is due only to the 2D graphic abstraction it shares with current GUI browsers. Stray much further from the Windows, Icons, Menus, and Pointer (WIMP) paradigm, and HTML FORMs fall over and can't get up.
For example, one of the standard canards of our Wonderful Twenty-First Century™ is that more people will soon access the Web from a cellphone than from a PC. That certainly could be true but not by dint of compressing a WIMP interface into a four-line display!
Even more people could access it from an ordinary phone by Interactive Voice Response (IVR). But how would our robot concierge know what order to inquire for the origin and destination airport? Even more pointedly, how will it realize they cannot be the same airport?
Designing completely abstract user interfaces for the Web requires addressing three separable aspects: Presentation, Logic, and Data. Our virtual assistant needs to know how to 1) prompt the user, 2) do so in a specific order, and 3) recognize spoken or typed entries as valid airports. The first layer, Presentation, addresses rendition of interactors, whether as GUI widgets, voice prompts, or paper blanks. Second, the Logical layer governs the order of form field fill-in, multipage and sequenced forms, and scripting for input validation. Finally, the Data layer adds more structure and coherency to existing text-string-only values by applying richer schemas (types).
This kind of coordinated evolution is precisely the mission of the World Wide Web Consortium, whose XForms Working Group (WG) is tackling these interdependent issues. While XHTML™ brought existing HTML 4.0 usage into XML compliance, XForms was specifically chartered to innovate solutions to support handheld, television, and desktop browsers; deploy richer user interfaces to meet the needs of business, consumer and device control applications; improve internationalization; and decouple presentation, logic, and data. It also has more concrete engineering goals: supporting more structured data formats and multi-page forms; integrating well with other XML tag sets; and supporting suspend-and-resume of partially-filled-in forms.
Broadly construed, the XForms subgroup is tackling a long-cherished dream of Software Engineering: automatic user interface construction. Compiling an abstract functional interface into a working UI has been tackled in many ways; stepping back to understand that context will help us better evaluate specific XForm contenders. Specifically, we'll look at proposals for Formsheets, which add interactivity to any existing tag just as stylesheets add presentation hints; Forms Markup Language (FML), which generates procedural forms; and XML Forms Description Language (XFDL), which replicates the role of paper forms precisely. Whether the whole Web will be upgraded to any of these approaches is another question entirely
The same write-once-run-anywhere rhetoric championed for the Java Virtual Machine (VM) applies to entire Web browsers as well. While the domain of discourse is pixels in the former and HTML INPUT elements in the latter, both are late entries in a long timeline of user interface VMs. X widgets, the Motif toolkit, the NeXTstep AppKit, the Macintosh Toolbox these are only a few examples of User Interface Management Systems (UIMS) offering an abstract interactor set to software developers. By the early 90's, UIMS research abstracted one more step above them to offer multi-toolkit interoperablity. Tools like OPENSTEP or UC Irvine's Chiron-2 system bound virtual interactors to toolkit-specific peer objects on the fly. Allocate a scrolling text pane, and such meta-toolkits would bind to whatever the local window system's conventions were (left or right? Proportional or fixed? Pixel-at-at-a-time or line-at-a-time?).
Accessibility concerns drove complementary research that inferred presentation rules from actual renderings. William Gaver's SonicFinder (1989) added auditory feedback to mouse gestures in the Macintosh Finder interface. Even more ambitious, Georgia Tech'sMercator (1991-4) system automatically transformed X event streams into interactive auditory interfaces for the blind. Today, Everypath.com could also be cast in the same light, by applying an intelligent external model to interpret a stream of web pages for phones, pagers, palmtops, and television. As theoretical grounding for such inferences, CMU professor Brad Myers famously proposed seven fundamental affordances of mouse-and-keyboard direct manipulation GUIs in his Interactor Model (1990):
Web browser FORMs today provide only two: text and menus. To this day, HTML doesn't offer sliders or other continuous range pickers. At the other extreme, it already hard-codes distinctions between pop-up and pick lists -- distinctions that can't be distinguished in voice or paper renderings! As we discussed last issue, HTML 4.0 and the latest User Interface extensions to Cascading Style Sheets (CSS3-UI) do patch up around the edges of this model. For example, form authors can now explicitly articulate the order to tab between fields; indicate LABEL text associated with a particular input control; and can change appearance on gaining or losing user focus as the "active field".
XForms has an opportunity to raise the level of discourse for Web UIs to reason not only about the affordances of GUI interactors, but also, in conjuction with the Web Accessibilty Initiative (WAI), to accommodate many other limited-interface situations. That means designing XForms for a UI virtual machine running on everything from cellphones to TV screens -- and, crucially, invisible systems, without humans in the loop at all. Forms, after all, are becoming the default Application Programmer's Interface (API( to Internet information. Tools like webMethods' Web Interface Definition Language (WIDL) allow new applications to reuse, say, FedEx's package tracking form. Providing richer interactor specifications is like annotating a header file to aid program reuse. Such specifications can help infer the range of legal inputs and expected outputs or exceptions that could be raised.
Abstracting up one more layer brings us to a discussion of input sequence, validation, and state management. Most Web forms are embedded in a larger process: selecting the city pair is only the first step in a series in order to buy an airline ticket. Furthermore, the Web model splits some of the processing for input validation with the client, using scripting languages and the Document Object Model (DOM) APIs. That at least allows some fields -- for example, sales tax -- to be calculated on the fly.
Validating that a three letter combination is indeed an airport code, on the other hand, can only be done by constraining the choice through a massive popup list, or by sending it back to the server for verification in a multi-step Web transaction. To date, the only way to manage the state of such a partially-complete form (if we cast the entire multipage airline reservation as a single XForm) is to send the entire state of every input field back to the server every time.
Beyond tracking the logical dependence between individual data elements, though, the Xforms WG aims to mark up the presentation dependence of groups of data elements. This will allow browsers to present multipane, tabbed dialog boxes, or multipage forms from a single XHTML transfer. Voice browsers could use this information to disambiguate "barge-in" speech recognition when the user starts "filling in" a field before the voice prompt or menu is completed. Knowing about field subgroups could also allow interactive validation, such as sending a completed Zip Code field back to the Web server to fill out the companion City name field.
Suppose I'm ready to submit my airline reservation. I've used the XForm to construct an XML document containing groups of fields I've filled in; perhaps even a few inputs in the airline's own specific XML namespace. Can I expect to send the subpart representing my itinerary to my friend without also including the credit card portion? At the same time, the airline may expect this entire form submission to be digitally signed to ensure that we agree on the exact specifics of the ticket I'm about to buy?
These are questions that require inference of the actual data types in use. Today's HTML FORMs reduce every kind of input type to a text string. Dates, prices, addresses, names all illusions created by the page's author with natural (human) language. XForms will need to interoperate with other mechanisms to teach computers what various piles of XML might actually "mean." The XML Schemas effort is pinning down some concrete forms for encoding basic data types (integer, float, time, etc) and basic grammatical rules ("Every<ADDRESS> must contain a <POSTALCODE>"). Completing abstractions such as "Reservation," though, calls upon even more sophisticated metadata management. Resource Description Framework (RDF) is the technology W3C looks to for encoding semantics such as "origin and destination airport cannot be the same."
When a form designer can use this data layer to clearly indicate the type of input required (beyond just naming the field something heuristic like "expiryDate"), then it's also clearer where to annotate various inputs as 'secure.' Just as we classify cookies into two security classes today, we can then ensure parts of forms only flow over secure or public network connections.
This three-layered vision fulfills the Software Engineering dream of automatic user interface management. The literature relating to this dream dates back to the days of automating screen layout for text terminal access to mainframe databases and up through gesture recognition by demonstration for virtual reality environments. However clearly programmers can "see" the logical structure of the application and the role of user-supplied inputs at each stage, reducing that lattice to a clear sequence of commands and a considerably simpler end-user model of the process remains a painstaking trial-and-error proposition.
Not for lack of trying, though. The rise of WIMP GUIs in the 1980s arguably drove the commercial adoption of event-based, object-oriented programming as well as frameworks embodying both declarative and model-based UI development methodologies. First, the Mac popularized the event loop, putting the user truly in control of the program. Once rewritten as a series of event handlersonMouseDown, onKeyDown, and so on, it was a short hop to the object-oriented lessons of Smalltalk-80 and thence to C++, Objective-C, Common Lisp, and the rest.
Developers using the Model-View-Controller (MVC) pattern leveraged platform-specific Control and View widgets, as best embodied by NeXT's AppKit. Using its InterfaceBuilder, developers could visually wire a program's Model methods to controls such as sliders and buttons. The act of drawing a link to a target object and the action to be performed upon it declared a relationship that was stored along with layout geometry into a UI layout files. Separating the "program" and its UI thusly, even end-users could go back and edit the GUI of published applications (to localize it, for instance, or add keyboard shortcuts). Advanced research tools of this ilk could even apply externalized UI style guidelines and constraint-based layout engines to automatically synthesize, evaluate, and select dialog designs.
The Common Object Request Broker Architecture (CORBA) was supposed to be the revolution after OOP languages. Its Interface Definition Language (IDL) abstracted away the details specific to particular OO languages, operating systems, processors, and network topologies. The new dream was to cleave the programmers' and UI designers' lives at that interface. Suitably annotated IDLs would not only indicate how tosetOriginAirport(), but also that it was to be invoked before setDestinationAirport() and the parameter itself was a typed IATACode string three characters long.
In the early 90s, Pedro Szekeley's group at USC's Information Sciences Institute built MASTERMIND along these lines. It combined the utility of prior dialog design tools with annotated interface definitions to automatically synthesize graphical input and presentation for a given application. As they described it:
In the model-based paradigm, developers create a declarative model that describes the tasks that users are expected to accomplish with a system, the functional capabilities of a system, the style and requirements of the interface, the characteristics and preferences of the users, and the I/O techniques supported by the delivery platform. Based on the model, a much smaller procedural program then determines the behavior of the system.
There are several advantages to this approach. The declarative model is a common representation that tools can reason about, enabling the construction of tools that automate various aspects of interface design, that assist system builders in the creation of the model, that automatically provide context sensitive help and other run-time assistance to users.
If the NeXTstep AppKit used by Tim Berners-Lee to develop the first Web browser could be said to underlie today's HTML FORM tag, Mastermind's complaints also ought to ring true to today's Web authors:
Most applications have interface requirements that go far beyond the menus and dialogue boxes that can be constructed using interface builders:
- Data with complex structure
- Heterogeneous data
- Variable amounts of data
- Time varying data
A musical notation editor is a fine example of all four objections: the complex visual form of a staff and its unique fonts; the different kinds of notes and their interrelationships (e.g. chords); in several kinds of data formats and the need to incrementally view a few bars out of a whole database; and the synchronization of the melody as symbols, commands to the synthesizer, and the output waveform. It's all quite beyond the range of even a fifth-generation Web browser, to say nothing of the additional assistance model-based UI tools offer in automating Undo, Help, and Internationalization facilities.
Not to say that XForms are intended to compose symphonies inside a Web browser! There are several candidate technologies for the WG to choose amongst, none of which have the expressive power to tackle that musical UI problem. We can still use it as a guide to understanding the various approaches on offer.
With a custom XML tagset for musical scores, an separate XML Stylesheet Language Tree (XSLT) transformation could render a graphical interface, while Formsheets would indicate which elements of the score were editable and would submit collected score changes back to the server.
If that seems too abstract, both XML Forms Description Language (XFDL, by PureEdge.com) and XML Forms Architecture (XFA, by JetForm) start with a detailed visual representation mirroring paper forms and add sophisticated formulas, logic, and digital signature security.
Isn't it convenient that the World Wide Web Consortium is doing all this heavy thinking for us? Perhaps -- if the XForms WG has goals clear enough to ever converge on a solution. True, they're not going down the rathole of "representing GUIs in XML," as XML User interface Language (XUL) does for Mozilla's own look & feel. But pursuing the dream of cleanly separating Presentation, Logic, and Data across the wide, barren plateau Software Engineering research has already mapped out could be equally futile.
One of the only lessons a degree in Economics is good for is that there are no $20 bills lying on the sidewalk. If model-based user interfaces were such a great idea, we'd already be using them. The XForms WG is struggling for clarity because it is trying to standardize and innovate simultaneously, a difficult balance indeed for an organization chartered to "Lead the Evolution of the Web."
And evolution proceeds by fits and starts -- the sheer list of yet other W3C technologies XForms must account for! XHTML Modularization, XML Schemas, Web Accessibility Initiative, Internationalization, Style Sheets, Synchronized Multimedia, Scalable Vector Graphics, Document Object Model, Common scripting languages (ECMAScript) -- it's hard enough to keep score on the home game even before the committee tackles newer mandates, such as synchronizing form data among multiple devices or digital signature requirements.
Ultimately, the power to migrate to a new forms language is in Web authors' hands -- and if hand-coding a new-fangled XForm requires learning even a fraction of all these technologies simultaneously, it can't get anywhere. All the browser support in the world isn't going to make some of these approaches any more legible to an HTML hacker.
It's hard to believe that technology so central to the Web's success could be so static. Jim Whitehead recently presented an analysis of how the Web outstripped other hypertext tools in the early '90s. Its success was governed by the Network Effect: the increase in utility of the whole system with every new reader and publisher who chose to use HTTP, HTML, and URLs. Open publishing, decentralized control, anonymous surfing; it would appear that freedom (bordering on anarchy) was the Web's fundamental difference compared to HyperCard or Xanadu. Instead, Jim argued "Once Gopher and the Web came into direct contact, the richer content of the Web was far more capable of generating network effects than the more strictly controlled, yet more simple Gopher user interface."
That is to say, the Web won because the dominant GUI browsing idiom controlled the user experience so thoroughly that authors could expect to use the same fonts, layout, color, and input widgets across every platform from workstation to wristwatch. Mosaic was surely richer than Gopher, but it has proven just as tight a straitjacket around user conceptions of how to interact with this medium.
Sounds to me like an opening for the Next Big Thing