An extensible model for real-time XML processing
Dan Rosen
Find


Abstract
One of the uniquely advantageous aspects of XML is the separation of XML syntax from the semantics of XML documents. Standards such as XHTML1, MathML2, XSLT3, XLink4 and others all rely on this separation. The XML specification intentionally leaves this open, so that any language described in XML (thus inheriting the common XML syntax) may have its own unique semantics, behavior, or functionality. Most XML applications, such as those mentioned, require special processing: for example, XHTML elements are rendered, MathML expressions are rendered or evaluated, XSLT transformations are executed, and XLinks are reified or embedded. This processing is performed by software programs conformant to each particular XML application.
With the recent growth in the number of XML applications and conformant software, many common problems have begun to present themselves. Most software applications (web browsers, for example, which are capable of processing XHTML) use home-grown software engineering techniques to translate from XML content to program behavior. The lack of uniformity between these various techniques is only a minor problem, however. The major problem is that, since all XML applications share the same syntax by definition, there is no fundamental difference between XML applications visible to a processor without prior knowledge of each particular application. It's safe to say that a processor of one XML application will have knowledge about that application, and it's relatively safe to assume that a processor of two (possibly related) XML applications will have knowledge about both applications, as well as well-defined interactions between them (for example, web browsers will soon be able to handle XHTML with embedded SVG5). But if a processor of an arbitrary number of XML applications must assume prior knowledge of all the interactions between each (for example, XHTML with SVG, embedded through XLinks, transformed through XSLT), the programmers of such a processor would have a nightmarish job. Conformance to one application has proven a difficult enough task for programmers in the past; conformance to several in the same software would prove impossible given current techniques.
This paper will detail current work on XML processing models, a new approach to this problem, combining conventional wisdom from both the markup and the software engineering communities. Variations on abstract processing models based on principles of semistructured data will be discussed, as well as various techniques to implement these models in software. The concept of an extensible, real-time processing model will be introduced, and finally, as this is a very young field of research combining two historically distinct disciplines, many open questions will be posed.

Keywords

Contents
  1. Introduction
    1. Vocabulary
    2. Outline
  2. Abstract processing models
    1. Motives for developing abstract models
    2. XSLT transformation reversal
      1. Preserving transformation traces
    3. First attempts at processing models
    4. The general abstract processing model
      1. Problems and open questions
  3. Processing model implementations
    1. UMLUnified Modeling Language conventions
    2. Object model
    3. Basic implementations
    4. Advanced implementations
    5. Notes on implementations
  4. Future work, and the mythical, extensible real-time model
  5. Acknowledgements

Introduction
When two different communities or fields merge, as the markup and software engineering communities have, there is always an extended enlightenment period prior to the solidification of the new field, as each learn each other's principles and begin to apply them. This enlightenment period is generally accompanied by a significant amount of frustration, as the vocabularies, concepts, and axioms of each group collide. In recognition of this (and for this paper to make any sense to its audience), some standard vocabulary and other preliminaries must be assumed.
Vocabulary
Application"Application" refers specifically to an XML document type. For example, in the MathML specification: "MathML is an XML application for describing mathematical notation."
Software Application"Software Application" will represent our use of the word "application" in the XML specification: "It is assumed that an XML processor is doing its work on behalf of a module, called the application." This distinction is necessary to disambiguate between XML document types (or, more abstractly, modes of employment), and specific programs which handle these document types.
Semantics"Semantics" is a well-defined term already, but for clarity, it is the set of intended meanings and behaviors of a type of document and its constituent pieces. For example, the semantics of an XHTML <p> element include the basic ramifications of just being an XML element, the rendering behavior in the context of different browsers and stylesheets, the significance of being a paragraph, and so forth. The semantics of XHTML as a whole are defined in these terms.
InterpretationLikewise, "interpretation" of semantics simply refers to the correct processing of behaviors or recognition of meanings in the semantics of an application. In other words, a conformant software application is one which correctly interprets the semantics (correctly rendering XHTML, for example).
Object ModelAn "object model" is a representation of data, which may be either conceptual or in software. An example of a conceptual object model for XML is the Infoset6, and an example of a software object model is the DOM7.
Processing ModelFinally, a "processing model" is a scheme, either conceptual or in software, for interpreting the intended semantics of one or more XML applications. A processing model uses an object model to interpret XML data.
This is, of course, a broad definition in many ways, and leaves a lot up in the air. A lot of definitions need to be filled in. When multiple applications are present in a single document (XHTML with embedded SVG, for example), is that considered a unique application? When two applications could be processed together in several different ways, i.e. when there is ambiguity, how is that ambiguity resolvable? These are the type of questions which will be approached by this paper.
Outline
This paper will make a stark distinction between abstract (conceptual or theoretical) processing models, and models implemented in software. In reality, the implementations are tied directly to the theoretical models, one being of little use without the other; but until more vocabulary and experience become common to both the markup and software disciplines, both will be easier to comprehend if a certain "safe distance" is maintained between each.
Previous Previous Table of Contents
Abstract processing models
In recent months, a significant amount of thought has been put into the development of abstract processing models. It is useful to consider abstractions in general when we don't want to let ourselves be bogged down in the details and quirks of concrete implementations, when we want to force ourselves to the rigor which theory and mathematics demands, or when we're bored. All three are valid reasons (although the latter is more commonly solved in front of the television), however, the development of abstract processing models for XML is out of necessity. The range of real-world applications of XML is becoming so broad, and the complexity so high, that the theoretical limits of XML processing are being pressure-tested.
Motives for developing abstract models
The complexity of XSLT is the main motive for developing abstract processing models. It is a Turing-complete language, meaning it is of the highest computational complexity we know of. This has many implications, but there are two main, concrete concerns:
XSLT transformation reversal
The concept of reversing, or tracing back XSLT transformations turns out to be the solution to both concerns. The problem of comparing the same document with different transformations yields this solution, but the problem of identifying semantic and syntactic breakage can also be solved in this way, by tracing transformations back to the source, to compare and see whether anything in the original source became broken in the result. And although the computational complexity of XSLT does not permit us to perform this operation given only the result and transformation documents, we can be successful by keeping a bit of extra information around.
At every step of XSLT's processing, there is a context node in the source document, and a template rule which is applied to that context node to produce a result tree fragment containing one or more nodes. Where the data appearing in the result tree fragment (whether it be directly from the context node, from a combination of nodes, or entirely generated) is entirely irrelevant; the point is that given the combination of context node, template rule, and node in a result tree fragment, the three are unambiguously related. This relation may be recorded in a variety of ways, one of which will be described shortly, but the key point is that with this mapping, a node in a source tree may be re-located given only the node in the result tree.
Preserving transformation traces
Steven DeRose, of the Brown University STG, and the author, propose a method of using XLinks to preserve the transformation relationships. As XSLT is processed, the object model stores the relationships and then serializes them in this form:
<!ELEMENT transform-triple (source, template, result)>
<!ATTLIST transform-triple
xlink:type (extended) #FIXED "extended"
>
<!ELEMENT source EMPTY>
<!ATTLIST source
xlink:type (arc) #FIXED "arc"
xlink:role NMTOKEN #FIXED "source"
xlink:to CDATA #REQUIRED <!-- URI reference to node, -->
<!-- possibly including XPointer -->
>
<!-- (template and result elements defined similar to source element) -->
For example:
<transform-triple>
<source xlink:to="source.xml#sourcenode"/>
<transform xlink:to="transform.xsl#templatenode"/>
<result xlink:to="result.xml#resultnode"/>
</transform-triple>
This is a simple, lossless serialization of the relevant information, but is a bit naive. Although it inherits the robustness of the XLink model, it requires a lot of processing to be of any later use. Also, template parameters must be associated with the template role (they may be traceable from the context node in the source, but could also be specified explicitly using something like <!ELEMENT template (param*)> in the definition), and transformations may result in zero or more result tree nodes (which may be represented by "null" or by the root of the fragment, but this represents additional complexity). This is, however, useful as a proof of concept. Also, extensions could be envisioned, whereby any number of template-result pairs beyond the first two could be added to signify transformation chains.
More complex, and potentially more efficient solutions exist to preserve information about transformation processing. These solutions are typically geared towards specific applications, but some scale surprisingly well. One notable such solution, proposed by Chris Maden and Deborah Hooker of Yomu, is the creation of a new XPath extension function:
result-node(URI-reference-to-stylesheet-node[, param-name, param-value]*)
This leverages pre-existing implementations without significant overhead, without the complexities of defining a new XML application, and so forth. This solution is described in further detail at http://www.yomu.com/xtech2000 (URL subject to change).
Detailed treatment of this and related methods for preserving transformation trace information will be omitted, but it is important to note the role of transformation tracing in the development of abstract processing models.
First attempts at processing models
Given the ability to trace documents across transforms (however we choose to implement it), thus solving the major problems mentioned, we have the basis for a simple processing model:
Although only a first attempt, this processing model has some obvious advantages. The most clear advantage is that the problem of semantic breakage has been entirely solved; we need only specify the semantics to be aware of during the checking phase, and each application's semantics may be specified independently of one another with no worries of unforeseen interactions.
Taking this basic model further, we notice that XSLT isn't the only transformer of documents. XLinks are also capable of transforming documents to a degree. XLinks of type show="embed" may transform documents during processing as well. Whether this should truly be considered a transformation in a sense similar to XSLT is a question of much contention, but if desired, we can accommodate this:
This manages to close up some loopholes in our basic model, but opens up some further ones. Most importantly, in which order should processing take place? Should content embedding occur before XSLT transformations, or after? Should embedded content undergo processing in the context of the original document after it is embedded, or in its own context before embedding? The most sophisticated answer we currently have is:
This is an incredibly robust solution for the combination of XSLT, XLinks, and any other XML applications not involving transformations; since it builds on the basic advantages of even our most naive abstract processing model with some useful knowledge about the specifics of XLink. Paradoxically, it turns out to be a somewhat weak solution for combinations not falling into this convenient category. In the cases where other mechanisms are transforming documents, this model is still too ambiguous and does not suffice.
But what other mechanisms do we have presently to transform XML, besides XSLT and XLink? None, as far as XML applications go. But there are a growing number of software applications which let users directly or indirectly manipulate (effectively, transform) XML documents in real-time. And more XML applications could certainly be envisioned to do arbitrary transformations as well. Can a theoretical model be applied to this, the most general case?
The general abstract processing model
We can modify the wording of our previous processing models to use more general terminology. Neither are XSLT transformations particularly special, nor those associated with XLink embedding, nor any other transformation. They all fall under the category of general transformations of semistructured data, and there is a limited number of basic transformations (create, move, delete) which may be composed to arbitrary complexity.
Since general transformations can be expressed in terms of rules, in the same way as XSLT transformations are expressed as template nodes, the same source-rule-result triplet may be maintained in the general case, and the same back-tracing may occur.
The general abstract model, then, is as follows:
This model, with perhaps a few tweaks, is theoretically the strongest processing model possible given the semistructured data model. The trouble is, though, that it is now too general. There is ambiguity in processing order: for example, if both XLink embedding and XSLT transformations are regarded as transformation functions t(x), which of the two is supposed to be processed first? The prior processing models assumed XLinks would be processed after XSLT, but is that a safe assumption?
Problems and open questions
The answer is, of course, no assumptions are safe. The model is provably sound, and it accomplishes the goals intended of an abstract processing model; but to assume anything about the workings of the model in order to make it work for one case, is to break it for ten other cases. It will likely be determined necessary to define a specification language for the processing of XML applications, which will fill in the blanks in the processing model on a case-by-case basis. The need for this sort of extensibility should be of no surprise to those who understand the need for extensibility in XML, but the specifics must be left to further research.
To summarize the open questions about abstract models:
Previous Previous Table of Contents
Processing model implementations
One of the key principles to modern software engineering is the zen of object-oriented design. Object-oriented design stresses breaking programs into discrete objects, each of which is responsible for a set of operations and a set of data. These objects are additionally responsible for protecting (even hiding) their data, so that neither malicious users nor careless programmers may misuse it. Objects may represent physical objects, conceptual objects, encapsulate algorithms, aggregate other objects, act as intermediaries between other objects, and so forth.
We will discuss implementations of xml and its models strictly using the object-oriented paradigm, since practically all implementations are expected to be either in object-oriented programming languages, or in other languages using an object-oriented discipline.
UML conventions
For those unfamiliar with UML, a visual language for diagramming object-oriented systems, the following conventions are used:
named boxesobject classes (italics imply an abstract class)
linesrelations or associations between classes
trianglesinheritance relations (class with triangle pointing to it is the parent)
diamondscomposition relations (class with the diamond pointing to it is required for the composition of the class it relates to)
arrowsrepresent traversability (in association or composition relationships)
numbersrepresent cardinality of objects (instances in association or composition relationships)
Object model
For the sake of simplicity, we will assume a very straightforward object model, based on the 20 December 1999 working draft of the Infoset6. The only slight complexity in the model is the Position abstraction, which expresses the commonality between all information bearing the notion of position within a text string.
Figure 1 . Infoset-based object model
Basic implementations
Given a simple object model for XML, we can now begin to create an object-oriented framework for processing that XML. Note that this framework is independent of the steps or ordering in the abstract models posed, but is based on the concepts in those models. The challenge in designing this framework, as with the challenge in all object-oriented design, is to pick objects which will represent the concepts appropriately, while still allowing for flexibility.
The simplest framework we can imagine is one where the document is represented in one object, and the semantics and behavior in another:
Figure 2 . Basic implementation design
Although the concepts have all been covered by the design, we can pick out some immediate weaknesses due to its simplicity:
We can combine these two issues into a cleaner, safer, second version of our processing model framework:
Figure 3 . Basic implementation design, revisited
We have now split the semantics of an XML application into two classes, which we call XMLProcess (to signify the operations on a document when processing a particular application type) and XMLBehavior (which codifies the interactions between the XML application and the rest of the software). The XMLProcess class also serves to protect the Document from malice or accident. In fact, this basic model is sufficiently robust to handle most of the complexities present in the general abstract model described previously.
Advanced implementations
The basic implementation we now have serves as an excellent abstraction for the concepts in the general processing model. We can attach one or more "processes" to a document (representing, for example, XHTML with embedded SVG), and may attach zero or more "behaviors" to each process (representing the semantics of an application, as defined in "Vocabulary" ). There are, however, inefficiencies in the model in some contexts.
Whenever the XMLProcess class attempts to extract information from the document tree, it must either do exhaustive searching through portions of the tree, or must store references to salient features of the tree. Either way, the space-time tradeoffs just amount to gross inefficiencies: either a real-time application will be slowed down enormously by the tree traversals, or the target system will be required to have large amounts of memory.
Taking the concepts from our basic models and applying some simple design patterns yields better results. We will go through a three-step process to convert the framework from a document basis to a node basis:
First, we would like all elements in our object model to inherit some core properties from a generic XMLObject:
Figure 4 . Inheritance of nodes from XMLObject class
We now re-create the abstraction of our basic implementation, with our new XMLObject as the core:
Figure 5 . Replacing the Document with XMLObject
Finally, since it is inefficient to store an overabundance of XMLProcess and XMLBehavior objects in memory, we split these objects into flyweights (separating the algorithms from the context):
Figure 6 . Final efficient implementation
A variety of techniques may now be used to provide safety and organization to the resultant structure; the simplest would be to provide a subclass of XMLProcess to attach to the Document node, as before, which would marshall the entire process.
Notes on implementations
Although the final implementation we developed for real-time processing is efficient, and conceptually relatively simple (in that it contains very few object classes, all of which are predictable and flexible); it is obviously a challenge to properly implement, especially in the presence of a chosen abstract model.
Fortunately, in some cases, where real-time processing is not a concern, or efficiency is decided not to be a concern, we can gracefully choose the basic model (see Figure 3).
The most important factor in designing the implementation is that it remains independent, and does not restrict the algorithm specified in the abstract processing model. The algorithm itself should be specified in one class (or an encapsulated group of classes) and should theoretically be swappable; this is a basic concept in object-oriented design.
Lastly, we have one question left open from this section:
Previous Previous Table of Contents
Future work, and the mythical, extensible real-time model
Up to this point we have discussed, in great length, all sorts of models surrounding XML processing. We introduced abstract models for processing of certain types of XML, an abstract model for processing any type of XML, a simple model for implementing these algorithms, as well as a complex implementation. We have discussed the challenges associated with generality, and those associated with real-time processing. But we haven't found the magic bullet. We still have holes in our models, and questions unanswered.
Fortunately, we can learn from our own questions. The key question we may be able to draw insight from is: given the general model we posed in section "The general abstract processing model" , how do we solve its ambiguities? It is the author's belief that an XML application (or other language, depending on mood) should be developed to specify the parameters or "bindings" for the general model. This constitutes an extensible solution; and combined with our software implementations, the mythical "Extensible Model for Real-Time XML Processing" can be realized.
This has interesting ramifications for the future of XML application and software development, as well as for the future of computing and communication. When such a model is finally implemented, it could stand as the core for all XML-based applications. The W3C, the Mozilla project (http://www.mozilla.org), Yomu, and others have all recognized the intrigue in this idea: namely, with the development of this model, software and markup will be inextricably linked. And the new world of applications born of this convergence will change the computing field profoundly.
Previous Previous Table of Contents
Acknowledgements
I would like to deeply thank Ben Trafford for laying the groundwork for this research, enlightening me of it over Christmas dinner, and giving me the opportunity to write and present it; and Chris Maden and Deborah Hooker for being so painfully smart.
I would also like to thank professors Steven J. DeRose and Steven P. Reiss for their profound answers to my shallow questions.
I would finally like to acknowledge Jimi Hendrix, without whom this paper would never have been completed.
Previous Previous Table of Contents