|
An extensible model for real-time XML processing
|
 |
One of the uniquely
advantageous aspects of XML is the separation of XML syntax from the semantics
of XML documents. Standards such as
XHTML1,
MathML2,
XSLT3,
XLink4 and others all rely on this separation. The XML specification intentionally
leaves this open, so that any language described in XML (thus inheriting the
common XML syntax) may have its own unique semantics, behavior, or functionality.
Most XML applications, such as those mentioned, require special processing:
for example,
XHTML elements are
rendered,
MathML expressions are
rendered or evaluated,
XSLT transformations
are executed, and
XLinks are reified
or embedded. This processing is performed by software programs conformant
to each particular XML application.
With
the recent growth in the number of XML applications and conformant software,
many common problems have begun to present themselves. Most software applications
(web browsers, for example, which are capable of processing
XHTML)
use home-grown software engineering techniques to translate from XML content
to program behavior. The lack of uniformity between these various techniques
is only a minor problem, however. The major problem is that, since all XML
applications share the same syntax by definition, there is no fundamental
difference between XML applications visible to a processor without
prior knowledge of each particular application. It's
safe to say that a processor of one XML application will have knowledge about
that application, and it's relatively safe to assume that a processor of two
(possibly related) XML applications will have knowledge about both applications,
as well as well-defined interactions between them (for example, web browsers
will soon be able to handle
XHTML
with embedded
SVG5). But if a processor of an arbitrary number of XML applications
must assume prior knowledge of
all the
interactions between each (for example,
XHTML
with
SVG, embedded through
XLinks, transformed through
XSLT),
the programmers of such a processor would have a nightmarish job. Conformance
to one application has proven a difficult enough task for programmers in the
past; conformance to several in the same software would prove impossible given
current techniques.
This paper will detail current
work on XML processing models, a new approach
to this problem, combining conventional wisdom from both the markup and the
software engineering communities. Variations on abstract processing models
based on principles of semistructured data will be discussed, as well as various
techniques to implement these models in software. The concept of an extensible, real-time processing model will be introduced,
and finally, as this is a very young field of research combining two historically
distinct disciplines, many open questions will be posed.
Introduction
When
two different communities or fields merge, as the markup and software engineering
communities have, there is always an extended enlightenment period prior to
the solidification of the new field, as each learn each other's principles
and begin to apply them. This enlightenment period is generally accompanied
by a significant amount of frustration, as the vocabularies, concepts, and
axioms of each group collide. In recognition of this (and for this paper to
make any sense to its audience), some standard vocabulary and other preliminaries
must be assumed.
Vocabulary
|
|
|
| Application | "Application" refers specifically
to an XML document type. For example, in the MathML
specification: "MathML is an XML application
for describing mathematical notation."
|
| Software Application | "Software Application"
will represent our use of the word "application" in the XML specification:
"It is assumed that an XML processor is doing its work on behalf of a module,
called the application." This distinction
is necessary to disambiguate between XML document types (or, more abstractly,
modes of employment), and specific programs which handle these document types.
|
| Semantics | "Semantics" is a well-defined term
already, but for clarity, it is the set of intended meanings
and behaviors of a type of document and
its constituent pieces. For example, the semantics of an XHTML <p>
element include the basic ramifications of just being an XML element, the
rendering behavior in the context of different browsers and stylesheets, the
significance of being a paragraph, and so forth. The semantics of XHTML as
a whole are defined in these terms.
|
| Interpretation | Likewise, "interpretation" of
semantics simply refers to the correct processing of behaviors or recognition
of meanings in the semantics of an application. In other words, a conformant
software application is one which correctly interprets
the semantics (correctly rendering XHTML, for example).
|
| Object Model | An "object model" is a representation
of data, which may be either conceptual or in software. An example of a conceptual
object model for XML is the Infoset6, and an example of a software object model is the DOM7.
|
| Processing Model | Finally, a "processing model"
is a scheme, either conceptual or in software, for interpreting the intended
semantics of one or more XML applications. A processing model uses an object
model to interpret XML data. This is, of course, a broad definition
in many ways, and leaves a lot up in the air. A lot of definitions need to
be filled in. When multiple applications are present in a single document
(XHTML with embedded SVG,
for example), is that considered a unique application? When two applications
could be processed together in several different ways, i.e. when there is
ambiguity, how is that ambiguity resolvable? These are the type of questions
which will be approached by this paper.
|
Outline
This paper will make a stark distinction between abstract (conceptual
or theoretical) processing models, and models implemented in software. In
reality, the implementations are tied directly to the theoretical models,
one being of little use without the other; but until more vocabulary and experience
become common to both the markup and software disciplines, both will be easier
to comprehend if a certain "safe distance" is maintained between each.
- 1. Abstract processing models will be discussed first, covering the
circumstances in which they came about, the evolution of basic models, and
the difficulties in creating truly general models for XML processing.
- 2. Implementations of processing models will then be discussed, covering
the object-oriented design techniques used to represent the concepts presented
in abstract models in software.
- 3. Finally, the "Extensible Real-Time Processing Model" will be introduced,
and future directions of markup technology and software engineering will be
discussed.
Abstract processing models
In recent months, a significant amount of thought has been put into
the development of abstract processing models. It is useful to consider abstractions
in general when we don't want to let ourselves be bogged down in the details
and quirks of concrete implementations, when we want to force ourselves to
the rigor which theory and mathematics demands, or when we're bored. All three
are valid reasons (although the latter is more commonly solved in front of
the television), however, the development of abstract processing models for
XML is out of necessity. The range of real-world applications of XML is becoming
so broad, and the complexity so high, that the theoretical limits of XML processing
are being pressure-tested.
Motives for developing abstract models
The complexity of
XSLT is the
main motive for developing abstract processing models. It is a
Turing-complete language, meaning it is of the highest
computational complexity we know of. This has many implications, but there
are two main, concrete concerns:
-
XSLT is able to break or mangle the semantics and syntax of any XML.
This is the W3C's main concern as
it develops applications such as XLink
(XLinks should, by definition, never
be broken, as they represent important semantic information about document/text
relations).
-
XSLT is often used as as
stylesheet language for documents, but in this process, documents may be rearranged
and text may be generated or disappear. Given an XML application X, two stylesheets a
and b may exist to transform document x of type X into XHTML and XSL-FO8, respectively. A user of the transformed document a(x)
may have a very different view of x from
a user of the transformed document b(x).
For the two users to share information about x,
they must be able to reverse the effects
of their transformations to extract information about the original document.
Since XSLT is Turing-complete, this is an impossible problem in the absence
of an intelligent work-around.
XSLT transformation reversal
The concept of reversing,
or tracing back
XSLT transformations
turns out to be the solution to
both concerns.
The problem of comparing the same document with different transformations
yields this solution, but the problem of identifying semantic and syntactic
breakage can also be solved in this way, by tracing transformations back to
the source, to compare and see whether anything in the original source became
broken in the result. And although the computational complexity of
XSLT does not permit us to perform this operation
given only the result and transformation documents, we can be successful by
keeping a bit of extra information around.
At every step of
XSLT's processing,
there is a
context node in the source
document, and a
template rule which is
applied to that context node to produce a
result tree
fragment containing one or more nodes. Where the data appearing
in the result tree fragment (whether it be directly from the context node,
from a combination of nodes, or entirely generated) is entirely irrelevant;
the point is that given the combination of context node, template rule, and
node in a result tree fragment, the three are unambiguously related. This
relation may be recorded in a variety of ways, one of which will be described
shortly, but the key point is that with this mapping, a node in a source tree
may be re-located given only the node in the result tree.
Preserving transformation traces
Steven DeRose, of the Brown University
STG,
and the author, propose a method of using
XLinks
to preserve the transformation relationships. As
XSLT
is processed, the object model stores the relationships and then serializes
them in this form:
<!ELEMENT transform-triple (source, template, result)>
<!ATTLIST transform-triple
xlink:type (extended) #FIXED "extended"
>
<!ELEMENT source EMPTY>
<!ATTLIST source
xlink:type (arc) #FIXED "arc"
xlink:role NMTOKEN #FIXED "source"
xlink:to CDATA #REQUIRED <!-- URI reference to node, -->
<!-- possibly including XPointer -->
>
<!-- (template and result elements defined similar to source element) -->
For example:
<transform-triple>
<source xlink:to="source.xml#sourcenode"/>
<transform xlink:to="transform.xsl#templatenode"/>
<result xlink:to="result.xml#resultnode"/>
</transform-triple>
This is a simple, lossless serialization of the relevant information,
but is a bit naive. Although it inherits the robustness of the
XLink
model, it requires a lot of processing to be of any later use. Also, template
parameters must be associated with the template role (they may be traceable
from the context node in the source, but could also be specified explicitly
using something like
<!ELEMENT template (param*)> in the definition),
and transformations may result in zero or more result tree nodes (which may
be represented by "null" or by the root of the fragment, but this represents
additional complexity). This is, however, useful as a proof of concept. Also,
extensions could be envisioned, whereby any number of template-result pairs
beyond the first two could be added to signify transformation chains.
More complex, and potentially more efficient solutions exist to preserve
information about transformation processing. These solutions are typically
geared towards specific applications, but some scale surprisingly well. One
notable such solution, proposed by Chris Maden and Deborah Hooker of Yomu,
is the creation of a new XPath extension function:
result-node(URI-reference-to-stylesheet-node[, param-name, param-value]*)
This leverages pre-existing implementations without significant overhead,
without the complexities of defining a new XML application, and so forth.
This solution is described in further detail at
http://www.yomu.com/xtech2000
(URL subject to change).
Detailed treatment of this and related methods for preserving transformation
trace information will be omitted, but it is important to note the role of
transformation tracing in the development of abstract processing models.
First attempts at processing models
Given the ability to trace documents across transforms (however we choose
to implement it), thus solving the major problems mentioned, we have the basis
for a simple processing model:
- 1. Start with document x0
which is known to be syntactically and semantically valid, with respect to
the semantics of the application.
- 2. Given document xn,
apply transformation tn to
yield xn + 1.
- 3. Verify that no syntax or semantics have broken, using the XSLT back-tracing described in section
"XSLT transformation reversal"
. Repeat transformation and checking
for all transformations t0 ... k.
- 4. Interpret the semantics present in the result document.
Although only a first attempt, this processing model has some obvious
advantages. The most clear advantage is that the problem of semantic breakage
has been entirely solved; we need only specify the semantics to be aware of
during the checking phase, and each application's semantics may be specified
independently of one another with no worries of unforeseen interactions.
Taking this basic model further, we notice that
XSLT
isn't the only transformer of documents. XLinks are also capable of transforming
documents to a degree. XLinks of type
show="embed" may transform
documents during processing as well. Whether this should truly be considered
a transformation in a sense similar to
XSLT
is a question of much contention, but if desired, we can accommodate this:
- 1. Start with valid document x0.
- 2. Given document xn,
apply transformation or content embedding tn
to yield xn + 1.
- 3. Verify that no syntax or semantics have broken, using XSLT
back-tracing, and check embedded content as well. Repeat transformation and
checking for all transformations or embeddings t0
... k.
- 4. Interpret the semantics present in the result document.
This manages to close up some loopholes in our basic model, but opens
up some further ones. Most importantly, in which order should processing take
place? Should content embedding occur before
XSLT
transformations, or after? Should embedded content undergo processing in the
context of the original document after it is embedded, or in its own context
before embedding? The most sophisticated answer we currently have is:
- 1. Start with valid document x0.
- 2. Given xn, identify
all XLinks, and process XSLT transformation tn
to yield xn + 1, maintaining
the identity of the links across the transformation using back-tracing. Repeat
for all t0 ... k.
- 3. Identify surviving XLinks
requesting embed or other processing of targets, and process each target in
its own context.
- 4. Interpret all semantics in all documents.
This is an incredibly robust solution for the combination of XSLT, XLinks,
and any other XML applications not involving transformations; since it builds
on the basic advantages of even our most naive abstract processing model with
some useful knowledge about the specifics of
XLink.
Paradoxically, it turns out to be a somewhat weak solution for combinations
not falling into this convenient category. In the cases where other mechanisms
are transforming documents, this model is still too ambiguous and does not
suffice.
But what other mechanisms do we have presently to transform XML, besides
XSLT and
XLink?
None, as far as XML applications go. But there are a growing number of software
applications which let users directly or indirectly manipulate (effectively,
transform) XML documents in real-time. And more XML applications could certainly
be envisioned to do arbitrary transformations as well. Can a theoretical model
be applied to this, the most general case?
The general abstract processing model
We can modify the wording of our previous processing models to use more
general terminology. Neither are
XSLT
transformations particularly special, nor those associated with
XLink
embedding, nor any other transformation. They all fall under the category
of
general transformations of semistructured data,
and there is a limited number of basic transformations (create, move, delete)
which may be composed to arbitrary complexity.
Since general transformations can be expressed in terms of rules, in
the same way as XSLT transformations are expressed as template nodes, the
same source-rule-result triplet may be maintained in the general case, and
the same back-tracing may occur.
The general abstract model, then, is as follows:
- 1. Start with valid document x0.
- 2. Given xn, execute
transformation function tn(x)
to yield xn + 1.
- 3. Verify that nothing has broken, using transformation back-tracing.
Repeat for all transformation functions t0 ...
k(x).
- 4. For any transformation functions which pull content from other documents
besides x, transform and verify that document
until finished, and then embed that content in the main document as necessary.
- 5. Interpret all semantics in all documents.
This model, with perhaps a few tweaks, is theoretically the strongest
processing model possible given the semistructured data model. The trouble
is, though, that it is now too general. There is ambiguity in processing order:
for example, if both
XLink embedding
and
XSLT transformations are regarded
as transformation functions
t(x), which
of the two is supposed to be processed first? The prior processing models
assumed
XLinks would be processed
after
XSLT, but is that a safe assumption?
Problems and open questions
The answer is, of course, no assumptions are safe. The model is provably
sound, and it accomplishes the goals intended of an abstract processing model;
but to assume anything about the workings of the model in order to make it
work for one case, is to break it for ten other cases. It will likely be determined
necessary to define a specification language for the processing of XML applications,
which will fill in the blanks in the processing model on a case-by-case basis.
The need for this sort of extensibility should be of no surprise to those
who understand the need for extensibility in XML, but the specifics must be
left to further research.
To summarize the open questions about abstract models:
- What is the best, or most general way to represent and process transformation
reversal and back-tracing? (section
"XSLT transformation reversal"
)
- What other types of common transformations would we like to perform
on XML data? (section
"The general abstract processing model"
)
- How do we clarify the ambiguities which arise from maximal generalization
of our models?
Processing model implementations
One of the key principles to modern software engineering is the zen
of object-oriented design. Object-oriented
design stresses breaking programs into discrete objects, each of which is
responsible for a set of operations and a set of data. These objects are additionally
responsible for protecting (even hiding)
their data, so that neither malicious users nor careless programmers may misuse
it. Objects may represent physical objects, conceptual objects, encapsulate
algorithms, aggregate other objects, act as intermediaries between other objects,
and so forth.
We will discuss implementations of xml and its models strictly using
the object-oriented paradigm, since practically all implementations are expected
to be either in object-oriented programming languages, or in other languages
using an object-oriented discipline.
For those unfamiliar with
UML,
a visual language for diagramming object-oriented systems, the following conventions
are used:
|
|
|
| named boxes | object classes (italics imply an
abstract class)
|
| lines | relations or associations between classes
|
| triangles | inheritance relations (class with
triangle pointing to it is the parent)
|
| diamonds | composition relations (class with
the diamond pointing to it is required for the composition of the class it
relates to)
|
| arrows | represent traversability (in association
or composition relationships)
|
| numbers | represent cardinality of objects (instances
in association or composition relationships)
|
Object model
For the sake of simplicity, we will assume a very straightforward object
model, based on the 20 December 1999 working draft of the
Infoset6. The only slight complexity in the model is the
Position abstraction, which expresses the commonality
between all information bearing the notion of position within a text string.
Basic implementations
Given a simple object model for XML, we can now begin to create an object-oriented
framework for processing that XML. Note that this framework is independent
of the steps or ordering in the abstract models posed, but is based on the
concepts in those models. The challenge in designing this framework, as with
the challenge in all object-oriented design, is to pick objects which will
represent the concepts appropriately, while still allowing for flexibility.
The simplest framework we can imagine is one where the document is represented
in one object, and the semantics and behavior in another:

Figure 2
. Basic implementation design
Although the concepts have all been covered by the design, we can pick
out some immediate weaknesses due to its simplicity:
- Since the Document class has no information pertaining
to the type of document it is, the XMLSemantics class has to
do a lot of work. Namely, it needs to provide all the accessor/mutator methods
for the data in the document hierarchy which has special meaning to the XML
application, as well as code to handle the behavior of the document in the
software, and so forth. The XMLSemantics class needs to be split
up.
- Also, since the Document has no conception of what
types of data should be protected, or the effects of modifications to that
data, the document hierarchy could easily be broken. A wrapper around the Document
class should be provided to remedy this.
We can combine these two issues into a cleaner, safer, second version
of our processing model framework:

Figure 3
. Basic implementation design, revisited
We have now split the semantics of an XML application into two classes,
which we call XMLProcess (to signify the operations on a document
when processing a particular application type) and XMLBehavior
(which codifies the interactions between the XML application and the rest
of the software). The XMLProcess class also serves to protect
the Document from malice or accident. In fact, this basic model
is sufficiently robust to handle most of the complexities present in the general
abstract model described previously.
Advanced implementations
The basic implementation we now have serves as an excellent abstraction
for the concepts in the general processing model. We can attach one or more
"processes" to a document (representing, for example,
XHTML
with embedded
SVG), and may attach
zero or more "behaviors" to each process (representing the semantics of an
application, as defined in
"Vocabulary"
).
There are, however, inefficiencies in the model in some contexts.
Whenever the XMLProcess class attempts to extract information
from the document tree, it must either do exhaustive searching through portions
of the tree, or must store references to salient features of the tree. Either
way, the space-time tradeoffs just amount
to gross inefficiencies: either a real-time application will be slowed down
enormously by the tree traversals, or the target system will be required to
have large amounts of memory.
Taking the concepts from our basic models and applying some simple design
patterns yields better results. We will go through a three-step process to
convert the framework from a document basis to a node basis:
First, we would like all elements in our object model to inherit some
core properties from a generic XMLObject:

Figure 4
. Inheritance of nodes from XMLObject class
We now re-create the abstraction of our basic implementation, with our
new XMLObject as the core:

Figure 5
. Replacing the Document with XMLObject
Finally, since it is inefficient to store an overabundance of XMLProcess
and XMLBehavior objects in memory, we split these objects into flyweights (separating the algorithms from the context):

Figure 6
. Final efficient implementation
A variety of techniques may now be used to provide safety and organization
to the resultant structure; the simplest would be to provide a subclass of XMLProcess
to attach to the Document node, as before, which would marshall
the entire process.
Notes on implementations
Although the final implementation we developed for real-time processing
is efficient, and conceptually relatively simple (in that it contains very
few object classes, all of which are predictable and flexible); it is obviously
a challenge to properly implement, especially in the presence of a chosen
abstract model.
Fortunately, in some cases, where real-time processing is not a concern,
or efficiency is decided not to be a concern, we can gracefully choose the
basic model (see
Figure 3).
The most important factor in designing the implementation is that it
remains independent, and does not restrict the algorithm specified in the
abstract processing model. The algorithm itself should be specified in one
class (or an encapsulated group of classes) and should theoretically be swappable;
this is a basic concept in object-oriented design.
Lastly, we have one question left open from this section:
- What is "efficient enough"? What lengths do we have to go to, and
what pains must we take to accomplish this level of efficiency?
Future work, and the mythical, extensible real-time model
Up to this point we have discussed, in great length, all sorts of models
surrounding XML processing. We introduced abstract models for processing of
certain types of XML, an abstract model for processing any type of XML, a
simple model for implementing these algorithms, as well as a complex implementation.
We have discussed the challenges associated with generality, and those associated
with real-time processing. But we haven't found the magic bullet. We still
have holes in our models, and questions unanswered.
Fortunately, we can learn from our own questions. The key question we
may be able to draw insight from is: given the general model we posed in section
"The general abstract processing model"
, how do we solve
its ambiguities? It is the author's belief that an XML application (or other
language, depending on mood) should be developed to specify the parameters
or "bindings" for the general model. This constitutes an extensible solution;
and combined with our software implementations, the mythical "Extensible Model
for Real-Time XML Processing" can be realized.
This has interesting ramifications for the future of XML application
and software development, as well as for the future of computing and communication.
When such a model is finally implemented, it could stand as the core for
all XML-based applications. The
W3C, the Mozilla project (
http://www.mozilla.org),
Yomu, and others have all recognized the intrigue in this idea: namely, with
the development of this model, software and markup will be inextricably linked.
And the new world of applications born of this convergence will change the
computing field profoundly.
Acknowledgements
I would like to deeply thank Ben Trafford
for laying the groundwork for this research, enlightening me of it over Christmas
dinner, and giving me the opportunity to write and present it; and Chris Maden and Deborah Hooker
for being so painfully smart.
I would also like to thank professors Steven
J. DeRose and Steven P. Reiss
for their profound answers to my shallow questions.
I would finally like to acknowledge Jimi Hendrix,
without whom this paper would never have been completed.