One of the uniquely
advantageous aspects of XML is the separation of XML syntax from the semantics
of XML documents. Standards such as
XHTML1,
MathML2,
XSLT3,
XLink4 and others all rely on this separation. The XML specification intentionally
leaves this open, so that any language described in XML (thus inheriting the
common XML syntax) may have its own unique semantics, behavior, or functionality.
Most XML applications, such as those mentioned, require special processing:
for example,
XHTML elements are
rendered,
MathML expressions are
rendered or evaluated,
XSLT transformations
are executed, and
XLinks are reified
or embedded. This processing is performed by software programs conformant
to each particular XML application.
With
the recent growth in the number of XML applications and conformant software,
many common problems have begun to present themselves. Most software applications
(web browsers, for example, which are capable of processing
XHTML)
use home-grown software engineering techniques to translate from XML content
to program behavior. The lack of uniformity between these various techniques
is only a minor problem, however. The major problem is that, since all XML
applications share the same syntax by definition, there is no fundamental
difference between XML applications visible to a processor without
prior knowledge of each particular application. It's
safe to say that a processor of one XML application will have knowledge about
that application, and it's relatively safe to assume that a processor of two
(possibly related) XML applications will have knowledge about both applications,
as well as well-defined interactions between them (for example, web browsers
will soon be able to handle
XHTML
with embedded
SVG5). But if a processor of an arbitrary number of XML applications
must assume prior knowledge of
all the
interactions between each (for example,
XHTML
with
SVG, embedded through
XLinks, transformed through
XSLT),
the programmers of such a processor would have a nightmarish job. Conformance
to one application has proven a difficult enough task for programmers in the
past; conformance to several in the same software would prove impossible given
current techniques.
This paper will detail current
work on XML processing models, a new approach
to this problem, combining conventional wisdom from both the markup and the
software engineering communities. Variations on abstract processing models
based on principles of semistructured data will be discussed, as well as various
techniques to implement these models in software. The concept of an extensible, real-time processing model will be introduced,
and finally, as this is a very young field of research combining two historically
distinct disciplines, many open questions will be posed.