An extensible model for real-time XML processing
Dan Rosen
Full Content


Abstract
One of the uniquely advantageous aspects of XML is the separation of XML syntax from the semantics of XML documents. Standards such as XHTML1, MathML2, XSLT3, XLink4 and others all rely on this separation. The XML specification intentionally leaves this open, so that any language described in XML (thus inheriting the common XML syntax) may have its own unique semantics, behavior, or functionality. Most XML applications, such as those mentioned, require special processing: for example, XHTML elements are rendered, MathML expressions are rendered or evaluated, XSLT transformations are executed, and XLinks are reified or embedded. This processing is performed by software programs conformant to each particular XML application.
With the recent growth in the number of XML applications and conformant software, many common problems have begun to present themselves. Most software applications (web browsers, for example, which are capable of processing XHTML) use home-grown software engineering techniques to translate from XML content to program behavior. The lack of uniformity between these various techniques is only a minor problem, however. The major problem is that, since all XML applications share the same syntax by definition, there is no fundamental difference between XML applications visible to a processor without prior knowledge of each particular application. It's safe to say that a processor of one XML application will have knowledge about that application, and it's relatively safe to assume that a processor of two (possibly related) XML applications will have knowledge about both applications, as well as well-defined interactions between them (for example, web browsers will soon be able to handle XHTML with embedded SVG5). But if a processor of an arbitrary number of XML applications must assume prior knowledge of all the interactions between each (for example, XHTML with SVG, embedded through XLinks, transformed through XSLT), the programmers of such a processor would have a nightmarish job. Conformance to one application has proven a difficult enough task for programmers in the past; conformance to several in the same software would prove impossible given current techniques.
This paper will detail current work on XML processing models, a new approach to this problem, combining conventional wisdom from both the markup and the software engineering communities. Variations on abstract processing models based on principles of semistructured data will be discussed, as well as various techniques to implement these models in software. The concept of an extensible, real-time processing model will be introduced, and finally, as this is a very young field of research combining two historically distinct disciplines, many open questions will be posed.