XML Europe 2001 logo21-25 May 2001
Internationales Congress Centrum (ICC)
Berlin, Germany

Schema Adjuncts

Relating Schemas to Applications

Scott Vorthmann <scottv@tibco.com>
 PDF version    Latest version   

ABSTRACT

Schema adjuncts represent a mechanism for relating an XML schema to a particular XML processing task, such as analyzing content, triggering activity, or mapping to relational or other non-XML representations. An adjunct takes the form of an XML document external to the schema, containing application-specific metadata associated with particular attributes, elements, and types in the schema. Since a schema adjunct is external to both the schema and the application implementation, applications can quickly adapt to new input schemas, and schemas can quickly be enabled in new processing applications.

Table of Contents

1. Schema Adjunct Concepts

Schema adjuncts provide a mechanism for associating application behavior with XML schemas or with XML documents. A schema adjunct is an XML document that contains data that is both specific to a particular target schema and specific to a particular application program. A schema adjunct can viewed from either of two perspectives, as an extension of an XML schema with application-specific data, or as parameterization of an XML processing application with schema-specific data.

To state this more concretely, an XML processing task such as HTML form generation might be repurposed for several different schemas (purchase orders, invoices, order tracking) by writing a schema adjunct for each schema that states how forms are generated for it. Conversely, several different processing applications (form generation, RDBMS persistence, extended validation) might be applied to a new schema by writing a schema adjunct for each application that states how it is applied to the new schema.

Like XSLT stylesheets [XSLT], all schema adjuncts share the same general structure, but differ in the details that make them specific to certain schemas and certain applications. The sample adjunct below illustrates these aspects.

<schema-adjunct 
  target="targetNSURI" 
  xmlns:targ="targetNSURI" 
  xmlns="http://www.extensibility.com/namespaces/saf" 
  xmlns:adj="adjunctDataNSURI"> 
<!-- the global association is useful for information that applies to the target 
schema or instance document as a whole -->
<global>
<adj:aGlobalProperty>value</adj:aGlobalProperty>
</global>
<!-- this element association matches any element in the "targ" namespace that 
has an ancestor named "el1", and has type "someType" -->
<element type="targ:someType" context="targ:el1//targ:*">
<!-- This can be any XML information that is meaningful to the processing 
application. -->
<adj:aMarker/>
<adj:aProperty>property value</adj:aProperty>
<adj:structured>
<adj:yadda/>
<adj:yaddaYadda>cool stuff</adj:yaddaYadda>
</adj:structured>
</element>
<!-- "attribute" and "type" associations are similar to "element" -->
</schema-adjunct>

The general idea should be clear from this example: a schema adjunct contains a number of element, attribute, and type associations. Each may state a type or a context or both. The type name syntax allows identification of any type in a schema, simple or complex, whether separately declared or implicitly defined by an attribute or element declaration. The context expression is a simplified form of XPath [XPath], allowing association to be restricted on the basis of the context of a node in an instance document. Associations can contain any well-formed XML content; that content may be structured in whatever form is most convenient for the application that will use the information.

For many schema adjuncts, there are two possible perspectives on the function of the schema adjunct. In one perspective, the adjunct data is being associated with the components of a schema. Adjuncts with this character tend to state associations in terms of type, rather than context. Conversely, schema adjuncts that use context to state associations are naturally viewed from the other perspective: the adjunct data is being associated with elements and attributes of XML instance documents.

The Schema Adjunct Framework makes no assumptions about how schema adjuncts are used. This is important because schema adjuncts can play both design-time roles (where they apply to schema documents) and run-time roles (where they apply to XML instance documents). Adjunct-driven processing of XML documents at run-time can take many forms. The processor may traverse a DOM representation of the instance document, checking for associated adjunct data at each node. Alternatively, the processor may "execute" the associations in the schema adjunct one at a time, applying each to the set of matching nodes in the instance document. Schema adjuncts may even be "compiled", generating executable code to optimize performance and improve traceability.

The generality of schema adjunct concept cannot be overstated. Schema adjuncts can be applied in any XML application that seeks to parameterize its behavior on a per-schema basis. Familiar examples include mapping XML documents to relational databases, and generating HTML input forms from schemas. Those examples have been well covered in other papers, and indeed there are commercial enterprises implementing solutions for those problems that employ schema adjuncts. In this paper, I will restrict myself to two simple applications for schema adjuncts.

2. Application: Extended Validation

Among the various possible applications of schema adjuncts, a straightforward and easily motivated application is the logical extension of schemas for the purpose of stating additional validation constraints. Although XML schemas offer a sometimes bewildering array of capabilities for expressing the structure of valid XML documents, they actually enable only the simplest of constraints to be expressed. There is a very broad spectrum of richer validation constraints that users might want to state. Consider the following sample constraints:

  1. Optional element "spouseName" can be present if and only if optional element "spouseSSN" is present.

  2. If attribute "role" has the value "developer", then one or more "developmentPlatform" elements must be present.

  3. If element "customerStatus" does not have the value "preferred", then the sum of the "estimatedValue" attributes on all "collateralItem" elements must be greater than $5000.

Even the simplest of these cannot be expressed using the XML Schema language itself. While there are a large number of possible ways to state such constraints, any such statement has to refer to elements and attributes in the schema, in a way that allows the constraints to be bound to particular contexts. In other words, no matter what language is used to state the constraints themselves, a schema adjunct is a useful way to associate the constraints with the relevant schema constructs and/or instance contexts.

The example below illustrates how a schema adjunct might be used to state the third constraint, above. This example postulates the existence of an adjunct-driven extended validation processor that interprets XML statements in the "extendedValidation" namespace. The language consists of XPath expressions wrapped in a framework providing a "constraint" construct and certain logic constructs not available in XPath itself.

<schema-adjunct xmlns:loan="http://www.example.com/namespaces/loanApplication"
  xmlns:xval="http://www.example.com/namespaces/extendedValidation">
	<element type="loan:application/*">
    <xval:constraint>
      <xval:test>
        <xval:if condition="customerStatus!='preferred'"
          implication="sum(collateralItem/@estimatedValue)>5000"/>
      </xval:test>
      <xval:message> The total value of collateral must exceed $5000,
        unless the applicant is a preferred customer.
      </xval:message>
    </xval:constraint>
  </element>
</schema-adjunct> 

It is important to realize that the Schema Adjunct Framework does not itself address the problem of extended XML validation, or any other particular problem. One could say, with some justification, that the "interesting" part of the adjunct shown above is the contraint statement, and not the association syntax containing it. However, the association syntax allows constraints to be related to a particular type or instance context in a standard way. The Schema Adjunct Framework seeks to solve only this problem: the need to associate application-specific metadata with instance nodes of a particular type or in a particular context. In the example above, this is accomplished by the "type" attribute on the element association, which associates the stated constraint with the "association" element (actually the anonymous complex type that it defines).

Since these "extended validation" constraints can be considered a logical extension of the schema itself, it is reasonable in some cases to embed the extensions within the schema document using the "appInfo" mechanism provided by the XML Schema language [Schema]. However, in many cases the validation extensions are enterprise-specific constraints placed upon industry standard schemas. In those cases, the schema document itself is not the property of the enterprise wishing to state the additional validation constraints, and they therefore must be stated in a separate document. A schema adjunct is ideally suited for this purpose. Company A and Company B might communicate using a standard schema, but each company can have their own schema adjunct expressing their additional validity requirements for documents/messages of that type.

3. Application: Document Templates

As another application of the Schema Adjunct Framework, consider a program used for editing XML documents in a schema-aware fashion. That is, the editor can check that the document being edited remains valid, or it can guide the editing such that it only presents valid choices, or both. Clearly, such behavior can be implemented using the schema document alone (using the word "valid" to mean schema valid per the XML Schema specification).

Now, consider this schema-aware editor as applied to a schema that embodies a large, complex e-commerce standard like RosettaNet or CBL. Within the broad class of documents valid against this schema, there will inevitably be a much smaller class of "typical" documents, focused on particular scenarios. Users creating new documents with the editor will seldom wish to start with the minimum content for a particular root element; rather, they will prefer to select from a prepared set of "template" documents designed for their enterprise and their business situation.

This behavior is already very useful, but it is really too rigid. The argument for "typical" documents and templates applies for any element in the schema, regardless of whether it appears as the root of a document. A document template provides a starting point, but there will still be choices to make, and each decision opens up additional choices. Wherever a choice exists, there will be "typical" choices.

The schema adjunct shown below captures a set of defaults for a bibliography schema. This adjunct states all of its associations in terms of "context", although there could certainly be situations in which default behaviors could be associated with simple or complex types.

<schema-adjunct 
 target="http://www.example.com/namespaces/bibliography" 
 xmlns:bib="http://www.example.com/namespaces/bibliography" 
 xmlns:et="http://www.example.com/namespaces/element-templates" 
 et:template-name="Joe E. Gomaniac" 
 xmlns="http://www.extensibility.com/namespaces/saf"> 
<global> 
<!-- default for a "bibentry" choice group is a "journalArticle" element -->
<et:choice group="bibentry">bib:journalArticle</et:choice>
</global>
<element context="bib:bibliography">
<!-- default length for the list of bibliography entries -->
<et:list-length value="5"/>
<!-- every bibliography should contain this book entry -->
<et:list-insert xmlns="http://www.example.com/namespaces/bibliography">
<book year = "2001">
<title>Cite Thyself: Who Knows Better?</title>
<author>Joe E. Gomaniac</author>
<publisher>Ransom House</publisher>
</book>
</et:list-insert>
</element>
<element context='bib:author'>
<!-- the default value for the author in any bibliography entry -->
<et:value>Joe E. Gomaniac</et:value>
</element>
</schema-adjunct>

As in the scenario described above for extended validation rules, the element defaults stated in this adjunct apply in a particular situation, presumably when Joe E. Gomaniac is editing his documents. Clearly, Joe's preferences in this regard do not make sense for all users of the schema, so it would be impractical to store his preferences (or anybody else's) in the schema itself as "appinfo" elements.

4. Conclusion

Schema adjuncts provide a way for application behavior to be bound to XML elements and attributes. With simple syntax and association semantics, schema adjuncts serve as a framework for a wide variety of schema-driven XML processing tasks. As XML schemas become more and more pervasive as data models in applications of all kinds, schema adjuncts offer a way to relate application behavior to the underlying models, while avoiding "schema lock-in"... application code with hard-wired schema dependencies.

An effort is currently underway to promote the Schema Adjunct Framework as a standard. An industry group is in the process of revision the specification, with the intent to submit it to the W3C for consideration. Interested parties are invited to send mail to schema-adjuncts@extensibility.com, or contact the author.

Bibliography

[Schema] XML Schema Part 1: Structures, Henry Thompson et al., eds.
[XPath] XML Path Language, James Clark and Steve DeRose, editors, 16 November 1999.
[XSLT] Extensible Stylesheet Language Transformations, James Clark, W3C, 21 April 1999.

Biography

Scott Vorthmann
TIBCO Software
USA
Email: scottv@tibco.com

Scott Vorthmann - Scott Vorthmann is a Senior Architect for TIBCO Software, Inc. His responsibilities include new product design and development, and integration of XML technologies into the product suite. His previous work focused on language tools and meta-tools, including compilers, language-based editors, and integrated development environments, and generators for all of the above. At Carnegie Mellon University, he performed basic research in these areas. In 1995, Mr. Vorthmann and a partner formed GenieWorks, LLC, and created and marketed SpotCheck, a Java-aware program editor. Mr. Vorthmann earned his Ph.D. in Computer Science from Georgia Tech in 1990.