Using XML effectively in eBusiness architectures
Ron Bodkin
Find


Abstract
XML is being incorporated into the foundation of eBusiness applications in increasing numbers and in greater variety. This paper addresses how and where XML is effective and how it impacts implementations. It addresses how to use XML, approaches to integrating transactions between business partners, and offers approaches to selecting and designing vocabularies. It will discusses architectural patterns for leveraging XML at each layer of an application architecture, and new skill sets for working with XML.

Keywords

Contents
  1. Introduction
  2. Case studies
    1. Dealer extranet
    2. Software exchange
  3. Architectural considerations
    1. Vocabularies
    2. Data tier
    3. Functionality tier
    4. Presentation tier
  4. Organizational impacts
  5. Summary
  6. Acknowledgements
  7. Bibliography

Introduction
The Internet is changing the economy by redefining supply chains and allowing companies to interact with customers, manufacturers, and distributors in ways previously not possible. The eagerness of companies to take advantage of this new market channel has resulted in rapid business innovation, causing the explosive growth of eBusiness—especially for B2B cooperation.
XML has emerged as an important new Internet standard. It has rapidly achieved endorsement and adoption from nearly all software platform providers, spurring adoption in most industries. XML combines a heritage of markup languages for publishing applications with the value of structured data for online applications. The resulting merger of ideas has led to new ideas and revitalized old ones. C-bridge believes that XML will change how applications are built. It will unlock new possibilities by allowing standardized business information, allow more universal data processing, and enable a world of mass-customized services provided over the Web.
This paper analyzes how XML has changed and will continue to change application architectures for eBusiness solutions. Since effective architecture conforms to business needs, we will consider why and how XML has been used in sample projects for C-bridge clients. This in turn will lead us to consider the impact on architectures and on business organizations.
Previous Previous Table of Contents
Case studies
We present two case studies of use of XML in an eBusiness solution. The cases vary most notably in the degree to which each uses XML. One solution uses XML at the edge of the application architecture (for communication with partner systems). The other solution uses XML intensively throughout the application architecture, including interfaces both with partner systems and within the system itself.
Dealer extranet
The first case is an extranet C-bridge created with a Fortune 500 client to service their dealer network. The application provides e-Commerce capability (order transactions, order status, participation in promotions) and access to a wide variety of transactional and information data. The application users are employees of separate companies in the client’s dealer network. The initial release of the application did not use XML. The application had a typical Web architecture, including high availability clusters for database, Java-based application servers, and Web servers. The most important application design deliverable was a logical database model. Other important design deliverables were the physical systems architecture, the object design for application services, and the UI prototype.
After the initial application launch, the dealer extranet was extended with supplier integration. Users can place orders from a supplier online, and subsequently perform online order status queries. The integration needed to be automatic, so users would not be aware that multiple systems are involved. To accomplish this, the supplier system and the extranet system were extended to communicate directly with each other. The resulting high-level application architecture—is shown in Figure 1, demonstrates how XML is used at the edge of an application:
Figure 1 . Dealer extranet application architecture
XML allows a standardized set of messages that can be easily evolved, leverages standardized development tools (such as parsers), and, with HTTPS, easy integration with almost any technical infrastructure that partners might have.
The systems pass XML messages over the HTTPS protocol. Application-level code is used both to provide more protocol information (e.g., the use of HTTPS post, variable names to use) and to choreograph the series of messages and confirmations required to implement reliable messaging and a solution-specific 2-phase commit protocol.
The message formats were defined using XML DTDs. The DTDs were based on Ariba’s cXML 1.0 specification, and extended to capture additional required data items and message types.
The application processes incoming messages with a factory that uses a DOM parser to convert specific DTD types into specific object types. Figure 2 and Figure 3 below illustrate the approach used. The application generates outgoing messages by creating objects of the given type and having them emit an XML representation. The code that maps between XML and Java objects is hand-written.
<order>
<orderID>1</orderID>
<date>5-Aug-2000</date>
<shipToLocation>Helsinki</shipToLocation>
<orderItemList>
<orderItem>
<quantity>200</quantity>
<productSKU>3307</productSKU>
</orderItem>
<orderItem>
<quantity>30</quantity>
<productSKU>1205</productSKU>
</orderItem>
</orderItemList>
</order>
Figure 2 . Sample order document
public class order {
int orderID() { ... }
String date() { ... }
String shipToLocation() { ... }
List orderItems() { ... }
}
Figure 3 . Sample order class
This application uses XML for messaging, but only at the boundary of communication with other systems, i.e., it uses XML at the edge of its architecture. This allows the application to take advantage of the benefits of standards-based supplier integration, with minimal impact to the existing architecture and limited requirements for new skills.
Software exchange
The second case is a software exchange application C-bridge created with a dot.com client. This system allows users to publish, locate, and acquire software components. It also allows communities of interested users to share information, and it provides news for them. Both publishers and users of software components use this system. The client’s business has challenging requirements for integrating third party capabilities, customization and adaptability, and leveraging XML information sources.
The software exchange uses XML to
The most important design deliverable was the logical data model, since it was used to determine the underlying data storage and service interfaces. This model was captured in XML schema (the draft standard). XML-oriented services, object designs, and a UI prototype were also important design deliverables to specify the system. This application has a great deal of persistent data, and the schema for the logical model was used to determine how data is stored in a relational database and in an LDAP directory. Using XML schemas to derive the database design represents a significant shift from traditional application development where a logical relational model is used to determine the application data model. The schemas were also used to determine the data interfaces for internal services.
The application architecture is illustrated in Figure 4. Highlights include
Figure 4 . Software exchange application architecture
By contrast with the first case study, this application uses XML intensively throughout. This placed an increased reliance on evolving standards and on less mature tools, but it enabled the application to achieve significant benefits and laid a foundation for the future.
Previous Previous Table of Contents
Architectural considerations
This section assesses the impact of XML throughout eBusiness application architectures, using the three-tier architectural pattern as our frame of reference. We first describe how XML impacts a given part of the architecture, and then discuss the impact on the case studies in more detail.1
Vocabularies
There are several schema languages for representing the structure of XML data. Today, the most commonly used format is the XML DTD. The (XSDL) is a public working draft of the XML Schema Working Group of the W3C that is not expected to change significantly when released (http://www.w3.org/TR/xmlschema-1/). XSDL provides significant benefits over DTD such as much better data type support, better modularity, refinement of types, and the use of XML to define types. It is generally believed that XSDL will quickly become the standard representation format for XML documents when the W3C publishes it, because of these benefits and the momentum behind the standard. For example, the Garner Group recommends that organizations “plan to use XML schema (when approved by the W3C)”2
Figure 5 . XML domain vocabularies
Applications that use XML can take advantage of this standardized data by using a variety of standard schemas to capture key concepts. However, these applications need to deal with the complexities of schemas evolving, managing overlapping schemas and integrating with partners that use different schemas to represent the same business concepts. While there is a great deal of activity in creating domain vocabularies, there is still a lot of work to be done and in many domains there is not yet a clear standard to use.
During the transition period to XSDL, application architectures will generally use DTD or draft XSDL to represent data. Using DTD allows use of more existing domain schemas and has the widest range of tools support. It can be augmented with tools that capture additional information (e.g., those that support Datatypes for DTD, http://www.w3.org/TR/2000/NOTE-dt4dtd-20000113). Today, many domain vocabularies are being represented in DTD, XSDL, and in other schema formats (which pre-date XSDL and C-bridge expect to be superseded by it).
Using XSDL allows immediate benefits, prepares for the future (minimizing the need for rework), and allows incorporation of future schema standards. Describing proposed standard vocabularies in draft XSDL is an important step toward better specifications, as well as preparing for the future. While fewer tools support draft XSDL today, this is changing quickly; both validating parsers (e.g., Apache Xerces) and tools to convert between draft XSDL and DTDs (e.g., Extensibility) currently exist. For more information on the use of XSDL, DTD, or other formats see also [Mik99], [Stl99], and [XscBP].
It is instructive to consider some examples of whether to use DTD or XSDL to represent data. The Dealer Extranet uses DTD because it was completed in 1999 when XSDL was still undergoing significant changes. It also uses DTD because the limited use of XML reduced the benefits of using XSDL. The importance of working with a wide variety of partner companies and their different technical infrastructures made it critical to support DTD so partners could use the widest array of tools for integration.
The Software Exchange uses XSDL because it uses XML intensively throughout the architecture, it benefits greatly from more precise and modular specifications, and it would be costly and risky to retrofit a subsequent change from using XML DTD. The Software Exchange incorporates some custom code for handling schemas, in addition to using a validating parser.
Both example applications use standard vocabularies for part of their data representation (e.g., cXML, DSIG, vCard). However, additional information was modeled without a standard vocabulary, and the parts that did use standard vocabularies needed to extend and adapt them for their specific business problems.
There are important disciplines for selecting schemas, as well as for extending or creating them, much like the disciplines for buying and extending or building software components. The Gartner Group provides these guidelines:
Groups of trading partners should not wait for standard schemas developed by duly authorized standards committees to emerge. Instead, they should look to smaller consensus groups, including dominant vendors, or create bilateral agreements when the business case dictates. At the same time, these enterprises should limit the use of these "rogue" schemas to those projects in which the business need is compelling. They should realize that the standards will evolve, and there will eventually be more consensus, but only after periods of confusion. When that occurs, they will need to change what they have already done in favor of the de jure or de facto standards to be compliant.3
Forrester Research advises
Monitor relevant standards organizations.
Bet on leading XML vocabularies.
Match XML commerce vocabularies to internal XML projects.
Use XML first in projects with low agreement requirements.4
There are a number of good sources for information on effective DTD and Schema design: http://wdvl.com/Authoring/Languages/XML/Schema.html provides a number of good links on designing Schemas; see also [Meg98], and [Kra99]. The emerging field of XML Design Patterns has addressed significant attention in this area. For example, see "Introduction to XML Design Patterns" at , and .
Data tier
The data tier in a three-tier architecture is responsible for access to and storage of persistent data. There are a number of different approaches to persistent storage of XML information:
Key questions that determine the best approach include the size, complexity, and type of information (e.g., is it data or a document), the frequency, and type of access (e.g., is it frequently updated, or mostly read-only), and existing storage formats and mechanisms.
An important source additional source of data includes existing content stores, transactional systems, and partner Web-based systems. There are a great number of converters and adapters that allow these existing assets to be integrated as XML data and to be communicated with using XML APIs. Newer systems are building native XML interfaces.
Relational databases are a frequently used approach. Relational databases and XML documents have different data type systems. This discrepancy is referred to as impedance mismatch, a term that was first used to describe the same type of mismatch between relational databases and OOP objects (see [Sri97]and [Ban92]. The difference in type systems makes it important to design a strategy for mapping between the two representations.
[Bou99]and [Buck]discuss different approaches to storing data, the trade-offs among them, and best practices for working with them.
We will explore some of the issues by considering our case studies. Both applications use a relational database to store the majority of their data, and needed a strategy for mapping between XML and the relational database. To demonstrate the different approaches, we present a simplified example of a sample order document (from Figure 2 above).
In the Dealer Extranet, the database is represented in a conventional relational form, and the data in an XML document is broken out into appropriate tables, which can be joined together to represent a complete XML entity. This approach is illustrated in Table 1 and Table 2. This approach was taken because:
Accordingly, the time required to translate between an XML format and database tables was not a problem for this application.
OrderIDDateShipToLocationID
15-Aug-20005
...
Table 1 . Conventional relational form: order table
OrderIDSequenceNbrQuantity ProductSKU
11200 3307
1 2301205
...
Table 2 . Conventional relational form: order item table
In the Software Exchange, the database is represented in an XML-specific relational form, whereby the data in an XML document is stored as text5 as a single logical item, and elements that need to be indexed are stored in a separate table of element values. The Software Exchange does not need to index attributes, though this could be handled similarly. The XML data will often exceed the largest varchar2 that can be stored in the database, so multiple columns are used to store the data, and a large object (or additional rows) can be used for larger elements.6 This approach (see Table 3 and Table 4) was taken because there is a great deal of complexity and structure in the XML data being stored, significant parts of the data did not need to be indexed, and the information was primarily being read. The index table supports optimized reading of data and simplifies full-text search. It is also possible to further optimize performance by caching the XML data and/or the indices. The biggest drawback of having a row per element for indexing is for those (rare) queries that search for many attributes. For these queries, the database needs to do multiple joins.
RootIDSchemaIDXMLData1 XMLData2
150<order> <orderID>1</orderID> <date>5–Aug-2000</date> <shipToLocation>Helsinki </shipToLocation> <orderItemList> <orderItem><quantity>200 </quanity><productSKU> 3307</productSKU>
...
Table 3
RootIDSequenceNbrPath Value
11order/orderID 1
1 2order/date5–Aug-2000
1 3order/shipToLocation Helsinski
1 4order/orderItemList/ orderItem/quantity200
15order/orderItemList/ orderItem/productSKU3307
Table 4
Functionality tier
The functionality tier is responsible for all the business logic, routing, and integration required to convert between application-focused data and core system data. Applications that process XML can use object-oriented, data-oriented, or a mixture of development approaches.
Processing XML in an object-oriented fashion involves encapsulating XML documents through native objects, which are specific to the problem domain. This provides the benefits of OO programming, which are compelling for applications that are doing complex or intensive computation (encapsulating data, flexibility, reuse of methods, proven object design patterns).
While there are some similarities between XML documents and objects, there is still an impedance mismatch, just as there is between relational databases and objects. This raises the implementation question of how to map between XML and objects. A typical approach is to map simple XML elements and all XML attributes into object attributes and to map complex elements into objects. As with other mapping problems, there is more complexity involved when mapping associations between items. Key issues include determining when to aggregate elements or when to use references, and how to represent references between items.
XML makes the use of data-oriented development a viable alternative. There are a number of different data-oriented approaches:
Data-oriented development allows straightforward, principled manipulation of data. There are a number of data-oriented techniques, which can be applied to the problem at hand. Many parts of an e-Business application are fundamentally responsible for data flow without complex processing: retrieving, transforming, displaying, validating, and posting data. In these cases, data-oriented development offers significant benefits. Indeed, the continued popularity of 4GL development environments for IS applications can be attributed to the benefits of data-oriented development. However, conventional 4GL’s suffered from the simplicity of modeling data as a relational database result set. By contrast, XML data-oriented development benefits from modeling data with a rich structure.
Data-oriented development for XML is a new approach, and when assessing the different techniques, it is important to consider the maturity, performance, and existence of available tools (including tools for supporting tasks like editing and debugging). Likewise, it is important to consider developer skills. Understanding XML data and how to operate on it is likely to become a common skill for developers; as this occurs, there will be a benefit from increased productivity. However, there is an investment involved in data-oriented development, unlike encapsulating XML in native objects, which lets developers operate using known techniques and built-in language facilities.
XSLT is sometimes viewed as a universal tool to solve XML processing problems. C-bridge believes that XSLT is appropriate for a subset of the problems that are best solved with data-centric approaches, where structural pattern matching is helpful. XSLT is neither appropriate for natural language translation, nor for complex algorithmic transformations. [Cla99]provides notes on when to use and not to use XSLT and on the two basic styles for XSLT transformations: push and pull.
DOM development allows more flexibility in the data structures that can be handled than encapsulating XML in native objects. However, this reduces type checking which can make development and testing harder.
One of the most interesting challenges in processing XML is handling evolution, overlap, and alternative schemas. Object-oriented designs can use encapsulation including multiple interfaces for different parts of an object, flexible mappings from schemas to classes, and parameterized objects to address this challenge. Data-oriented designs can use internal representation schemas, metadata, pattern matching, rules, transformations, bindings, refinements, and parameterized schemas.
The Dealer Extranet uses an object-oriented approach in which a factory class converts received XML into a set of domain objects. On output, a template generates XML messages, and the data is retrieved by binding from specific object attributes to output parts in the template. This approach leverages the existing development environment.
The Software Exchange uses several of the data-oriented approaches discussed earlier for much of the work of reading, transforming, and presenting data. Object-oriented approaches are used for operations like iterating over a set of items, performing calculations, and handling user-driven queries.
Presentation tier
XML has most frequently been applied in the presentation layer of eBusiness applications. One common approach is using XSL transforms to convert XML into HTML (or WML) on a server. In future, XML browsers are likely to display XML directly, possibly using XSL formatting objects. There are a number of new efforts underway to use XML to describe device independent displays, such as XUL (http://www.mozilla.org/xpfe/), XForms (http://www.w3.org/MarkUp/Forms), and UIML (http://www.uiml.org). XML is a natural format for describing presentations in a display-independent fashion. For more information on how XSL and XML can be used to present data see [Hol99].
In addition to presenting information to users, B2B applications often present information to other applications, to support integration with partners. Such applications require agreement on systems interactions and standardized vocabularies. The eCo Architecture provides a good framework for analyzing the different layers of complexity involved (see Figure 5). Agreement needs to address inter-company workflows, negotiation of which vocabularies to use, and technical messaging methods. Many applications today pass XML over HTTP and HTTPS (as in our case studies). This type of messaging leverages the widespread support for XML and for HTTP to allow diverse systems to communicate in a loosely coupled fashion. The Gartner Group refers to XML over Web protocols as the “Digital Dial Tone” and Forrester describes XML and HTTP as central parts of “Internet Middleware”. However, XML over HTTP or HTTPS leave many aspects of messaging open, requiring custom techniques to integrate among applications, especially for tighter integration. There are a number of proposals to solve these problems by addressing standard middleware issues (such as reliable messaging, name resolution, as well as how to pass variables). The W3C is convening a panel on XML and Protocols at WWW9. http://www.LWProtocols.org provides more information and additional links on this important topic. 7
Figure 6 . eCo architecture
Source CommerceNet eCo Project
The Dealer Extranet uses XML only for server-to-server messaging, as was described in the Case Study section.
The Software Exchange uses templates that are subdivided into logically distinct sections. Each section uses a parametric model to convert XML data into an HTML presentation format. This system enables presentation and workflow that is highly customizable for different roles. The presentation layer is structured as a set of components that expose internal XML interfaces.
The application uses the model-view-controller paradigm (see [Bur92]), with the model being the functionality layer, and separate view and controller (application flow) components. Model-view-controller remains effective in a system with XML data and a declarative development approach.
Previous Previous Table of Contents
Organizational impacts
We now consider how using XML for eBusiness has impacts on organizations. XML is leading to standardization of business information through common vocabularies and defined agreements. This new level of standardization of business information requires organizations to participate in standards efforts, including decisions about how to influence standards, who to work with, how to engage, and how aggressively to track emerging standards. These standards will be driven at the much faster pace of the Internet era. The lessons from dealing with emerging technology standards must now be applied to business.
When integrating Web services into a solution, the quality of service—including availability, performance, reliability, security, and accuracy—is crucial. It is important to select partners who can deliver on Service Level Agreements When first integrating partners, it is important to work collaboratively throughout the development cycle including shared definition of requirements, shared design interfaces (vocabulary and agreement), jointly agreed upon project milestones, and coordinating closely on testing and refinement throughout development. After successfully integrating with pilot partners, the process needs to shift to a standardized approach, making integration of additional partners simple. One best practice is creating “starter kits” for the most common technology environments of partners, which minimize complexity and effort for partners. Likewise, producing a standardized (and rigorous) compliance test suite is important.
The data modeling and development techniques enabled by XML also have an impact on the skill sets required for application. XML metadata allows better separation of tasks between display, business logic, and storage, and it allows targeting different display types. B2B solutions require more flexibility for customization, which require techniques and new design skills to use. Use of XML to model data requires new skills, and new tools, as does working with XML in databases and other data storage media; monitoring, tuning, and analyzing data stored in XML-specific forms is not the same as traditional relational information. Indeed, the extensibility and richness of XML data require the same kind of conceptual shift in data modelers as was required in shifting from procedural to object-oriented for application designers.
As XML and data-centric development evolve, there is a constant stream of innovative new components, tools, and standards. It is important to track these carefully and to determine their functional and technical qualities and to balance the value add against risk.
Previous Previous Table of Contents
Summary
XML is already having an impact on business, especially for B2B processes. It is a young, fast growing technology and those who use it face the challenges of tracking its rapid evolution and of dealing with overlapping and evolving business vocabularies. Of particular note is the challenge of allowing deep integration between systems in a standards-based manner, so businesses can connect and collaborative quickly without reliance on proprietary technology.
XML offers significant benefits for e-Business, and is growing to play a major role. Today, XML is mostly being used at the edge of architectures and intensively within pilot systems. XML promises increased benefits with more intensive use, but the final extent of XML use when it matures is still an open question.
Among the most important benefits of XML are standardized business data, increased data flexibility, and common tools for working with data. B2B applications require more customization, and integration with partner Web services, which will be major impetuses for adoption. XML will have significant impact on, and provide benefits for, application architectures, extended enterprise integration, and organizational skill sets. XML is an important enabling technology for the B2B revolution.
Previous Previous Table of Contents
Acknowledgements
Thanks to Alan Spencer, Scott Fleming, Steve Donelow, Jaikumar Nihalani, James Tauber, Mike Plusch, and Bill Pope for their collaboration, review, and contribution of ideas to the projects described. Thanks to Jim D’Augustine, Kelly Parr, and Alex Burdenko for reviewing this paper.
Previous Previous Table of Contents
Bibliography
[Ban92]F. Bancilhon, C. Delobel, and P. Kannellakis, Building an Object-Oriented System: The Story of O2, Morgan Kaufmann Publishers, Inc., San Mateo, CA (1992).
[Bou99]R. Bourret, “XML and Databases”, Technical University of Darmstadt, (1999).
[Bur92]S. Burbeck, “Applications Programming in Smalltalk-80(TM): How to use Model-View-Controller (MVC)”, http://st-www.cs.uiuc.edu/users/smarch/st-docs/mvc.html, (1992).
[Buck]L. Buck, “Modeling Relational Data in XML”, http://apps.xmlschema.com/white_papers/modeling.htm.
[Cla99]J. Clark, “XSLT In Perspective”, http://www.jclark.com/xml/xslt-talk.htm, (1999).
[Dod00]L. Dodds, “Databases, Querying, and the Document Object Model”, XML-Deviant, http://www.xml.com/pub/2000/04/05/deviant/index.html, (2000).
[Fow97]M. Folwer, Analysis Patterns, Addison WesleyLongman, Inc., Menlo Park, CA (1997).
[Hol99]G. K. Holman, “What’s the Big Deal with XSL”, XML.com, (http://www.xml.com/xml/pub/1999/04/holman/xsl.html), (1999).
[Kra99]A. Kramer, “FpML – Initial Design Tradeoffs”, XML Developer’s Conference, http://metalab.unc.edu/bosak/conf/xmldev99/kramer.htm, August 1999.
[Mar00]B. Martin, “Build distributed applications with Java and XML”, JavaWorld, http://www.javaworld.com/javaworld/jw-02-2000/jw-02-ssj-xml.html, February 2000.
[Meg98]D. Megginson, Structuring XML Documents, Prentice Hall Computer Books, (1998).
[Mik99]N. Mikula, K. Levey, “Schemas Take DTDs to the Next Level”, XML Magazine, http://www.xmlmag.com/upload/free/features/xml/1999/01win99/nmwin99/nmwin99.asp, Winter 1999/2000, (1999).
[Sri97]V. Srinivasan, D. T. Chang, &ldquo;Object persistence in object-oriented applications&rdquo;, IBM Systems Journal 36, No. 1, p. 66, http://www.research.ibm.com/journal/sj/361/srinivasan.html, (1997).
[Roz00]C. Rozwell, “XML and the Evolution of E-Commerce Standards”, Gartner Group Research Note Tactical Guidelines, 17 February 2000
[Stl99]S. St. Laurent, “Describing Your Data: DTDs and XML Schemas”, XML.com, (http://xml.com/pub/1999/12/dtd/index.html), (1999).
[Wal99]Joshua Walker, et. al, “A Rational Approach to XML”, The Forrester Report, April 1999
[XscBP]XMLschema.com’s Best Practices white papers (collection), http://apps.xmlschema.com/white_papers/index_best.htm
Previous Previous Table of Contents