|
The application of XSL for XML transformations in e-business solutions
|
 |
The Extensible Markup Language (“XML”) sets a standard for
the exchange of business data that is completely platform- and vendor-neutral.
XML is in increasingly wide use for web applications, especially for business-to-business
integration.
Because XML data comes in many forms, the most important technology
needed for XML applications is the ability to transform it from one for to
another (“vocabulary translation”), and to convert it to visible
renderings, for example in HTML or PDF documents.
The Extensible Stylesheet Language (“XSL”) specification,
from the W3C standards body, defines a powerful language to easily transform
XML data from one form to another. This paper introduces XSL and studies several
application scenarios that benefit from the use of XSL to solve real-world
e-business problems.
The tower of babel problem
The Extensible Markup Language standard (“XML”) is now two
years old, and a lot of progress has been made since the W3C “recommended”
the XML specification. XML.ORG, a registry and repository for XML vocabularies
overseen by OASIS, now has well over a hundred standard vocabularies for industry-specific
usage. With the ebXML initiative, standards common to all industries —
purchase orders, and the like — will begin to emerge. Still, compared
to the enormous potential of using XML for web-based applications, these are
still the early days.
Some might fear that a large number of vocabularies represent a fragmentation
in the standard. To the contrary, XML is intended as a meta-language for establishing
these vocabularies.
XML differs from HTML in that it describes the data, but not its presentation.
While XML can easily be understood by programmers and programs, we need to
be able display the data on web pages and page-oriented documents. To maximize
the flexibility of using this data, the presentation should be specified outside
of the XML document, for example using stylesheets to define its appearance.
We recognize that the unique business structures that give each company
its own competitive edge can be represented in private vocabularies. Companies
can organize their departments the same as individual enterprises, again with
vocabularies that reflect their way of doing business. But ultimately information
in the private definitions must be converted to a public standard for exchange
with other organizations.
We also expect that new versions of vocabularies, even with completely
different structures, are bound to replace the old as we learn better ways
to do business.
All of this points to a need for automatic conversion from one form
of XML to another, from XML to HTML, and from XML to completely different
presentation formats, such as PDF. What we need, then, is a general way to
accomplish mechanical translations from XML to all of these different forms.
The solution: XSL transformations
The Extensible Stylesheet Language specification, known as XSL, describes
powerful tools to accomplish the required transformation of XML data. XSL
consists of the XSLT language for transformation, and Formatting Objects (“FO’s”),
a vocabulary for describing the layout of documents. XSLT uses XPath, a separate
specification that describes a means of addressing XML documents and defining
simple queries. The XSLT and XPath 1.0 specifications are complete, having
become W3C “Recommendations” on November 16, 1999 (see
http://www.w3.org/Style/XSL/).
The XSL specification (which also describes Formatting Objects) is expected
to reach W3C recommended status soon.

There are now several implementations of processors for XSLT. In particular,
the Xalan project from Apache Software Foundation (see http://xml.apache.org)
is a robust and highly-compliant XSLT and XPath implementation. This tool
was donated to Apache by IBM; it was developed at Lotus Development Corporation,
an IBM company, by Scott Boag and his team. While Boag’s team continues
to develop Xalan, being part of Apache means Xalan will enjoy contributions
from individuals and other companies in the industry. With the XSLT specification
in place and with the release of Xalan 1.0 in March 2000, XSLT is now stable
and ready for real-world use.
The XSLT language offers powerful means of transforming XML documents
into other forms, producing XML, HTML, and other formats. It is capable of
sorting, selecting, numbering, and many other features. It operates by reading
a stylesheet, which consists of one or more templates, then matching the templates
as it visits the nodes of the XML document. The templates can be based on
names and patterns. Templates include literal text that is used in the resulting
transform, interspersed with directives to include specific data. This technique
thus defines transformations are declared “by example”, a simple
programming model.

XSLT is not a general-purpose programming language in the sense of Java
or C++. For example, symbolic “variables” cannot be reassigned
a new value, so they are really constant definitions. This limitation means
that counters and accumulators are not available. Java-like “for”
or “while” statements are also not available; instead, iteration
can be accomplished using recursion.
The limitations in the language definition are intended to support powerful
optimization techniques. XSLT supports an escape hatch to allow processing
more easily done in conventional languages: the extension mechanism can call
modules developed, for example, in Java.
The most important feature of XSL is the ability to develop transformations
quickly, with few lines of code. A transformation that could be developed
and tested in an hour might take days to write using Java, even when an off-the-shelf
XML parser like Xerces (again, contributed to Apache by IBM) is used. One
could write transformations in Perl, using XML4P to add XML parsing and DOM
access support, but for many transformations it would be faster to use XSL.
XSL is a very new technology, and as an industry we have only begun
to invent various uses for it. In the following sections, we see some of the
ways in which it is used in these days of its infancy. These are not intended
as design patterns or definitive approaches, but rather as examples of the
many ways in which it may be employed. The purpose in studying these approaches
is to stimulate your thinking for solving problems with XSL in ways that we
have not yet invented.
XSL application scenarios: rendering XML as HTML
XSL was originally developed with conversion of XML to HTML in mind,
hence “Stylesheet” is its middle name. In this role, XSL can be
run on the client, using a stylesheet either local to the client’s system,
or one stored as a resource on a server. Using XSL on the client allows processing
to be distributed to each client’s computer.
Most corporations find it more convenient to offload processing from
client workstations to server computers. This simplifies the task of upgrading
the power of an entire system; if more power is needed, the server can be
upgraded or supplemented with other servers, for scalability. An important
advantage to the use of servers is that applications can be upgraded in only
one place, on the server, rather than requiring a redeployment of application
software on many client machines.
XSL works well on a server. A common way to provide access is to use
Servlets which respond to a client’s request by starting XSL and returning
the resulting stream.

One can even imagine an architecture where it is used both on a server
and on the client. For example, the server might select records that match
a query, and “prune” parts of the tree that contain information
not needed by the client to reduce transmission latency. The client could
then run XSL locally to format the XML data according to the appearance they
want for viewing.
Recent studies have concluded that browsing will become a small part
of e-business in the coming years. They suggest that even though there are
uncountable web sites today, a larger use of the internet — some say
ten times as much — will be in the exchange of information in XML from
one server to another, in scenarios that do not include a browser. Thus, business-to-business
frequenly involves vocabulary translation, translating from one XML application
to another, rather than transformation of XML to HTML.
XSL application scenarios: transcoding
Translation on demand, whether to HTML, XML, or some other form, is
recognized as a common use case on an application server. A transcoding server
can be as simple as a servlet that can accept requests for a specified document
rendered for a specific device or XML vocabulary, and run XSL to produce and
return the result.
IBM has recently introduced a product called theWebSphere Transcoding
Publisher
http://www.software.ibm.com/developer/features/feat-transcoding.html
which automatically provides XSL translation on demand. It is capable of rendering
XML to several different forms. As such, it is the logical extension of the
server XSL transformation model discussed in the previous section.
Transcoding can be used to create HTML renderings, or PDF (via XSL Formatting
Objects and a formatting objects processor like James Tauber’s FOP on
Apache), thus supporting conventional desktop and laptop clients, as discussed
in the previous section.
It can also reformat the data to WML and other forms suitable for handheld
devices. Doing so often requires pruning the data to a simpler form, as well
as adapting it to the device requirements for handhelds. In the “Copernicus”
project, IBM used transcoding technology to build a system with SABRE’s
travel management system coupled to Nokia intelligent telephones. Information
from SABRE is transcoded to an appropriate form for the device and sent to
the device, at which point the mobile user can make changes to his itinerary
as required, using HTTPS to talk to specialized business objects on the server.
The flexibility of the transcoding technology allows the system to expand
to support many other types of handheld devices, even when they involve vocabularies
other than WML.
Finally, aside from converting XML to devices for direct client use,
the Transcoding Publisher can be used for automated vocabulary translation,
such as may be required for business-to-business transactions.
The major advantage of the transcoding server model is that it can start
with support for a few devices, then add stylesheets to support others as
the need arises. In addition to applications listed above, it could be used
to support traditional print media — newspapers, magazines, books —
as well as web publishing, or even the new e-books offline readers. It could
support a fax-on-demand system. Cars will eventually be able to connect to
the network, and transcoding can be set up to send information in the form
they require. As set-top boxes integrate our home entertainment system with
the home computer, transcoding will play a role.

IBM’s Transcoding Publisher runs the XSL processor from a servlet
used to handle requests. It also supports caching of transformed data, so
that multiple requests for the same transformation do not require running
XSL for each request.
XSL application scenarios: application integration
XML is being embraced by every major software vendor. The ability to
emit XML, and to incorporate data expressed in XML, is being added to most
software products where it makes sense.
Because XML is a common and portable data format that is, or will be,
available in these products, there is a tremendous opportunity to use XML
data to integrate software into a complete system. However, because the XML
data may be in a variety of vocabularies, we may need a quick and mechanical
means of converting it from the form we got it into the one we need.
We can imagine that a company’s internal structure might evolve
into a series of entities with well-defined interfaces, and XML vocabularies
that reflect their function. In this sense, the company’s structure
begins to resemble the structure of business-to-business between companies,
on a smaller scale.
In the diagram shown below, XML is the exchange medium between departments
of a company, and XSL is used to transform data from the private form favored
by a department into a form needed for processing in another.

The same model can be applied to the exchange of information between
companies.
XSL application scenarios: business integration
We are seeing a new trend in developing companies where one company
specializes in one aspect of a complete business cycle. Such companies optimize
their processes to be cost-effective. Since on their own they may not be able
to provide certain products or services, they may seek complimentary products
or services from other small companies, together offering the complete product
or service required by their consumers. This arrangement ranges might be one-time
partnership, or may exist in the long term. For all intents and purposes,
a “soft merger” of this type begins to look like a “virtual
company”. Indeed, the virtual company may have a name different from
the partners involved in creating the service or product.

In the new economy, this kind of business
aggregation requires the ability to respond quickly to a new opportunity.
When companies expose their services and products as processes represented
in XML, it is possible to use XSL with not much programming to assemble an
operating e-business from the partners’ component systems. Such companies
can be described as “integration-ready”. Prior to the standardization
of XML and XSL, building virtual companies from partners could take a long
time — days, weeks, months — to configure middleware to work together,
to write the required business logic. While XML and XSL does not eliminate
these requirements, it does provide a quickly implemented and efficient means
of aggregating the partners’ business data.
In most cases there is no requirement that a company be involved in
only one such partnership. One could easily imagine a company that specializes
in, say, warehousing and fulfillment, providing the same service to a large
number of partnerships. The picture above shows a company participating in
two “virtual companies”.
XSL application scenarios: portals
Portals like “myYahoo” are familiar to many web users. They
allow the client to design a custom home page with live, updated information
according to the user’s wishes. “MyYahoo” allows the user
to request an up-to-date weather forecast for their area, current stock prices
they want to watch, news headlines, and the like, gathering data from many
sources. This information is combined it into a single web page that has different
parts of the screen allocated to presenting each part of the customized report.
This model can also benefit a business worker. Suppose a clerk is employed
to manage the supply of a particular line of parts needed for his company’s
manufacturing process. A portal could be designed to display prices or availability
for certain critical components from various vendors. Information from the
company’s ERP system, such as inventory and forecasted demand, can be
incorporated on the same page. The similarity with the “myYahoo”
type portal is the ability to gather data from a variety of resources, select
according to a user’s profile, and format the data for a particular
screen.

When the sources of such data can provide it in XML, XSL can be used
to automate the transformation required for portals. One can imagine sending
HTML streams to sub-objects on the browser as a means of managing regions
for display.
XSL application scenarios: code generation
In all of the examples above, XML is treated as data to be converted
from one form to another, either for consumption by a client or by another
server. Yet another way of using XML is to generate procedural code based
on specifications described by XML data.
For example, IBM has recently announced the submission to XML.ORG of
a technology called “Trading Partner Agreements Markup Language”,
or tpaML, for consideration as a standard
http://www.software.ibm.com/developer/library/tpaml.html.
Trading Partner Agreements used to be a paper document created by the lawyers
of two potential business partners. IBM recognizes the value of coding such
agreements electronically, so that the terms of the agreement can be implemented
as software. This is especially helpful for the new aspects of starting e-business
with another company, such as the technical details needed to configure the
middleware servers of each partner to begin the conversation.
A filled-out tpaML document can be interpreted by a program which generates
Java code to configure the middleware on each side automatically. We refer
to the source as an “executable document”. They can be produced
using XSL, or it may be more practical to use Java business logic mixed with
DOM traversal.
Of course, the software products on both sides would need to support
tpaML, but they need not be the same product. Generating Java code for configuration
of the servers provides a rapid way of getting set up for e-business transactions.
The alternative involves manual configuration of the software, possibly writing
additional code, a process which could take weeks or even months.
The sections above list just a few application categories where XSL
can be gainfully employed; we expect many other usages to emerge as the technology
is embraced by creative developers around the world.
Limits of mechanical translation
XSL can solve many problems by translating XML mechanically. However,
it is just one tool, and it won’t address every need for changing XML
documents.
The language itself is not intended as a general-purpose programming.
Unlike Java or C++, for example, variables can be set only once; they are
really more like symbolic constants in that respect. They cannot be incremented,
so loop counting is not possible. If there is a need to parse a “lastname,
firstname” string into separate components, it can be done in XSL, but
not easily. Such situations may call for the use of extensions plugged into
XSL. With the Java version of Xalan, Java classes can be used to extend the
power of XSL processor.
Mechanical translation must be done with care. When converting from
one vocabulary to another, it is important to consider the meaning of the
data between tags, not just the tag name. Even with a common tag name like
<name>, we cannot be sure what the name means - customer name?
company name? or something else?
In addition to the meaning of the data, the format of the data must
be understood. When combining listings from two catalogs of electronic parts,
for example, the specifications of particular components must be expressed
in a similar standard. The working voltage of a capacitor could be expressed
as a fixed value, a range, or a fixed value with a percent tolerance. The
application which eventually consumes such data may understand only one form.
Both of these problems are best addressed by having the vocabularies
be very well defined and agreed between companies. XML.ORG oversees the definition
and development of such vocabularies within an industry, and it is important
that the specifications reflect the input of all companies that will be using
the vocabulary for e-business.
Conclusions
XSL is a powerful transformation facility that provides mechanical translation
of XML documents from one form to another. It can convert to HTML, another
XML vocabulary, or text which is not XML at all. Many transformations can
be designed using only an XSL processor, and it is possible to add extensions
to the processor to support particular requirements that are not easy using
only XSL.
We have studied several scenarios where XSL has a role. We recognize
that these initial ideas about using XSL represent solutions to certain problems
we see today, but that XSL can be used in many ways that have yet to be invented.
Finally, XSL by itself cannot address all incompatibilities between
XML documents. When vocabularies are not well defined, either by the exact
meaning of a tag or the exact format of the data associated with it, mechanical
translation will not solve the problem. This underscores the importance of
developing well-defined standard vocabularies for e-business usage under the
auspices of a neutral standards organization such as XML.ORG.