|
The state of XML
|
 |
This paper examines the current state of XML, from standards initiatives
to commercial tool support. From its catalytic effect in electronic business
to its long term influence on the future of the Web, XML is having a radical
impact on the world of computing. Yet we are still at the beginning of a long
and exciting journey.
Introduction
Writing about "the state of XML" is an ambitious undertaking, and one
in which I am almost doomed to fail. Happily, the reason for this is the widespread
success and adoption of XML, such that you can find it almost anywhere you
care to look right now. As well as the obvious data exchange and publishing
applications, new uses of XML, from Internet messaging clients to GUI design
programs, are popping up everywhere.
I'm delighted to be able to say I'm also utterly overwhelmed by some
of the new inventions and techniques in the XML world. At the XTech 2000 conference
in San Jose earlier this year, the quality and quantity of innovation was
outstanding. Each day of presentations was at the same time completely fascinating
and completely exhausting!
As editor of XML.com, I'm all too aware of increasingly widespread nature
of XML applications. Receiving press releases on subjects ranging over anything
from healthcare to legal matters, I feel hard-pushed to be as versatile as
XML is! Tim Bray is fond of the aphorism that "XML is the new ASCII". Soon
XML will be everywhere. Bray foresees the doom of conferences like XML Europe,
where people gather to talk about new ways of handling XML. Well, yes, and
no. Later in this paper I'll observe that some areas of XML, once solved,
do indeed become dull. It is my contention, however, that with XML lies the
long-term future of the Web itself: that's a topic that won't run out of steam!
In this paper I will attempt to provide a "long view" on XML, taking
in where XML is now, and where it's going.
XML standards
It is most appropriate to begin any review of XML with a look at standards.
XML and web standards are inextricably linked. Interoperability of data -
XML's core strength - requires that we all agree and implement certain things.
Vendors have a responsibility to their users to implement standards, and users
have a responsibility to demand such implementations.
Yet the standards bodies themselves also have a large responsibility
to address their activities to the right area: to solve the correct problems,
and to solve them in a way that vendors and programmers can readily adopt.
The
W3C introduced last year the phase
of "Candidate Recommendation" into its standards development process. This
means that there is a mandatory period of implementation time, where feedback
is solicited from developers implementing a standard. A move which was long
overdue, but should ensure that fewer retrofits are needed, and it is some
insurance against a standard withering and dying for want of tools.
XPath and XSLT
XSLT has undeniably been the W3C's success story of the last year. Reinforcing
the need for the Candidate Recommendation phase, XSLT's success rides partly
on the back of several quality XSLT processor implementations from James Clark,
Michael Kay and Lotus. These garnered both grassroots support for XSLT, and
feedback as the standard progressed. It is interesting to note that XSLT processors
were developed more or less in parallel with the XSLT Working Draft - whether
it would have progressed in the same fashion had implementation been done
when the specification was less malleable, as with Candidate Recommendation
phase, is uncertain.
One must also applaud Microsoft for their support of XSLT. Although
it is widely regarded as unfortunate that IE5.0's XSLT implementation is somewhat
non-standard, by providing a tool for getting instant utility out of XSLT,
Microsoft made a definite contribution to the adoption of the standard.
As a technology XSLT has shown itself useful for far more than just
straight transformations. Two inventions in particular have caught my attention.
One of these is Rick Jelliffe's Schematron
1. The Schematron is an XSLT-based tool for validating and constraining
XML, and is an alternative to using DTDs or XML Schemas. In Jelliffe's words,
"Schematron rejects the idea that the result of validation is a binary valid/invalid...
Schematron puts natural language descriptions on an equal footing to machine-usable
expressions. Diagnosis is just as important as prescription". Utilizing the
power of XPath, Schematron allows users to write constraints like "if there
are 3 'foo' elements, there must be at least 2 'bar' elements. It allows usage
recommendations to be built into the schema, rather than simple valid/invalid
constraints.
The second innovative use of XSLT that caught my eye was a project at
Sun, presented at XTech 2000 earlier this year by Jacek Ambroziak. It uses
XSLT to drive the indexing process for documentation. The stylesheets perform
such functions as selecting elements to index or ignore, assigning tokenizers
to process text content of elements and computing metadata to be stored in
the index.
XML Schemas
The road forward has been less than easy for the XML Schemas specification.
Its completion has been eagerly awaited by many XML developers. Accompanying
the clamour was the concern that those who had proceeded with their own schema
technologies (Microsoft, Commerce One, etc.) needed to adopt the new W3C XML
Schemas. Unfortunately, the need to please everyone, and the sheer complexity
of the task, has delayed the delivery of the specification. The spec itself
has not been without its detractors, having been described by some as "monstrously
complex", and criticized by others as omitting required features. Whether
the schema Working Group has effectively reached the "80/20" point with the
specification, as they claim, will doubtless be discovered during the Candidate
Recommendation phase.
More than any other XML technology so far, XML Schemas will depend on
the volume of available tool support for their adoption. Authoring the various
constraints in a schema is not a trivial task. Other schema initiatives, such
as the above-mentioned Schematron, and Murata Makoto's "RELAX" schema language,
may well gain ground with those who feel no need to utilize (or memorize)
the full depth of XML Schemas. Nevertheless, the XML Schema specification
is an important one for establishing the "contracts" used in machine-to-machine
XML communication: it delivers the necessary tools to those using strongly
typed languages and wishing to use XML for data exchange.
SVG and XHTML
As the Web has grown older, the rate of progress in user-interface and
presentation technology has gradually slowed further. One can attribute this
to various reasons: the increasing spread of the Web to non-technical folk
who are reluctant to upgrade browsers, and the cooling of the "browser wars"
as Internet Explorer gained dominance. Things reached a point where people
wanted to concentrate on selling things over the Web rather than furthering
the browser technology itself. Happily, things are set to hot up again on
that front, with the advent of the Mozilla browser.
Two new technologies this year are set to change the face of web browsing.
The first of these is SVG, the Structured Vector Graphics standard. Widely
acclaimed as one of the W3C's success stories, SVG already has multiple implementations
in the form of browser plug-ins. Its integration with the DOM and Javascript
means that it is in a position to radically improve the toolset available
for web designers to convey information. One note of warning: what will inevitably
happen is that people will script user interface elements (e.g. buttons, maybe
even windows) with SVG to make up for the continuingly woeful lack of UI elements
in HTML. Such a move would be unfortunate, and to the long-term disadvantage
of both the user and developer. There is a W3C initiative underway, called
XForms, to enhance the UI facilities of browsers, but this looks like a long
way off. Perhaps the best bet for now is XUL, the XML-based user interface
language built into the Mozilla browser.
XHTML, the reformulation of HTML 4 into XML, paves the way for more
XML on the web. While not an especially large step forward from the user's
point of view, bringing XHTML support into Web browsers sets the scene for
the transfer of web markup from HTML to XML. The formulation in XML also enables
XML document authors to import XHTML semantics by using namespaces, rather
than continually reinventing the <p> tag, for instance.
XLink and RDF
XLink is the final missing piece for using XML on the Web, supplying
as it does the componentry to turn XML documents into hypertext. Within its
relatively simple specification lies a whole host of questions. From the issues
of implementation through to the intellectual property and legal ramifications.
XLink could make Napster look like child's play.
Within RDF, frequently maligned and misunderstood, lurks some of the
most exciting possibilities for the Web. Ubiquitous machine-readable metadata
all over the web presents great possibilities. There is not room here to present
a detailed vision for RDF. I will say, however, that for RDF as well as XLink
tool support is probably the critical factor in its long-term success. Most
likely that tool support needs to come from the web browser, or client software
that is in as regular use (e.g. "Outlook" or its equivalents). RDF is one
of those technologies that sets the mind racing with ideas of exciting implementations.
However, for it to be reality, somebody needs to write the "killer" application
for it.
OASIS and vertical consortia
OASIS's flagship effort is undoubtedly ebXML. With ebXML, OASIS is working
with the United Nations to produce standard vocabularies and mechanisms for
the establishment and conduct of electronic commerce with XML. Involvement
here is from the big vendors such as IBM and Sun. Microsoft has pledged support
for ebXML, but meanwhile continues to implement its own BizTalk technologies.
One suspects that the constituency for BizTalk is likely to be larger than
that for ebXML and, at the very least, tomorrow's eBusiness servers must be
compatible with both initiatives.
This year will be a testing time for OASIS in its relationship with
the grassroots XML community. Having introduced personal memberships, it provides
a forum for support for more grassroots initiatives, as well as those that
are outside of the scope of the W3C. OASIS's remit is as a "standards organization",
that is, they will aid standardization of existing technology, rather than
as an institution like the W3C, which also invents the technology they standardize.
It is by no means certain at the time of writing that OASIS has won the confidence
of the grassroots community: suggestions that OASIS should be the guardian
for SAX met a decidedly mixed response earlier in the year. As far as OASIS
and the XML community are concerned, one might ask whether there is any appeal
for the small developer in unpaid positions on committees. Despite this, Jon
Bosak, the "Father of XML", and several other respected XML developers, believe
strongly in OASIS participation, a significant enough reason for most to consider
OASIS. The contribution of OASIS in hosting the XML-DEV mailing list shows
their commitment to the community.
The number of vertical consortia focusing on implementing XML in specific
markets has exploded over the last year. Yet it is by no means clear that
members of the same vertical industry are even aware of these efforts, despite
several notable successes. Just because standards work is going on in a particular
area, it does not mean that it will be implemented. Many companies will just
plough straight ahead with their own requirements. For a lot of companies,
where 100% integration isn't business critical, this is probably the most
pragmatic route to take. Committees and standardization can take many months,
frequently years, and business cannot wait that long. The beauty of XML is
that it allows companies to do that, and for translation to interchange standards
at a later date. The core constraint is that a company must ensure that it
still retains all the information items which might be needed at a later date.
That problem is nothing new to XML, however.
XML has a tendency to place a magnifying glass on whatever you're doing.
Bringing your data out into a readily readable format that encourages structure
will highlight the weaknesses of your information infrastructure. Perhaps
the biggest challenge of XML usage inside an organization is the focus it
brings to the way you gather, structure and store your information. The problems
there need solving before you even touch XML.
General comments on standards
XML is still too young for us to draw general conclusions about which
standards are effective and which aren't. It has been said often that XML
itself came in "Fast and low and under the radar", rather than being conceived
by committee. Groups working on XML technologies through the W3C work via
committees, and probably face more procedural challenges than the original
XML 1.0 Working Group. Happily, we've not seen any XML technology go down
in flames yet (or, more likely, die quietly) but it is not inconceivable that
some future recommendations may go by the wayside.
Currently, the W3C's problem seems to be the reverse. There is demand
for standardization on subjects such as XML Protocols and XML Packaging that
is not being met. It was remarked earlier in the year that although many businesses
are embracing and building upon XML, the core number of developers at the
heart of XML doesn't seem to have increased. There are some companies that
need to pay back to XML by contributing resource towards W3C activities.
One factor having a major bearing on the success of standards is tool
support. The most finely honed and thoroughly agreed business interchange
vocabulary doesn't mean a thing if there's no software to use it with.
XML tools
The last year has seen a lot of growth in commercial XML support. It
is an interesting time, as we're only just starting to figure out what we
want to do with XML. One relatively stable product category is that of SGML
repositories and content management systems that have adapted to XML. Perhaps
the most interesting change in the industry so far is that effected by eXcelon
Corp, formerly Object Design, who have migrated their whole business from
object databases to XML storage.
However, as most of us still haven't figured out what kind of XML-based
products we require, many so-called "XML products" are simply relaunches of
existing products with simple XML compatibility. While this is good news,
it's not a sign of much progress: import/export facilities are relatively
simple features. XML has an accompanying market hype which causes companies
to get their XML product and press releases out fast, despite the fact that
often, to quote Elliotte Rusty Harold, "there's no there there"! I'm heartily
sick of 95% of the press releases I receive, containing unremarkable news
about a product that they don't even define to the point where I can tell
what it does.
XML is delivering today for the marketers. We are at the beginning of
a journey in terms of seeing commercial XML products deliver value for systems
developers. There are many wrinkles to iron out, and many production situations
to test. Interoperability between tools, despite being a key promise of XML,
is not yet a reality. At the moment, even though you use XML, you still have
to go for a "platform" decision.
XML databases and application servers
Support for XML in relational databases is advancing in leaps and bounds,
with Oracle in particular making an excellent contribution in this area.
There is also an emerging class of "XML servers". In the general case,
I am unsure as to whether this kind of product is a good idea or not. Clearly,
if you are storing documents then an XML-aware repository, it makes sense.
I detect, however, a missionary enthusiasm in some to use XML everywhere.
What's wrong with that? Well, I applaud XML everywhere outside your application,
but not necessarily everywhere inside your application. It may make a lot
more sense to keep your lean, business-specific, data structures and databases,
and use XML purely for externalization and interchange, than to completely
replace your structures with DOMs and your databases with XML servers. Several
developers have complained to me of this kind of XML-misuse, and have had
to forcibly remove XML from the internals of some projects.
A sure win for XML though is the growing breed of Enterprise Integration
applications, whose main purpose in life is a glorified import/export filter,
bring disparate data together. Both in the Intranet and e-business exchange
situation, this is an application of XML that delivers value today.
The farthest I will commit myself on commercial XML support, and XML
servers in particular, would be to say that the game is hardly played out
yet. There is plenty of time to see what will sink and what will swim. My
advice is to carefully consider changes to infrastructure and big platform
decisions. XML-awareness alone, as we know, is no silver bullet.
XML browsers
By the end of this year, we can look forward to widely-available cross-platform
XML browsing support. This is an important step forward to a web full of XML.
The open-source Mozilla browser, although distinctly laggardly in its
development, is finally coming to fruition. It is notable for its excellent
support of W3C standards: so much so in fact, that you can use W3C Recommendations
as developer documentation.
Microsoft's Internet Explorer, although making huge leaps forward in
XML support last year, is currently lagging a little in its support for W3C
standards. The exception here is IE5 for the Macintosh, whose attention to
detail in implementing CSS is commendable. It is clear, though, that Microsoft's
core commitment is to its customers. Clearly that isn't the same as commitment
to an open platform for web authoring. For this reason, it is vital that diversity
in web browsers continues.
Opera 4.0 now contains XML and CSS support, a welcome move that means
all mainstream web browsers now support direct display of XML.
For all the browsers though, display isn't enough, implementation of
the XML DOM is what counts if we are to get true browser-based applications.
On this score Mozilla alone currently delivers.
There still remain many open questions about interoperability: whether
it's possible to create cross-platform web applications using W3C technology
that will run identically on every web browser. From XML's perspective, this
support is a necessary precondition to filling the Web with XML - however
exciting the prospect of a "Semantic Web" is, it won't gain broad support
until tools are available for the widespread user creation and manipulation
of XML.
XML community
One of XML's crowning glories is its community. From the beginning,
XML has been supported by a lot of generous spirited, intelligent developers.
In the same way that today's Internet companies stand on the shoulders of
free software giants, who wrote programs such as "BIND" and "sendmail", companies
making money out of XML benefit from the great work achieved by such people
as James Clark and Tim Bray.
Today the XML community is growing and diversifying. We're seeing more
work done in application and language specific areas. In particular, the Python
and Perl XML communities are strong. New ideas about XML processing are coming
out of these groups and into the general arena of interest.
Conclusions
One of the most remarkable things about XML has been its social effects.
It has sparked new co-operation within industries, and changes in the way
information is distributed on the Web. Those effects are worthy of a paper
in themselves.
Although many feel they've been working on XML for a long time, practically
speaking we're still at the beginning of the story. Tim Berners-Lee, in his
book "Weaving the Web", suggested that Amazon, AOL and others are the merely
the background for the Web: the Web will move on, fuelled by these "incidental
economies". In a similar way it is my view that the B2B world and its associated
alphabet soup forms the next generation background for the Web and XML, but
the technology will move on. Once you can exchange business info with XML
(and it isn't that hard), then XML ceases to become interesting in that sphere:
much in line with Tim Bray's XML/ASCII comparison. The ultimate future is
rooted more deeply in technologies such as XLink and RDF.
We are only just beginning to imagine the possibilities of a Web full
of XML. Today's business models won't work for that future. There's got to
be a lot of invention and hard work yet.