|
Building an XML application
key management issues
|
 |
XML is a hot new technology for describing data. It is also being touted
as the technology that will enable the creation of the next-generation Internet
applications. XML holds the promise of exchanging data seamlessly and efficiently
between applications running on multiple platforms. However, it is difficult
for an IT manager to adopt a new technology like XML and build applications
successfully using it. Developing applications using a relatively new technology
like XML involve considerable amount of investment in time, effort, money
and manpower and an IT manager has to fully understand what the new technology
is, what it offers, its advantages, limitations, and future direction, before
making capital expenditure. This paper aims to dispel the myths surrounding
XML.
What is XML?
XML is the Extensible Markup Language. It is a subset of the Standard
Generalized Markup Language (SGML), a complex standard for describing structure
and content in documents. It is a markup (tag-based) language that is designed
to organize data rather than format it. It is also a meta-language - a language
for describing other languages. It lets you define your own customized markup
languages for different classes of documents.
XML is a project of the World Wide Web Consortium (W3C). The development
of the XML specification is done under the supervision of W3C's XML Working
Group. It is an open specification (non-proprietary) and the current specification
(version 1.0) was accepted by the W3C as a Recommendation on Feb 10, 1998.
A Recommendation by the W3C indicates that the specification is appropriate
for widespread use. But XML is still evolving with the addition of new features
and functionalities.
XML is more flexible than a fixed format markup language like HTML.
It adds context and gives meaning to data. In XML, you can define your own
custom tags that represent data logically.
Figure 1
shows a sample XML document.
<?xml version="1.0"?>
<employee>
<id>12345</id>
<firstname>John</firstname>
<lastname>Smith</lastname>
<jobtitle>CEO</jobtitle>
<address>
<street>123 Street</street>
<city>New York</city>
<state>NY</state>
<zip>12345</zip>
<country>USA</country>
</address>
</employee>
Figure 1
. Sample XML document
Figure 1 describes the personnel record of employee
"John Smith". Note that from this document, we can ascertain key relationships
about different items of data with regard to the whole "employee" entity.
This is also referred to as self-describing data because the tags describe
the information contained within.
The grammar of XML
A DTD (Document Type Definition) defines the grammar of an XML document.
It describes the markup (elements) available, where they may occur, and how
they all fit together. It is essentially a description of the legal structure
of an XML document.
Figure 2 shows a sample DTD for
the XML document specified in
Figure 1.
<!ELEMENT employee (id,firstname,lastname,jobtitle,address)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT jobtitle (#PCDATA)>
<!ELEMENT address (street,city,state,zip,country)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT country (#PCDATA)>
Figure 2
. Sample DTD
The XML standard does not insist on using DTDs, but using a DTD means
you can be certain that all documents, which belong to a particular type,
will be constructed and named in a conformant manner. It will also help to
ensure that XML documents adhere to relevant business rules if those rules
are embedded in a DTD. But DTDs should be carefully designed. They should
cover all possible document cases as well as allow for future enhancements.
This is key to deploying effective XML applications.
Important features
One of the important features that make XML extremely powerful and useful
is that it is a simple language. The rules for creating a markup language
in XML for encapsulating data are quite simple. For example, XML documents
are composed of simple tags marked by angle brackets with data stored in between
them as plain text. The tags almost always come in pairs and they can be nested
to multiple levels, as shown in
Figure 1. Similarly,
XML data is just ordinary text that isn't tied to any particular programming
language or platform. Standard text editors can be used to create and edit
XML documents and the XML markup usually makes sense to we humans also.
XML supports the Unicode standard, a character-encoding system that
supports all of the world's major languages. So virtually all the characters
that are used in the world are legal characters in XML. With software that
processes XML properly, this can be a huge benefit in developing web applications
that span across national and cultural boundaries.
XML documents essentially have a rooted tree structure and for many
applications, a tree data structure is powerful enough to represent complex
data. It is also easy to write software programs that manipulate tree-structured
data and this again goes to the heart of XML - its simplicity.
Key application areas
XML is not intended to replace HTML; instead it provides more flexible
document definition and processing capabilities. XML allows us to reformat
data to be displayed in multiple devices and platforms. Since XML separates
display instructions from content definition, a web site designer can alter
the look and feel of the site simply by applying a different style sheet to
the same XML document. More importantly, this allows using the same content
for other systems or devices such as PDAs (personal digital assistants) and
wireless devices that do not use HTML for their display processing.
XML excels in making on-line information search and retrieval fast and
efficient. This is because XML documents also store meta-information (i.e.,
information about information). By looking at the tags, we can determine what
each data is - whether it is the author's first name, last name, address and
so on. Search engines can use this feature to efficiently search and retrieve
documents. For example, it would be possible to process search queries like
"find all documents where author's last name is Smith", a decisive advantage
over HTML.
One of the hottest application areas of XML is messaging. XML enables
seamless and efficient transfer of data between applications. Since it is
text-based, it is easily understood by all platforms. It can be used as the
least common denominator for representing information. This makes it the perfect
medium for exchanging information between organizations or within an organization
in a platform independent way.
XML promises to dramatically improve the way companies exchange and
present information over the Internet. XML is gaining fast acceptance in the
development of next generation Business-to-Business (B2B) e-commerce applications.
XML can benefit e-commerce by enabling back-end systems to communicate business
transaction information. For example, business partners can standardize on
a specific XML syntax that describes a purchase order and can then automate
the transfer of that information across otherwise incompatible systems. XML
is the perfect choice in building these systems because data can be formatted
for exchange between business partners in an easy to process platform neutral
way. With B2B e-commerce expected to reach $1.3 trillion by year 2003 and
a staggering $7.3 trillion by year 2004, XML promises to be a key enabling
technology.
Getting started
XML is best suited for developing applications that exchange data between
systems or for building applications that offer different views of the same
data. Adopting a new technology like XML involves considerable amount of investment
in time, effort, money, manpower and other resources. A four-pronged approach
can be used to ease the adoption of XML into an organization's technology
mix.
- Provide XML education:
There is definitely a steep learning curve associated with learning
XML and related technologies. The first step is to get the development team
energized and quickly up to speed. XML articles, books, tutorials, white papers,
and case studies can greatly aid in understanding the technology and how it
can be used to solve real-life problems.
- Launch an XML project:
Launch an XML pilot project with reasonable project scope. A useful
project that adds value to the organization as well as provides solid learning
experience should be the criteria for selecting a project.
- Tabulate results:
Upon project completion, tabulate the results by identifying the lessons
learned, the costs incurred, and the benefits achieved. Take note of the problems
encountered, interoperability issues with existing systems, and how the issues
were resolved.
- Formulate strategy:
The IT manager should now be in a better position to accurately determine
how the organization can better leverage the technology. The manager can then
revise application development plans and incorporate XML as appropriate.
Using XML development tools
Use of XML development tools and application servers can significantly
aid in building XML based systems quickly and efficiently.
- XML tools:
There are a host of XML tools that are available from vendors and as
free open-source software. The most widely used XML tools are parsers, programs
that decode XML tags. Other useful tools include XML generators, XML document
editors, DTD editors, Stylesheet editors and formatters. These tools are available
in a wide variety of languages like Java, Perl, Tcl, and C++. Examples include,
among others, the popular IBM XML for Java parser. These tools are typically
quite robust and can significantly reduce application development time, effort
and cost. A word of caution about using free open-source tools - you are unlikely
to get any service and support.
- XML application servers:
These are middleware applications that automate the exchange of XML
data. They store and retrieve data from various sources, apply the appropriate
markup tags, and distribute it to applications. Care should be taken in selecting
an XML application server because the storage methods and capabilities of
these servers vary significantly between products from different vendors.
The XML server should preferably work well with your existing database and
web application servers. Software AG and Bluestone Software have recently
introduced XML application servers into the market.
Security
XML is expected to facilitate Internet based Business-to-Business (B2B)
messaging. But one of the biggest concerns in doing Internet B2B messaging
is security. Internet is a public network and messages can be stolen or modified
during transmission. XML by itself doesn't provide any security features.
One possible solution is to make use of cryptographic protocols such as SSL
to make the communication secure. Also, commercial products that can be used
to digitally sign, encrypt, verify and decrypt XML documents have started
to arrive in the market. One recent example is the X/Secure product from Baltimore
Technologies. W3C and IETF (Internet Engineering Task Force) are also working
on a digital signature standard for XML documents.
Another security issue arises when XML documents refer to resources
such as DTDs stored on external systems that are not adequately secured. A
hacker attack on these systems that makes a small change to these external
resources can cause havoc to XML processing. The easiest solution is to copy
necessary resources to local secure systems. But this reduces flexibility,
especially if the resources are being shared. These are critical issues that
an IT manager needs to be aware of when designing XML based systems.
Limitations and drawbacks
In spite of all the hype surrounding XML, it isn't the one stop solution
for all issues. It is hardly a good choice for building internal standalone
systems. It is also not the ideal choice when security and efficient low-level
communication are of critical importance.
XML is limited when it comes to the data types that it supports. It
is a text-based format and it doesn't have facilities for supporting binary
data or other complex data types such as multimedia data. It also lacks data
typing. Even though XML is excellent in validating the structure of a document
using DTDs, it doesn't check for errors in data contained within a document.
But this may soon change with the adoption of the XML Schema standard that
the W3C is currently working on.
Another limitation is the lack of XML support on the client side. Among
the popular web browsers, only Microsoft's Internet Explorer (version 5) offers
some support for displaying XML documents. Netscape's Communicator/Navigator
hardly offers any in-built XML support at all.
One of the major hindrances in adopting XML is the lack of standard
vocabularies or tag sets. There is no clear consensus yet on how key business
terms like customer or invoice
are defined within vertical or horizontal industry segments. For example,
one company may define an XML tag for purchase order
using customer name and account
number where as another company may just use the account
number. Information can get lost or be interpreted differently
when data is transmitted between these two companies. The problem is exacerbated
when data is exchanged between companies in different industries. Standard
XML vocabularies, at least for specific industries, will ensure that systems
can exchange data in a consistent manner. In fact, this is one of the most
important issues surrounding XML today.
A work in progress
XML is still a work in progress. New features and functionalities are
being added to it as well as new technologies are being developed around it.
A few of the important technologies and standards that are rapidly evolving
around XML include:
-
XHTML: Extensible Hypertext Markup Language
(XHTML) is a markup language written in XML. It is the result of rewriting
HTML (version 4.0) as an XML application and it creates a middle ground between
HTML and XML. It is a technology that will help broaden the number of devices
that can access information from the Web and increase the capabilities of
those that already do so, such as cellular phones, personal digital assistants
and other miniature devices.
-
XSL: Extensible Stylesheet Language (XSL)
allows applying formatting rules to XML documents. It can be used to specify
presentation format for XML documents (for example, font size). It can also
be used to transform XML documents into different formats like HTML, PDF or
even audio. For example, once an XML document is converted into HTML using
XSL, it can be viewed in any browser. In addition, XSL can transform an XML
document into another XML document.
-
XML Schema: An XML Schema essentially defines
the elements that can appear within an XML document along with its attributes.
It also defines the structure of the document - the parent and the child elements,
the number of child elements, the sequence in which the child elements can
appear, and whether an element can be empty or whether it can include text.
It can also define default values for attributes. It provides a more powerful
mechanism than DTDs for describing the structure of XML documents.
-
XPointer: XML Pointer Language (XPointer)
is a language that supports addressing into the internal structure of an XML
document. Essentially, it provides a mechanism to refer to elements, character
strings, selections, and other parts of an XML document.
-
XLink: XML Linking Language (XLink) specifies
constructs that may be inserted into XML documents to describe links between
objects. It can be used to describe the simple unidirectional hyperlinks of
today's HTML as well as more sophisticated bi-directional, multi-directional,
and typed links.
In addition to the core standards and technologies, use of XML as a
tool for data exchange hinges on developing standard definitions of key business
terms. Several industry initiatives are already under way to develop XML business
vocabularies. Following lists some of the on-going initiatives in this regard.
-
ebXML: The United Nations body for Trade
Facilitation and Electronic Business (UN/CEFACT) and the Organization for
the Advancement of Structured Information Standards (OASIS) have joined forces
to initiate a worldwide project to standardize XML business specifications.
They have established the Electronic Business XML Working Group (ebXML) to
develop a technical framework that will enable XML to be utilized in a consistent
manner for the exchange of all electronic business data.
-
FpML: The Financial products Markup Language,
based on XML, is a new initiative enabling e-commerce activities in the field
of financial derivatives. The development of the standard, controlled by market
participant firms, will allow the electronic integration of a range of services,
from electronic trading and confirmations to portfolio specification for risk
analysis.
-
HL7 initiative: Health level 7 (HL7) is a
standards organization serving the health-care industry. They are currently
working on an architecture based on XML for exchanging data between health
care organizations.
Most of these XML standards are in their early stages of development.
An intelligent approach is to move forward with XML projects, but at the same
time keep a careful watch on standards as they continue to evolve. It may
be necessary to support emerging vocabulary standards, but there are tools
available in the market to aid the transition.
Conclusion
XML is fast becoming the key language for an increasing number of new
applications. It is poised to fundamentally alter the way information is delivered
and used as well as enable the creation of new and powerful applications.
It has a lot going for it and it certainly looks like a key technology that
has the potential to shape the future, especially of the World Wide Web. A
good understanding of the technology along with its advantages, limitations
and future direction is the key to building applications successfully using
XML.
Acknowledgements
The authors would like to acknowledge the support they got from their
parents in writing this paper.
Bibliography
| [XMLS] | Information about XML standards - http://www.w3.org/XML |
| [XMLC] | Comprehensive information about XML and related technologies |
| [XMLB1] | E. R. Harold, XML Bible, IDG Books Worldwide, 1999. |
| [XMLB2] | H. Maruyama et al, XML and Java: Developing Web Applications,
Addison Wesley, 1999 |
| [XMLB3] | J. Bosak and T. Bray, "XML and the Second-Generation
Web", Scientific American, May 1999 |
| [XMLB4] | D. Megginson, Structuring XML Documents, Prentice Hall,
1998. |
| [XMLB5] | N. Bradley, The XML Companion, Prentice Hall, 1998 |
| [XMLB6] | F. Boumphrey et al, Professional XML Applications, Worx
Press Inc., 1998. |