Building an XML application
key management issues
Jaideep Roy
Anupama Ramanujan
Find


Abstract
XML is a hot new technology for describing data. It is also being touted as the technology that will enable the creation of the next-generation Internet applications. XML holds the promise of exchanging data seamlessly and efficiently between applications running on multiple platforms. However, it is difficult for an IT manager to adopt a new technology like XML and build applications successfully using it. Developing applications using a relatively new technology like XML involve considerable amount of investment in time, effort, money and manpower and an IT manager has to fully understand what the new technology is, what it offers, its advantages, limitations, and future direction, before making capital expenditure. This paper aims to dispel the myths surrounding XML.

Keywords

Contents
  1. What is XML?
  2. The grammar of XML
  3. Important features
  4. Key application areas
  5. Getting started
  6. Using XML development tools
  7. Security
  8. Limitations and drawbacks
  9. A work in progress
  10. Conclusion
  11. Acknowledgements
  12. Bibliography

What is XML?
XML is the Extensible Markup Language. It is a subset of the Standard Generalized Markup Language (SGML), a complex standard for describing structure and content in documents. It is a markup (tag-based) language that is designed to organize data rather than format it. It is also a meta-language - a language for describing other languages. It lets you define your own customized markup languages for different classes of documents.
XML is a project of the World Wide Web Consortium (W3C). The development of the XML specification is done under the supervision of W3C's XML Working Group. It is an open specification (non-proprietary) and the current specification (version 1.0) was accepted by the W3C as a Recommendation on Feb 10, 1998. A Recommendation by the W3C indicates that the specification is appropriate for widespread use. But XML is still evolving with the addition of new features and functionalities.
XML is more flexible than a fixed format markup language like HTML. It adds context and gives meaning to data. In XML, you can define your own custom tags that represent data logically. Figure 1 shows a sample XML document.
<?xml version="1.0"?>
<employee>
<id>12345</id>
<firstname>John</firstname>
<lastname>Smith</lastname>
<jobtitle>CEO</jobtitle>
<address>
<street>123 Street</street>
<city>New York</city>
<state>NY</state>
<zip>12345</zip>
<country>USA</country>
</address>
</employee>
Figure 1 . Sample XML document
Figure 1 describes the personnel record of employee "John Smith". Note that from this document, we can ascertain key relationships about different items of data with regard to the whole "employee" entity. This is also referred to as self-describing data because the tags describe the information contained within.
Previous Previous Table of Contents
The grammar of XML
A DTD (Document Type Definition) defines the grammar of an XML document. It describes the markup (elements) available, where they may occur, and how they all fit together. It is essentially a description of the legal structure of an XML document. Figure 2 shows a sample DTD for the XML document specified in Figure 1.
<!ELEMENT employee (id,firstname,lastname,jobtitle,address)>
<!ELEMENT id (#PCDATA)>
<!ELEMENT firstname (#PCDATA)>
<!ELEMENT lastname (#PCDATA)>
<!ELEMENT jobtitle (#PCDATA)>
<!ELEMENT address (street,city,state,zip,country)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
<!ELEMENT country (#PCDATA)>
Figure 2 . Sample DTD
The XML standard does not insist on using DTDs, but using a DTD means you can be certain that all documents, which belong to a particular type, will be constructed and named in a conformant manner. It will also help to ensure that XML documents adhere to relevant business rules if those rules are embedded in a DTD. But DTDs should be carefully designed. They should cover all possible document cases as well as allow for future enhancements. This is key to deploying effective XML applications.
Previous Previous Table of Contents
Important features
One of the important features that make XML extremely powerful and useful is that it is a simple language. The rules for creating a markup language in XML for encapsulating data are quite simple. For example, XML documents are composed of simple tags marked by angle brackets with data stored in between them as plain text. The tags almost always come in pairs and they can be nested to multiple levels, as shown in Figure 1. Similarly, XML data is just ordinary text that isn't tied to any particular programming language or platform. Standard text editors can be used to create and edit XML documents and the XML markup usually makes sense to we humans also.
XML supports the Unicode standard, a character-encoding system that supports all of the world's major languages. So virtually all the characters that are used in the world are legal characters in XML. With software that processes XML properly, this can be a huge benefit in developing web applications that span across national and cultural boundaries.
XML documents essentially have a rooted tree structure and for many applications, a tree data structure is powerful enough to represent complex data. It is also easy to write software programs that manipulate tree-structured data and this again goes to the heart of XML - its simplicity.
Previous Previous Table of Contents
Key application areas
XML is not intended to replace HTML; instead it provides more flexible document definition and processing capabilities. XML allows us to reformat data to be displayed in multiple devices and platforms. Since XML separates display instructions from content definition, a web site designer can alter the look and feel of the site simply by applying a different style sheet to the same XML document. More importantly, this allows using the same content for other systems or devices such as PDAs (personal digital assistants) and wireless devices that do not use HTML for their display processing.
XML excels in making on-line information search and retrieval fast and efficient. This is because XML documents also store meta-information (i.e., information about information). By looking at the tags, we can determine what each data is - whether it is the author's first name, last name, address and so on. Search engines can use this feature to efficiently search and retrieve documents. For example, it would be possible to process search queries like "find all documents where author's last name is Smith", a decisive advantage over HTML.
One of the hottest application areas of XML is messaging. XML enables seamless and efficient transfer of data between applications. Since it is text-based, it is easily understood by all platforms. It can be used as the least common denominator for representing information. This makes it the perfect medium for exchanging information between organizations or within an organization in a platform independent way.
XML promises to dramatically improve the way companies exchange and present information over the Internet. XML is gaining fast acceptance in the development of next generation Business-to-Business (B2B) e-commerce applications. XML can benefit e-commerce by enabling back-end systems to communicate business transaction information. For example, business partners can standardize on a specific XML syntax that describes a purchase order and can then automate the transfer of that information across otherwise incompatible systems. XML is the perfect choice in building these systems because data can be formatted for exchange between business partners in an easy to process platform neutral way. With B2B e-commerce expected to reach $1.3 trillion by year 2003 and a staggering $7.3 trillion by year 2004, XML promises to be a key enabling technology.
Previous Previous Table of Contents
Getting started
XML is best suited for developing applications that exchange data between systems or for building applications that offer different views of the same data. Adopting a new technology like XML involves considerable amount of investment in time, effort, money, manpower and other resources. A four-pronged approach can be used to ease the adoption of XML into an organization's technology mix.
Previous Previous Table of Contents
Using XML development tools
Use of XML development tools and application servers can significantly aid in building XML based systems quickly and efficiently.
Previous Previous Table of Contents
Security
XML is expected to facilitate Internet based Business-to-Business (B2B) messaging. But one of the biggest concerns in doing Internet B2B messaging is security. Internet is a public network and messages can be stolen or modified during transmission. XML by itself doesn't provide any security features. One possible solution is to make use of cryptographic protocols such as SSL to make the communication secure. Also, commercial products that can be used to digitally sign, encrypt, verify and decrypt XML documents have started to arrive in the market. One recent example is the X/Secure product from Baltimore Technologies. W3C and IETF (Internet Engineering Task Force) are also working on a digital signature standard for XML documents.
Another security issue arises when XML documents refer to resources such as DTDs stored on external systems that are not adequately secured. A hacker attack on these systems that makes a small change to these external resources can cause havoc to XML processing. The easiest solution is to copy necessary resources to local secure systems. But this reduces flexibility, especially if the resources are being shared. These are critical issues that an IT manager needs to be aware of when designing XML based systems.
Previous Previous Table of Contents
Limitations and drawbacks
In spite of all the hype surrounding XML, it isn't the one stop solution for all issues. It is hardly a good choice for building internal standalone systems. It is also not the ideal choice when security and efficient low-level communication are of critical importance.
XML is limited when it comes to the data types that it supports. It is a text-based format and it doesn't have facilities for supporting binary data or other complex data types such as multimedia data. It also lacks data typing. Even though XML is excellent in validating the structure of a document using DTDs, it doesn't check for errors in data contained within a document. But this may soon change with the adoption of the XML Schema standard that the W3C is currently working on.
Another limitation is the lack of XML support on the client side. Among the popular web browsers, only Microsoft's Internet Explorer (version 5) offers some support for displaying XML documents. Netscape's Communicator/Navigator hardly offers any in-built XML support at all.
One of the major hindrances in adopting XML is the lack of standard vocabularies or tag sets. There is no clear consensus yet on how key business terms like customer or invoice are defined within vertical or horizontal industry segments. For example, one company may define an XML tag for purchase order using customer name and account number where as another company may just use the account number. Information can get lost or be interpreted differently when data is transmitted between these two companies. The problem is exacerbated when data is exchanged between companies in different industries. Standard XML vocabularies, at least for specific industries, will ensure that systems can exchange data in a consistent manner. In fact, this is one of the most important issues surrounding XML today.
Previous Previous Table of Contents
A work in progress
XML is still a work in progress. New features and functionalities are being added to it as well as new technologies are being developed around it. A few of the important technologies and standards that are rapidly evolving around XML include:
In addition to the core standards and technologies, use of XML as a tool for data exchange hinges on developing standard definitions of key business terms. Several industry initiatives are already under way to develop XML business vocabularies. Following lists some of the on-going initiatives in this regard.
Most of these XML standards are in their early stages of development. An intelligent approach is to move forward with XML projects, but at the same time keep a careful watch on standards as they continue to evolve. It may be necessary to support emerging vocabulary standards, but there are tools available in the market to aid the transition.
Previous Previous Table of Contents
Conclusion
XML is fast becoming the key language for an increasing number of new applications. It is poised to fundamentally alter the way information is delivered and used as well as enable the creation of new and powerful applications. It has a lot going for it and it certainly looks like a key technology that has the potential to shape the future, especially of the World Wide Web. A good understanding of the technology along with its advantages, limitations and future direction is the key to building applications successfully using XML.
Previous Previous Table of Contents
Acknowledgements
The authors would like to acknowledge the support they got from their parents in writing this paper.
Previous Previous Table of Contents
Bibliography
[XMLS]Information about XML standards - http://www.w3.org/XML
[XMLC] Comprehensive information about XML and related technologies
[XMLB1]E. R. Harold, XML Bible, IDG Books Worldwide, 1999.
[XMLB2]H. Maruyama et al, XML and Java: Developing Web Applications, Addison Wesley, 1999
[XMLB3]J. Bosak and T. Bray, "XML and the Second-Generation Web", Scientific American, May 1999
[XMLB4]D. Megginson, Structuring XML Documents, Prentice Hall, 1998.
[XMLB5] N. Bradley, The XML Companion, Prentice Hall, 1998
[XMLB6] F. Boumphrey et al, Professional XML Applications, Worx Press Inc., 1998.
Previous Previous Table of Contents