XML's Promise: Delivering Customized Information Everywhere
ABSTRACT
The promise of the Web is to make information available to everyone. The challenge companies face is how to best utilize the Web to deliver customize"d information to customers at a time and place of their choice in the format they prefer. The solution to that challenge is XML. However, there are many paths a company can choose to implement XML. This session will explore best and worst case practices on how to implement an XML solution to stay ahead of the fast-paced internet technology. When done correctly, you'll have satisfied customers, repeat business and higher revenues.
Table of Contents
Most enterprises today find themselves grappling with the fundamental problem of delivering information to multiple types of media – Web, print, CD-ROM, wireless devices. At one extreme, however, some leading organizations not only successfully accommodate all media types, but also fully exploit the capabilities of each medium while delivering tailored information to each audience.
The benefits include not only achieving competitive advantage through superior sharing of business-critical information, but also significantly reduced costs in creating and distributing this information. In these times, the potential for increased revenues and reduced costs should prove nearly irresistible for any company looking for an edge.
This paper explores the opportunities and challenges of delivering meaningful, relevant information to all media that information consumers require: Web, print, CD-ROM and wireless devices. In addition, this paper proposes the key elements of a solution architecture that enables such a solution.
1. The "Content Gap"
To understand the future, it's important first to understand the past. This section summarizes several critical business problems, taken from real-world experience, and discusses the common problems at work.
-
A multi-divisional company buys from its competitors instead of from itself because they can't provide, even to themselves, critical product information required by the ultimate customer; if their most highly motivated buyers turn elsewhere, how often do ordinary customers do the same?
-
Each of four divisions in a major corporation looks like a different company – and the CEO says that must stop. How can entities that barely interact with each other act as one to their customers to enable up-selling and cross-selling opportunities?
-
A company selling commoditized and mostly undifferentiated products took over market leadership by focusing on service. They succeeded in providing a vastly superior product selection and self-service experience to their customers, not only on their Web site but also through every medium with which they interact with their customers (print and CD-ROM). And while their differentiation was sparse, they also succeeded in communicating their differentiation, improving the perception of extra value delivered.
-
With product information highly fragmented, geographically dispersed and manually updated, a company's Australian division failed to update its product catalog to remove an obsolete product and replace it with an improved version that sells for a higher price. The error was caught by order fulfillment, so the company had to sell new product at the old price to the entire pipeline of existing orders.
-
According to Roper Starch Worldwide, "On average, Web-rage is uncaged after twelve minutes of fruitless searching, although about seven percent say ire starts rising within three minutes. The main culprit: All that information--overwhelming at times--which is actually driving some people offline and back to telephoning customer service or other information resources from the pre-cyber generation."
What's going on here? The common problem across all these examples relates to each organization's inability to share information that is accurate, consistent, complete, fresh, and easy to use. We refer to that inability as a "Content Gap" – it's a "gap" because the information typically exists somewhere within the organization; the challenge is in sharing it with others.
The Content Gap exists because in most organizations, the processes to create, assemble and deliver information cannot meet today's needs and expectations. Ten years ago, the typical organization's information process supported print output reasonably well. As additional media became mainstream – CD-ROM initially, followed by the vastly greater requirement for Web content – these organizations added on to their existing processes.
Growing processes to support additional types of media led to both information and process fragmentation, where the content for each medium exists in a separate format that must be separately maintained. Everyone responsible for making content available on multiple types of media is familiar with this problem.
Maintenance and updates are especially difficult because changes to the information are costly, slow and manual. And what's the business impact of failing to make a change?
-
Could list an obsolete product but continue to take orders for it
-
Could fail to update a warning that results in personal injury or business loss
-
Could ultimately become regarded as a less reliable source of information
2. Current Challenges
To understand the fundamental issues at stake, there are three key aspects:
-
Tools — In most cases, information creation tools are considered sacrosanct: users have their favorites and they're unwilling to change. As a result, downstream processes must accommodate the limitations of current tools and carry the entire burden. In fact, different tools are sold for different problems (e.g., Microsoft sells Word, Publisher and FrontPage), and while each of these is a wonderful tool for a specific application, none of them is tuned for filling in the Content Gap.
-
Processes — Achieving a different outcome requires a different process. Force fitting existing tools and practices into a new process doesn't work – to gain an outcome suited to your business, you must build a process that's tuned to your business. By definition, out-of-the-box tools that support an out-of-the-box process cannot meet a business's unique needs.
-
People — desktop tools give their users the ability not only to create information, but also to control how that information looks. And people love to polish the look of what they produce, despite the fact that the value added in such polishing is extremely low relative to the value of the information they produce. Continuing to use existing desktop tools can work, but only if people are willing to change how they use those tools. They must follow a different process to achieve a different outcome, but that requires breaking habits that they really enjoy.
3. Solutions
Over the last few years, content management systems have gained enormous attention, primarily because of their capability to deal with two key problems:
-
Information fragmentation — content management systems provide a mechanism for centralizing control over information that is otherwise spread out across each individual user's desktop
-
Workflow — because even a single change ripples through many different people – for example, one change may affect many different documents on multiple types of media – content management systems provide sophisticated workflow controls to coordinate all of the people involved in the process
Content management systems do not deal with overlapping information formats (e.g., Web, print, CD-ROM), nor do they fully automate the process of delivering information to all of those formats. Further improvements can be gained through what we call a "Single Source Publishing Architecture (sm)," (SSPA) which eliminates overlapping data formats and fully automates the publishing process.
The key elements of the SSPA include:
-
Common approach to publishing on multiple media types – while we once thought that this challenge would be largely addressed through tools built on a single stylesheet standard, development of such a standard has proven elusive. The World Wide Web Consortium (W3C) has come close with XSLT and XSL-FO, similar languages with important differences, so tools are emerging that support both through a single interface.
-
Fully automated publishing to multiple media – with true "lights out" publishing that produces high-quality results that fully exploit the capabilities of each medium, new and changed content can be deployed instantly, and at the lowest possible cost.
-
Conversion of existing content formats – even though the lowest-cost process is to create information directly in XML in the first place, it's important to ease adoption by allowing existing formats to co-exist. By supporting the conversion of this information into reusable modules of XML, an SSPA maximizes the use of existing information.
-
Content processing – automation of the publishing process often involves the creation of additional processing functions in support of each organization's unique business process. Any system that offers a single-source approach should offer a standards-based API (i.e., the W3C's DOM) that's accessible by a variety of standard languages such as C, C++, Java, Visual Basic, and others.
-
Data integration – because XML can not only express narrative information but also tabular data, an SSPA must provide hooks into relational databases so that prices, part numbers, customer information and other data may be merged into a document in order to meet specific, customized requirements.
4. Role of XML
By deploying a SSPA, XML goes beyond the capabilities of content management systems to enable the creation of a "single source of truth" – i.e., the elimination of all information redundancy in a standards-based, media-neutral form that enables all versions and forms of information to change with a single click.
The following paragraphs describe the key benefits that XML provides in support of delivering customized content:
-
Precise personalization — you could add attributes to every tag that allow you to specify the target audience for the content that the tags enclose. For example, consider the following:
<para security="employee">Price is $4.00.</para>
For the example above, you could set up an application so that the contents of the <para> tag appear only if the user is an employee.
-
Separation of content, format and code — One of XML's most important capabilities is its separation of content from formatting and code. This means that XML separates content from embedded instructions (i.e., metadata) that describe how to display or print that content. This also means that XML separates content from scripting or programming code that can add other behaviors to that code.
To appreciate the power inherent in this separation, it's helpful to understand traditional content file formats. Word processing and desktop publishing file formats embed within the content a description of how that content should appear. For example, a word processing program may embed with this paragraph formatting instructions such as "11pt. Times Roman," which means that this paragraph should be displayed with a font size of 11 points in the Times Roman font.
The problem with embedding formatting within the content is that in order to change the formatting, you have to change the content. This inhibits the use of the same content within different documents and on different media. Because XML keeps the formatting information separate from the content, the content itself is independent of any particular medium. (See the next bullet item, "Media independence," for more information about that feature of XML.)
On the Web, many pages contain embedded scripting within the content. This provides immense additional power because you can write scripts to make your content very dynamic and easy to use. The problem with this approach is that you have to let both your programmers and your authors have access to the same files, and either group can accidentally "break" the other's work.
By allowing you to keep content and code separate, XML allows you to restrict access to each type of file and allows you to make changes to one file without breaking the other.
-
Media independence —XML's media-neutral format allows the capture of content in a form that's independent of any particular medium. That means that instead of embedding formatting information within the content, it's applied separately through a "stylesheet" that can vary based on the target medium. By using a different stylesheet for each medium, you can generate many different forms from the same source of content – which enables information to remain consistent regardless of media type: Web, print, CD-ROM, wireless. Further, you can adjust the output to take advantage of the capabilities of each medium.
For instance, print output should make use of multiple columns, page headers and footers, chapter-level tables of contents, indices, and other features particularly suited to print. And Web output should make use of moderate page sizes, intra- and inter-document linking, automatic stripping of irrelevant information, and linking actions to information (for example, click on a spare part to order it).
Further, specific elements should behave differently depending on the medium. For example, you can set up a stylesheet for printing content so that footnotes show up at the end of each chapter. For the same content displayed on the Web, you can have footnotes displayed in a different color or show footnotes in pop-up windows that appear when the mouse hovers over a footnote mark.
-
Processability — Another vital capability of XML is its support for automatic processing of content. Because XML provides an absolutely consistent data format, it enables you to write computer programs so that you can employ automation for faster results and greater functionality.
-
Easy reuse —XML supports the creation of small reusable components of content that can be created once and appear in many places within a document and in many different documents. This eases maintenance, so that content can be changed once and automatically updated wherever it's used.
5. Five A's of Customized Content
This section lists the five key ingredients of "customized content" – content that truly meets the needs of each individual information consumer.
-
Aware of the needs of each user (personalization) – the best Web sites alter the content each user sees to meet the needs of that user. There are three primary types of personalization available, listed in order of prevalence:
-
Manual Personalization – each user controls which information is displayed on their home page, a capability common to portal sites such as "My Yahoo."
-
Relevance Personalization – based on evidence of the user's interests, the Web site automatically delivers content. For example, if the user has purchased shoes, a Web site could suggest socks.
-
Precision Personalization – based on the user's characteristics, the Web site delivers only content that's relevant and omits all irrelevant content. At its most extreme, this means that content could vary all the way down to a word in a sentence or a cell in a table.
Because XML supports very fine granularity of information, it's uniquely well suited to supporting the "Precision Personalization" described above.
In every case, the primary value of personalization is that it gives the user more control, either explicitly or implicitly, over the content that he or she sees.
-
-
Accessible: more easily searched — customized content helps users more easily find relevant information. The key difference is context: where typical Web searches look across everything, a contextual search limits the search to content that meets a profile. For example, if the user seeks repair information, a contextual search would allow the user to search for information only within those documents specifically related to repair.
XML's support for creating information hierarchies provides the key for improved information searching.
-
Adaptable to the medium: Web, print, CD-ROM, wireless — in nearly every organization, authors create content for a specific medium and a separate group massages that content for each additional medium.
In most cases, authors use word processing or desktop publishing tools to create content for print and a separate group converts that print content for the Web. In the process of conversion, the Web group breaks up large documents into Web-sized chunks for personalized assembly, adds navigation aids, adjusts formatting, generates additional links, and coordinates the deployment of related content additions and updates. In the process, the Web team creates a separate content repository with no automated connection to the original.
This effort pales in comparison to the cost of keeping the content fresh and correct. As authors and webmasters copy existing content from one document and paste it into new documents, the same content proliferates until there are dozens or even hundreds of copies of the same content to maintain. When it needs changing, the cost of tracking down and changing every instance dwarfs the original cost of creating and delivering it — and the risk of out-of-synch changes rise as well.
In contrast, customized content adapts automatically to the medium. >From a single source, customized content can be transformed automatically not only to be compatible with each medium, but also to take advantage of each medium's capabilities and strengths.
Customized content's advantage of automatic transformation not only reduces the time and cost of deploying content across multiple media, it also reduces the costs of revisions by enabling a single change to propagate automatically to every document and medium where it appears.
-
Actionable — interactive — any content that invites an action should let the user initiate that action. The most common example of actionable content is an online catalog where you can click on a product to order it. Additional examples of actional content include:
-
While stepping through a diagnostic procedure, the user can not only read each step but also enter the result of each step to guide the next step.
-
When the user determines the need for a replacement part, the user can click on the part to learn price and availability.
-
-
Aggregated from multiple sources — one of the key benefits of customized content is its currency: by pulling together content from various sources at the moment the user requests it, the user receives the most current and accurate information.
A typical instance of aggregated content is a document that contains information from database fields. For example, a product description can include current price and availability drawn from a database as well as lists of features, benefits and specifications.
6. What's Next
Over the last decade, enterprises have squeezed enormous amounts of loss out of many of their business processes. One of the greatest remaining areas of opportunity for improvement remains in the process of creating, managing and delivering information. This process has been left to last because it may be the most difficult — it involves the greatest amount of change throughout the organization, with the most far-reaching implications and effects — but it's also potentially the most rewarding.


