|
XML and XSL from servers to cell-phones
a new Internet
content model
|
 |
XML and XSL provide a powerful metaphor for separating content from
presentation. Content can be generated, assembled, and personalized from a
variety of sources and media. Using XSL stylesheets matched to the end-user's
environment, the content can be formatted and rendered to match the delivery
platform, program, and connection.
Using next-generation technologies, the rendering can also be deferred
to the individual devices, freeing servers to concentrate on generating highly
targeted and personalized content.
This paper presents an application architecture that can be used to
implement a "generate once, display anywhere" scheme for Web-based content
delivery.
The underlying technology is in use by EarthLink Networks Inc., the
largest independent Internet Service Provider in the U.S. to reach 3+ million
users, from desktop browsers, to Internet-enabled cell phones.
Background
Today, the majority of content on the web is coded in
HTML, a markup language that combines
presentation tags with the content (i.e.
<FONT>, <B>, ...) Mixing the display code with the content
makes it difficult to show the material on browsers that do not support the
complete HTML standard.
XML allows content to be tagged based on the type of content itself,
for example:
<USER> <NAME> <FIRST>John</FIRST>
<LAST>Doe</LAST> </NAME> </USER>
The new eXtensible Stylesheet Language Transform (XSLT) specification
released in 1999 by W3C provides a language for transforming XML data into
HTML (or other XML flavors). By choosing the appropriate XSLT stylesheet to
transform the XML, the content can be "rendered" into the appropriate display
markup flavor. For example, one stylesheet can translate the content to HDML
or WML (flavors of HTML for cell-phones), while another can generate DHTML
with animation and links to streaming video.
A web-server can defer the rendering decision until the very last minute,
so the same content can be experienced regardless of the mode of browsing.
The server chooses the best stylesheet to match a user's immediate needs and
renders the content to match it.
Although very flexible, this puts a large processing burden on the server.
In this model, ALL web-pages are dynamically processed (either assembled,
or rendered, or both). Performance will be key to the user experience. Through
advanced XSLT tools, and intelligent caching techniques, the processing time
can be reduced to a minimum. Further gains can be derived from browsers that
are capable of performing the XSLT transformation themselves.
Web application architectures
Until now, web-based applications either involved sending static HTML
files directly to the browser (
Figure 1) or HTML code
dynamically generated via an application server (
Figure 2).

Figure 1

Figure 2
Today, through XML and XSL technologies, the HTML can be generated on-the-fly,
with the added benefit that the flavor of display markup can be chosen at
runtime (
Figure 3).

Figure 3
In the case of cell-phones, WML (a markup language for Wireless applications)
has to be translated into binary form by intermediate servers and sent via
WAP (
Figure 4).

Figure 4
The advent of browsers with built-in XML and XSL processing technologies
allows XML content to be directly sent to the browser. The browser can then
format the content to best match its own capabilities (
Figure 5).

Figure 5
Through intelligent browser-side caching technologies, the XSL stylesheets
can be pre-loaded into the browser and used to rapidly process incoming XML
into visible form. Today, Microsoft's IE5 and Netscape 6.0 under Windows are
desktop browsers with built-in XML/XSL processing technologies (
Figure 6).

Figure 6
Next generation technologies
The next generation of web-based applications will have to provide support
for more than just the desktop browser. To do this, they have to support content
generation, rendering, and interactivity. XML and XSLT technologies are ideally
suited for this. XML and XSLT applications in each area include:
Content personalization via assembly/generation
An XSLT processing engine with "plug-in extensions" can obtain data
from remote sources and assemble the content into a personalized XML content
file. The content may include third-party syndicated material (i.e. news,
horoscopes, sports, etc.), direct database access, remote services (via ActiveX,
RMI, or CORBA), application-generated data, and legacy HTML. The XSLT processor
can use the user-preferences to assemble content specifically targeted to
a single user. The XSLT stylesheet contains "rules" for obtaining and formatting
each information source (
Figure 7).

Figure 7
Rendering
The personalized content can be rendered to best match a user's preferences
(i.e. themes) as well as browser-type, device-type, and line-speed. A variety
of algorithms could be used within a content-matching engine (CME) to best
match these input parameters to the optimal stylesheet for a given type of
content (
Figure 8).

Figure 8
Interactivity
When a user selects a link or fills out a form, the request is transmitted
to the server. The server converts the HTTP request into an XML request, processes
and generates a response back to the user. Using XML and XSL, user requests
can be mapped onto any custom application code (
Figure 9).

Figure 9
Performance tuning
To be able to maximize performance, some techniques can be employed
throughout the application flow:
Caching
- Server-side (XML, XSL, legacy HTML) - Content that has already been
processed can be cached and reused until the source changes. This can be used
to avoid re-assembly and/or re-rendering.
- Client-side (XSL, HTML/WML) - Request for content that has not changed
on the server may be serviced through the browser-cache. Programmable caches
allow relatively static files (such as stylesheets and images) to be stored
in the client.
Binary compilation
- Stylesheets: An XSLT compiler can translate XSL source into compressed
binary form, therefore avoiding a time-consuming reparsing in subsequent times.
- Content: An XML compiler can translate the XML source into compressed
binary data leading to faster load times and smaller file sizes.
- Advantages: Smaller file-size, faster processing due to removal
of parsing stage, smaller XSLT engine size since parser will no longer be
necessary.
Optimization
- Distributed networking: content sources can be distributed across
multiple machines. XSLT aggregation and rendering may be performed on separate
servers.
- XSLT analysis: XSL source may be analyzed for optimal instruction
processing and XPath optimization.
- Multithreading: A multithreaded XSLT engine can process multiple
requests simultaneously.
Case study: EarthLink Networks
Company background
EarthLink is the largest independent Internet Service Provider in the
United States second only to America Online in total customers. In 1996 EarthLink
developed the first user-personalizable start page, PSP 1.0, for use by it's
access customers. Since that time the product has gone through five iterations
leading up to the state-of-the-art portal it has become today. The members-only
version of this product is on-track to generate US$50 Million Dollars in revenue
for the year 2000, with less than US$1.5 Million in capital investment in
hardware and a development team of 5 Java/C++ engineers and 10 XSL/Markup
Engineers.
Development goals
The current version of the EarthLink portal was built with the following
goals in mind:
- 1. Open Access: It must support
all forms of content access; PC browsers, Internet Appliances, PDAs, and wireless
phones, with minimal changes required to support new devices, and no changes
to the content harvesting process with content partners.
- 2. Flexible: Allow for elegant
and simple user-customization of content and interface and "consistency of
identity" across multiple devices.
- 3. Profitable: It must scale across
multiple versions of the portal, customized for affinity marketing partners
and key OEM relationships (EarthLink currently supports over 95 separate versions
of this portal).
- 4. Scalability: It must scale to
allow for open access to the portal for key strategic partners and everyday
Internet users, with a target traffic level of over 15 Million requests per
day, using the current hardware and network infrastructure: a four times increase
in capacity over the previous system.
- 5. Maintainable: It must scale
internally across the development organization, and require no new personnel
to support these additional devices.
How it was done?
In order to support these conflicting goals, EarthLink worked in conjunction
with Activare to develop a pure XML/XSL solution to this problem.
- 1. Open Access: Content is assembled
from third-party sources, using XML, XSL, and XSL plug-ins, then automatically
rendered in real-time using XSL stylesheets for any number of devices.
- 2. Flexible: User-preferences are
stored directly in XML files and drive the assembly and rendering process.
New types of content and services can be added quickly, unlike the widely
used application-server/database model.
- 3. Profitable: The portal can be
cloned very quickly with a new user-interface, look-and-feel, and branding
with simple changes to core XSL stylesheets. XML/XSL architecture allows integration
of the portal with third-party advertising, E-commerce, and co-branding opportunities.
- 4. Scalability: Through the use
of compiled binary representations of all XML and XSL objects within the portal
engine EarthLink was able to use its existing hardware and disk infrastructure
while allowing a four-times increase in user-base and traffic.
- 5. Maintainable: Existing HTML
markup personnel were retrained in XML and XSL to maintain the system, eliminating
the need for scarce C++ and Java programming talent.
The solution involves a pure C++-based XSLT processing engine and XML/XSL
compiler from Activare, with support for C++ and Java plugins. For optimal
performance, the display rendering system for the EarthLink portal was written
in C++. For maximum flexibility and time-to-market, the core personalization
system was written in Java using a JNI version of the Activare XSLT system.
EarthLink has developed custom versions of the portal for Apple Computer,
Sprint and Sprint PCS, Palm, USAA, and Sony, all of which are accessible from
PC browsers, Sprint PCS hand-held phones, and Palm devices.
Conclusion
XML and XSL are highly flexible technologies for use in development
of next-generation web-based applications. XSL is an ideal solution for deployment
on both servers and clients, allowing the existing infrastructure to handle
the demands of future content-distribution systems. Highly customizable content,
delivered to any device, any place, is finally within reach.