TECHNICAL
TRACK
WEDNESDAY, MARCH
1
8:30 am
Morning Keynote: XML Stands
on the Shoulders of an IT Giant (presentation
coming soon)
Benoit Lheurex, Gartner Group
Session #5: Small Devices
9:00 am
XML and Jini - On Using
XML and the "JAVA Border Service Architecture" to
Integrate Mobile Devices into the JAVA Intelligent
Network Infrastructure (presentation
coming soon)
Stefan
Mueller-Wilken, Research Assistant, University of
Hamburg
smueller@informatik.uni-hamburg.de
Biography:
Stefan
Mueller-Wilken is with Prof.Dr. Lamersdorf Distributed
Systems Group at the University of Hamburg since
1994. After finishing his master's thesis in Computer
Science with a work on service mediation in distributed
middleware environments, he became a research assistant
in the same group in 1997. His main research circles
around questions of how to integrate mobile devices
into distributed system environments. Positions
as visiting researcher brought him to the Distributed
Systems and Technology Center (DSTC) in Brisbane,
Australia and at the 'Ericsson Eurolabs Deutschland'
(EED) in Aachen, Germany.
Abstract:
Since
its introduction early this year, the "JAVA Intelligent
Network Architecture" (JINI) has brought a new and
fascinating approach to the field of lightweight
middleware systems. Using building blocks such as
'resource leasing', 'distributed events' and a centralized
'lookup service' to store registered service offers,
JINI offers good potential for realizing highly
dynamic distributed computing scenarios with participants
ranging from large scale server applications to
consumer electronics and mobile systems such as
mobile phones and personal digital assistants (PDAs).
Among JINI's outstanding features is the ability
to not only register the access path of a server
application (URL, socket, etc.) as other middlewares
do, but to register service proxies to be used on
the client side to access the service. These proxies
can be used to implement a communication method
secretly shared between client and server. The transfer
of a user interface to a human client is possible.
Changes to a JINI application will simply lead to
different proxies being registered and transferred
to the client side and without any modification
of the client side code becoming necessary - all
simply taking place under the hood next time the
leases are due. While the JINI approach brings great
flexibility to the field of distributed system design,
there is currently one huge drawback: JINI is inherently
based on the JAVA programming language and therefore
not accessible from the vast number of small devices
such as WAP phones or simple PDAs in use today.
The University of Hamburg is currently developing
the 'Hydepark' infrastructure (hyper distributed
environment for personal appliances) which will
allow for the integration of non-JAVA devices into
the JINI application scenarios. As one important
project a JAVA border service architecture (JBSA)
is being designed to make service GUIs (like those
registered as part of JINI service offers) accessible
from simple browsing devices (such as WAP phones
or PDAs), thus giving direct means to integrate
such appliances into JINI application scenarios.
Like other approaches, the JAVA border service is
based on principles of introducing an abstract layer
between application logic and presentation layer.
This abstract layer is based on using XML and optimized
XSLT processing to transfer GUI descriptions into
concrete representations in XHTML, WML, VoiceML
or VRML. But where Gamma's half bridge pattern or
the W3C's XFDL approach rely on modifications to
the code at design time, the JAVA border service
architecture is based on runtime analysis of the
active application and dynamic transformation into
a representation as requested by the mobile client.
We call this approach an 'n+0 tier design' with
a mobile client being co-located to a running desktop-client
application. In best case, the application wouldn't
even notice the difference between being used locally
from being used from abroad and the client device
could change with the application being 'alive'
- from direct access to access via HTML browser,
on to access via WAP telephone and back to access
from the desktop PC. To allow for this flexibility,
the JAVA border service architecture provides numerous
services in addition to the core analysis and transformation
functionality:
*
authentication support,
*
a session management facility,
*
pluggable device adapter support,
*
an application factory to start and host service GUIs
and
*
a device- and application classification mechanism
These
additional services allow a complete infrastructure
for XML
based
device integration with the JINI architecture.
The
JAVA border service architecture is designed around
a JAVA GUI analysis functionality. Using a so called
'application shadow' that runs in the same virtual
machine as the application, the object hierarchy that
makes up the user interface is scanned in regular
intervals and the results are converted into JSML,
the Java Swing markup language (JSML), an XML dialect
specially designed for Swing GUI representations.
This JSML 'snapshot' is being routed through the 'BSA
gateway' and dispatched to one of the XSLTP instances
that are being held online for each client-side
representation
style sheet for performance reasons. At this point
in time transformations to XHTML and WML are possible.
We plan to integrate support for VoiceML and VRML.
Results of this transformation process are then forwarded
to the 'external communication adapter' (ECA) corresponding
to the target representation, where they will be offered
to the client device in a manner suitable to the device
class (for WAP phones using a WAP gateway, for HTML-aware
PDAs through a servlet engine etc.). Client interaction
such as a button being clicked or a list item being
picked are caught by the 'ECA', forwarded through
the 'BSA gateway' and routed back through the 'application
shadow' where they are retranslated into GUI events
and inserted into the Swing event queue. The application
now processes these events as if they originated locally.
Results of the client interaction are scanned by the
'application shadow' and the process can start over
again. Early prototypes have led to very promising
results with respect to performance and flexibility
of the XML based approach chosen for the 'JAVA border
service architecture'. This rapid prototyping has
only been achieved through use of the Extensible Markup
Language and XSL Transformations. Ongoing changes
to the JSML design can be rapidly incorporated into
the Architecture by simply adjusting stylesheets as
opposed to rewriting large sections of code. As a
consequence, the JBSA is currently being integrated
into a first real world application scenario, where
fieldworkers can use their mobile phone to gain access
to company information (mostly tasks, addresses and
dates), stored in central databases and usually presented
through a small JAVA application they have on their
PC desktops when in the office. While a lot of effort
currently goes into design and implementation of the
JAVA border service architecture, the 'Hydepark' infrastructure
is not restricted to integration of non-JAVA clients
into the JINI architecture. In other sub-projects,
integration support for non-JAVA services is being
realized and first prototypes on using simple PC104-based
servers within the JINI environment have been built.
In
short, Hydepark offers the following benefits for
distributing applications:
*
JINI Services become available to non-Java clients.
*
All XML based target representations that allow
for interaction (like XHTML and WML) may be incorporated.
Biography:
Ken
Rabold is a software engineer at BSQUARE Corporation.
He is working on providing XML based solutions for
Windows CE devices. He was worked with XML since 1998
for exchanging information between medical computer
information systems.
Abstract:
The
BSQUARE software deployment application incorporates
XML as the basis for software deployment and device
configuration for embedded Windows CE. Placing the
responsibility of initiating software updates from
the CE device (pull) versus a dedicated centralized
server (push), the software deployer uses XML to form
a custom software update description language from
which update scripts can be written.
The
XML based scripts are downloaded by the CE device
from a web server, interpreted by the software update
component, and executed. Elements within the XML file
allow for updating data files, downloading and installing
executables or COM objects, CE registry modification,
and downloading a whole new operating system image.
This
presentation describes the use of XML in the software
update component, the design of the software update
package schema, and how XML and the internet protocols
are used on an embedded Windows CE device. The component
nature of the product allows it to be incorporated
into custom applications and platforms. One such platform,
a Windows Based Terminal (WBT), is a natural for hosting
a software deployment application. WBTs are designed
to provide a low Total Cost of Ownership by running
Windows NT sessions as a terminal client on low cost
hardware. To help facilitate a low TCO, updating software
on WBTs remotely by an administrator is an absolute
necessity. By incorporating a software update component
into WBTs, BSQUARE is able to provide a mechanism
by which network administrators can deploy new versions
of software, modify settings on the device, and through
the use of active server pages on a the web server,
track devices that have requested software updates
as well as serve up customized XML update packages.
10:00 am
Hardwired
XML
John
Aloysius Ogilvie, President, Killdara Corp
jogilvie@killdara.com
Biography:
John
Ogilvie received his degree in Systems Engineering
in 1983. Since then, John has been a prolific software
developer in the U.S., Canada and the U.K., including
stints with Norpak, Bell Northern Research (Nortel),
Videotron, Virtual Prototypes, Oracle and other innovators
in graphic telecommunications and scientific computing
Projects have included medical imaging, remote sensing/mapping,
public-access, entertainment, retail and training,
among others. John currently runs Killdara Corporation,
a venture-funded XML product company.
Abstract:
In
this session we will explore an underappreciated area
of XML: how it can be used as a 'lingua franca' for
communication between intelligent, automated hardware/software
devices known as 'bots' (robots). Bots are already
common, although usually invisible to the user.
Sophisticated
websites and transactional services are built from
bots, and they have been embedded in cars, appliances
and computers for years. A bot is an autonomous piece
of computing power which is dedicated to a specific
task, and which has no user interface.
We
can see a time in the near future when hundreds of
millions of these cheap, simple, single-purpose 'micro-servers'
are embedded everywhere. Precursors of these devices
are already built by pioneers like Axis Communications
and Cobalt; researchers have even built complete webservers
which fit in the palm of your hand.
So
these microservers are increasingly ubiquitous, but
they are unable to speak to one another.There have
been interesting initiatives such as JINI, but in
my opinion these initiatives are overdesigned (similar
to CORBA) and will not be the solution. XML will be
the solution.
XML
is a good foundation for inter-bot communication for
several reasons:
a. It's
a comparatively lightweight messaging protocol, so
it fits easily into devices which have limited memory
and computing power.
b. Messages
can be rigidly structured, making them easy to generate
and interpret.
c. It's
vendor neutral and carries no licensing fees.
d. It
can be implemented on any platform using freely-available
software components written in Java.
My
prediction is that by mid-decade we will take it
for granted that we are surrounded by a constant,
inaudible XML chatter among the bots. Your home/computer
(deliberate punctuation) will host bots which synchronize
your life with your colleagues and family, and surf
the net for good deals on groceries and airfare.
Your office, car, phone, PDA will all have similar
responsibilities and capabilities. Cars and garages
will know one dialect (DTD), and another dialect
will be used between your TV set and PDA. A given
bot may use ten different DTDs in a day's operation.
The
chatter will be delivered as intermittent, low-bandwidth
traffic over conventional TCP/IP networks, often
using wireless transmission. The traffic will take
a variety of forms: E-mail, web post (HTTP) or file
transfer (FTP) will be the basic protocols.
The
messages will be concise XML documents, using defacto
industry standard DTDs. Of course, someone will
first have to design DTDs for "Vending Machine Out
Of Order Report", and "Flight Delay Information
Request".
Where
security is an issue, the documents will be digitally
encrypted and signed using public-key infrastructure
(PKI) techniques. Each bot will have it's own unique
and legally-recognized digital ID or signature.
The chatter will be much more secure than even existing
financial transactions.
Biography:
Walter
E. Perry, PhD. is a founder and the CTO of net.uniqueness,
Inc., a New York- and London-based firm applying XML
to database functions and to databased application
support. He has sixteen years experience developing
enterprise systems which apply distributed databased
solutions in financial settlements and other transnational
processing.
Abstract:
For
nearly twenty years two-phase commit (2PC) has been
the basis of transaction processing on distributed
systems and of peer-to-peer transactions between systems.
Improvements in the efficiency and availability of
2PC-based systems have been real
ized
through increasingly reliable hardware and through
software which has refined the definition of transactions
to a granularity best suited to the environment
in which they are processed. The parties to transaction
processing are confident of the ability of their
counterparties to execute because either both systems
are distributed within a single organization or
the counterparty's systems are so familiar that
they can be treated with the same trust as the enterprise's
own. In addition--and perhaps most crucially--the
premise of two-phase commit is that each atomic
transaction be identically defined by the software
of the two systems. If the transactions work at
all, that is proof that they are identically understood
by both parties.
The
current excitement over business-to-business transactions
across the Internet should be tempered by the understanding
that this will be an increasingly unsuitable environment
for two-phase commit and indeed for the past twenty
years' common understanding of transaction processing.
The inherent topology of the Internet is of autonomous,
largely anonymous nodes. Much of the promise of
Internet commerce lies in the prospect of doing
business with parties previously unknown or inaccessible.
Yet even when those prospective counterparties can
be identified and reached through the network, much
about their systems and processes will remain opaque.
It will certainly not be reasonable for a business
to act as if those systems and processes are closely
analogous to its own. More important, a business
cannot expect the central requirement of 2PC-based
transaction processing--the identical definition
of the transaction--from a counterparty that was
never previously an identified participant in that
market or industry sector.
XML,
through its inherent extensibility, is the crucial
tool for defining a transaction model--and more
important for building transaction processors--which
work in the Internet topology of autonomous, largely
anonymous nodes. We must assume a world where neither
the boundaries nor the particular constituent components
of a transaction are understood identically by two
largely anonymous parties. Yet if a transaction
is doable at all, it must be understood by each
of the parties as an extension or modification of
some already familiar transaction, for otherwise
they could not comprehend the transaction at all.
We cannot expect that one such party will know how--or
be willing--to process a transaction in precisely
the way, and on precisely the terms, that the other
party would, or as that other party might expect
its counterparty on a 2PC-based transaction to do.
Yet if, through an exchange of messages containing
nothing but the specifics of the proposed transaction
as each party sees them, each is able to understand
in its own terms what the other is proposing, then
each can independently execute its part of the transaction,
once it has satisfied its own definition of the
data required. In other words, instead of two-phase
commit, we have autonomous, asynchronous separate
execution of differently-composed and differently-bounded
transactions, which are nevertheless counterparts
to one another in the instance of execution.
This
presentation will describe how to implement such a
transaction model by exploiting inherent benefits
of XML. Key to this is the process of constructing
a data model from the parse of a message received
and then instantiating the elements of that data in
(probably very different) structural terms understood
by the receiver. We will describe how the XML markup
describing these structures and relationships can
be generated from a general-purpose process driven
by the parse of each new message. We will also cover
how processing constraints can be implemented, in
XML markup, very differently on the two systems and
yet be simultaneously applied in the execution of
each transaction. Finally we will explore what this
transaction processing model demands of--and teaches
us about--the nature of data vocabularies in general.

11:30 am
XML
Messaging and Java/XML/SQL Conversion
David Orchard, IBM
orchard@pacificspirit.com
1:30 pm
Afternoon
Keynote: When XML Turns Ugly
David Megginson, Megginson Technologies, conference
co-chair
David Megginson of Megginson
Technologies, currently co-chairs the W3C XML Core
Working Group and serves as a member of the W3C's
XML Coordination Group. He led the initiative that
created SAX, the Simple API for XML, and organized
the XMLNews initiative, which promotes the use of
open standards for the exchange of news and information.
David has been a Linux user since 1993 and has been
writing free software for over a decade.
Session
#7: APIs
2:00 pm
EasySAX:
SAX made Pythonic
Paul Prescod, Consulting
Engineer, ISOGEN
paul@prescod.net
Biography:
Paul
Prescod is a leading researcher and implementor of
document processing technologies. His formal education
was in mathematics and computer science from the University
of Waterloo. His research interests include formalisms
for document modelling, queries and schemata. As a
consulting engineer at ISOGEN, he helps organizations
apply ISO and W3C standards to large-scale documentation
problems.
Among
his accomplishments, Paul has been very involved in
the development and promotion of new standards. He
worked within the XML Working Group of the World Wide
Web consortium to develop the XML family of standards
and co-wrote the most popular book on that family
of standards: The XML Handbook. Paul wrote the first
and most popular tutorials on the DSSSL style language
and the grove paradigm. He writes widely on other
topics both abstract and concrete. On the implementation
side, Paul can integrate a wide variety of tools and
techniques. He has experience with programming languages
such as C++, Python, Java and Omnimark; authoring
systems such as FrameMaker+SGML and AdeptEditor and
SGML toolkits such as James Clark's SP and Jade.
Abstract:
EasySAX
is a high level SAX-based API for working with XML
event streams in Python. Where SAX was specifically
designed as a low-level API, EasySAX is designed first
and foremost to be easy to use, convenient and flexible.
EasySAX
has dynamic event handler dispatch mechanisms that
make XML processing convenient by building on Python's
dynamism. Where SAX users typically dispatch events
using switch statements or hand-coded dispatch table,
EasySAX builds a dispatch table automatically based
upon method names and metadata.
EasySAX
also combines some of the best features of tree-based
and event-based interfaces by allowing trees to
be built "on-demand" from portions of parse streams.
This allows the performance degredation of tree
building to be minimized.
EasySAX
is currently in testing and the final release is
expected in time for the conference.
2:30 pm
XML
in the Java Platform
James Davidson, Staff Engineer,
Sun Microsystems
james.davidson@eng.sun.com
Biography:
James
Davidson is a staff engineer at Sun Microsystems .
He is currently leading the specification for the
Java API for XML Parsing. Since joining Sun in 1997,
James has previously worked on the Java Servlet team
as the author of the Servlet API specification and
on other web technologies. James sits on the W3C DOM
Working Group. He has also played an instrumental
role in founding the Apache Jakarta project and continues
to chair the Jakarta project management commmittee.
Abstract:
This
session will provide a technical overview along with
detailed examples of XML technologies in the the Java
2 Platform. Attendees will learn about current XML
technologies being developed through the Java Community
Process, including the Java API for XML Parsing (JAXP),
Project Adelard, and XML integration in Java 2 Platform,
Enterprise Edition (J2EE). Attendees will also learn
how to leverage the synergistic relationship of XML
and the Java technology to create powerful Web applications
for large-scale enterprise applications down to small
devices.
3:00 pm
Dynamic
Classes API for XML DOM
- Read
Me First
Robert
Houben, Vice President R & D, Liberty Integration
Software Inc.
roberth@libertyodbc.com
Dr. Philip Mansfied, President,
Schema Software Inc.
philipm@schemasoft.com
Dr. Yuri Khramov, Schema Software
Inc.
yurik@schemasoft.com
Biographies:
Robert
Houben has 19 years experience in the software industry.
He has authored several ODBC drivers including the
Red Brick ODBC driver, and the Liberty ODBC driver,
as well as the Liberty JDBC driver. He has deployed
eCommerce and eBusiness applications that integrate
Legacy Line-Of-Business DBMS systems using Web technology
for over 3 years. He is a founder of Liberty Integration
Software Inc., of Vancouver Canada.
Philip Mansfied is the president of SchemaSoft and
the member of the W3C SVG working group. Prior to
SchemaSoft, he worked at Paradigm Develoment Corp.
and taught in University of Toronto. He got his Ph.D.
from Yale University.
Yuri Khramov has more than 20 years of experience
in the software industry; he is involved in XML and
other WEB technologies for more than 4 years. he is
one of the founding partners of SchemaSoft. Prior
to that, he worked at Paradigm Development Corp. in
Vancouver, Canada Graphica in Tokyo, and several industrial
and Academic instituions in Moscow. He holds a Ph.D.
in Computer Science from Moscow Management Institute.
Abstract:
The
acceptance of a technology by the Basic programming
community is tantamount to becoming a "mainstream"
one. The XML DOM appears to be the model and the tool
of choice to deal with XML documents; that's why we
decided to concentrate on better integration of the
XML DOM with Basic, and particularly Visual Basic
Script (VBScript).
The
goal of the bizDOM project is to provide a tool and
API that will make the DOM more natural, easier to
use and more in accordance with the way the VB community
thinks and works.
The
central idea of BizDOM is a dynamic class generation
based on the content of a loaded XML document. This
feature allowed us to create a very simple and intuitive
API well suited to VBScript. The syntax construct
for these dynamic classes is called Nodepath.
The
following sections describe some of the most important
advantages of BizDOM.
1.
Tree navigation and node addressing
The
ease of addressing different nodes in the DOM tree
is crucial for the acceptance of the tool by the
programmers community. With the existing DOM tools,
the programmer has to operate with generic terms
and methods. For example, to get the element that
represents fifth <line> inside the <details>
section of the <invoice> document, the programmer
has to write several lines of code using "getChildList","get_Name"
methods, iterators, etc.
With
our dynamic classes and the Nodepath notation, the
application programmer is able to write clear and
intuitive code like the following: Invoice.Details.Lines(5)
The
Nodepaths also work for addressing attributes.
2.
Collections
VB
programmers use collection very widely; constructs
like "for each" are ubiquitous in scripts. The current
DOM tools lack collections completely, so we implemented
APIs that create the VB collections that would be
the most important in the VB scripts: child elements
of a node, child elements with a specific tag name.
3.
Default Properties
The
notion of the default property is very important
in VB, and non-existant in the W3C DOM spec. We
implemented it such that the default property of
an attribute is (naturally) its value; for an element,
the default property is the value of its first child
text node.
4.
W3C DOM Functionality
The
philosophy of the implementation was "make the common
things simple by implementing special APIs, and
keep all the W3C DOM available with the standard
interfaces".
To
provide the access to the complete W3C functionality
we are exposing the "underlying" DOM objects that
implement the complete W3C specification. The BizDOM
class object provides access to the W3C Document
object, and the BizNode object allows the user to
access a corresponding W3C Element object. Through
those objects, the users can create and access objects
of such W3C classes as Attribute, NodeList, NodeMap,
etc.
5.
Implementation Details
We
have implemented bizDOM as an ActiveX object atop
of MS IXML DOM ActiveX object, delegating most W3C
DOM functions to it. This method guarantees us full
compatibility for the current version and for all
versions to come. It also allowed us to reduce the
time from the "conception of the product" to the
beta version to little more than 3 months.
The
bizDOM beta is about to be released, but we already
have a number of "early adopters" signed into our
beta program. By the end of February, we expect
to have many customers using bizDOM in industrial
applications.
Biographies:
Mike
Champion is a long-time member of the Document Object
Model Working Group and an author of the core XML
portion of the W3C DOM Level 1 Recommendation. He
spent several years at Arbortext working on interfaces
between an XML authoring system and various XML repositories.
He now works for Software AG's development organization
and acts as a contact with the W3C and the XML community.
Simon
St. Laurent is a web developer, network administrator,
computer book author, and XML troublemaker living
in Ithaca, NY. His books include XML:A Primer, XML
Elements of Style, Building XML Applications, Inside
XML DTDs: Scientific and Technical, and Cookies.
Don Park is the CTO of Docuverse, a bleeding-edge
company specializing in providing tools and services
to e-commerce industry. Mr. Park has been actively
consulting for the past 18 years. As a vocal member
of the XML community, he has participated in the design
of SAX and DOM standards. Recently, Mr. Park has founded
SML-DEV group to address growing concerns over complexity
in XML standards.
Abstract:
Over
the last several years, eXtensible Markup Language
("XML") has generated enormous currency in the marketplace.
It is sold as the universal syntax for making business
information accessible, independent of the software
deployed. XML was bootstrapped as a simplification
of the popular Standard Generalized Markup Language
("SGML"). XML retained much of SGML's power and existing
market acceptance--yet was easier to implement. Since
then, XML has made great progress towards making information
processes more commoditized, replaceable, and thus
accountable.
As
a result of its SGML heritage, XML has brought with
it a document publishing bias, including features
such as external parsed entities, document type definitions
("DTD"), notations, CDATA sections,
and the like. However, in many business domains, especially
electronic commerce, much of these carry-overs simply
aren't needed. And in fact, they form the bulk of
XML's complexity. They tend to increase development,
testing, and training costs. And they hinder interoperability.
Certainly XML is a huge improvement over SGML, however,
for many domains the simplification was stopped prematurely.
In
November 1999, a group of practitioners gathered
to continue this simplification. By stripping XML
down to the core, they hope to maintain a bulk of
XML's applicability, yet relieving a majority of
its complications. From the start, there was unanimous
agreement to eliminate DTDs, notations, external
parsed entities, and CDATA sections. The question
then became, how much further? As of January 2000,
two key SGML features were still up for debate,
attributes and mixed content. There are valid reasons
for not wanting to include either of these in a
simplified markup language. However, there is an
equally valid reason to stop just short of these
syntax elements. So, rather than choose, the group
decided that it may be better to provide two simplifications.
The
first simplification, Common XML, maintains both
attributes and mixed content. It will be an XML
usage guideline, highlighting the most commonly
used aspects and clearly marking troublesome areas.
The
second simplification, Simple Marker Language ("SML"),
goes much further. A element can either have a text
value or a list of child elements, but not both.
Attributes are gone and so is mixed content. Namespaces
are still being discussed as is the <degenerate/>
tag. This minimal bounded tagging language may be
especially useful in environments where high performance,
a minimal footprint, and/or guaranteed interoperability
are important, such as B2B messaging. Further, with
simplicity comes a better foundation upon which
layered structures can be built. For example, a
special text tag could be added to allow for mixed-content.
A coloring layer could be added to support attributes.
And a rhythmic embedding could be used to express
alternating map and list structures, like those
found in the GROVES model.
When
this group is finished, the Common XML usage guideline
and the Simple Marker Language specification will
provide simplified subsets of XML, allowing a more
granular learning and adoption of tagging systems.
4:30 pm
SML
and Ockham's Razor: Too Close a Shave?
Evan Lenz, student, North
Seattle Community College
elenz@ricochet.net
Biography:
In
1998, Mr. Lenz received a Bachelor of Music degree
from Wheaton College (IL), with majors in piano performance
and philosophy. Making a living as a securities trader,
he is currently studying web application development
at North Seattle Community College and living with
his wife in Seattle.
Abstract:
"Entities
are not to be multiplied beyond necessity"
--
William of Ockham
The
stir of support behind the recently established SML-DEV
group is prompting the invocation of Ockham's Razor
against XML 1.0. The claim is that, for many simple
applications such as e-commerce transactions, XML
retains too much unnecessary baggage inherited from
document-centric SGML. A drastically simplified subset,
"Simple Marker Language," has been proposed. Using
only XML's "essential" features, it will purportedly
be easier to learn and implement.
The
notion that a subset is necessary at this point in
time undermines precisely what is revolutionary about
XML--its ability to function across many types of
applications without losing its identity as one language.
Whether for storage, document display, data interchange,
or e-commerce messaging, XML has achieved an impressive
compromise, and, in terms of industry support, has
hit a sweet spot. XML is optimized for broad usage
precisely because it is not optimized for any particular
usage. The attempt to isolate one application domain
and create a subset for it is a classic case of premature
optimization.
XML
is also revolutionary in its human readability. The
advantage to structuring data in a text file as opposed
to a binary format is that people can peek inside
an XML transaction, for example, and easily edit it
by hand. This allows for robust and easily replaceable
systems. In the proposed simple subset, the removal
of attributes would severely hamper the human readability
that made XML so revolutionary. Attributes are conceptually
distinct from elements; they provide us with a separate,
nonrecursive channel of markup which allows us to
structure data in more logical, human-readable ways.
The
saving grace in XML is that, while our parsers must
still support the XML specs, we don't have to use
every XML feature in our applications. If the prospect
of structured messages, say, that use only element
and text nodes appeals to us, then XML gives us
that freedom--without splitting itself into confusing
dialects.The burden is on SML-DEV to demonstrate
that the speed and ease of implementation resulting
from a simplified subset is so compelling as to
warrant the splitting of XML into subsets. This,
of course, would not be XML anymore, at least not
the XML we know--the one that allows us to speak
the same language, choose whatever parsers we want,
and use whatever features we like. Another way of
stating Ockham's Razor is particularly appropriate
here: "Plurality is not to be posited without necessity."
5:00 pm
Tired
of complicated specifications? You just RELAX!
Makoto Murata, INSTAC XML
SWG,
Masayuki Hiyama, INSTAC XML SWG
Motohiro Kosaki, Matsushita
AVC Multimedia Software
Biography:
Murata
graduated from Kyoto University. He has participated
in the XML activity at W3C since 1997. He is also
the chair of a Japanese committee (INSTAC XML SWG)
which published the Japanese XML Profile as a JIS
technical report. He is interested in theoretical
aspects of SGML/XML, especially the hedge automaton
theory.
Hiyama is a member of the W3C SYMM WG. He is also
a member of the INSTAC XML SWG. He has authored a
number of DTDs and is interested in hedge regular
languages.
Abstract:
RELAX
(REgular LAnguage for XML) is a language for representing
regular sets of XML documents as grammars. A RELAX
grammar generates a set of XML documents. Conversely,
XML documents can be validated against a RELAX grammar.
RELAX
consists of RELAX core and RELAX modularization. RELAX
core provides modules, which declare and constrain
elements and attributes in a single namespace. The
design of RELAX core (Version 1.0) has been completed,
and this presentation is mainly concerned about RELAX
core. RELAX modularization provides mechanisms for
attaching namespaces to modules and combining these
modules to form a single grammar. A whitepaper of
RELAX modularization is expected to be released in
early 2000.
A
RELAX module consists of rules and patterns. Intuitively
speaking, rules correspond to element type declarations
and parameter entities used therein, and patterns
correspond to attribute list declaraions and parameter
entities used therein. As a special case, a RELAX
grammar of a single namespace is a RELAX module.
RELAX
is based on the theory of tree (or hedge) automata.
From a RELAX grammar, one can effectively construct
a hedge automaton. By executing this hedge automaton,
XML documents can be validated against the grammar.
Operations on hedge automata can be applied to RELAX
grammars so as to examine their properties. In particular,
one can examine if one RELAX grammar is upper-compatible
with another by computing the difference of two
grammars.
RELAX
is more expressive than DTD in representing structural
constraints on elements and attributes. RELAX, however,
does not provide mechanisms for declaring entities,
notations, and default values, which have been captured
by DTDs. Rather, RELAX is intended to be used in
conjunction with DTD; XML documents containing DTDs
are first parsed by XML processors and then validated
against RELAX grammars.
Unlike
XML Schema of W3C, RELAX does not affect the information
emitted by XML processors. Thus, existing APIs such
as SAX and DOM can be used without loss of information,
even when the XML document has an associated RELAX
grammar. Information embedded in RELAX grammars
can be obtained by parsing RELAX grammars as XML
documents, if necessary.
DSD
is another proposal based on the tree automaton
theory. In comparison to DSD, RELAX is simpler,
internationalized, and provides rich datatypes.
A
RELAX validator receives a RELAX grammar and an
XML document. The validator first invokes some XML
processor to parse the grammar and document, and
then recieves the result via some API. The current
prototype uses DOM to access the RELAX grammar and
the SAX-like API of XML4C to access the document.
A RELAX validator reports either "This document
is valid" or "This document is invalid." Some error
messages and warnings may be reported as well.
A
RELAX validator has been developed in C++ and its
source code is available under GPL. The construction
of automata from content models is done by an automaton
construction tool kit called Grail. A converter
from DTDs to RELAX grammars has been developed in
Java and is also freely available under GPL. The
XML spec DTD was converted to RELAX by this program
and then revised by hand.
