|
Informix & XML
in, out, and shakin’ all
about
|
 |
An Object-Relational DataBase is often referred to as an extensible
server software system. XML is the eXtensible Markup Language. In this talk,
you will learn what happens when the most advanced data management technology,
and the most advanced internet technology, collide. In a nutshell you will
learn how easy it is to get XML data into an object-relational DBMS, how easy
it is to get XML out, and how you can take advantage of an ORDBMS to ‘shake
XML all about’.
Introduction
The web changes everything. By expanding access to information, and
our ability to share information, the web enhances individual productivity,
and improves overall organizational efficiency. The scale of these changes
has been trivialized and obscured by the recent dot.com mania, but it is hard
to underestimate the impact the Internet will have on our lives, and on the
lives of our children.
Once, IT professionals built information systems that were like islands.
Travel between these data islands was very difficult so exchanges between
them were rare. As a result other challenges – language differences,
standards etc – attracted little attention. But the web makes travel
between our data islands far easier. And this makes XML important, because
it represents a powerful way to overcome semantic barriers to information
exchange.
So technology vendors like INFORMIX are changing. We are making XML
an integral part of our products and solutions. Traditionally, INFORMIX has
been a leading provider of
DBMS
software. With more people sharing more information, there is greater demand
than ever for our scalable, transactional, data management products. Also,
one of the basic requirements for web software is flexibility: web sites evolve
rapidly, changing their look-and-feel, content, and the kinds of the services
they provide. This makes declarative, query-centric interfaces, where a web
application can ask and answer ad hoc questions, very useful.
But how do we view the combination of XML and DBMSs? And how do we think
it ought to be done?
Extensible or object-relational DBMSs
Fortunately, recent changes in DBMS technology make this integration
easier than it would have been before. Today, the best DBMS engines are extensible.
This means that they allow developers to embed modules of procedural code
within them, and to use these modules within an abstracted, logical data model.
In other words, instead of storing INTEGER, VARCHAR, DECIMAL, FLOAT and BLOB
data types, and relying on middle-ware or client-side logic to turn this data
into information, the columns in an object-relational database’s table
can contain instances of atomic objects as exotic and varying as Java beans,
records of temperature {120 F, 41 C, 304 K}, physical quantities {85 Kg, 180
Lb}, geographic points and polygons, finger prints and so on.
Moreover, ORDBMSs allow developers to reason about these objects in
the query language. For example, consider a Business-to-business (B2B) e-commerce
exchange where retail buyers locate perishable foodstuffs, chek on the availability
of space in a refrigerated moving van, and send messages that are bids to
buy the inventory and reserve van space. Within an ORDBMS, such a schema and
queries might look like this:
CREATE TABLE Perishable_Food ( CREATE TABLE Freight_Space (
What Food_Type NOT NULL, From_To Geo_Path NOT NULL,
Where Geo_Point NOT NULL, When Period NOT NULL,
Available Period NOT NULL Capacity Mass NOT NULL,
); Space Volume NOT NULL,
Goods SET(Packages NOT NULL)
);
SELECT F.From_To, F.When
FROM Freight_Spaces F, Perishable_Food P
WHERE Geo_Within ( Circle(P.Where, ’10 Miles’), From(F.From_To) )
AND Time_Within ( F.Available, Start (P.When) )
AND Has_Space_For ( P.What, F.Capacity, F.Space, F.Goods );
Figure 1
. Object-relational schema and query example
Illustrates object-relational schema storing extensible objects within
tables, and example of declarative query expression using these objects.
The point of this figure is to illustrate how sophisticated modern data
management systems can be. And it also hints at the necessity of XML in this
application. Where do the values in these tables come from? Given the variety
of data islands involved (each wholesale supplier and trucking company probably
has their own, existing management information systems, each with its own
formats and structures for storing data) how can all of this be unified? The
answer is XML.
The good news is that DBMS extensibility also means much of the plumbing
necessary to make XML a reality can now be embedded directly into the DBMS.
(This does not mean, of course, either that XML is the only way to talk to
an ORDBMS, nor that an ORDBMS is the only use for XML!) Over the next couple
of pages, we will see how this can be done.
Getting XML in
The problem with building this kind of system is the number and variety
of islands of data involved. XML excels at overcoming this problem. Independently
of the DBMS, our B2B site developers can create a set of DTD specifications
to describe how information can be communicated. In our example application,
such a message sample may look like this:
<goods> <freight_spaces>
<food_item> <transport>
<food_type> <trip>
<name>Apples</name> (2.371,48.937,4.01,49.24)
<mass unit="Kg">12</mass> </trip>
<space unit="M">1x1x1</space> <when>
<store unit="C">12</store> <from>05/02/2000</from>
</food_type> <to>06/02/2000</to>
<loc>(2.371,48.937)</loc> </when>
<available> <cap units="T">1.5</cap>
<from>05/02/2000</from> <vol units="M">3x3x2</vol>
<to>10/02/2000</to> <goods>
</available> <item>Wine
</food_item> <mass units="Kg">175</mass>
<food_item> <space unit="M">.5x.5x.5</space>
etc <store unit="C">20</store>
</food_item> </item>
</goods> <item>
etc
</item>
</transport>
</freight_spaces>
Figure 2
. Examples of XML exchange of business information
Illustrates how XML might be used as a standard means of representing
complex business data. The XML data in these examples would comply with an
appropriate, standardized Document Type Definition or Style Sheet.
The trick, of course, is bridging the gap between the kind of data you
see in
Figure 2, which might come from a variety of sources,
and the kind of structure you see in
Figure 1, where end
users answer their questions.
One of XML’s strengths lies in the way it employs standard ASCII
text. Although accessing data within an XML document requires that you first
process it, because of XML’s simple structure, writing parsers for it
is a relatively simple programming assignment. Consequently, a variety of
commercial quality parsers are available, for free, from various sources on
the web. Many of these parsers are written in Java.
Extensible DBMSs can take Java code, and run it natively within the
DBMS. Consequently, we are able to embed several, free Java XML parsers directly
into the framework of our server. In
Figure 3 below, we
illustrate the general architecture.
For large documents and systems with high volumes of information exchange,
such an approach has a performance advantage because it avoids the overhead
of moving queries and data between an external program and the DBMS. It is
also attractive from an ongoing administration and maintenance perspective
because the embedded code is not linked into the ORDBMS as it might be with
more conventional programs. Extensible DBMSs employ dynamic linking and invocation
techniques that make replacing such a module as easy as dropping an empty
database table.

Figure 3
. Architecture for embedding XML parser into
ORDBMS
Illustrates embedding an XML parser into the ORDBMS. The parser picks
apart the XML document, modifying the state of the database schema depending
on what it finds. Note that the parser may elect to store the entire document
un-parsed. Parser may use XSL specification to allocate data in XML document
to locations in schema.
This process is made easier when the overall bundle also includes:
- Tools to map DTD and style sheets to corresponding relational schemas.
- Advanced data model features in the ORDBMS, like compound (multi-part)
types, collections (sets) and facilities to store semi-structured data like
text indexing.
Getting XML out
XML is a derivative of SGML. So is HTML. For some time, DBMS vendors
have been providing tools that can take the results of a SQL query, and return
it marked up with HTML tags. It is a fairly straightforward engineering assignment
to re-work these tools to handle XML too.
Web development tools like this rely on the way query results consist
of a set of named columns. Data in an ORDBMS’s columns can be of a compound
form (a single column containing multiple elements) or a COLLECTION (a single
row/column data object consisting of a set of values). Fortunately, both of
these novelties can be easily married to the XML data model.
In terms of our islands of data, getting XML out of the database makes
it possible to encapsulate the functionality of a central server like the
one we use in our examples. Other systems wishing to exchange information
with it can send in their contributions in XML form, and receive responses
in XML. An overall architecture that adopted this kind of approach might look
like what we see in
Figure 4 following.
In this figure we see how multiple, heterogeneous, islands of data can
all share their information in order to achieve more individual business efficiency.
In this figure, sub-sets of the information in systems developed by trucking
companies and food wholesalers are exchanged (using XML) with a central B2B
service. Using this service, other businesses can bid for allotments of perishable
goods, making their valuation decisions not merely on the quality of the good
on offer, but also on its geographic location, and based on whether or not
it can be delivered.
In this example, we see XML being used both to get the data into the
central store, and to get information out of the central store and back into
each external information system.

Figure 4
. B2B infrastructure architecture
The "bigger picture"; illustrating how the XML/ORDBMS strategies described
in this paper fits into an overall B2B infrastructure.
Shakin’ it all about
Most data management companies and many web applications will adopt
this kind of model. But it is unsuitable for every kind of XML. Another potential
use for XML is in document exchange. In this problem domain, XML data usually
exhibits much less structure than in the kind of scenario we envision earlier.
Never the less, it is still highly desirable to store the XML data in a transactional
system, and then to allow external users to interact with it: to query it,
read it, and so on. In other words, in addition to getting it in, and getting
it out, any complete XML story needs to deal also with shakin’ it all
about.
Ultimately, an XML document can be completely unstructured but ‘marked
up’. In this kind of document key words or phrases are tagged with a
label than conveys semantic information. Sometimes these tags are indications
for a user-interface program, but sometimes users want to ask questions about
the contents of such documents. For example, they may want to say “Show
me documents in the repository where the word ‘Paris’ is tagged
up as a ‘destination’?”
The appropriate way to store this kind of document data is to do so
using data management techniques like document indexing, query-by-document
content, and so on. Object-relational DBMSs can be extended with this kind
of functionality too. Whether or not you ultimately use a DBMS to store the
data, an ORDBMS can play an invaluable role as index and scalable subject
catalog.
Alternatively, in the absence of a style sheet or DTD, it might be desirable
to shred an XML document. Shredding involves
parsing the XML but instead of assigning values in its elements to corresponding
rows in a table. An obvious challenge with such a strategy is how you maintain
the XML document’s original structure within the Object-Relational model.
Summary and conclusions
In this paper we have explored how XML and extensible or Object-Relational
DBMS technology complement one another. In the short term, the importance
and usefulness of XML in building web applications is as a data inter-change
format, enabling information exchange between islands of data. But to use
XML efficiently requires changes to how database management systems are built,
and developers wishing to build effective web applications would to well to
use object-relational DBMSs somewhat differently from how they used relational
DBMSs in the past.
The key points of this paper are:
- 1. The core extensibility of an object-relational DBMS allows vendors
and our customers to embed logic into the DBMS to parse XML data, and then
modify the database based on the contents of the document. Similarly, the
best DBMS technology allows developers to embed logic to convert SQL query
results into XML.
- 2. Because the object-relational data model includes facilities like
compound (multi-element) data structures and COLLECTIONS (sets), the task
of mapping between XML and ORDBMS SQL is not as complex as the task of mapping
between XML and earlier versions of SQL. Further, using XML is necessary to
truly support such systems, because of the complexity of the data involved.
In summary, the three things you need to support XML are the capacity
to get XML data into your database, get XML out when an external system requires
it, and shake XML all about when you need to store it.