|
Quilt: an XML query language
|
 |
XML is an extremely versatile markup language, capable of labeling the
information content of diverse data sources including structured and semi-structured
documents, relational databases, and object repositories. A query language
of similar versatility is needed to realize the potential of XML as a universal
medium for data interchange. Most existing proposals for XML query languages
are robust for particular types of data sources but weak for other types.
In this paper, the authors combine features from several sources to propose
a new query language called Quilt, which is designed to be broadly applicable
across all types of XML data sources.
Introduction
"Mediocre composers borrow; great composers steal." -- Igor Stravinsky
As increasing amounts of information are stored in XML, exchanged in
XML, and presented as XML through various interfaces, the ability to intelligently
query XML data sources becomes increasingly important. The data model of XML
is quite different from the data models of traditional databases, and traditional
database languages are not well suited to querying XML sources. In addition,
XML blurs the distinction between data and documents, allowing documents to
be treated as data sources and traditional data sources to be treated as documents.
Query languages, including XML query languages, still tend to be designed
either for documents or for data. Since XML may represent a rich variety of
information coming from many sources and structured in many ways, an XML query
language must try to provide the same rich variety in the queries that may
be formulated.
Quilt is a query language for XML. It originated when the authors attempted
to apply XML query languages such as XML-QL, XPath, XQL, YATL, and XSQL to
a variety of use cases. We found that each of these languages had strong advantages
for some of the queries we examined, but was unable to express other queries
we found equally important. Therefore, we chose to take some of the best ideas
from these languages, plus some ideas from SQL and OQL, integrating them with
a fresh syntactic approach. Our goal was to design a small, implementable
language that met the requirements specified in the W3C XML Query Working
Group's
http://www.w3.org/XML/Group/xmlquery/xmlquery-req XML
Query Requirements. During our design work, we have adapted features from
various languages, carefully assembling them to form a new design--hence the
name "Quilt". The resulting language supports queries that draw information
from various sources and patch them together in a new form. This is another
reason for the name "Quilt".
Although Quilt borrows from many languages, its conceptual integrity
comes from a deep reliance on the structure of XML, which is based on hierarchy,
sequence, and reference. Quilt is able to express queries based on document
structure and to produce query results that either preserve the original document
structure or generate a new structure. Quilt can operate on flat structures,
such as rows from relational databases, and generate hierarchies based on
the information contained in these structures. It can also express queries
based on parent/child relationships or document sequence, and can preserve
these relationships or generate new ones in the output document. Although
Quilt can be used to retrieve data from objects, relational databases, or
other non-XML sources, this data must be expressed in an XML view, and Quilt
relies solely on the structure of the XML view in its data model.
We begin by describing the Quilt language, and then illustrate how the
language may be used for a variety of queries, including queries typically
used for relational data, semi-structured data, and structured documents.
A first look at Quilt
Two languages that were particularly influential in the design of Quilt
were XML-QL and XQL. We will begin our description of Quilt by discussing
some of the concepts it inherited from these two languages. In the discussion
that follows, we conceptualize the nested elements of an XML document as nodes
in a tree-like data model. Sometimes we will speak in terms of the elements
and attributes of the XML document, and sometimes in terms of the nodes of
the tree that models the document.
The core of an XML-QL query consists of a WHERE clause that binds one
or more variables, and a CONSTRUCT clause that uses the values of the bound
variables to construct a new document, possibly having a structure quite different
from that of the original document(s). This is a very powerful concept because
it allows documents to be joined, grouped, and transformed in arbitrary ways.
However, the patterns that are used in XML-QL for binding variables tend to
be unnecessarily verbose. More important, the result of the WHERE clause is
a relation composed of scalar values, which is not sufficient to preserve
all the information about the hierarchic and sequential relationships among
the elements of the original document. As a result, XML-QL is a very powerful
language for asking certain types of questions, but is weak at expressing
queries based on hierarchy and sequence, such as "Find the second quotation
in the fourth chapter." From XML-QL, Quilt borrows the concept of constructing
a new document from bound variables, but Quilt uses a different paradigm for
binding the variables and a richer data model to represent the result of the
binding.
XQL-98 contains a powerful syntax for navigating in a hierarchic document
and for selecting a set of nodes that satisfy some complex condition. This
syntax was a precursor of the abbreviated syntax of XPath, which has been
adopted as a W3C Recommendation and has been used in various XML-related applications
such as XPointer and XSLT. For selecting sets of nodes, Quilt uses abbreviated
XPath expressions, enhanced by certain operators borrowed from XQL. Since
an XPath expression, in general, returns a "forest" of nodes in which hierarchy
and sequence are preserved, Quilt is able to express queries based on hierarchy
and sequence more easily than languages such as XML-QL.
In this section, we will introduce the Quilt language using some example
queries based on the following XML document:
<?xml version="1.0"?>
<!DOCTYPE bib SYSTEM "books.bibitemd">
<bib>
<book>
<title>Harold and the Purple Crayon</title>
<author>
<lastname>Johnson</last>
<firstname>Crockett</first>
</author>
<pubinfo>
<publisher>Harper and Row</publisher>
<price>$4.76</price>
<year>1955</year>
</pubinfo>
</book>
<book>
<title>Harold's Fairy Tale</title>
<author>
<lastname>Johnson</last>
<firstname>Crockett</first>
</author>
<pubinfo>
<publisher>Harper and Row</publisher>
<price>$4.76</price>
<year>1956</year>
</pubinfo>
</book>
<book>
<title>Rise Up Singing</title>
<author>
<lastname>Blood</last>
<firstname>Peter</first>
</author>
<author>
<lastname>Patterson</last>
<firstname>Annie</first>
</author>
<pubinfo>
<publisher>Sing Out Corporation</publisher>
<price>$15.45</price>
<year>1988</year>
</pubinfo>
</book>
</bib>
A simple form of a Quilt query consists of FOR, WHERE, and RETURN clauses.
The FOR clause uses XPath expressions to bind the values of one or more variables.
In general, an XPath expression evaluates to a set of nodes. The FOR clause
generates an ordered list of tuples, each containing of a value for each of
the bound variables. A tuple is generated for each possible way of binding
the list of variables to nodes that satisfy their respective XPath expressions.
When a node is bound to a variable, its descendant nodes are carried along
with it. The WHERE clause applies a filter to the tuples, retaining only those
tuples that satisfy a given search condition. The RETURN clause then generates
a new document structure using the values of the bound variables.
The following example finds every book written by Crockett Johnson.
The FOR clause generates a list of bindings in which $b is bound to individual
book elements in the document found at the given URL, and $a is bound to individual
author elements that are descendants of $b. The WHERE clause retains only
those tuples of bindings in which the author is Crockett Johnson, and the
RETURN clause uses the resulting values of $b to generate a list of books.
By default, the ordering of book elements in the original document is preserved.
FOR $b IN document("http://www.biblio.com/books.xml")//book,
$a IN $b/author
WHERE $a/firstname = "Crockett" AND $a/lastname = "Johnson"
RETURN $b
The semantics of comparisons in Quilt is the same as in XPath. For example,
in the query above, consider the comparison $a/lastname = "Johnson". In general,
an XPath expression such as $a/lastname evaluates to a set of nodes. The comparison
is considered to be True if at least one of the nodes returned by $a/lastname
has a string-value equal to "Johnson".
The query shown above returns a list of book elements without a common
root element to hold them. Since a well-formed XML document must have precisely
one root element, we may desire to generate a new element to enclose the results
of our query. The easiest way to do this is to embed the entire query within
an element constructor, as illustrated in the following query:
<Result>
(
FOR $b IN document("http://www.biblio.com/books.xml")//book,
$a IN $b/author
WHERE $a/firstname = "Crockett"
AND $a/lastname = "Johnson"
RETURN $b
)
</Result>
In the above queries, the result consists of selected nodes from the
original document. Each selected node carries its descendants with it to the
result document, preserving part of the structure of the original document.
Many queries, however, need to create an output document that is structured
in a new way. One example of such a transformation is a query that inverts
the hierarchy of the input data, converting it from authors nested inside
books to books nested inside authors. This inversion can be accomplished by
the following query:
<Result>
(
FOR $a IN DISTINCT document("http://www.biblio.com/books.xml")//author
RETURN
<BooksByAuthor>
<Author>
$a/lastname/text()
</Author>
(
FOR $b IN document("http://www.biblio.com/books.xml")//book[author=$a]
RETURN $b/title SORTBY(title)
)
</BooksByAuthor> SORTBY(Author)
)
</Result>
The result of this query on the sample data shown above is as follows:
<Result>
<BooksByAuthor>
<Author>Blood</Author>
<title>Rise Up Singing</title>
</BooksByAuthor>
<BooksByAuthor>
<Author>Johnson</Author>
<title>Harold and the Purple Crayon</title>
<title>Harold's Fairy Tale</title>
</BooksByAuthor>
<BooksByAuthor>
<Author>Patterson</Author>
<title>Rise Up Singing</title>
</BooksByAuthor>
</Result>
The following points are worth noting about the above example:
- 1. This example illustrates how a query can be nested inside another
query. The variable $a, defined in the outer query, ranges over the set of
distinct authors in the input document. The variable $b, defined in the inner
query, ranges over the individual books written by a given author.
- 2. The data contained in the "Author" elements of the output document
is selected from the "lastname" elements of the input document. The contents
of these elements are extracted using XPath expressions that use the text
() function, and then enclosed in new tags in the RETURN clause.
- 3. The order of elements in the output document is controlled by SORTBY
clauses in both the inner and outer queries.
- 4. The predicate [author = $a] involves a comparison of complex types.
Quill compares elements that have substructure by normalizing whitespace and
testing for the same content and structure, in the same sequence. One concrete
way to do this is by serializing content and markup. For example, the normalized
string value of an element might be formed by serializing the markup, removing
leading and trailing whitespace for each element, and removing whitespace
between tags. Here is a sample <author> element, normalized using this
scheme: <last>Johnson</last><first>Crockett</first>
Once a complex element has been converted to a normalized string form,
it can be compared to the normalized string value of another complex element.
The queries discussed above explicitly state the URL of the document
to be queried. This is not always necessary, and is sometimes undesirable.
Some queries operate on a set of nodes for which there is no URL, or for which
it is not convenient to provide a URL; for instance, a query may operate on
the set of documents found in a web site or document repository, the set of
nodes found in a collection used in some programming language, or the results
of a previous query. Furthermore, it is useful to be able to write queries
that may be applied to a variety of data sources, including data sources that
do not exist at the time that the query is written. If the document () function
does not occur in a variable binding, the binding is applied to the set of
input nodes for the query, which is implicitly determined by the environment
in which the query is executed. The following query is equivalent to the first
query shown in this paper, except that it binds the variable $b to all book
elements contained within the implicit set of input nodes for the query:
FOR $b IN //book,
$a IN $b/author
WHERE $a/first = "Crockett"
AND $a/last = "Johnson"
RETURN $b
At this point, we have seen the basic structure of a Quilt query. The
next section describes the syntax of Quilt in more detail.
The Quilt language
This section discusses the structure and meaning of a Quilt query, starting
with top-level syntax and then examining each clause in detail.
Syntax
The following syntax shows the top-level structure of a Quilt query.
The grammar is still being developed as the language evolves, and is incomplete
in the current version of this document. Terminal symbols are shown in angular
brackets and their lexical structure is not further specified.
| Query |
::= | Function_defn* Expression |
| Function_defn | ::= |
'FUNCTION' |
|
| <Function_name> |
| | '(' <Variable> * ')' '{' Expression '}' |
| Expression | ::= |
<Variable> | <Constant> | |
| | <Function_name>
'(' ExpressionList ')' | Expression Operator Expression | |
| | <XPathExpression> | ElementConstructor | FLWR_Expression
| |
| |
'(' Expression ')' |
| ExpressionList |
::= | Expression | Expression
',' Expression |
| ElementConstructor |
::= | StartTag ExpressionList
Enbibitemag |
| StartTag | ::= | '<' <String> Attribute*
'>' |
| Enbibitemag | ::= |
'</' <String> '>' |
| Attribute |
::= | <String> '=' Expression
| Expression |
| FLWR_Expression |
::= | (FOR_clause | LET_clause)+
WHERE_clause? RETURN_clause |
| FOR_clause |
::= | 'FOR' FOR_binding (,
FOR_binding)* |
| FOR_binding |
::= | <Variable> 'IN'
'DISTINCT'? Expression |
| LET_clause |
::= | 'LET' LET_binding (,
LET_binding)* |
| LET_binding |
::= | <Variable> ':=' 'DISTINCT'?
Expression |
| WHERE_clause | ::= | 'WHERE' Expression |
| RETURN_clause | ::= |
'RETURN' Expression SORTBY_clause? |
| SORTBY_clause | ::= |
'SORTBY' '(' ExpressionList ')' ( 'ASCENDING' | |
| | 'DESCENDING' )? |
| Operator |
::= | '<' | '<=' | '>'
| '>=' | '=' | '!=' | '+' | .... |
Table
1
. Quilt grammar
A Quilt query can begin with the definition of one or more functions
that are used in the body of the query. The body of the query is simply an
expression. One important type of query is based on an "FLWR_expression",
which is constructed from FOR, LET, WHERE, and RETURN clauses. But a query
may also be based on other types of expressions, such as an XPath expression
or a element constructor. Element constructors are often used to generate
new output elements that contain data computed by the query.
In this grammar, XPathExpression is a terminal symbol. This symbol represents
an expression based on the abbreviated syntax of XPath, enhanced by the following
operators which are borrowed from XQL '99:
- 1. The BEFORE and AFTER operators. "a BEFORE b" evaluates to the set
of nodes in 'a' that occur before some node in 'b'. "a AFTER b" evaluates
the set of nodes in 'a' that occur after some node in 'b'. In the section
"Querying Structured Documents, we will see how these operators are used for
expressing conditions based on sequence.
- 2. The INTERSECT operator. "a INTERSECT b" evaluates to the set of
nodes in 'a' that are also found in 'b'. This is an enhancement to XPath,
which contains an operator for set union but not for intersection.
- 3. The EXCEPT operator. "a EXCEPT b" evaluates to the set of nodes
in 'a' that are not also found in 'b'.
We also use the document() function, which is taken from XSLT. In this
paper, the argument of the document() function is always a URL. When used
in this way, the document() function returns the root node of the document.
Some other functions are also used in the example queries, and are introduced
where they are used. There is not yet a well-defined standard function library
for Quilt.
The overall flow of information through an FLWR_expression is illustrated
in
Figure 1. The input to the expression consists of
one or more XML documents or fragments that are named in the FOR and/or LET
clauses. Each of these fragments can be conceptualized as an "ordered forest"--that
is, an ordered list of nodes representing XML elements, each with its tree
of descendant nodes. The result of the FOR and LET clauses is an ordered list
of tuples, each containing a value for each of the bound variables. The value
of a variable bound by a FOR clause is a tree (a single node and its descendants).
The value of a variable bound by a LET clause is a (possibly empty) ordered
forest of nodes. The WHERE clause serves as a filter that discards some of
the tuples and retains others. Finally, the RETURN clause is executed for
each surviving tuple, generating output nodes from the values of the bound
variables. The nodes generated by the RETURN clause can be linearized into
an output XML document or fragment.

Figure 1
. Flow of data in an FLWR expression
FOR
The FOR clause generates bindings for one or more variables. Each variable
introduced in the FOR clause is associated with an expression (typically,
an XPath expression.) In general, each of these expressions returns a set
of nodes. The result of the FOR clause is a set of tuples, each of which contains
a binding for each of the variables. The variables are bound to individual
nodes in the sets returned by their respective expressions, together with
their descendants. The number of tuples generated by a FOR clause is the product
of the cardinalities of the node-sets returned by the respective expressions.
The implied ordering among the tuples generated by the FOR clause is derived
from the ordering of their elements in the input document, with the first
bound variable taking precedence, followed by the second bound variable, etc.
In a future version of the Quilt grammar, we plan to provide a way in
which the ordering of the tuples generated by a FOR clause can be relaxed,
in order to permit more efficient processing of queries in which order is
not important.
The DISTINCT keyword can be applied independently to each expression
in a FOR clause, serving to eliminate duplicate values from the node-sets
returned by the expression. Equality is defined by equality of value rather
than by identity. A node set generated using the keyword DISTINCT is unordered,
and the tuples of bindings generated from such a node set are unordered also.
When DISTINCT is specified and several candidate nodes of equal value are
available for binding, Quilt does not specify which of the candidate nodes
is bound to the variable.
The following queries illustrate use of the DISTINCT keyword:
FOR $p IN //publisher
RETURN $p
<publisher>Harper and Row</publisher>
<publisher>Harper and Row</publisher>
<publisher>Sing Out Corporation</publisher>
FOR $p IN DISTINCT //publisher
RETURN $
<publisher>Sing Out Corporation</publisher>
<publisher>Harper and Row</publisher>
Figure 2
. Using DISTINCT
Query (Without DISTINCT):
Result:
Query (Using DISTINCT):
Result (Unordered):
The tuples generated by a FOR clause can be used to construct tabular
representations of hierarchical data, such as might be used in HTML tables
or relational databases. The following example query creates part of an XHTML
document in which names of authors and book titles appear in separate columns
of a table:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title> Books By Author </title>
</head>
<body>
<p>
The following table lists authors and books written
by each author.
</p>
<table wibibitemh="50%" border="1">
(
FOR $a IN DISTINCT //author ,
$b IN //book[author = $a]
RETURN
<tr>
<td>
$a/first/text(),
$a/last/text()
</td>,
<td>
$b/title/text()
</td>
</tr>
)
</table>
</body>
</html>
The table generated by the above query might have the following appearance:
| Crockett Johnson | Harold and the Purple Crayon |
| Crockett Johnson | Harold's Fairy Tale |
| Peter Blood | Rise Up Singing |
| Annie Patterson | Rise Up Singing |
Table
2
LET
The LET clause binds a variable to the value of an expression (typically
an XPath expression.) Since an expression in general returns an ordered list
of nodes, the value of a variable bound in a LET clause is an ordered list
of nodes, together with their descendants (an "ordered forest"). The value
bound by the LET clause preserves the ordering of nodes in the input document,
unless DISTINCT is specified. DISTINCT causes duplicate nodes to be eliminated
and the ordering of the nodes to become undefined.
The FOR and LET clauses work together to generate tuples of variable
bindings. Unlike a FOR clause, however, a LET clause does not affect the number
of tuples that are generated. Each LET clause binds its variable to exactly
one value (an "ordered forest"). If a query contains a LET clause but no FOR
clause, exactly one tuple of variable bindings is generated.
Each binding in a FOR or LET clause may contain references to variables
that have already been bound in earlier clauses or in an outer-level query.
A LET clause is often used to bind a variable to a set of values that
is used as the argument of some aggregate function such as avg(). For example,
the following query returns the average price of all the books in the input
document:
LET $b := //book
RETURN
<avgprice> avg($b/price) </avgprice>
A LET clause can be combined with a FOR clause to generate several sets
and to perform some computation on each. The following example uses variable
$pub to iterate over all the publishers in the input document, binds variable
$pri to the set of prices of books published by each publisher, and then generates
an output document that lists each publisher with its average book price.
FOR $pub IN DISTINCT //publisher
LET $pri := document("http://www.biblio.com/books.xml")
//book/pubinfo[publisher=$pub]/price
RETURN
<pubinfo>
$pub ,
<avgprice> avg($pri) </avgprice>
</pubinfo>
WHERE
The FOR and LET clauses generate tuples of bindings for the variables
that are introduced in these clauses. Each of these tuples is subject to further
filtering by the WHERE clause. Only those tuples of bindings for which the
condition in the WHERE clause is true are used to invoke the RETURN clause.
In the WHERE clause, predicates may be combined using parentheses, AND,
OR, and NOT. Predicates are based on XPath expressions that contain the variables
bound in the FOR and LET clauses. The WHERE clause may also use several operators
taken from XQL: set intersection is expressed with the INTERSECT keyword,
sequence is expressed with the BEFORE and AFTER operators, and set difference
is expressed using the EXCEPT operator. These operators will be illustrated
by example queries throughout the remainder of this paper.
In the following example, the WHERE clause uses a built-in function
named exists(), which returns False if its argument is the empty set, and
otherwise returns True. The query finds books that have titles but no authors.
<orphan_books>
(
FOR $b IN //book
WHERE exists($b/title)
AND NOT exists($b/author)
RETURN $b/title
)
</orphan_books>
Remember that variables bound in a FOR clause are bound to individual
nodes (with their descendants), but variables bound in a LET clause are bound
to ordered sets of nodes (with their descendants). In the WHERE clause, appropriate
predicates must be used with each type of variable. For example, in the following
query, $bk is bound to a set of books, and the WHERE clause appropriately
applies a count() function to count the number of books in the set. The query
returns publishers who have published more than 100 books.
FOR $pub IN DISTINCT //publisher
LET $bk := //book[pubinfo/publisher=$pub]
WHERE count($bk) > 100
RETURN $pub
If it were desired to add an additional condition on books, such as
"find publishers who published more than 100 books in 1999", this condition
could not be added to the WHERE clause, since the WHERE clause has access
only to sets of books, not to individual books. The proper place to add such
a condition would be in the XPath expression that defines $bk, as follows:
FOR $pub IN DISTINCT //publisher
LET $bk := //book[pubinfo/publisher=$pub AND pubinfo/year="1999"]
WHERE count($bk) > 100
RETURN $pub
RETURN
The RETURN clause generates the output of the FLWR-expression, which
may be a node, an "ordered forest" of nodes, or a primitive value. The RETURN
clause is invoked once for each tuple of variable bindings that is generated
by the FOR and LET clauses and satisfies the condition in the WHERE clause.
If an ordering exists among the tuples of variable bindings, the RETURN clause
is invoked on each tuple, in order, and the order is preserved in the output
document. The RETURN clause contains an expression that may include structured
XML text, bound variables and XPath expressions based on these variables,
and subqueries. We have already seen many examples of RETURN clauses, and
we will see more in the queries that follow.
It is often important to specify an order for the elements in a query
result, supplementing or superceding the ordering derived from the bindings
of variables. If the query result contains several levels of elements, an
ordering may be required among the elements at each level. Quilt provides
a SORTBY clause that may be used after a constructed element, variable, or
path expression to specify an ordering among the generated elements. The arguments
of the SORTBY clause are evaluated relative to the individual nodes to be
sorted, and may be followed by ASCENDING or DESCENDING to specify the direction
of the sort (ASCENDING is the default).
The following query illustrates some of the features of SORTBY. It generates
an alphabetical list of publishers. Within each publisher it generates an
alphabetical list of book titles, followed by a list of authors ordered by
their last and first names. The notation "SORTBY(.)" indicates that the generated
elements are to be ordered by their content.
<result>
(
FOR $p IN DISTINCT //publisher
RETURN
<publisher>
<name>
$p/text()
</name> ,
<books>
(
FOR $t IN DISTINCT //book[pubinfo/publisher = $p]/title
RETURN $t SORTBY(.)
)
</books> ,
<authors>
(
FOR $a IN DISTINCT //book[pubinfo/publisher = $p]/author
RETURN $a SORTBY(lastname, firstname)
)
</authors>
</publisher> SORTBY(name)
)
</result>
Filtering documents
An important class of queries extracts selected elements from a document
while preserving the relationships among the selected elements. In many cases,
both hierarchic and sequential relationships must be preserved. This type
of query can be expressed in Quilt by using the keyword FILTER in a LET clause.
As we have seen, a LET clause binds a variable to an ordered set of
nodes that are selected by an XPath expression. We will refer to these nodes
as the "selected nodes". In general, each of the selected nodes carries with
it all its descendant nodes, so we may think of the selected nodes as a list
of trees (node hierarchies). A LET clause may also contain the keyword FILTER
followed by a second XPath expression called the filter. Each of the subtrees
in the selected list is "filtered" in such a way that only those nodes are
retained that satisfy the filter XPath expression. Descendants of nodes that
satisfy the filter are not retained unless they independently satisfy the
filter. The hierarchical and sequential relationships among the surviving
nodes are preserved (for example, if node Y is a descendant of node X and
all the intervening nodes between X and Y fail to satisfy the filter, then
Y becomes an immediate descendant of X.)
If the right-hand side of a LET clause contains only the FILTER keyword
followed by a filter expression, the input to the filter is the implicit input
document of the query.
Figure 3 shows the effect of applying a filter
to a tree of nodes. In this case, the filter retains only nodes of type A
and B, causing the original tree to split into two trees.

Figure 3
. Effects of filtering on a tree of nodes
We will illustrate the use of FILTER by writing some queries that operate
on a document named "cookbook.xml". The document contains many different kinds
of elements, including Section elements that may be nested inside each other
to many levels of nesting. Each Section element immediately contains a Title
element. Mixed among the other elements, at various levels of nesting, are
Figure elements, which may also contain Title elements. The following fragment
illustrates the structure of the cookbook:
<Section><Title>Desserts</Title>
<Para>This section tells you all about yummy desserts.</Para>
<Section><Title>Cookies</Title>
<Para>Cookies are flat things that sometimes have chocolate chips.
<Figure image="cookie1"><Title>A Big Cookie</Title></Figure>
</Para>
</Section>
<Section><Title>Candy</Title>
<Para>Candy is dandy.</Para>
</Section>
<Section><Title>Pies and Tarts</Title>
<Para>Pies and tarts are even better.</Para>
<Figure image="applepie"><Title>An Apple Pie</Title></Figure>
<Figure image="peachpie"><Title>A Peach Pie</Title></Figure>
</Section>
<Section><Title>Cakes and Cupcakes</Title>
<Para>Cakes and Cupcakes are absolutely awesome.</Para>
<Section><Title>Cakes</Title>
<Figure image="choc_cake"><Title>A Chocolate Cake</Title></Figure>
</Section>
<Section><Title>Cupcakes</Title>
<Para>Cupcakes are not as big but just as good.</Para>
</Section>
</Section>
</Section>
Suppose that we wish to generate a table of contents for the cookbook,
in which the hierarchy of nested sections is preserved but only the title
of each section is included. We might think of this as a process of "projecting"
the sections and section titles out of the document, preserving their original
hierarchy and sequence, while eliminating other kinds of elements. The following
query could be used for this purpose:
<toc>
(
LET $s := document("cookbook.xml")
FILTER //Section | //Section/Title | //Section/Title/text()
RETURN $s
)
</toc>
In this example, the LET clause uses FILTER to bind variable $s to a
copy of the cookbook document that contains only Section elements and Title
elements that are immediately contained within Section elements, preserving
the original relationships among these elements. All other elements are removed,
even if they are descendants of a selected element. Since $s then contains
exactly the elements that we wish to include in the output, the query can
simply return $s without any further processing. The result of applying this
query to the document fragment above is as follows:
<Section><Title>Desserts</Title>
<Section><Title>Cookies</Title>
</Section>
<Section><Title>Candy</Title>
</Section>
<Section><Title>Pies and Tarts</Title>
</Section>
<Section><Title>Cakes and Cupcakes</Title>
<Section><Title>Cakes</Title>
</Section>
<Section><Title>Cupcakes</Title>
</Section>
</Section>
</Section>
The hierarchy and sequence of the query results are determined by the
document, and could not have been known in advance by the person formulating
the query; in fact, the purpose of the query is to determine this unknown
hierarchy and sequence.
Now suppose that we wish to extend our table of contents to include
the number of Figures in each Section. This is a much more complex transformation
in the structure of the document, and we will accomplish it by defining a
function. As shown in the Quilt grammar, a query can begin by defining a function
that takes one or more parameters. In this example, we will define a function
named "section_summary" that takes a Section element as its parameter. A function
is allowed to call itself recursively, and we will make use of this capability
in this example. The "section_summary" function generates a new Section element
that contains only the Title of the original Section and the number of Figures
that are immediately contained in it; it also invokes itself recursively to
process all immediately nested Sections in the same way.
The main part of the query invokes the "section_summary" function on
all the top-level Sections of the document. But before doing this, it filters
the input document so that it contains only Section, Title, and Figure elements.
The filtering step is necessary so that all Figure elements appear to be immediately
contained within their respective Sections, even if they were originally nested
inside some other element such as a paragraph or a list. By eliminating intervening
elements, the FILTER enables each Section to compute an accurate count of
its Figures.
FUNCTION section_summary($s)
{
<Section>
$s/Title ,
(
LET $f := $s/Figure
RETURN <Figcount> count($f) </Figcount>
) ,
(
FOR $ss IN $s/Section
RETURN section_summary($ss)
)
}
<toc>
LET $stf := document("cookbook.xml")
FILTER //Section | //Title | //Figure
FOR $s IN $stf/Section
RETURN section_summary($s)
</toc>
The result of applying this query to the document fragment shown above
is as follows:
<Section><Title>Desserts</Title>
<Figcount> 0 </Figcount>
<Section><Title>Cookies</Title>
<Figcount> 1 </Figcount>
</Section>
<Section><Title>Candy</Title>
<Figcount> 0 </Figcount>
</Section>
<Section><Title>Pies and Tarts</Title>
<Figcount> 2 </Figcount>
</Section>
<Section><Title>Cakes and Cupcakes</Title>
<Figcount> 0 </Figcount>
<Section><Title>Cakes</Title>
<Figcount> 1 </Figcount>
</Section>
<Section><Title>Cupcakes</Title>
<Figcount> 0 </Figcount>
</Section>
</Section>
</Section>
Querying relational data
Since much of the world's business data is now stored in relational
databases, access to relational data is a vitally important application for
an XML query language. In this section, we will illustrate the use of Quilt
to access relational data by a series of examples based on a very simple database
that describes an online auction. The three tables in the auction database
are shown below. The USERS table has a row for each user who is registered
with the auction, either as a seller or a bidder, with the USERID column as
its primary key. The ITEMS table lists all the items that are currently for
sale, with the ITEMNO column as primary key and the OFFERED_BY column as a
foreign key that matches the USERID of the offering user. The BIDS table lists
the bids that have been received for various items, using the USERID and ITEMNO
columns as foreign keys to identify the userid of the bidder and the item
to which the bid applies.
Table
3
. USERS
| ITEMNO | DESCRIPTION | OFFERED_BY |
RESERVE_PRICE |
Table
4
. ITEMS
| USERID | ITEMNO | BID_AMOUNT |
BID_DATE |
Table
5
. BIDS
In this section, we will assume that a layer of software running on
top of a relational database system presents each table in the form of an
XML document. The name of the document element is the same as the name of
the table, and each row of the table is represented by a nested element that
has the same name as the table with the suffix "_tuple". Inside each row element
are nested a series of elements that represent the column-values stored in
that row. For example, two rows of the USERS table might be represented by
the following XML view:
<users>
<user_tuple>
<userid> 1005 </userid>
<name> Baker </name>
<rating> B </rating>
</user_tuple>
<user_tuple>
<userid> 1007 </userid>
<name> Carter </name>
<rating> A </rating>
</user_tuple>
</users>
The Document Type Definitions for the XML views of the three tables
in our online auction database are as follows:
<!DOCTYPE users [
<!ELEMENT users (user_tuple*)>
<!ELEMENT user_tuple (userid, name, rating)>
<!ELEMENT userid (#PCDATA)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT rating (#PCDATA)>
]>
<!DOCTYPE items [
<!ELEMENT items (item_tuple*)>
<!ELEMENT item_tuple (itemno, description, offered_by, reserve_price)>
<!ELEMENT itemno (#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ELEMENT offered_by (#PCDATA)>
<!ELEMENT reserve_price (#PCDATA)>
]>
<!DOCTYPE bids [
<!ELEMENT bids (bid_tuple*)>
<!ELEMENT bid_tuple (userid, itemno, bid_amount, bid_date)>
<!ELEMENT userid (#PCDATA)>
<!ELEMENT itemno (#PCDATA)>
<!ELEMENT bid_amount (#PCDATA)>
<!ELEMENT bid_date (#PCDATA)>
]>
It is clear that many applications will require structured XML views
of relational data, in which elements derived from different tables are nested
to represent their logical relationships. We believe that an XML query language
such as Quilt could be used to construct such structured XML views from simple
default XML views of individual tables such as the ones described above. At
the end of this section, we will give an example of how such a view can be
constructed.
The example queries below illustrate several common types of access
to relational data. Since SQL is a well-known relational language, we introduce
each query by expressing it in English and in SQL, and then show its representation
in Quilt. For each query, we also show a partial result containing enough
data to clarify the expected structure of the query result.
Most of the queries shown below generate their output in the form of
a single element named <result>, with nested elements that contain the
data of the query result. This is a convention that is not required for all
queries, as we will see in some examples near the end of the section.
Our first example query is a very simple relational query that retrieves
selected columns from the rows of one table that satisfy some search-condition.
SELECT i.itemno, i.description, i.reserve_price
FROM items AS i
WHERE i.description LIKE 'Bicycle'
AND i.reserve_price < 50
ORDER BY i.itemno
<result>
(
FOR $i IN document("items.xml")//item_tuple
WHERE contains($i/description, "Bicycle")
AND $i/reserve_price < 50
RETURN
<item_tuple>
$i/itemno,
$i/description,
$i/reserve_price
</item_tuple> SORTBY(itemno)
)
</result>
<result>
<item_tuple>
<itemno> 10577 </itemno>
<description> Mountain Bicycle </description>
<reserve_price> 45 </reserve_price>
</item_tuple>
<item_tuple>
<itemno> 21684 </itemno>
<description> 10-speed Bicycle </description>
<reserve_price> 40 </reserve_price>
</item_tuple>
</result>
Figure 4
. Q1
Find the item number, description, and reserve price for all bicycles
that have a reserve price of less than $50, in order by item number.
SQL expression:
Quilt expression:
Partial result:
In Q1, the analogy between the clauses in SQL and Quilt is straightforward.
The Quilt FOR-clause, like the SQL FROM-clause, binds a variable that ranges
over the rows of the table. In both languages, the WHERE-clause is used to
filter the set of rows by applying a search condition. In Q1, the search-condition
could alternatively have been expressed by using predicates in square brackets
in the FOR-clause. The Quilt RETURN-clause provides the function of the SQL
SELECT and ORDER BY clauses, generating output, including tags, and controlling
the order of the output data.
Query Q2 illustrates how data can be formed into groups and an aggregate
function can be applied to each group.
SELECT b.itemno, max(b.bid_amount) AS bid
FROM bids AS b
GROUP BY b.itemno
ORDER BY b.itemno
<result>
(
FOR $i IN DISTINCT document("bids.xml")//itemno
LET $b := document("bids.xml")//bid_tuple[itemno = $i]
RETURN
<highbid>
$i,
<bid> max($b/bid_amount) </bid>
</highbid> SORTBY(itemno)
)
</result>
<result>
<highbid>
<itemno> 10579 </itemno>
<bid> 55 </bid>
</highbid>
<highbid>
<itemno> 21648 </itemno>
<bid> 45 </bid>
</highbid>
</result>
Figure 5
. Q2
For each item that has received a bid, list the item number and the
highest bid received for the item, in order by item number.
SQL expression:
Quilt expression:
Partial result:
The following aspects of the Quilt expression for Q2 are worth noting:
- 1. Q2 binds two variables. Variable $i ranges over all the distinct
item numbers in the BIDS table. Variable $b represents the set of rows in
the BIDS table that contain item numbers that match $i. The RETURN clause
is executed once for each distinct item number.
- 2. The argument of the max function is $b/bid_amount. $b is bound to
a set of bid_tuple elements, $b/bid_amount represents the set of bid_amount
elements that are nested inside these bid_tuple elements, and max($b/bid_amount)
returns the maximum of the values of the bid_amount elements in the set.
- 3. In Q2, the condition that appears in square brackets in the LET
clause could not have been expressed in a WHERE-clause, since it is essential
to the definition of the set to which $b is bound. Any search-conditions that
apply to $b in a WHERE-clause would accept or reject the whole set represented
by $b rather than individual elements of the set.
Query Q3 represents the type of query called a "join" by relational
systems.
SELECT i.itemno, i.description, u.name AS seller
FROM items AS i, users AS u
WHERE i.offered_by = u.userid
AND i.description LIKE 'Bicycle'
AND u.rating = 'A';
<result>
(
FOR $i IN document("items.xml")//item_tuple,
$u IN document("users.xml")//user_tuple
WHERE $i/offered_by = $u/userid
AND contains($i/description, "Bicycle")
AND $u/rating = "A"
RETURN
<interesting_item>
$i/itemno,
$i/description,
<seller> $u/name/text() </seller>
</interesting_item>
)
</result>
<result>
<interesting_item>
<itemno> 10579 </itemno>
<description> Mountain Bicycle </description>
<seller> Jones </seller>
</interesting_item>
<interesting_item>
<itemno> 20860 </itemno>
<description> Racing Bicycle </description>
<seller> Smith </seller>
</interesting_item>
</result>
Figure 6
. Q3
Find item numbers and descriptions of Bicycles offered by users whose
rating is "A", and the names of the offering users (in any order).
SQL expression:
Quilt expression:
Partial result:
The result of Query Q3 includes some elements whose names are derived
from the original document and some other elements whose names are generated
by the query itself. For example, the expression $i/itemno evaluates to an
element named "itemno", which is included in the query result with its original
tag. The expression $u/name would have generated an element named "name".
Since it is desired to include the text content of the name element in the
query result with a different tag ("seller"), Q3 uses the text() function
of XPath to extract the text content and encloses it in explicit begin- and
end-tags.
Query Q4 is a relatively complex query that combines the relational
functions of join, grouping, and ordering.
SELECT i.itemno, i.description,
count(*) AS bid_count,
max(b.bid_amount) AS high_bid
FROM items AS i, bids AS b
WHERE i.itemno = b.itemno
GROUP BY i.itemno, i.description
HAVING count(*) >= 50
ORDER BY high_bid DESC
<result>
(
FOR $i IN document("items.xml")//item_tuple,
LET $b := document("bids.xml")//bid_tuple[itemno = $i/itemno]
WHERE count($b) >= 50
RETURN
<popular_item>
$i/itemno,
$i/description,
<bid_count> count($b) </bid_count>,
<high_bid> max($b/bid_amount) </high_bid>
</popular_item> SORTBY(high_bid) DESCENDING
)
</result>
<result>
<popular_item>
<itemno> 93558 </itemno>
<description> Tennis Racket </description>
<bid_count> 61 </bid_count>
<high_bid> 20 </high_bid>
</popular_item>
<popular_item>
<itemno> 45076 </itemno>
<description> Toaster </description>
<bid_count> 57 </bid_count>
<high_bid> 11 </high_bid>
</popular_item>
</result>
Figure 7
. Q4
List item number, description, number of bids, and highest bid for items
that have received at least 50 bids, in descending order by highest bid.
SQL expression:
Quilt expression:
Partial result:
In the Quilt expression for Q4, the WHERE clause applies a filtering
condition to the sets that are bound to $b in the LET clause. In this case,
the WHERE clause of Quilt is analogous to the HAVING clause of SQL.
Query Q5 matches information from the USERS table with information from
the ITEMS table, preserving those rows of the USERS table that have no counterpart
in the ITEMS table. In relational terminology, this type of query is called
an "outer join."
SELECT u.name, i.description AS offering
FROM users AS u OUTER JOIN items AS i ON u.userid = i.offered_by
ORDER BY name, offering
<result>
(
FOR $u IN document("users.xml")//user_tuple
RETURN
<user>
$u/name,
(
FOR $i IN document("items.xml")//item_tuple
[itemno = $u/offered_by]
RETURN
<offering>
$i/description/text()
</offering> SORTBY(.)
)
</user> SORTBY(name)
)
</result>
<result>
<user>
<name> Andrews </name>
<offering> Golf club </offering>
<offering> Tennis racket </offering>
</user>
<user>
<name> Baker </name>
</user>
<user>
<name> Caswell </name>
<offering> Coffee maker </offering>
</user>
</result>
Figure 8
. Q5
List the names of all users and the descriptions of the items they have
offered for sale. Users who have not offered any items should be included,
and output should be ordered by alphabetically by user name and secondarily
by item description.
SQL expression:
Quilt expression:
Partial result:
The following features of the Quilt expression for Q5 are worth noting:
- 1. The results of the SQL query and the Quilt query are not exactly
the same. Since SQL is a relational language, the result of an SQL query is
always a table. In the SQL query result for Q5, each user's name is repeated
in the query output for each item offered by that user. Quilt, on the other
hand, takes advantage of the hierarchical nature of XML to avoid repeating
user names. In the Quilt query result for Q5, each user name appears once,
in an element with the tag USER, and all the items offered by a given user
are nested inside the appropriate USER element.
- 2. In previous queries, when a SORTBY clause was used to specify the
ordering of a set of elements in the query result, the argument of SORTBY
was the name of the nested element to be used as the sort key. In the result
of Q5, however, we wish the <offering> elements to be sorted by their own
string values rather than by the value of a nested element. Therefore, we
specify "SORTBY(.)", indicating that the sort key is the element itself.
Q6 illustrates the use of the exists() function, which takes a set of
elements as its argument and returns True if the set is non-empty. In Q6,
the argument of the exists() function is an XPath expression.
SELECT u.name
FROM users AS u
WHERE NOT EXISTS
(SELECT i.itemno
FROM items AS i
WHERE i.offered_by = u.userid)
ORDER BY name;
FOR $u IN document("users.xml")//user_tuple
WHERE NOT exists
(document("items.xml")//item_tuple[offered_by = $u/userid])
RETURN $u/name SORTBY(.)
<name> Baker </name>
<name> Blackstone </name>
<name> Thompson </name>
Figure 9
. Q6
List names of users who have not offered anything for sale, in alphabetical
order.
SQL expression:
Quilt expression:
Partial result:
Unlike our previous examples, Q6 does not "wrap" its result in an enclosing
element such as <result>. Instead, Q6 returns a list of <name> elements,
ordered by their content. It is also possible for a Quilt query to return
a simple value, as in Q7.
SELECT max(bid_amount)
FROM bids;
LET $b := document("bids.xml")//bid_amount
RETURN max($b)
Figure 10
. Q7
Find the highest amount that has ever been bid for any item.
SQL expression:
Quilt expression:
Example result:
We have previously mentioned the importance of providing a nested view
of a relational database. A Quilt query can be used to define such a view,
based on simple default views such as the ones in the above examples. A view
definition facility might take the form of a DEFINE statement that uses a
query to define a view that can be subsequently used in other queries and
view definitions as though it were a real document. A system that implements
such a facility would need to maintain a catalog of view definitions. Although
the details of view definition in Quilt remain to be worked out, we provide
in Q8 an example of what a DEFINE statement might look like.
DEFINE $auction AS
<auction>
(
FOR $u IN document("users.xml")//user_tuple
RETURN
<User>
$u/userid,
$u/name,
$u/rating,
(
FOR $i IN document("items.xml")//item_tuple
WHERE $i/offered_by = $u/userid
RETURN
<Item>
$i/itemno,
$i/description,
$i/reserve_price,
(
FOR $b IN document("bids.xml")//bid_tuple
WHERE $b/itemno = $i/itemno
RETURN
<Bid>
$b/userid,
$b/bid_amount,
$b/bid_date,
</Bid> SORT BY bid_date
)
</Item> SORT BY itemno
)
</User> SORT BY userid
)
</auction>
Figure 11
. Q8
Construct a nested XML view of the auction database, consisting of a
single Auction element. Inside the Auction element, include a User element
for each registered user, in order by userid. If a user has offered any items
for sale, these items are represented by Item elements nested inside the User
element, ordered by item number. If an item has received any bids, these bids
are represented by Bid elements nested inside the Item element, ordered by
bid date.
Quilt Expression:
Querying structured documents
In XML, the two main forms of structure are hierarchy and sequence.
Since purely relational databases have neither hierarchy nor sequence, most
database systems do not support queries that take advantage of this structure.
Several queries shown earlier in this paper create hierarchies from flat relational
data, map one hierarchy onto another hierarchy, or flatten hierarchical data
to show a tabular view. We have also seen some queries that perform some transformation
on a document while preserving its hierarchical structure. In this section,
we will investigate queries that use the hierarchy or sequence of a document
to formulate conditions. Our queries will be based the following surgical
data, provided to us by Dr. Tom Lincoln, taken from a fictitious patient record
encoded in a HL7 Patient Record Architecture format:
<section><section.title>Procedure</section.title>
The patient was taken to the operating room where
she was placed in a supine position and
<Anesthesia>induced under general anesthesia.</Anesthesia>
<Prep>
<action>Foley catheter was placed to decompress the bladder</action>
and the abdomen was then prepped and draped in sterile fashion.
</Prep>
<Incision>
A curvilinear incision was made
<Geography>in the midline immediately infraumbilical</Geography>
and the subcutaneous tissue was divided
<Instrument>using electrocautery.</Instrument>
</Incision>
The fascia was identified and
<action>#2 0 Maxon stay sutures were placed
on each side of the midline.</action>
<Incision>
The fascia was divided using
<Instrument>electrocautery</Instrument>
and the peritoneum was entered.
</Incision>
<Observation>The small bowel was identified</Observation>
and
<action>
the
<Instrument>Hasson trocar</Instrument>
was placed under direct visualization.
</action>
<action>
The
<Instrument>trocar</Instrument>
was secured to the fascia using the stay sutures.
</action>
<!-- The section continues, but we excerpt it here... -->
Note that this document has a structure like that generally associated
with documents, but contains data that partly resembles traditional database
data. This is a useful general approach to embedding data in narrative or
other kinds of documents.
First we will examine a few queries that use absolute position. XPath
expressions use subscripts to indicate absolute position. Consider the following
query:
FOR $s IN //section[section.title = "Procedure"]
RETURN ($s//Incision)[2]/Instrument
Figure 12
. Q9
In each section with title "Procedure", what Instruments were used in
the second Incision?
Note that the precedence rules of XPath do not give the desired result
without the parentheses in the RETURN clause. The expression "$s//Incision[2]"
would return nodes that are the second Incision node within their immediate
parent. Since we are looking for the second Incision among all the descendants
of $s, regardless of level, we use the expression "($s//Incision)[2]".
Subscripts can also be used to indicate ranges. Consider the following
query:
FOR $s IN //section[section.title = "Procedure"]
RETURN ($s//Instrument)[1-2]
Figure 13
. Q10
In each section with title "Procedure", what are the first two instruments
to be used?
Now we will consider three queries that use the BEFORE and AFTER operators
of XQL. The expression "exp1 BEFORE exp2", in which exp1 and exp2 are path
expressions, evaluates to the list of nodes selected by exp1 that occur before
at least one node selected by exp2. We will start with a query that combines
AFTER with subscripts:
FOR $i IN (//Incision)[2] , $a IN (//action AFTER $i)[1-2]
RETURN $a//Instrument
Figure 14
. Q11
What instruments were used in the first two actions after the second
incision?
Next we will consider a query where the sequence has a more dramatic
significance:
FOR $proc IN //section[section.title="Procedure"]
WHERE NOT exists( $proc//Anesthesia BEFORE ($proc//Incision)[1] )
RETURN $proc
Figure 15
. Q12
Find "Procedure" sections where no anesthesia element occurs before
the first incision.
In many cases, BEFORE and AFTER are used to reconstruct a narrative,
as in the following example.
FOR $proc IN //section[section.title="Procedure"][1],
$bet IN $proc//((* AFTER ($proc//incision)[1]) BEFORE ($proc//incision)[2])
RETURN $bet
Figure 16
. Q13
In the first procedure, what happened between the first incision and
the second incision?
Exploiting references in documents
References from one element to another are an important part of XML.
Since XPath expressions are a basic building block of the Quilt grammar, Quilt
could exploit references by using the ID function of XPath, which was designed
for this purpose. Because of the importance of references, however, Quilt
introduces a new reference-related abbreviation that can be used in XPath
expressions in Quilt queries.
In this section, we will describe the Quilt syntax for following references.
Our examples will be based on intra-document references using attributes of
type ID and IDREF, but we believe that a straightforward extension can accomodate
inter-document references based on the XPointer specification.
Quilt introduces the notation X-> as an abbreviation for the XPath function
ID(X). The expression X-> evaluates to the set of nodes that are referenced
by X. The commonest use of this notation is with an attribute of type IDREF
or IDREFS. For example, suppose that an element of type Figref has an IDREF-type
attribute named "target" that references a Figure element. Then, in a Quilt
query, the following expression could be used to find all Figure elements
that are targets of Figref elements:
The Quilt "right-arrow" notation allows an expression containing a reference
to be read from left to right, and is intended to be easier to read than the
XPath functional notation. For example, the following expression finds the
titles of all Figure elements that are targets of Figref elements:
To illustrate the use of references in Quilt queries, suppose that we
have a repository containing a collection of papers. The repository might
be represented as a large XML document in which each paper is represented
by a Paper element with a unique ID attribute. Nested immediately inside each
Paper element is a Title element. Many of the papers reference other papers,
and these references are represented by Reference elements nested at some
level inside the Paper elements, each containing an attribute named "paperid"
of type IDREF that matches the ID attribute of the referenced paper.
The example query below illustrates the use of references, and also
illustrates how a Quilt query can transform the structure of a document. In
this example, Title elements in the input document are transformed into attributes
in the output document.
FOR $p IN //Paper
RETURN
<Paper title=$p/Title>
(
FOR $r IN $p//Reference
RETURN
<Reference title = $r/@paperid->/Title />
)
</Paper>
<Paper title="The Genius of Mark Twain">
<Reference title="Innocents Abroad" />
<Reference title="Huckleberry Finn" />
</Paper>
<Paper title="The Social Commentary of John Steinbeck">
<Reference title="Grapes of Wrath" />
<Reference title="Cannery Row" />
</Paper>
Figure 17
. Q14
Generate a list of papers, each identified by its title, and each containing
a list of the titles of all the papers that it references.
Quilt Expression:
Result:
Conclusion
Traditionally, documents and databases have been considered to be separate
and distinct forms of information. Documents have an irregular structure in
which hierarchy and sequence are very important. Databases have a much more
regular structure and, in relational databases, place little importance on
hierarchy and sequence. With the emergence of XML, the sharp distinction between
different forms of information such as documents and databases is quickly
disappearing. Unfortunately, most existing query languages are either document-oriented
or database-oriented, and find it awkward or impossible to express queries
outside the domain for which they were designed.
The Quilt language attempts to pull together features from several languages
that enable it to operate on a broad range of data sources. From XPath and
XQL it draws a powerful path expression syntax that can navigate inside a
hierarchical document, selecting a set of nodes that satisfy a complex predicate.
From XML-QL it draws the notion of bound variables and a versatile syntax
that can generate an output document of arbitrary structure.
This paper has illustrated the versatility of Quilt by using it to express
queries against diverse sources, ranging from documents to relational databases.
On relational data, we have written queries involving joins, outer joins,
and grouping. On documents, we have written queries that preserve hierarchy
and exploit ordering and references. We believe that Quilt represents a promising
approach to a query language that can help to realize the potential of XML
as a universal medium for data interchange.
Bibliography
| [Exemplars] | Mary Fernandez et al. XML Query Languages:
Experiences and Exemplars. See http://www-db.research.bell-labs.com/user/simeon/xquery.html |
| [Lorel] | Serge Abiteboul, Dallan Quass, Jason McHugh, Jennifer
Widom, and Janet L. Wiener. The Lorel Query Language for Semistructured Data. International
Journal on Digital Libraries, 1(1):68-88, April 1997. See http://www-db.stanford.edu/~widom/pubs.html |
| [Tree Structure] | Jonathan Robie. The Tree Structure
of XML Queries. See http://www.w3.org/XML/Group/1999/10/xquery-tree.html |
| [ODMG] | Rick Cattell et al. The Object Database Standard:
ODMG-93, Release 1.2. Morgan Kaufmann Publishers, San Francisco,
1996. |
| [XML] | World Wide Web Consortium. Extensible Markup
Language (XML) 1.0. W3C Recommendation. See http://www.w3.org/TR/1998/REC-xml-19980210 |
| [XML-QL] | Alin Deutsch, Mary Fernandez, Daniela Florescu, Alon
Levy, and Dan Suciu. A Query Language for XML. See http://www.research.att.com/~mff/files/final.html |
| [XML Query Requirements] | World Wide Web Consortium. XML
Query Requirements. W3C Working Draft. See http://www.w3.org/TR/xmlquery-req |
| [XPath] | World Wide Web Consortium. XML Path Language
(XPath) Version 1.0. W3C Recommendation. See http://www.w3.org/TR/xpath.html |
| [XPointer] | World Wide Web Consortium. XML Pointer
Language (XPointer). W3C Working Draft. See http://www.w3.org/TR/WD-xpt |
| [XQL] | J. Robie, J. Lapp, D. Schach. XML Query Language
(XQL). See http://www.w3.org/TandS/QL/QL98/pp/xql.html |
| [XSLT] | World Wide Web Consortium. XSL Transformations
(XSLT). W3C Recommendation. See http://www.w3.org/TR/xslt |
| [YATL] | S. Cluet, S. Jacqmin, and J. Simeon. The New
YATL: Design and Specifications. Technical Report, INRIA, 1999. |