|
A canonical query language & its efficient implementation
|
 |
The XML community is eagerly awaiting a W3C recommendation for a XML
query language. However, it has been estimated that it may take more than
a year until that recommendation reaches a final stage. Therefore, it may
be worthwhile to study in the meantime the intrinsic characteristics of query
languages for XML.
A number of query languages have been initially proposed for XML. Also
some simple query mechanisms are part of (proposed) recommendations, like
in XSL and Xpath.
It has been argued (e.g. in Cotton, P., and Malhotra, A., “Candidate
Requirements for XML Query”, Nov. 30, 1998) that the syntax for the
query formalism should be XML itself, that it will support full-text queries,
and that fast retrieval programs could be generated automatically based upon
the query formalism.
These requirements are so general and intuitive acceptable that it may
be conjectured that they will be the core of any future query language for
structured documents. As such, we might give it the name “canonical”
query language (“CQL”).
In this paper we show how the requirements for CQL can be met fully
by an extension of well-known techniques for a. the description of grammars,
b. pattern matching and c. the generation of finite state automata. These
techniques can straightforward be extended towards pattern grammars for natural
language.