A canonical query language & its efficient implementation
Gert van der Steen
Full Content


Abstract
The XML community is eagerly awaiting a W3C recommendation for a XML query language. However, it has been estimated that it may take more than a year until that recommendation reaches a final stage. Therefore, it may be worthwhile to study in the meantime the intrinsic characteristics of query languages for XML.
A number of query languages have been initially proposed for XML. Also some simple query mechanisms are part of (proposed) recommendations, like in XSL and Xpath.
It has been argued (e.g. in Cotton, P., and Malhotra, A., “Candidate Requirements for XML Query”, Nov. 30, 1998) that the syntax for the query formalism should be XML itself, that it will support full-text queries, and that fast retrieval programs could be generated automatically based upon the query formalism.
These requirements are so general and intuitive acceptable that it may be conjectured that they will be the core of any future query language for structured documents. As such, we might give it the name “canonical” query language (“CQL”).
In this paper we show how the requirements for CQL can be met fully by an extension of well-known techniques for a. the description of grammars, b. pattern matching and c. the generation of finite state automata. These techniques can straightforward be extended towards pattern grammars for natural language.