Spatial/temporal datatypes
an approach to specifying and querying multimedia objects and scheduled structures in XML documents
Peiya Liu
Liang H. Hsu
Find


Abstract
Many useful XML applications require a smooth integration of time- and space- dependent media objects and structures in XML documents. The XML tree document structures have limitations in support of spatial and temporal relationships for multimedia objects querying. However, the relationships could be specified based on spatial and temporal datatypes. The XML Schema:Datatypes framework opens up an opportunity to explore this dimension in a new way, and this paper will show this new perspective in specifying and querying multimedia objects and structures in such a framework. This abstract datatype approach provides potential advantages in query processing of multimedia objects and structures in XML.

Keywords

Contents
  1. Introduction
  2. The spatial/temporal datatype approach
    1. A structured video document
    2. A brief introduction to MMDOC-QL
  3. Specifying multimedia objects as spatial and temporal datatypes
  4. Querying multimedia objects from spatial and temporal relationships
  5. Related work
  6. Conclusion remarks
  7. Bibliography

Introduction
Many useful XML applications require multimedia objects and structures in documents. These multimedia document applications are across industries such as electronic product manuals in heavy industries, web TV programs in entertainment industries, e-commerce web documents, geographic information systems, etc. In a multimedia document, the content may include both static/spatial media (such as text, graphics, drawings, images, etc.) and time-based media (such as video, audio, animation, etc.). The media components and content can be further organized into three major document structures: hierarchical, hyperlinked, and scheduled (including both temporal and spatial). The scheduled structure plays an important role in organizing and accessing space- and time- dependent media objects.
Proposed document query languages [XML-QL 99] [Lorel 00] [YATL 98] [XQL 98] [SDQL 96] are focusing on hierarchical and hyperlinked structures which are mainly used for organizing textual information. The multimedia documents usually contain non-textual media objects in spatial and temporal relationships, which are non-hierarchical. The relationships cannot effectively be modeled in pure hierarchical or hyperlinked structures to support multimedia object retrieval. XML document tree structures are mainly used to model parent/child and sibling relationships of document elements. They can effectively support hierarchical and sequence queries, but are not appropriate for spatial/temporal queries of multimedia objects in XML documents. Integration of multimedia objects into document models for query requires a scheme to specify spatial/temporal structures and constraints. The scheme would impact efficiency and effectiveness of spatial and temporal information retrieval and processing. Currently, there is no standard way to specify these spatial and temporal media objects and structures in XML.
In this paper, we propose a spatial/temporal datatype scheme based on ADT to specify scheduled structures and to query multimedia objects in XML documents. Examples of spatial datatypes are points, polylines, areas, etc. Examples of temporal datatypes are instants, intervals, periods, etc. Spatio-temporal data types can also be defined by combining both spatial and temporal datatypes into composite ones such as "changing area over a period of time". These spatial/temporal datatypes are used to structure (or schedule) multimedia objects in documents. Based on abstract data types, many spatial and temporal operators, such as inside, nearby, before, after, etc., can be defined for querying multimedia objects in scheduled structures with efficient indexing support.
This ADT approach has several advantages. It can be formalized within the XML Schema Part 2: Datatypes framework. It provides extensibility of composing basic spatial/temporal datatypes or operators into composite ones. The spatial and temporal relationships of multimedia objects in documents can be specified by using spatial and temporal datatypes and their operators in queries. Furthermore, spatial and temporal query operators can also be efficiently designed since indexing techniques are often based on datatypes for optimizing query processing.
Previous Previous Table of Contents
The spatial/temporal datatype approach
The multimedia objects in XML can be specified as XML elements with spatial and temporal datatypes. These spatial and temporal element datatypes can be formalized within new W3C XML Schema development [XML Schema Part 1: Structures], particularly the datatype part [XML Schema Part 2: Datatypes]. The spatial and temporal relationships are derived from element datatypes and their associated operations rather than from element hierachical relationships. Examples of datatype operations could be spatial distance operations, spatial direction operations, temporal order operations, etc. In this way, the multimedia object queries can be specified based on spatial and temporal relationships. A similar technique for specifying moving objects was proposed by [Erwig 99] in relational databases. In the following, we give an example of video document along with XML schema to illustrate a proposed document query language, MMDOC-QL, and the spatial and temporal element datatypes.
A structured video document
A video document could consist of video segments corresponding to scenes in a video. Each scene consists of video objects appeared in a sequence of shots. In this industrial video, each scene highlights certain gas turbine locations described as video objects for showing maintenance and service operations. Shots indicate those video frames having significant motion changes in video objects. This video document can be automatically generated by our video AIU extractor based on advanced scene changing and video segmentation techniques [Chakraborty 99]. The video is shown in Figure 1.
<xsd:schema xmlns:xsd="http://www.mymind.com/VideoDocSchema">
<xsd:element name="videodoc">
<xsd:complexType>
<xsd:element name="videoseg" minOccurs="1" maxOccurs="*">
<xsd:complexType>
<xsd:element name="videoAIU" minOccurs="1" maxOccurs="*">
<xsd:complexType>
<xsd:element name="shot" minOccurs="1" maxOccurs="*">
<xsd:complexType>
<xsd:element name="area" type="region"/>
<xsd:element name="frame" type="integer"/>
</xsd:complexType>
</xsd:element>
<xsd:attribute name="id" type="ID"/>
</xsd:complexType>
</xsd:element>
<?Pub Caret?> </xsd:complexType>
</xsd:element>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="point">
<xsd:element name="x" type="xsd:integer"/>
<xsd:element name="y" type="xsd:integer"/>
</xsd:complexType>
<xsd:complexType name="region">
<xsd:element name="loc" type="point" MinOccurs="3" MaxOccurs="*"/>
</xsd:complexType>
<xsd:element name="mousepos" type="point"/>
<xsd:element name="focusarea" type="region"/>
</xsd:schema>
<videodoc>
<videoseg>
<videoAIU id="object01">
<shot>
<area>
<loc><x>254</x><y>161</y></loc>
<loc><x>254</x><y>270</y></loc>
<loc><x>370</x><y>270</y></loc>
<loc><x>370</x><y>161</y></loc>
</area>
<frame>1</>
</shot>
<shot>
<area> ...</>
<frame>66</>
</>
...
</videoAIU>
<videoAIU id="object02">... </>
...
<videoseg>
<videoAIU id= ...> ... </>
<videoAIU id= ...> ... </>
...
</>
<vidiodoc>
Figure 1 . An industrial video
(Left: keyframes in video segments found by AIU extractor along with regions of interests shown in red lines. Right: video object locations in each video segment with a sequence of shots exhibiting video objects motion changes)
A brief introduction to MMDOC-QL
MMDOC-QL is our proposed multimedia document query language for structured information retrieval. An example of the query is in the form of "find all video object ids where the objects are shown up at the mouse click position (x0, y0) in a shot".
GENERATE:<List>%objectnum <List>
FROM: video.xml
PATTERN: {"object"[0-9][0-9]*/%objectnum};
{<mousepos><x>x0</><y>y0</></mousepos>/%mpos};
CONTEXT: {(<videoAIU> with id=%objectnum) containing
{<area>/&reg}
and POINT-INSIDE(&reg %mpos)};
In MMDOC-QL, there are four clauses: GENERATE clause is used to describe the final results of documents. FROM clause is used to describe source documents to query. PATTERN clause is used to describe the domains of logical variables in the form of regular expressions or of document elements. CONTEXT clause is used to describe document element constraints in first-order logical expressions. A logical expression consists of primitive document path expressions and element datatype expressions including spatial and temporal datatype operators.
PATTERN clause is used to describe the domains of logical variables. There are two kinds of logical variables: string and element. By default, the domain of a string variable is the set of allowable strings in tag names, tag attributes or tag values in FROM clause. The domain of an element variable is the set of allowable elements in FROM clause. The domains are boundaries for logical variables to find values satisfying the first-order logical expressions in CONTEXT clause. The free variables are indicated by "%". A free variable means "for all" quantifier in CONTEXT clause. In the above example, "%objectnum" and "%mpos" are free variables. A variable indicated by "&" denotes a bound variable for "there exists" quantifier in CONTEXT clause. In the above example, element variable "&reg" is used to denote an existence of one element <area> satisfying the document path expression of "(<videoAIU> with id=%objectnum) containing <area> ".
A document path expression is a logical statement for specifying document element constraints. The constraints are specified by using element path relationships: parent/child relationship, sibling relationship, and tag attribute. The parent/child relationship constraints are described by keywords: inside, directly inside, containing, directly containing, etc. The sibling relationship constraints are described by keywords: before, immediately before, after, immediately after, sibling, immediately sibling, etc. The element attribute constraints are described by keyword with. In the above example, document path expression "(<videoAIU> with id=%objectnum) containing {<area>/&reg}" specifies constraints on element "<videoAIU>" by using tag attribute id and its child relationship to element "<area>". " &reg" is a variable to denote an existence of one element "<area>" which satisfies the path expression.
The element datatype expressions are used to describe arithmetic expressions about datatype operations including spatial/temporal datatypes operations. Note that aggregation functions can be viewed as a special case of these datatype operations since they operate on input data of real number datatype. The spatial and temporal datatype operations are stereotypical functional computations of temporal and spatial relationships such as SIZE, DISTANCE, DIRECTION, COVER, or TIME-BEFORE, etc. In the above example, POINT-INSIDE(element1:point element2: region) is a spatial operation and returns a value with boolean datatype. It returns "true" if a point is inside a region. Otherwise, it returns "false". The details of spatial and temporal datatype operations are addressed in the next section.
Previous Previous Table of Contents
Specifying multimedia objects as spatial and temporal datatypes
In general, there are three kinds of spatial and temporal datatypes to model multimedia objects: spatial, temporal and spatio-temporal. All these datatypes can be formalized as XML element datatypes. The stereotypical spatial and temporal operators can be defined for specifying scheduled relationships of multimedia objects. We believe that this ADT scheme is general enough to specify multidimensional coordinate spaces such as FCS and event schedules in HyTime documents for multimedia objects query and processing. For ease of operator composition, all defined spatial and temporal datatype operators are required to produce outputs in legal datatypes.
All spatial and temporal datatypes can be formalized in XML:Schema framework. Some of examples are shown as follows.
<xsd:complexType name="polyline">
<xsd:element name="loc" type="point" MinOccurs="3" MaxOccurs="*"/>
</xsd:complexType>
<xsd:simpleType name="instant" base="xsd:time">
</xsd:simpleType>
<xsd:complexType name="interval">
<xsd:element name="start" type="instant"/>
<xsd:element name="end" type="instant"/>
</xsd:complexType>
<xsd:complexType name="moving-point">
<xsd:element name="loc" type="point"/>
<xsd:element name="timestamp" type="instant"/>
</xsd:complexType>
Previous Previous Table of Contents
Querying multimedia objects from spatial and temporal relationships
Complex multimedia objects queries can be formed by using spatial and temporal operators to specify the scheduled constraints. In the following example, we specify a query "find all video objects and their frame numbers, which are shown up in a focus area of the video display window".
GENERATE:<List>%objectnum %frame-number <List>
FROM: video.xml
PATTERN: {"object"[0-9][0-9]*/%objectnum};
{[0-9][0-9]*/%frame-number};
{<focusarea> ...</focusarea>/%focus};
CONTEXT: {(<videoAIU> with id=%objectnum) containing
(<frame> containing "%frame-number")
sibling {<area> /%reg}
and OVERLAP(%reg %focus)};
Previous Previous Table of Contents
Related work
Two types of related work are described here. One is related to query languages. Based on underlying models and media types, query languages can be generally classified as free form query, relational structured query and document structured query as shown in Table 1
Media Types /Data ModelingFree Forms Relational TablesStructured Documents
TextualFree-Text RetrievalRelational Structured Query (SQL) Document Structured Query [XML-QL 99] [Lorel 00] [YATL 98] [XQL 98] [SDQL 96] MMDOC-QL)
Non-Textual Content-Based Query [Del Bimbo 99] Multimedia Database Query (SQL/MM SQL/Temporal) Multimedia Document Query (MMDOC-QL)
Table 1 . The spectrum of information retrieval
MMDOC-QL distinguishes itself from other work in dealing with multimedia objects queries based on spatial and temporal relationships in structured documents. In the following, we describe related standardization work on spatial and temporal specifications and queries.
ISO HyTime [HyTime 97] based on SGML uses FCS to define scheduled structures and events. These event schedules are intentionally designed for HyTime document presentation. FCS defines an abstract and system-independent method of specifying spatial and temporal information separated from content to be presented as event schedules in a multidimensional coordinate space. The design motivation is based on presentation abstraction rather than information retrieval. The indexing scheme support in HyTime is limited in querying spatial/temporal media objects and structures.
W3C SMIL [SMIL 98] is based on XML to define spatial and temporal layouts for SMIL document playout. The layout information is related to media display windows on a screen and media playing time. Thus, the spatial and temporal structures provided in SMIL are also for presentation purpose rather than for storage representation to be accessed. Futhermore, there are structural differences in representation [Rutledge 98] [Liu 99]. Often, the presentation forms are not sufficient for storage representation. Spatial and temporal query processing is often less emphasized in presentation-oriented multimedia specifications.
SQL/MM and SQL3/Temporal [SQL Standardization Projects] are new ISO standardization projects for extending database query language capability to specify and manage multimedia objects and temporal information in the relational data model. Both are focusing on integration of time- or space- dependent multimedia objects into relational data models for query. However, multimedia document models impose requirements on querying, which are quite different from this relational table model since not only document content but also document structures must be available for retrieval. These proposed query specifications based on relational data models would limit the retrieval capability for document models.
Previous Previous Table of Contents
Conclusion remarks
Proposed XML document query languages have some limitations in querying multimedia document objects which are in temporal or spatial relationships. Most of these languages focus on textual and hierarchical structures and have limitations in supporting temporal and spatial relationships of multimedia objects, which are not in hierarchical relationships. In this paper, we tackle these limitations by specifying the multimedia objects as spatial and temporal datatypes in XML. The multimedia object relationships can then be specified by using spatial and temporal datatypes and their operators for XML document retrieval.
The main contributions of this paper are (1) to provide a method to specify spatial and temporal datatypes in XML for querying multimedia objects. We illustrate many flavors of spatial and temporal datatypes defined as element datatypes, which can be formalized within the XML Schema Part 2: Datatypes framework. Therefore, multimedia objects and scheduled structures can be smoothly integrated into XML document models for retrieval. (2) to design a multimedia document query language, MMDOC-QL, along with stereotypical spatial and temporal operators to retrieve multimedia objects in XML documents. Many spatial/temporal indexing methods [Manolopoulos 00] [Vazirgiannis 98] are currently available for supporting and optimizing these spatial/temporal datatypes query processing.
Previous Previous Table of Contents
Bibliography
[SDQL 96]ISO 10179:1996 Information Technology -Processing Languages - Document Style Semantics and Specification Language (DSSSL)
[HyTime 97]ISO/IEC 10744:1997 Hypermedia/Time-based Structuring Language (HyTime), Second Edition.
[Rutledge 98]L. Rutledge, L. Hardman, J. van Ossenbruggen and D. C. A. Bulterman, “Structural Distinctions Between Hypermedia Storage and Presentation,” in Proc. ACM Multimedia 98, September 1998, pp.145-150.
[Allen 83]J. F. Allen Maintaining Knowledge about Temporal Intervals. Comm. ACM 26(11), 1983.
[Vazirgiannis 98]M. Vazirgiannis, Y Theodoris and T Sells, Sptio-Temporal Composition and Indexing for Large Multimedia Applications. ACM Multimedia Systems, 6(4), 1998, pp 284-298.
[Erwig 99]M. Erwig, R. H. Guting, M. Schneider and M. Vazirgiannis, Spatio-Temporal DataTypes: Approach to Modeling and Querying Moving Objects in Databases, GeoInformatica Vol 3, No 3, 1999.
[Manolopoulos 00]Y. Manolopoulos, Y Theodoridis and V. J. Tsotras, Advanced Database Indexing, Kluwer Academic Publishers, 2000
[SMIL 98] Synchronized Multimedia Integration Language (SMIL) 1.0 Specification, W3C Recommendations 15–June–1998
[XML 98]Extensible Markup Language (XML) 1.0, W3C Recommendations 10–Feburary–1998,
[XML Schema Part 1: Structures]XML Schema Part 1: Structures, W3C Working Draft 25 February 2000,:
[XML Schema Part 2: Datatypes]XML Schema Part 2: Datatypes: W3C Working Draft 25 February 2000,
[SQL Standardization Projects]http://www.jcc.com/SQLPages/jccs_sql.htm (SQL Standard Reference Page)
[Chakraborty 99]A. Chakraborty, P. Liu and L. Hsu, Authoring and Videwing Video Documents using SGML structure, 1999 IEEE International Conference on Multimedia Computing and Systems, pp-654-660 Florence, Italy,
[Liu 99]P. Liu, Y. F. Day, L. H. Hsu, Automatic Generation of DSSSL Specifications for Transforming SGML Documents into Card-Based Presentations, GCA Markup Technologies 99, PA, USA,
[XML-QL 99]A Deutsch, M. Fermandez, D. Florescu, A. Levy and D. Suciu: A Query Lanuage For XML, WWW'99
[YATL 98]Your Mediators Need Data Conversion, ACM-SIGMOD 1998
[Lorel 00]S. Abiteboul, P.Buneman, and D. Suciu, Data on the Web, Published by Morgan Kaufsmann, 2000
[XQL 98]J. Robie abd J. Lapp, XML Query Language, QL'98, http://www.w3c.org/TandS/QL/QL98/
[Del Bimbo 99]A. Del Bimbo, Visual Information Retrieval, Published by Morgan Kaufsmann, 1999
Previous Previous Table of Contents