|
Spatial/temporal datatypes
an approach to specifying and
querying multimedia objects and scheduled structures in XML documents
|
 |
Many useful XML applications require a smooth integration of time-
and space- dependent media objects and structures in XML documents. The XML
tree document structures have limitations in support of spatial and temporal
relationships for multimedia objects querying. However, the relationships
could be specified based on spatial and temporal datatypes. The XML Schema:Datatypes
framework opens up an opportunity to explore this dimension in a new way,
and this paper will show this new perspective in specifying and querying multimedia
objects and structures in such a framework. This abstract datatype approach
provides potential advantages in query processing of multimedia objects and
structures in XML.
Introduction
Many useful XML applications require multimedia objects and structures
in documents. These multimedia document applications are across industries
such as electronic product manuals in heavy industries, web TV programs in
entertainment industries, e-commerce web documents, geographic information
systems, etc. In a multimedia document, the content may include both static/spatial
media (such as text, graphics, drawings, images, etc.) and time-based media
(such as video, audio, animation, etc.). The media components and content
can be further organized into three major document structures: hierarchical,
hyperlinked, and scheduled (including both temporal and spatial). The scheduled
structure plays an important role in organizing and accessing space- and time-
dependent media objects.
Proposed document query languages
[XML-QL 99] [Lorel 00] [YATL 98] [XQL 98] [SDQL 96] are focusing on hierarchical and hyperlinked structures
which are mainly used for organizing textual information. The multimedia documents
usually contain non-textual media objects in spatial and temporal relationships,
which are non-hierarchical. The relationships cannot effectively be modeled
in pure hierarchical or hyperlinked structures to support multimedia object
retrieval. XML document tree structures are mainly used to model parent/child
and sibling relationships of document elements. They can effectively support
hierarchical and sequence queries, but are not appropriate for spatial/temporal
queries of multimedia objects in XML documents. Integration of multimedia
objects into document models for query requires a scheme to specify spatial/temporal
structures and constraints. The scheme would impact efficiency and effectiveness
of spatial and temporal information retrieval and processing. Currently, there
is no standard way to specify these spatial and temporal media objects and
structures in XML.
In this paper, we propose a spatial/temporal datatype scheme based
on
ADT to specify scheduled structures and to
query multimedia objects in XML documents. Examples of spatial datatypes are
points, polylines, areas, etc. Examples of temporal datatypes are instants,
intervals, periods, etc. Spatio-temporal data types can also be defined by
combining both spatial and temporal datatypes into composite ones such as
"changing area over a period of time". These spatial/temporal datatypes are
used to structure (or schedule) multimedia objects in documents. Based on
abstract data types, many spatial and temporal operators, such as inside,
nearby, before, after, etc., can be defined for querying multimedia objects
in scheduled structures with efficient indexing support.
This ADT approach has several advantages. It can be formalized within
the XML Schema Part 2: Datatypes framework. It provides extensibility of composing
basic spatial/temporal datatypes or operators into composite ones. The spatial
and temporal relationships of multimedia objects in documents can be specified
by using spatial and temporal datatypes and their operators in queries. Furthermore,
spatial and temporal query operators can also be efficiently designed since
indexing techniques are often based on datatypes for optimizing query processing.
The spatial/temporal datatype approach
The multimedia objects in XML can be specified as XML elements with
spatial and temporal datatypes. These spatial and temporal element datatypes
can be formalized within new W3C XML Schema development
[XML Schema Part 1: Structures],
particularly the datatype part
[XML Schema Part 2: Datatypes]. The spatial and
temporal relationships are derived from element datatypes and their associated
operations rather than from element hierachical relationships. Examples of
datatype operations could be spatial distance operations, spatial direction
operations, temporal order operations, etc. In this way, the multimedia object
queries can be specified based on spatial and temporal relationships. A similar
technique for specifying moving objects was proposed by
[Erwig 99]
in relational databases. In the following, we give an example of video document
along with XML schema to illustrate a proposed document query language, MMDOC-QL,
and the spatial and temporal element datatypes.
A structured video document
A video document could consist of video segments corresponding to scenes
in a video. Each scene consists of video objects appeared in a sequence of
shots. In this industrial video, each scene highlights certain gas turbine
locations described as video objects for showing maintenance and service operations.
Shots indicate those video frames having significant motion changes in video
objects. This video document can be automatically generated by our video AIU
extractor based on advanced scene changing and video segmentation techniques
[Chakraborty 99]. The video is shown in
Figure 1.
<xsd:schema xmlns:xsd="http://www.mymind.com/VideoDocSchema">
<xsd:element name="videodoc">
<xsd:complexType>
<xsd:element name="videoseg" minOccurs="1" maxOccurs="*">
<xsd:complexType>
<xsd:element name="videoAIU" minOccurs="1" maxOccurs="*">
<xsd:complexType>
<xsd:element name="shot" minOccurs="1" maxOccurs="*">
<xsd:complexType>
<xsd:element name="area" type="region"/>
<xsd:element name="frame" type="integer"/>
</xsd:complexType>
</xsd:element>
<xsd:attribute name="id" type="ID"/>
</xsd:complexType>
</xsd:element>
<?Pub Caret?> </xsd:complexType>
</xsd:element>
</xsd:complexType>
</xsd:element>
<xsd:complexType name="point">
<xsd:element name="x" type="xsd:integer"/>
<xsd:element name="y" type="xsd:integer"/>
</xsd:complexType>
<xsd:complexType name="region">
<xsd:element name="loc" type="point" MinOccurs="3" MaxOccurs="*"/>
</xsd:complexType>
<xsd:element name="mousepos" type="point"/>
<xsd:element name="focusarea" type="region"/>
</xsd:schema>
<videodoc>
<videoseg>
<videoAIU id="object01">
<shot>
<area>
<loc><x>254</x><y>161</y></loc>
<loc><x>254</x><y>270</y></loc>
<loc><x>370</x><y>270</y></loc>
<loc><x>370</x><y>161</y></loc>
</area>
<frame>1</>
</shot>
<shot>
<area> ...</>
<frame>66</>
</>
...
</videoAIU>
<videoAIU id="object02">... </>
...
<videoseg>
<videoAIU id= ...> ... </>
<videoAIU id= ...> ... </>
...
</>
<vidiodoc>

Figure 1
. An industrial video
(Left: keyframes in video segments found by AIU extractor along with
regions of interests shown in red lines. Right: video object locations in
each video segment with a sequence of shots exhibiting video objects motion
changes)
A brief introduction to MMDOC-QL
MMDOC-QL is our proposed multimedia document query language for structured
information retrieval. An example of the query is in the form of "find all
video object ids where the objects are shown up at the mouse click position
(x0, y0) in a shot".
GENERATE:<List>%objectnum <List>
FROM: video.xml
PATTERN: {"object"[0-9][0-9]*/%objectnum};
{<mousepos><x>x0</><y>y0</></mousepos>/%mpos};
CONTEXT: {(<videoAIU> with id=%objectnum) containing
{<area>/®}
and POINT-INSIDE(® %mpos)};
In MMDOC-QL, there are four clauses: GENERATE clause is
used to describe the final results of documents. FROM clause
is used to describe source documents to query. PATTERN clause
is used to describe the domains of logical variables in the form of regular
expressions or of document elements. CONTEXT clause is used to
describe document element constraints in first-order logical expressions.
A logical expression consists of primitive document path expressions
and element datatype expressions including spatial
and temporal datatype operators.
PATTERN clause is used to describe the domains of logical
variables. There are two kinds of logical variables: string and element. By
default, the domain of a string variable is the set of allowable strings in
tag names, tag attributes or tag values in FROM clause. The domain
of an element variable is the set of allowable elements in FROM
clause. The domains are boundaries for logical variables to find values satisfying
the first-order logical expressions in CONTEXT clause. The free
variables are indicated by "%". A free variable means "for all" quantifier
in CONTEXT clause. In the above example, "%objectnum" and "%mpos"
are free variables. A variable indicated by "&" denotes a bound variable
for "there exists" quantifier in CONTEXT clause. In the above
example, element variable "®" is used to denote an existence of one
element <area> satisfying the document path expression of "(<videoAIU>
with id=%objectnum) containing <area> ".
A document path expression is a logical statement for specifying document
element constraints. The constraints are specified by using element path relationships:
parent/child relationship, sibling relationship, and tag attribute. The parent/child
relationship constraints are described by keywords: inside, directly
inside, containing, directly containing,
etc. The sibling relationship constraints are described by keywords: before, immediately
before, after, immediately after, sibling, immediately
sibling, etc. The element attribute constraints are described by keyword with.
In the above example, document path expression "(<videoAIU> with id=%objectnum)
containing {<area>/®}" specifies constraints on element "<videoAIU>"
by using tag attribute id and its child relationship to element "<area>".
" ®" is a variable to denote an existence of one element "<area>"
which satisfies the path expression.
The element datatype expressions are used to describe arithmetic expressions
about datatype operations including spatial/temporal datatypes operations.
Note that aggregation functions can be viewed as a special case of these datatype
operations since they operate on input data of real number datatype. The spatial
and temporal datatype operations are stereotypical functional computations
of temporal and spatial relationships such as SIZE, DISTANCE, DIRECTION, COVER,
or TIME-BEFORE, etc. In the above example, POINT-INSIDE(element1:point element2:
region) is a spatial operation and returns a value with boolean datatype.
It returns "true" if a point is inside a region. Otherwise, it returns "false".
The details of spatial and temporal datatype operations are addressed in the
next section.
Specifying multimedia objects as spatial and temporal datatypes
In general, there are three kinds of spatial and temporal datatypes
to model multimedia objects: spatial, temporal and spatio-temporal. All these
datatypes can be formalized as XML element datatypes. The stereotypical spatial
and temporal operators can be defined for specifying scheduled relationships
of multimedia objects. We believe that this ADT scheme is general enough to
specify multidimensional coordinate spaces such as FCS and event schedules
in HyTime documents for multimedia objects query and processing. For ease
of operator composition, all defined spatial and temporal datatype operators
are required to produce outputs in legal datatypes.
- The primitive spatial datatypes are point, polyline and region.
The composite spatial datatypes can be constructed from these primitives.
Spatial datatypes are used to model geometric data such as cities as points,
route as polylines, etc. Typical spatial operators are SIZE (element: region),
DISTANCE(element1:point element2:point), POINT-INSIDE(element1:region element2:point),
EAST-DIRECTION(element1: region element2: region), OVERLAP(element1: region,
element2: region), etc
- The temporal primitive datatypes are instant, interval and period.
The temporal datatypes are used to specify multimedia objects in dynamic media
such as audio, animation, video, etc. Typical 13 temporal operators [Allen 83] can be defined based on temporal XML datatypes. Some
examples are MEET(element1: interval element2: interval), TIME-BEFORE(element1:
interval element2: interval), etc.
- The spatio-temporal datatypes are used to model spatial data changing
over a period of time. Examples are geometric objects moving in video clips,
weather moving path, etc. All spatio-temporal datatypes are actually composite
datatypes. For instance, moving points are constructed from "point" spatial
datatype and "instant" temporal datatype. Moving regions are constructed from
"region" spatial datatype and "instant" temporal datatype. Examples of spatio-temporal
operators are DURATION(element: moving-point), MAX-COVER-AREA(element: moving-region),
and so on.
All spatial and temporal datatypes can be formalized in XML:Schema framework.
Some of examples are shown as follows.
<xsd:complexType name="polyline">
<xsd:element name="loc" type="point" MinOccurs="3" MaxOccurs="*"/>
</xsd:complexType>
<xsd:simpleType name="instant" base="xsd:time">
</xsd:simpleType>
<xsd:complexType name="interval">
<xsd:element name="start" type="instant"/>
<xsd:element name="end" type="instant"/>
</xsd:complexType>
<xsd:complexType name="moving-point">
<xsd:element name="loc" type="point"/>
<xsd:element name="timestamp" type="instant"/>
</xsd:complexType>
Querying multimedia objects from spatial and temporal relationships
Complex multimedia objects queries can be formed by using spatial and
temporal operators to specify the scheduled constraints. In the following
example, we specify a query "find all video objects and their frame numbers,
which are shown up in a focus area of the video display window".
GENERATE:<List>%objectnum %frame-number <List>
FROM: video.xml
PATTERN: {"object"[0-9][0-9]*/%objectnum};
{[0-9][0-9]*/%frame-number};
{<focusarea> ...</focusarea>/%focus};
CONTEXT: {(<videoAIU> with id=%objectnum) containing
(<frame> containing "%frame-number")
sibling {<area> /%reg}
and OVERLAP(%reg %focus)};
Related work
Two types of related work are described here. One is related to query
languages. Based on underlying models and media types, query languages can
be generally classified as free form query, relational structured query and
document structured query as shown in
Table 1
| Media
Types /Data Modeling | Free Forms |
Relational Tables | Structured Documents |
| Textual | Free-Text
Retrieval | Relational Structured Query (SQL) |
Document Structured Query [XML-QL 99] [Lorel 00] [YATL 98] [XQL 98] [SDQL 96] MMDOC-QL) |
| Non-Textual |
Content-Based Query [Del Bimbo 99] |
Multimedia Database Query (SQL/MM SQL/Temporal) |
Multimedia Document Query (MMDOC-QL) |
Table
1
. The spectrum of information retrieval
MMDOC-QL distinguishes itself from other work in dealing with multimedia
objects queries based on spatial and temporal relationships in structured
documents. In the following, we describe related standardization work on spatial
and temporal specifications and queries.
ISO HyTime
[HyTime 97] based on SGML uses
FCS to define scheduled structures
and events. These event schedules are intentionally designed for HyTime document
presentation. FCS defines an abstract and system-independent method of specifying
spatial and temporal information separated from content to be presented as
event schedules in a multidimensional coordinate space. The design motivation
is based on presentation abstraction rather than information retrieval. The
indexing scheme support in HyTime is limited in querying spatial/temporal
media objects and structures.
W3C SMIL
[SMIL 98] is based on XML to define spatial
and temporal layouts for SMIL document playout. The layout information is
related to media display windows on a screen and media playing time. Thus,
the spatial and temporal structures provided in SMIL are also for presentation
purpose rather than for storage representation to be accessed. Futhermore,
there are structural differences in representation
[Rutledge 98] [Liu 99]. Often, the presentation forms are not sufficient for
storage representation. Spatial and temporal query processing is often less
emphasized in presentation-oriented multimedia specifications.
SQL/MM and SQL3/Temporal
[SQL Standardization Projects] are new ISO standardization
projects for extending database query language capability to specify and manage
multimedia objects and temporal information in the relational data model.
Both are focusing on integration of time- or space- dependent multimedia objects
into relational data models for query. However, multimedia document models
impose requirements on querying, which are quite different from this relational
table model since not only document content but also document structures must
be available for retrieval. These proposed query specifications based on relational
data models would limit the retrieval capability for document models.
Conclusion remarks
Proposed XML document query languages have some limitations in querying
multimedia document objects which are in temporal or spatial relationships.
Most of these languages focus on textual and hierarchical structures and have
limitations in supporting temporal and spatial relationships of multimedia
objects, which are not in hierarchical relationships. In this paper, we tackle
these limitations by specifying the multimedia objects as spatial and temporal
datatypes in XML. The multimedia object relationships can then be specified
by using spatial and temporal datatypes and their operators for XML document
retrieval.
The main contributions of this paper are (1) to provide a method to
specify spatial and temporal datatypes in XML for querying multimedia objects.
We illustrate many flavors of spatial and temporal datatypes defined as element
datatypes, which can be formalized within the XML Schema Part 2: Datatypes
framework. Therefore, multimedia objects and scheduled structures can be smoothly
integrated into XML document models for retrieval. (2) to design a multimedia
document query language, MMDOC-QL, along with stereotypical spatial and temporal
operators to retrieve multimedia objects in XML documents. Many spatial/temporal
indexing methods
[Manolopoulos 00] [Vazirgiannis 98]
are currently available for supporting and optimizing these spatial/temporal
datatypes query processing.
Bibliography
| [SDQL 96] | ISO 10179:1996 Information Technology -Processing Languages
- Document Style Semantics and Specification Language (DSSSL) |
| [HyTime 97] | ISO/IEC 10744:1997 Hypermedia/Time-based Structuring
Language (HyTime), Second Edition. |
| [Rutledge 98] | L. Rutledge, L. Hardman, J. van Ossenbruggen and
D. C. A. Bulterman, “Structural Distinctions Between Hypermedia Storage
and Presentation,” in Proc. ACM Multimedia 98, September 1998, pp.145-150. |
| [Allen 83] | J. F. Allen Maintaining Knowledge about Temporal Intervals.
Comm. ACM 26(11), 1983. |
| [Vazirgiannis 98] | M. Vazirgiannis, Y Theodoris and T Sells, Sptio-Temporal
Composition and Indexing for Large Multimedia Applications. ACM Multimedia
Systems, 6(4), 1998, pp 284-298. |
| [Erwig 99] | M. Erwig, R. H. Guting, M. Schneider and M. Vazirgiannis,
Spatio-Temporal DataTypes: Approach to Modeling and Querying Moving Objects
in Databases, GeoInformatica Vol 3, No 3, 1999. |
| [Manolopoulos 00] | Y. Manolopoulos, Y Theodoridis and V. J. Tsotras,
Advanced Database Indexing, Kluwer Academic Publishers, 2000 |
| [SMIL 98] | Synchronized Multimedia Integration Language (SMIL)
1.0 Specification, W3C Recommendations 15–June–1998 |
| [XML 98] | Extensible Markup Language (XML) 1.0, W3C Recommendations
10–Feburary–1998, |
| [XML Schema Part 1: Structures] | XML Schema Part 1: Structures,
W3C Working Draft 25 February 2000,: |
| [XML Schema Part 2: Datatypes] | XML Schema Part 2: Datatypes:
W3C Working Draft 25 February 2000, |
| [SQL Standardization Projects] | http://www.jcc.com/SQLPages/jccs_sql.htm
(SQL Standard Reference Page) |
| [Chakraborty 99] | A. Chakraborty, P. Liu and L. Hsu, Authoring
and Videwing Video Documents using SGML structure, 1999 IEEE International
Conference on Multimedia Computing and Systems, pp-654-660 Florence, Italy, |
| [Liu 99] | P. Liu, Y. F. Day, L. H. Hsu, Automatic Generation of
DSSSL Specifications for Transforming SGML Documents into Card-Based Presentations,
GCA Markup Technologies 99, PA, USA, |
| [XML-QL 99] | A Deutsch, M. Fermandez, D. Florescu, A. Levy and
D. Suciu: A Query Lanuage For XML, WWW'99 |
| [YATL 98] | Your Mediators Need Data Conversion, ACM-SIGMOD 1998 |
| [Lorel 00] | S. Abiteboul, P.Buneman, and D. Suciu, Data on the
Web, Published by Morgan Kaufsmann, 2000 |
| [XQL 98] | J. Robie abd J. Lapp, XML Query Language, QL'98, http://www.w3c.org/TandS/QL/QL98/ |
| [Del Bimbo 99] | A. Del Bimbo, Visual Information Retrieval, Published
by Morgan Kaufsmann, 1999 |