|
Answer is just a question [of matching Topic Maps]
|
 |
Expressive power of Topic Maps, commonly perceived as method for indexing
of information resources, places the standard very close to artificial intelligence
and knowledge modelling. Topic Maps resemble semantic networks and conceptual
graphs, but offer more - a unique, standards-based way of encoding and exchange
of the knowledge.
Starting from description of similarities and differences between artificial
intelligence and Topic Maps, paper demonstrates artificial intelligence features
worth incorporating into Topic Maps domain along with practical benefits they
provide.
Further on it focuses on applications in information filtering and retrieval
using proposed topic-map-defined query language and user profile definition.
Introduction
TM, ISO/IEC 13250:1999 standard, describe what
an information set is about, by formally declaring topics and associations,
and by linking the relevant parts of the information set to the appropriate
topics. Topic Maps help organise and retrieve online information in a way
that it can be mastered by information owners and information users.
TM play the same role as indexes play in books,
and that thesauri play in editorial environment.
Due to its generality and expressive power they go far beyond meta-information
modelling. Topic Maps can express facts, procedures and fairly complex relations
between concepts. As such they come close to the knowledge representation,
branch of
AI.
Current paper presents the AI knowledge representation foundations as
well as points out which of them may be useful for
TM
field of applications. In the following chapters information retrieval and
filtering model is introduced and applied to
TM
environment.
Last chapter presents proposal for
TMQL
syntax and semantic. It contains discussion of
TMQL
usage scenarios. In particular it discusses
TMQL
application to user profile definition as well as
ad
hoc query formulation on basis of already defined profile.
Knowledge representation
AI makes the claim that computers
can be made to "think" on a level equal to humans, or at least states that
some "thinking-like" features can be added to computers to make them more
useful tools. "Think" and "thinking-like" means to use knowledge representation
data structures and apply to it thinking representation processing. Knowledge
representation techniques like semantic networks or conceptual graphs turn
what we know about a particular domain into a form in which a computer can
understand it.
Semantic networks
SN [Lehm92] create
models involving nodes and links (arcs or arrows) between nodes. The nodes
represent objects or concepts and the links represent relations between nodes.
The links are directed and labelled, thus, a semantic network is a labeled,
directed graph. The network defines a set of binary relations on a set of
nodes. For instance, suppose that one wanted to represent the fact that Frederic
Chopin was born in Poland. One might represent Chopin as one node, Poland
as another node, and the relation
was born in as a link between
the two nodes. The fact that Chopin also composed mazurka might be represented
by adding a node "mazurka" and connecting it to the "Chopin" node with instance
of a
composed link (see
Figure 1).

Figure 1
In a semantic net nodes represent objects and links represent relations
among these objects. One can think of the objects as concepts or meanings
and the relations among them conceptual relations (an intensional account)
or think of the objects as entities in the world and relations among them
(an extensional account). Notice that while the relations allowed are only
binary this does not restrict the expressiveness of the semantic net. Any
n-ary relation can be decomposed into a binary relation. The commitment to
binary relations does has some effect on the fluency of the representation
as this decomposition may do some damage to original conception underlying
the construction of the network. Another restriction on semantic network representations
as opposed to the more powerful frame based representations is a difficulty
in representing kinds of things and natural classes. It is hard to represent
what we mean by saying Paris is a city in a pure semantic net since
a notion of inheritance of properties in a type hierarchy is lacking.
Conceptual Graphs
The idea of
CG [Sowa00]
is a combination of logic with the semantic networks used in AI and computational
linguistics. A concept graph is finite contacted bipartite graph. There are
two kinds of nodes; concept nodes (displayed as a box in graph notation) which
represent entities, attributes, states, and events, and relation nodes (displayed
as a circle in graph notation) which represent the relationship among concept
nodes. A single concept by itself (
Figure 2-a) may form a
conceptual graph but it is not the case with the relation (
Figure 2-b),
since every relation node should have one or more arcs each of which must
be linked to some concept (
Figure 2-c).

Figure 2
A conceptual graph can be constructed by assembling percepts. In the
process of assembly, concept relations specify the role that each percept
plays and concepts represent percepts themselves. A concept can be an individual
or generic. The function referent maps
concepts into a generic markers denoted with names starting from * or individual
markers usually denoted by numbers. The function type
maps concepts into a set of type labels. A concept c with type(c)=t
and reference(c)=f is displayed as [t:f] in the
linear form. The function type can be applied to relation also. A relation r
with type(r)=t is displayed as (t) in the linear
form.
Types of concepts (type labels) are organized into a hierarchy called
type hierarchy. Type hierarchy forming operations includes conjunction and
disjunction operations, so the type hierarchy will be a formal lattice.
For example, a sentence Chopin composed mazurka can be represented
in conceptual graphs as:
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA]
or if we introduce individual concept - mazurka Opus 23:
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA:23]
For example, a sentence "Chopin composed mazurka in his beloved Paris"
can be represented by the following linear form:
[COMPOSE]-
->(AGNT)->[CHOPIN:*x]
->(OBJ)->[MAZURKA]
->(PLACE)->[PARIS]<-(THME)<-[LIKE]<-(AGNT)<-[CHOPIN:*x]
This linear form finds the following graph representation (
Figure 3).

Figure 3
Lesson to be learned from AI knowledge representation techniques
TMs are also to be used for knowledge representation purposes. Therefore
they will face with typical
AI problems,
including completeness of the language, effectivity and nonambiguity. It's
worth to make a comparison of the
AI
languages presented above with topic maps.
Similarities and differences
SN use nodes and labeled, directed
arcs to express knowledge. It is not clearly defined whether used entities
represent classes or particular instances of classes. Lack of clear specification
of roles, which can be played by entities representing concepts and relations,
was found to cause misinterpretations and ambiguity of the semantic in the
model
[4].
Conceptual graphs are more precise and mature formulation of
SN. They relay on more atomic constructs, which
used in defined roles of concept and relation are able to build higher order
assertions.
CG have expressive power
equivalent to predicate logic and thus may represent natural language statements
as well as any other data or knowledge expressible by natural language. This
includes ability to express
TM data
using conceptual graphs.
Topic maps offer larger building blocks than
CG:
topic types, associations with types and association roles with association
role types.
TM identify scope, although
it could be expressed with special association type. The same applies to occurrences
and facets. Generally,
TM offer constructs
with more predefined semantic that
CG
and
SN
This is good on one hand but bad on the other, since we have a lot of
ready made instruments to use, on contrary if we need to build a custom one
- we have no basic materials. This would be the case with implementing
CG concept of
context
- used for instance to express assertions within the content of somebodies
belief.
One of the lessons to be learned from
AI
is that
TM should treat all of its
constructs (including associations, association roles) as first class objects
(topics), which is not allowed at the moment by the standard. This heterogenity
may cause problems in building elegant and compact data models leveraging
work done in
AI [5].
Lesson of semantic networks shows that we have to keep clear distinction
between class and its instance levels for association and its association
type (e.g. association type: composed,
association: composed#23182), association
role and its type (e.g. association role type: composer;
association role: composer#98723) and
for other such constructs.
As shown in the previous sections,
CG
research has developed linear and graph notations that may be taken over by
TM users.
Based on achievements in textual representation of
AI
knowledge structures we may propose analogous language for coding of
TMs. Such language will enable
TM
interchange using more compact language and without in-depth knowledge of
SGML or
XML syntax.
The proposed language is based conceptually on linear notation for conceptual
graphs, though because
TM use number
of semantic constructs we are forced to introduce more verbose syntax.
Table 1 summarises basic constructs of the proposed language.
Square and round brackets used in
CG
for concept and relation are substituted by explicit name of construct with
its identifier in square brackets. Directed arcs are replaced by dot, since
we no longer need to mark directionality of relations which is expressed in
TMs by association roles.
Presented language will be used later as a syntax foundation for proposed
TM query language.
| topic | topic[] |
| topic with id "chopin" | topic["chopin"] |
| topic with declared topic type "person" |
topic[].topic-type["person"] |
| association with id "as243" | assoc["as243"] |
| association with declared association type
"born-in" | assoc["as243"].assoc-type["born-in"] |
| topic "chopin" plays a unspecified role in
association which type is "born-in" | topic["chopin"].assoc-role[].assoc[].assoc-type["born-in"] |
| topic "chopin" plays a role of type
"who" in association which type is "born-in" | topic["chopin"].assoc-role[ar1].assoc[].assoc-type["born-in"],
assoc-role[ar1].assoc-role-type["who"] |
| topic "chopin" plays a role of type "who" in association
which type is "born-in" where the other role of type "where" is played by
"poland", topic "poland" is of type "country" | topic["chopin"].assoc-role[ar1].assoc[a].assoc-role[ar2].topic["poland"],assoc[a].assoc-type["born-in"], assoc-role[ar1].assoc-role-type["who"],assoc-role[ar2].assoc-role-type["where"] topic["poland"].topic-type["country"] |
Table
1
In fact whether we use "
assoc[a]" instead of "
(a)"
and "
." instead of "
->" is a matter of markup minimization
and configuration of delimiters used. This issue is well known to SGML community.
So one could propose isomorphic syntax but with more compact delimiters as
a useful language e.g. for rapid coding of prototype
TMs.
Graphical notation
As useful as textual notation is graphical representation of
TM structures. It may serve as one of possible
graphical user interfaces for
TM display
on screen, as well as for editing and browsing purposes. Current proposal
was inspired by graphical representation of conceptual graphs.
Table 1
provides graphical equivalents for the previously introduced TM statements:
| topic |  |
| topic with id "chopin" |  |
| topic with
declared topic type "person" |  |
| association with id "as243" |  |
| association
with declared association type "born-in" |  |
| topic "chopin"
plays a unspecified role in association which type is "born-in" |  |
| topic
"chopin" plays a role of type "who" in association which type is "born-in" |
 |
| topic "chopin" plays a role of type "who" in association
which type is "born-in" while the other role of type "where" is played by
topic "poland", which is of type "country" |  |
Table
2
Information retrieval and filtering for Topic Maps
After having defined linear and graphical notations for
TMs
we may try to approach
IR
for
TM-encoded data. Information retrieval
is usually described
[Belk92] as way of leading the
user to those documents that will best enable him/her to satisfy his/her need
for information.
IR is typically concerned
with single uses of the system, by a person with one-time goal and one-time
query, while information filtering is concerned with repeated uses by a person
or persons with long-term goals or interests. In
IR ad hoc goals of user are expressed by him/her in
a form of a query. In information filtering stable interests of user are expressed
in the form of user profile, which within a filtering process plays the role
similar to the role of a query.
In case of
TM-based application
retrievable resources may be considered as consisting of:
- information resources (actual data, documents, illustrations, audio,
also non-TM metadata)
- topic map domain data (metadata expressed in form of TM)
Thus potentially retrieval may be twofold: at the level of information
resources and at the level of topic map. In current research we focus for
simplicity on retrieval in the topic map domain. General model will require
combining of standard retrieval and filtering techniques for resources domain
with the presented one for
TMs.
In typical applications content is embered in a fixed and known data
structure. In TM applications user is responsible for creating content, its
structure and data model for information. This results in structural diversity
and complexity of
TM data undergoing
retrieval. Because of lack of data model (in the sense of e.g. relational
databases) for the
TM domain, retrieval
has to incorporate meta-queries about data structure along with queries for
data.
Thus
TM querying capabilities
may be perceived as generalisation of relational queries. (It's good design
principle to extend existing functionality instead of proposing completely
new one.) So one of the principles while designing
TMQL
was to leverage existing and popular querying solutions, namely
SQL. It was also inspired by research
done in the area of information retrieval in conceptual graphs
[8]
and graphical query languages developed for hypertext structures
[1] [3] .
RDB as a Topic Map
The most frequently used information representation and retrieval model
RDB and
SQL
is worth being investigated for similarities with
TM
retrieval.
Typical
RDB structures may be
expressed using
TM model. Sample representation
is shown on
Figure 4.

Figure 4
Presented
TM model for relational
database shows simple two level hierarchical entity structure. It consists
of tables (
employee and
address)
grouped by
contacts association and individual
field values (e.g.
John) bound in records
to the table by association of type
record in table.
The designed query language should extend
SQL
in a way that it handles the above but also more liberal data structures.
TMQL proposal
In
SQL pair of names connected
with a dot:
employee.name denotes all values of field
name
in the table
employee. Because of fixed data architecture, roles
played by the two parts of the expression are set to
table
and
field, and we evaluate
field part. But it is frequently of interest what
are the tables that contain field called
name. Such reverse querries
or querries about data structure are not easily expressible in
SQL1. In our
TM model of relational
database we would just use the same association which is used to answer the
question for field value to answer the question of table names where the field
belongs to. It would be enough to pick the other end of topic association.
The role entities play in our
TM
data model may be defined by linear notation introduced in section
"Linear notation for
TM"
. If we preserve SQL division of select query into traditional
parts:
select,
from
and
where we may start to build out first
query:
select topic[x]
from
topic[x]
where
x="Chopin"
or the other version of it, which returns all topics of type "composer":
select
topic[x]
from
topic-type["composer"].topic[x]
On the other hand we could ask for all the types of topic "Chopin",
expecting that we will learn that besides the fact that he was "composer"
he was also "pianist":
select
topic[t]
from
topic-type[t].topic["Chopin"]
If one would like to know all the topics that are subtype of type "musician",
he would need to use join:
select
topic[x]
from
topic-type["musician"].topic[t1], topic-type[t2].topic[x]
where
t1 = t2
or equivalently:
select
topic[x]
from
topic-type["musician"].topic[t], topic-type[t].topic[x]
Figure 5
If we come back to the
TM model
for
RDB we could reformulate the basic
SQL query:
select
name
from
employee.name
writing the following TMQL equivalent:
select
topic[x]
from
topic-type["table"].topic["employee"].asoc-role[].assoc[a]
.assoc-role[ar].topic[x],
assoc[a].assoc-type["record in table"],
assoc-role[ar].assoc-role-type["name"]
TMQL select query returns a topic
map. In the examples above, it was just a list of topics, but select clause
may contain more complex structures which enable user to create useful views
over
TM infopool. The following query
represents a view that for every topic having supertype "person" makes "person"
its explicit topic type.
select
topic-type["person"].topic[x]
from
topic-type["person"].topic[t], topic-type[t].topic[x]
User profile as a TMQL query
User profile may be considered as a query that is stable over time and
returns wide range of objects related to the interest of the user. After specifying
set of such queries they may be used for filtering of incoming data or may
serve as a default filter switched on when number of hits from ad hoc queries is to large.
Let's assume that some student is interested in piano music. We could
assume that everything that is connected with both concepts: "piano" and "music"
is relevant for him/her. Thus student may encode personal interest profile
with the following query. Double dots ".." mean omission of association role,
which improves reading in cases where association roles are free.
select topic[x]
from
topic[x]..assoc[]..topic[prof-item1],
topic[x]..assoc[]..topic[prof-item2]
where
prof-item1 = "piano" and
prof-item2 like "music%"
The above query may be stored as "my-piano-music-profile" by the user.
To benefit from the profile any query executed by user should for instance
include extended where clause invoking
condition contained in the profile query. If we use as an example query asking
for all information related to Paris, the following query is expected to return
great number of hits:
select topic[x]
from
topic["Paris"]..assoc[a]..topic[x],
asoc-type["related to"].assoc[a]
Figure 6
And here is its version filtered by the user's profile, which is likely
to return information about piano concerts in Paris and hopefully also topic
of Chopin:
select topic[x]
from
topic["Paris"]..assoc[a]..topic[x],
asoc-type["related to"].assoc[a]
where
topic[x] in (my-piano-music-profile)
For useful exploration of information resources in the presented way
we need some processing of topic map structures, which enable to interpret
the association "lived in" as a specialisation of association "related to"
so that the above query is able to find "Chopin" even in case where there
is no direct association of the type "related to" between "Chopin" and "Paris"
in the map. To achieve this intelligent inference of information contained
in the map is needed. We may use the support from thesauri, which are in ideal
case to be found in the supplementary
TM.
One of the
TM properties of great importance
for information inference applications is
transitivity
property of associations, unfortunately missing in the standard at the moment.
TMQL query expressed as Topic Map
We argued that
TMs seem to take
over duties of
AI information representation
techniques. If so it should be no problem to encode
TMQL
queries as topic maps also. Let's have a look at the last query based on user
profile definition displayed with help of graphical notation:

Figure 7
What has to be expressed additionally is: what of results of this query
has to be returned back to the user. This could be easily done with a help
of
TM construct - scope, which shuld
be set as a flag (to a predefined theme) for entities to be returned.
Interesting aspect of presented approach is that users may formulate
queries either by using
TMQL syntax
or by drawing the query using graphical tool similar to
TM
editor. Query in this representation looks very much like pattern of the requested
answer with some unknown parameters. This fact should enable wide range of
users to formulate advanced queries without knowing the syntax of
TMQL. If we would add natural processing unit
at the front of query engine, then system would be capable of turning natural
language queries into their topic map representation (which could be again
refined by user) and use this form as high value input for
TM
query engine.
Figure 8 presents overview of
TM
retrieval application model.

Figure 8
This was one of the goals of
CG
research. For instance question "What is related to Paris?" can be directly
mapped to the
TM representation demonstrated
on
Figure 6 and
Figure 7.
Summary
Based on previous research in
AI
we have introduced linear and graphical notations for
TM
knowledge structures. Those notations are used further on in the definition
of proposed topic map query language.
TMQL
is positioned as a generalisation of
SQL,
such that it covers also
TM data having
more complex structure than just relational. The paper reports work in progress
rather than a result of finite research. Significant number of questions is
still to be soved.
Presented graphical notation for the queries shall enable users to express
their information needs in a form of topic pattern. Such pattern is later
on matched with information resources in order to find unknown parameters
and present them to the user in a form of view over a topic map.
Therefore answer to information request is a question of matching right
topic maps: one for a query, one for user's profile and finally one for information
resources.
Bibliography
| [Aman92] | Amann B., Scholl M. (1992) Gram:
A Graph Data Model and Query Language, |
| [Belk92] | Belkin N.J., Croft W.B. (1992) Information
Filtering And Information Retrieval: Two Sides Of The Same Coin?,
Communications of the ACM. |
| [Cruz87] | Cruz I.F., Mendelzon A.O., Wood P.T. (1987) A Graphical Query Language Supporting Recursion,
International Conference on Management of Data and Symposium on Principles
of Database Systems, ACM. |
| [Grif82] | Griffith R.L. (1982) Three Principles
of Representation for Semantic Networks, ACM Transactions on Database
Systems. |
| [Ksie99] | Ksiezyk R. (1999) Trying not
to get lost with a topic map, XML Europe '99, GCA. |
| [Lehm92] | Lehmann F., Rodin E.Y. (1992) Semantic
Networks in Artificial Intelligence, Pergamon Press. |
| [Sowa00] | Sowa J.F. (2000) Knowledge Representation:
logical, philosophical, and computational foundations, Brooks/Cole. |
| [Yang93] | Yang G., Oh J. (1993) Knowledge
Acquisition and Retrieval Based on Conceptual Graphs, Symposium
on Applied Computing, ACM. |