Answer is just a question [of matching Topic Maps]
Rafal Ksiezyk
Find


Abstract
Expressive power of Topic Maps, commonly perceived as method for indexing of information resources, places the standard very close to artificial intelligence and knowledge modelling. Topic Maps resemble semantic networks and conceptual graphs, but offer more - a unique, standards-based way of encoding and exchange of the knowledge.
Starting from description of similarities and differences between artificial intelligence and Topic Maps, paper demonstrates artificial intelligence features worth incorporating into Topic Maps domain along with practical benefits they provide.
Further on it focuses on applications in information filtering and retrieval using proposed topic-map-defined query language and user profile definition.

Contents
  1. Introduction
  2. Knowledge representation
    1. Semantic networks
    2. Conceptual Graphs
  3. Lesson to be learned from AI knowledge representation techniques
    1. Similarities and differences
    2. Linear notation for TM
    3. Graphical notation
  4. Information retrieval and filtering for Topic Maps
    1. RDB as a Topic Map
    2. TMQL proposal
    3. User profile as a TMQL query
    4. TMQL query expressed as Topic Map
  5. Summary
  6. Bibliography

Introduction
TM, ISO/IEC 13250:1999 standard, describe what an information set is about, by formally declaring topics and associations, and by linking the relevant parts of the information set to the appropriate topics. Topic Maps help organise and retrieve online information in a way that it can be mastered by information owners and information users. TM play the same role as indexes play in books, and that thesauri play in editorial environment.
Due to its generality and expressive power they go far beyond meta-information modelling. Topic Maps can express facts, procedures and fairly complex relations between concepts. As such they come close to the knowledge representation, branch of AI.
Current paper presents the AI knowledge representation foundations as well as points out which of them may be useful for TM field of applications. In the following chapters information retrieval and filtering model is introduced and applied to TM environment.
Last chapter presents proposal for TMQL syntax and semantic. It contains discussion of TMQL usage scenarios. In particular it discusses TMQL application to user profile definition as well as ad hoc query formulation on basis of already defined profile.
Previous Previous Table of Contents
Knowledge representation
AI makes the claim that computers can be made to "think" on a level equal to humans, or at least states that some "thinking-like" features can be added to computers to make them more useful tools. "Think" and "thinking-like" means to use knowledge representation data structures and apply to it thinking representation processing. Knowledge representation techniques like semantic networks or conceptual graphs turn what we know about a particular domain into a form in which a computer can understand it.
Semantic networks
SN [Lehm92] create models involving nodes and links (arcs or arrows) between nodes. The nodes represent objects or concepts and the links represent relations between nodes. The links are directed and labelled, thus, a semantic network is a labeled, directed graph. The network defines a set of binary relations on a set of nodes. For instance, suppose that one wanted to represent the fact that Frederic Chopin was born in Poland. One might represent Chopin as one node, Poland as another node, and the relation was born in as a link between the two nodes. The fact that Chopin also composed mazurka might be represented by adding a node "mazurka" and connecting it to the "Chopin" node with instance of a composed link (see Figure 1).
Figure 1
In a semantic net nodes represent objects and links represent relations among these objects. One can think of the objects as concepts or meanings and the relations among them conceptual relations (an intensional account) or think of the objects as entities in the world and relations among them (an extensional account). Notice that while the relations allowed are only binary this does not restrict the expressiveness of the semantic net. Any n-ary relation can be decomposed into a binary relation. The commitment to binary relations does has some effect on the fluency of the representation as this decomposition may do some damage to original conception underlying the construction of the network. Another restriction on semantic network representations as opposed to the more powerful frame based representations is a difficulty in representing kinds of things and natural classes. It is hard to represent what we mean by saying Paris is a city in a pure semantic net since a notion of inheritance of properties in a type hierarchy is lacking.
Conceptual Graphs
The idea of CG [Sowa00] is a combination of logic with the semantic networks used in AI and computational linguistics. A concept graph is finite contacted bipartite graph. There are two kinds of nodes; concept nodes (displayed as a box in graph notation) which represent entities, attributes, states, and events, and relation nodes (displayed as a circle in graph notation) which represent the relationship among concept nodes. A single concept by itself ( Figure 2-a) may form a conceptual graph but it is not the case with the relation ( Figure 2-b), since every relation node should have one or more arcs each of which must be linked to some concept ( Figure 2-c).
Figure 2
A conceptual graph can be constructed by assembling percepts. In the process of assembly, concept relations specify the role that each percept plays and concepts represent percepts themselves. A concept can be an individual or generic. The function referent maps concepts into a generic markers denoted with names starting from * or individual markers usually denoted by numbers. The function type maps concepts into a set of type labels. A concept c with type(c)=t and reference(c)=f is displayed as [t:f] in the linear form. The function type can be applied to relation also. A relation r with type(r)=t is displayed as (t) in the linear form.
Types of concepts (type labels) are organized into a hierarchy called type hierarchy. Type hierarchy forming operations includes conjunction and disjunction operations, so the type hierarchy will be a formal lattice.
For example, a sentence Chopin composed mazurka can be represented in conceptual graphs as:
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA]
or if we introduce individual concept - mazurka Opus 23:
[CHOPIN]<-(AGNT)<-[COMPOSE]->(OBJ)->[MAZURKA:23]
For example, a sentence "Chopin composed mazurka in his beloved Paris" can be represented by the following linear form:
[COMPOSE]-
->(AGNT)->[CHOPIN:*x]
->(OBJ)->[MAZURKA]
->(PLACE)->[PARIS]<-(THME)<-[LIKE]<-(AGNT)<-[CHOPIN:*x]
This linear form finds the following graph representation ( Figure 3).
Figure 3
Previous Previous Table of Contents
Lesson to be learned from AI knowledge representation techniques
TMs are also to be used for knowledge representation purposes. Therefore they will face with typical AI problems, including completeness of the language, effectivity and nonambiguity. It's worth to make a comparison of the AI languages presented above with topic maps.
Similarities and differences
SN use nodes and labeled, directed arcs to express knowledge. It is not clearly defined whether used entities represent classes or particular instances of classes. Lack of clear specification of roles, which can be played by entities representing concepts and relations, was found to cause misinterpretations and ambiguity of the semantic in the model [4].
Conceptual graphs are more precise and mature formulation of SN. They relay on more atomic constructs, which used in defined roles of concept and relation are able to build higher order assertions. CG have expressive power equivalent to predicate logic and thus may represent natural language statements as well as any other data or knowledge expressible by natural language. This includes ability to express TM data using conceptual graphs.
Topic maps offer larger building blocks than CG: topic types, associations with types and association roles with association role types. TM identify scope, although it could be expressed with special association type. The same applies to occurrences and facets. Generally, TM offer constructs with more predefined semantic that CG and SN
This is good on one hand but bad on the other, since we have a lot of ready made instruments to use, on contrary if we need to build a custom one - we have no basic materials. This would be the case with implementing CG concept of context - used for instance to express assertions within the content of somebodies belief.
One of the lessons to be learned from AI is that TM should treat all of its constructs (including associations, association roles) as first class objects (topics), which is not allowed at the moment by the standard. This heterogenity may cause problems in building elegant and compact data models leveraging work done in AI [5].
Lesson of semantic networks shows that we have to keep clear distinction between class and its instance levels for association and its association type (e.g. association type: composed, association: composed#23182), association role and its type (e.g. association role type: composer; association role: composer#98723) and for other such constructs.
As shown in the previous sections, CG research has developed linear and graph notations that may be taken over by TM users.
Linear notation for TM
Based on achievements in textual representation of AI knowledge structures we may propose analogous language for coding of TMs. Such language will enable TM interchange using more compact language and without in-depth knowledge of SGML or XML syntax.
The proposed language is based conceptually on linear notation for conceptual graphs, though because TM use number of semantic constructs we are forced to introduce more verbose syntax. Table 1 summarises basic constructs of the proposed language.
Square and round brackets used in CG for concept and relation are substituted by explicit name of construct with its identifier in square brackets. Directed arcs are replaced by dot, since we no longer need to mark directionality of relations which is expressed in TMs by association roles.
Presented language will be used later as a syntax foundation for proposed TM query language.
topictopic[]
topic with id "chopin"topic["chopin"]
topic with declared topic type "person" topic[].topic-type["person"]
association with id "as243"assoc["as243"]
association with declared association type "born-in"assoc["as243"].assoc-type["born-in"]
topic "chopin" plays a unspecified role in association which type is "born-in"topic["chopin"].assoc-role[].assoc[].assoc-type["born-in"]
topic "chopin" plays a role of type "who" in association which type is "born-in"topic["chopin"].assoc-role[ar1].assoc[].assoc-type["born-in"], assoc-role[ar1].assoc-role-type["who"]
topic "chopin" plays a role of type "who" in association which type is "born-in" where the other role of type "where" is played by "poland", topic "poland" is of type "country"topic["chopin"].assoc-role[ar1].assoc[a].assoc-role[ar2].topic["poland"],assoc[a].assoc-type["born-in"], assoc-role[ar1].assoc-role-type["who"],assoc-role[ar2].assoc-role-type["where"] topic["poland"].topic-type["country"]
Table 1
In fact whether we use "assoc[a]" instead of "(a)" and "." instead of "->" is a matter of markup minimization and configuration of delimiters used. This issue is well known to SGML community. So one could propose isomorphic syntax but with more compact delimiters as a useful language e.g. for rapid coding of prototype TMs.
Graphical notation
As useful as textual notation is graphical representation of TM structures. It may serve as one of possible graphical user interfaces for TM display on screen, as well as for editing and browsing purposes. Current proposal was inspired by graphical representation of conceptual graphs. Table 1 provides graphical equivalents for the previously introduced TM statements:
topic
topic with id "chopin"
topic with declared topic type "person"
association with id "as243"
association with declared association type "born-in"
topic "chopin" plays a unspecified role in association which type is "born-in"
topic "chopin" plays a role of type "who" in association which type is "born-in"
topic "chopin" plays a role of type "who" in association which type is "born-in" while the other role of type "where" is played by topic "poland", which is of type "country"
Table 2
Previous Previous Table of Contents
Information retrieval and filtering for Topic Maps
After having defined linear and graphical notations for TMs we may try to approach IR for TM-encoded data. Information retrieval is usually described [Belk92] as way of leading the user to those documents that will best enable him/her to satisfy his/her need for information. IR is typically concerned with single uses of the system, by a person with one-time goal and one-time query, while information filtering is concerned with repeated uses by a person or persons with long-term goals or interests. In IR ad hoc goals of user are expressed by him/her in a form of a query. In information filtering stable interests of user are expressed in the form of user profile, which within a filtering process plays the role similar to the role of a query.
In case of TM-based application retrievable resources may be considered as consisting of:
Thus potentially retrieval may be twofold: at the level of information resources and at the level of topic map. In current research we focus for simplicity on retrieval in the topic map domain. General model will require combining of standard retrieval and filtering techniques for resources domain with the presented one for TMs.
In typical applications content is embered in a fixed and known data structure. In TM applications user is responsible for creating content, its structure and data model for information. This results in structural diversity and complexity of TM data undergoing retrieval. Because of lack of data model (in the sense of e.g. relational databases) for the TM domain, retrieval has to incorporate meta-queries about data structure along with queries for data.
Thus TM querying capabilities may be perceived as generalisation of relational queries. (It's good design principle to extend existing functionality instead of proposing completely new one.) So one of the principles while designing TMQL was to leverage existing and popular querying solutions, namely SQL. It was also inspired by research done in the area of information retrieval in conceptual graphs [8] and graphical query languages developed for hypertext structures [1] [3] .
RDB as a Topic Map
The most frequently used information representation and retrieval model RDB and SQL is worth being investigated for similarities with TM retrieval.
Typical RDB structures may be expressed using TM model. Sample representation is shown on Figure 4.
Figure 4
Presented TM model for relational database shows simple two level hierarchical entity structure. It consists of tables (employee and address) grouped by contacts association and individual field values (e.g. John) bound in records to the table by association of type record in table.
The designed query language should extend SQL in a way that it handles the above but also more liberal data structures.
TMQL proposal
In SQL pair of names connected with a dot: employee.name denotes all values of field name in the table employee. Because of fixed data architecture, roles played by the two parts of the expression are set to table and field, and we evaluate field part. But it is frequently of interest what are the tables that contain field called name. Such reverse querries or querries about data structure are not easily expressible in SQL1. In our TM model of relational database we would just use the same association which is used to answer the question for field value to answer the question of table names where the field belongs to. It would be enough to pick the other end of topic association.
The role entities play in our TM data model may be defined by linear notation introduced in section "Linear notation for TM" . If we preserve SQL division of select query into traditional parts: select, from and where we may start to build out first query:
select topic[x]
from
topic[x]
where
x="Chopin"
or the other version of it, which returns all topics of type "composer":
select
topic[x]
from
topic-type["composer"].topic[x]
On the other hand we could ask for all the types of topic "Chopin", expecting that we will learn that besides the fact that he was "composer" he was also "pianist":
select
topic[t]
from
topic-type[t].topic["Chopin"]
If one would like to know all the topics that are subtype of type "musician", he would need to use join:
select
topic[x]
from
topic-type["musician"].topic[t1], topic-type[t2].topic[x]
where
t1 = t2
or equivalently:
select
topic[x]
from
topic-type["musician"].topic[t], topic-type[t].topic[x]
Figure 5
If we come back to the TM model for RDB we could reformulate the basic SQL query:
select
name
from
employee.name
writing the following TMQL equivalent:
select
topic[x]
from
topic-type["table"].topic["employee"].asoc-role[].assoc[a]
.assoc-role[ar].topic[x],
assoc[a].assoc-type["record in table"],
assoc-role[ar].assoc-role-type["name"]
TMQL select query returns a topic map. In the examples above, it was just a list of topics, but select clause may contain more complex structures which enable user to create useful views over TM infopool. The following query represents a view that for every topic having supertype "person" makes "person" its explicit topic type.
select
topic-type["person"].topic[x]
from
topic-type["person"].topic[t], topic-type[t].topic[x]
User profile as a TMQL query
User profile may be considered as a query that is stable over time and returns wide range of objects related to the interest of the user. After specifying set of such queries they may be used for filtering of incoming data or may serve as a default filter switched on when number of hits from ad hoc queries is to large.
Let's assume that some student is interested in piano music. We could assume that everything that is connected with both concepts: "piano" and "music" is relevant for him/her. Thus student may encode personal interest profile with the following query. Double dots ".." mean omission of association role, which improves reading in cases where association roles are free.
select topic[x]
from
topic[x]..assoc[]..topic[prof-item1],
topic[x]..assoc[]..topic[prof-item2]
where
prof-item1 = "piano" and
prof-item2 like "music%"
The above query may be stored as "my-piano-music-profile" by the user. To benefit from the profile any query executed by user should for instance include extended where clause invoking condition contained in the profile query. If we use as an example query asking for all information related to Paris, the following query is expected to return great number of hits:
select topic[x]
from
topic["Paris"]..assoc[a]..topic[x],
asoc-type["related to"].assoc[a]
Figure 6
And here is its version filtered by the user's profile, which is likely to return information about piano concerts in Paris and hopefully also topic of Chopin:
select topic[x]
from
topic["Paris"]..assoc[a]..topic[x],
asoc-type["related to"].assoc[a]
where
topic[x] in (my-piano-music-profile)
For useful exploration of information resources in the presented way we need some processing of topic map structures, which enable to interpret the association "lived in" as a specialisation of association "related to" so that the above query is able to find "Chopin" even in case where there is no direct association of the type "related to" between "Chopin" and "Paris" in the map. To achieve this intelligent inference of information contained in the map is needed. We may use the support from thesauri, which are in ideal case to be found in the supplementary TM. One of the TM properties of great importance for information inference applications is transitivity property of associations, unfortunately missing in the standard at the moment.
TMQL query expressed as Topic Map
We argued that TMs seem to take over duties of AI information representation techniques. If so it should be no problem to encode TMQL queries as topic maps also. Let's have a look at the last query based on user profile definition displayed with help of graphical notation:
Figure 7
What has to be expressed additionally is: what of results of this query has to be returned back to the user. This could be easily done with a help of TM construct - scope, which shuld be set as a flag (to a predefined theme) for entities to be returned.
Interesting aspect of presented approach is that users may formulate queries either by using TMQL syntax or by drawing the query using graphical tool similar to TM editor. Query in this representation looks very much like pattern of the requested answer with some unknown parameters. This fact should enable wide range of users to formulate advanced queries without knowing the syntax of TMQL. If we would add natural processing unit at the front of query engine, then system would be capable of turning natural language queries into their topic map representation (which could be again refined by user) and use this form as high value input for TM query engine. Figure 8 presents overview of TM retrieval application model.
Figure 8
This was one of the goals of CG research. For instance question "What is related to Paris?" can be directly mapped to the TM representation demonstrated on Figure 6 and Figure 7.
Previous Previous Table of Contents
Summary
Based on previous research in AI we have introduced linear and graphical notations for TM knowledge structures. Those notations are used further on in the definition of proposed topic map query language. TMQL is positioned as a generalisation of SQL, such that it covers also TM data having more complex structure than just relational. The paper reports work in progress rather than a result of finite research. Significant number of questions is still to be soved.
Presented graphical notation for the queries shall enable users to express their information needs in a form of topic pattern. Such pattern is later on matched with information resources in order to find unknown parameters and present them to the user in a form of view over a topic map.
Therefore answer to information request is a question of matching right topic maps: one for a query, one for user's profile and finally one for information resources.
Previous Previous Table of Contents
Bibliography
[Aman92]Amann B., Scholl M. (1992) Gram: A Graph Data Model and Query Language,
[Belk92]Belkin N.J., Croft W.B. (1992) Information Filtering And Information Retrieval: Two Sides Of The Same Coin?, Communications of the ACM.
[Cruz87]Cruz I.F., Mendelzon A.O., Wood P.T. (1987) A Graphical Query Language Supporting Recursion, International Conference on Management of Data and Symposium on Principles of Database Systems, ACM.
[Grif82]Griffith R.L. (1982) Three Principles of Representation for Semantic Networks, ACM Transactions on Database Systems.
[Ksie99]Ksiezyk R. (1999) Trying not to get lost with a topic map, XML Europe '99, GCA.
[Lehm92]Lehmann F., Rodin E.Y. (1992) Semantic Networks in Artificial Intelligence, Pergamon Press.
[Sowa00]Sowa J.F. (2000) Knowledge Representation: logical, philosophical, and computational foundations, Brooks/Cole.
[Yang93]Yang G., Oh J. (1993) Knowledge Acquisition and Retrieval Based on Conceptual Graphs, Symposium on Applied Computing, ACM.
Previous Previous Table of Contents