Extreme
Markup Languages 2000
Thursday, August 17, 2000
Click
on highlighted titles for visual presentation.
9:00
- 9:45
Yellow Track
Validating Topic
Maps with constraints
Hans Holger Rath, STEP Electronic Publishing
Solutions GmbH
A Topic
Map can be expressed validly, in terms of the ISO/IEC
13250 standard, and yet contain information that is
inconsistently or incompletely expressed. For example,
creators and maintainers of large, complex Topic Maps
need ways to use computers to identify trouble spots,
such as topics that have been incompletely specified
(where the criteria for "completeness" are arbitrary
and specifiable). Possible uses of the values of "scope"
attributes can be to specify value constraints that
can be tested algorithmically. The extensions to the
standard that make this possible are minor, and they
can express several important kinds of combinatorial
constraints.
Blue
Track
Structured content out of Microsoft Word: technologies
and tricks
Irina Golfman, Inera Incorporated
The need
to convert documents created in Microsoft Word (which
may be represented in Word Binary, RTF, or Word Object
Model) to more structured tagged information is a
perennial problem. Generating structured content from
documents created in Microsoft Word can be done reliably
by employing several strategies: the use of good templates,
an easy way for users to apply the template; and identification
of elements through pattern recognition. The Word
Object Model does not provide complete information
in VBA to completely convert a document to XML; it
is necessary to work with RTF. If you have the need
to do bi-directional conversions, however, the most
expedient method is to use a combination of the Word
Object Model and RTF. Implementation strategies, techniques,
and solutions for selecting most authoritative source,
validation, bi-directional parsing, dealing with math
and tabular formats, and the "solution" of Word 2000
are discussed.
9:45
- 10:30
Yellow Track
Semantic
interoperability on the Web
Jeff Heflin and James Hendler, both
of University of Maryland
"Semantic
interoperability" the ability to make use of
information outside of its semantic universe of origin
is highly desirable because we all live in
a worldwide universe of (somewhat disjunct) semantic
universes. As different semantic universes increasingly
share one worldwide Web, the problem of semantic interoperability
becomes more urgent and less ignorable. XML is able
to accommodate an unbounded number of diverse markup
vocabularies, each of which makes sense in its own
semantic universe, but XML, by itself, does not make
semantics portable among universes. RDF (the W3Cs
Resource Description Framework Recommendation) facilitates
some aspects of semantic interoperability. The SHOE
(Simple HTML Ontology Expressions) language has many
features necessary for the expression of semantic
webs, and may be better suited for semantics on the
Web than either XML DTDs or RDF.
Blue
Track
Advantages and difficulties with TEI tagging: Experiences
from an aided document composition and translation
tool
Arantza Casillas, Universidad de Alcalá,
Joseba Abaitua, Universidad de Deusto, Bilbao,
and Raquel Martínez, Universidad
Complutense de Madrid
Translation
memories and SGML-authoring can be hybridized to produce
substantial machine translation coverage. Based on
the idea of using DTDs as document-generation grammars,
we present an interactive editing tool that integrates
the process of source document composition and translation
into the target language. The tool benefits from a
collection of complementary language databases automatically
derived from a TEI conformant tagged and aligned parallel
corpus.
11:00
- 12:30 Plenary
Topic Maps and RDF
Eric Freese, ISOGEN/DataChannel
RDF
and Topic Maps
Eric Miller, Online Computer Library Center
Very similar
claims are made for RDF (the W3C Resource Description
Framework and related Recommendations) and Topic Maps
(ISO/IEC 13250:2000). Both are vigorously promoted
(by different parties) as the absolute-best way to
associate arbitrary metadata with arbitrary content,
and to support an unbounded variety of information-finding
and other functionalities. Indeed, both have been
openly described by respected pundits as panaceas
for every kind of information management woe -- but
never by the same pundits. If we subtract from the
discussion all political posturings, rivalries, and
hard-to-compare claims and counterclaims made by competing
economic interests, what are the comparative technical
and business merits? As this conference program was
going to press, nobody seemed to have a commanding
grasp of both paradigms. Two distinguished speakers,
who have agreed to share their differing perspectives
with us. Both of them have been encouraged to profile
both technologies in their talks, in the hope that
two perspectives on each technology will illuminate
both of them for the rest of us. Some of the questions
we hope each of them will address include:
How are RDF and Topic Maps supposed to be implemented?
Where are the places where miracles (proprietary and/or
nonproprietary software magic) are supposed to occur?
How is this magic described and/or constrained in
each paradigms documentation?
Is it true that both RDF and Topic Maps are primarily
about expressing relationships between things? If
so, what kinds of relationships can they express,
and how are the relationships characterized? What
kinds of things can participate in such relationships?
Do both RDF/RDFS and Topic Maps formalize the context(s)
within which particular relationships are regarded
as relevant or valid?
What are the practical constraints on the use of RDF/RDFS?
on using Topic Maps?
How do the use cases of RDF/RDFS and Topic Maps differ?
In the (purely hypothetical) event that the public
could choose between RDF and Topic Maps on the basis
of its own best interests, under what circumstances
would the public rationally choose one, the other,
either, or neither, and why?
Each speaker
will have half an hour to make his case. After both
presentations, there will be a half hour of facilitated
discussion, during which questions will be invited.
2:00
- 2:45 Plenary
Invited Keynote
Douglas B. Lenat, President, CyCorp
Douglas
B. Lenat has been a professor of computer science
at Carnegie-Mellon University and Stanford University,
and authored hundreds of publications. His work includes
the first meta-representation language (RLL) and forays
in natural-language understanding, automatic program
synthesis, and machine learning by discovery. But
in 1984, he concluded that "each subfield of Artificial
Intelligence has hit a brick wall the very
same brick wall namely the need for our programs
to have the breadth and depth of common-sense knowledge
and reasoning abilities as people do. To achieve that,
Im afraid that elegant, free lunch
tactics are not going to substitute for long, hard
work. Its time to bite the bullet." To put his
money where his mouth was, Lenat formed the CYC common-sense
project at MCC in Austin in 1984; the project reached
fruition, as planned, after a decade and spun off
as a separate company, Cycorp (www.cyc.com) of which
Dr. Lenat is President and CEO.
2:45
- 3:30
Yellow Track
Constructing
a navigableTopic Map by inductive semantic acquisition
methods
Helka Folch, Eléctricité
de France, Benoit Habert
Once it
has been made, a well-made Topic Map can make desired
information easily findable, even when the desired
information is a very small part of a very large library
of resources. However, the effort involved in making
a useful Topic Map for a very large corpus of very
diverse materials can be quite large. The Scriptorium
Project of EDF (the French national electrical monopoly)
makes this problem manageable using several data mining
methods including ALCESTE, part of a process that
subjects the text content of EDFs enormous backlog
of heterogeneous resources to statistical analysis.
The semantic classes thus generated become topics
in the resulting Topic Maps. A side effect of the
process is the division of the library into manageable
(<10 Mb) corpora, the identity of each of which
is reflected in the "scope" specifications of the
resulting topic characteristics.
Blue
Track
XML-izing Eiffel: Why language designers and programmers
should embrace 20th century markup technology
Sam Hunting, Chasse, Balisage LLC
While there
have been many proposals to optimize XML and SGML
syntax for processing by programs, there have been
few proposals to reform programming synta x to reap
the benefits of the revolution in markup technologies
that began with ISO 8879. Such a reform would enable
the users of that subset of documents called programs
to enjoy the interchange, component management, validation,
and longevity advantages enjoyed by the markup community.
Eiffel, an industrial-strength object-oriented programming
language, could be translated into XML syntax, which
would allow the use of technologies like RELAX and
XLink in an Eiffel programming environment.
4:00
- 4:45
Yellow Track
Building dynamic Web sites with Topic Maps and XSLT
Nikita Ogievetsky, Cogitech Inc.
ISO/IEC
13250 Topic Maps cannot be expressed in HTML, but
HTML offers an excellent way to deliver browsable
information via the Web. The use of a Topic Map as
the maintained "source code" or "sitemap" of a website,
for example, is one of the applications of Topic Maps
that offer convenience, power, reliability, and rapid
reconfigurability to the maintainers of large, complex
websites. There are many ways in which Topic Maps
can be used to create and maintain commercial websites:
XSLT transformations can be used to generate richly-linked
HTML pages from Topic Maps, and Topic Maps constructs
(occurrence roles, topic names, association roles,
etc.) can play specific roles in the process of automatically
creating the delivered HTML.
Blue
Track
A case for
the implementation of groves in a PDM environment
Trish Laedtke, ISOGEN International
Basically,
Product Data Management (PDM) systems facilitate three
kinds of activity: 1) identifying new data, 2) adding
value to new data, and to the entire dataset, and
3) making the data available to multiple sites in
a variety of ways, As new data is put into the system,
the new item is parsed and "understood". In a PDM
utilizing the grove paradigm, the result of parsing/understanding
is a grove, interconnected data nodes, with each node
consisting of named properties, and values for those
properties. Every node in every grove is fully addressable
and available for every applications purpose.
There are many advantages of groves to traditional
PDM processing.
4:45
- 5:30
Yellow Track
Simultaneous Topic Maps and RDF metadata structures
in SVG
David Dodds, Open Text
Topic Maps
constructs can be embedded as RDF metadata in Scalable
Vector Graphics (SVG) resources. SVG resources have
distinct "title" and "description" elements that permit
XML Namespaces to be invoked. Collections of SVG objects,
such as the bars in a bar chart, can be accompanied
by RDF metadata that can be both rendered for human
perception and understood by machines. Using such
metadata, software can "know" that a bar chart is
a bar chart whose axes are expressed in terms of certain
measurement domains, that it has a specific number
of bars in it, and the quantitative significance of
the lengths and positions of the bars with respect
to those measurement domains. Similarly, a Topic Map
can be embedded in an RDF element (rdf:parseType="Literal"),
allowing the Topic Map information to be handled by
an external Topic Map processing system.
Blue
Track
Demonstrational interface for XSLT stylesheet generation
Teruo Koyanagi, Kouichi Ono, and Masahiro
Hori, all of IBM
XSLT plays
an important role on the conversion of data among
different XML representations. Converting XML into
HTML is a particularly practical task because it lets
Web browsers render XML documents in human-readable
form. Describing the desired HTML rendering of a document
can be done by people who are skilled in Web page
styling but who may not have the skills to write XSL
transformations. For such users we suggest XSLT stylesheet
authoring by demonstration. First, we introduce the
paradigm of programming by demonstration, and briefly
explain a model of WYSIWYG editing. We then elaborate
a process of XSLT rule generation based on the uses
operation history recorded behind the WYSIWYG editor.
Finally, we give an example of XSLT rules for HTML
rendering, created by a rule generation module.
HISTORY |
SCHEDULE-AT-A-GLANCE |
CONFERENCE PROGRAM
|
TUTORIALS
|
REGISTRATION
INFO |
RELATED EVENTS |
HOTEL
INFORMATION
|
|
|