GCA
GCA Attend a GCA Conference

Extreme Conference Logo

Extreme Markup Languages 2000

Thursday, August 17, 2000

Click on highlighted titles for visual presentation.

9:00 - 9:45

Yellow Track
Validating Topic Maps with constraints
Hans Holger Rath, STEP Electronic Publishing Solutions GmbH

A Topic Map can be expressed validly, in terms of the ISO/IEC 13250 standard, and yet contain information that is inconsistently or incompletely expressed. For example, creators and maintainers of large, complex Topic Maps need ways to use computers to identify trouble spots, such as topics that have been incompletely specified (where the criteria for "completeness" are arbitrary and specifiable). Possible uses of the values of "scope" attributes can be to specify value constraints that can be tested algorithmically. The extensions to the standard that make this possible are minor, and they can express several important kinds of combinatorial constraints.

Blue Track

Structured content out of Microsoft Word: technologies and tricks
Irina Golfman, Inera Incorporated

The need to convert documents created in Microsoft Word (which may be represented in Word Binary, RTF, or Word Object Model) to more structured tagged information is a perennial problem. Generating structured content from documents created in Microsoft Word can be done reliably by employing several strategies: the use of good templates, an easy way for users to apply the template; and identification of elements through pattern recognition. The Word Object Model does not provide complete information in VBA to completely convert a document to XML; it is necessary to work with RTF. If you have the need to do bi-directional conversions, however, the most expedient method is to use a combination of the Word Object Model and RTF. Implementation strategies, techniques, and solutions for selecting most authoritative source, validation, bi-directional parsing, dealing with math and tabular formats, and the "solution" of Word 2000 are discussed.

9:45 - 10:30

Yellow Track

Semantic interoperability on the Web
Jeff Heflin and James Hendler, both of University of Maryland

"Semantic interoperability" — the ability to make use of information outside of its semantic universe of origin — is highly desirable because we all live in a worldwide universe of (somewhat disjunct) semantic universes. As different semantic universes increasingly share one worldwide Web, the problem of semantic interoperability becomes more urgent and less ignorable. XML is able to accommodate an unbounded number of diverse markup vocabularies, each of which makes sense in its own semantic universe, but XML, by itself, does not make semantics portable among universes. RDF (the W3C’s Resource Description Framework Recommendation) facilitates some aspects of semantic interoperability. The SHOE (Simple HTML Ontology Expressions) language has many features necessary for the expression of semantic webs, and may be better suited for semantics on the Web than either XML DTDs or RDF.

Blue Track

Advantages and difficulties with TEI tagging: Experiences from an aided document composition and translation tool
Arantza Casillas, Universidad de Alcalá, Joseba Abaitua, Universidad de Deusto, Bilbao, and Raquel Martínez, Universidad Complutense de Madrid

Translation memories and SGML-authoring can be hybridized to produce substantial machine translation coverage. Based on the idea of using DTDs as document-generation grammars, we present an interactive editing tool that integrates the process of source document composition and translation into the target language. The tool benefits from a collection of complementary language databases automatically derived from a TEI conformant tagged and aligned parallel corpus.

11:00 - 12:30 Plenary

Topic Maps and RDF
Eric Freese, ISOGEN/DataChannel

RDF and Topic Maps
Eric Miller, Online Computer Library Center

Very similar claims are made for RDF (the W3C Resource Description Framework and related Recommendations) and Topic Maps (ISO/IEC 13250:2000). Both are vigorously promoted (by different parties) as the absolute-best way to associate arbitrary metadata with arbitrary content, and to support an unbounded variety of information-finding and other functionalities. Indeed, both have been openly described by respected pundits as panaceas for every kind of information management woe -- but never by the same pundits. If we subtract from the discussion all political posturings, rivalries, and hard-to-compare claims and counterclaims made by competing economic interests, what are the comparative technical and business merits? As this conference program was going to press, nobody seemed to have a commanding grasp of both paradigms. Two distinguished speakers, who have agreed to share their differing perspectives with us. Both of them have been encouraged to profile both technologies in their talks, in the hope that two perspectives on each technology will illuminate both of them for the rest of us. Some of the questions we hope each of them will address include:

• How are RDF and Topic Maps supposed to be implemented? Where are the places where miracles (proprietary and/or nonproprietary software magic) are supposed to occur? How is this magic described and/or constrained in each paradigm’s documentation?

• Is it true that both RDF and Topic Maps are primarily about expressing relationships between things? If so, what kinds of relationships can they express, and how are the relationships characterized? What kinds of things can participate in such relationships? Do both RDF/RDFS and Topic Maps formalize the context(s) within which particular relationships are regarded as relevant or valid?

• What are the practical constraints on the use of RDF/RDFS? on using Topic Maps?

• How do the use cases of RDF/RDFS and Topic Maps differ? In the (purely hypothetical) event that the public could choose between RDF and Topic Maps on the basis of its own best interests, under what circumstances would the public rationally choose one, the other, either, or neither, and why?

Each speaker will have half an hour to make his case. After both presentations, there will be a half hour of facilitated discussion, during which questions will be invited.

2:00 - 2:45 Plenary

Invited Keynote
Douglas B. Lenat, President, CyCorp

Douglas B. Lenat has been a professor of computer science at Carnegie-Mellon University and Stanford University, and authored hundreds of publications. His work includes the first meta-representation language (RLL) and forays in natural-language understanding, automatic program synthesis, and machine learning by discovery. But in 1984, he concluded that "each subfield of Artificial Intelligence has hit a brick wall — the very same brick wall — namely the need for our programs to have the breadth and depth of common-sense knowledge and reasoning abilities as people do. To achieve that, I’m afraid that elegant, ‘free lunch’ tactics are not going to substitute for long, hard work. It’s time to bite the bullet." To put his money where his mouth was, Lenat formed the CYC common-sense project at MCC in Austin in 1984; the project reached fruition, as planned, after a decade and spun off as a separate company, Cycorp (www.cyc.com) of which Dr. Lenat is President and CEO.

2:45 - 3:30

Yellow Track

Constructing a navigableTopic Map by inductive semantic acquisition methods
Helka Folch, Eléctricité de France, Benoit Habert

Once it has been made, a well-made Topic Map can make desired information easily findable, even when the desired information is a very small part of a very large library of resources. However, the effort involved in making a useful Topic Map for a very large corpus of very diverse materials can be quite large. The Scriptorium Project of EDF (the French national electrical monopoly) makes this problem manageable using several data mining methods including ALCESTE, part of a process that subjects the text content of EDF’s enormous backlog of heterogeneous resources to statistical analysis. The semantic classes thus generated become topics in the resulting Topic Maps. A side effect of the process is the division of the library into manageable (<10 Mb) corpora, the identity of each of which is reflected in the "scope" specifications of the resulting topic characteristics.

Blue Track

XML-izing Eiffel: Why language designers and programmers should embrace 20th century markup technology
Sam Hunting, Chasse, Balisage LLC

While there have been many proposals to optimize XML and SGML syntax for processing by programs, there have been few proposals to reform programming synta x to reap the benefits of the revolution in markup technologies that began with ISO 8879. Such a reform would enable the users of that subset of documents called programs to enjoy the interchange, component management, validation, and longevity advantages enjoyed by the markup community. Eiffel, an industrial-strength object-oriented programming language, could be translated into XML syntax, which would allow the use of technologies like RELAX and XLink in an Eiffel programming environment.

4:00 - 4:45

Yellow Track

Building dynamic Web sites with Topic Maps and XSLT
Nikita Ogievetsky, Cogitech Inc.

ISO/IEC 13250 Topic Maps cannot be expressed in HTML, but HTML offers an excellent way to deliver browsable information via the Web. The use of a Topic Map as the maintained "source code" or "sitemap" of a website, for example, is one of the applications of Topic Maps that offer convenience, power, reliability, and rapid reconfigurability to the maintainers of large, complex websites. There are many ways in which Topic Maps can be used to create and maintain commercial websites: XSLT transformations can be used to generate richly-linked HTML pages from Topic Maps, and Topic Maps constructs (occurrence roles, topic names, association roles, etc.) can play specific roles in the process of automatically creating the delivered HTML.

Blue Track

A case for the implementation of groves in a PDM environment
Trish Laedtke, ISOGEN International

Basically, Product Data Management (PDM) systems facilitate three kinds of activity: 1) identifying new data, 2) adding value to new data, and to the entire dataset, and 3) making the data available to multiple sites in a variety of ways, As new data is put into the system, the new item is parsed and "understood". In a PDM utilizing the grove paradigm, the result of parsing/understanding is a grove, interconnected data nodes, with each node consisting of named properties, and values for those properties. Every node in every grove is fully addressable and available for every application’s purpose. There are many advantages of groves to traditional PDM processing.

4:45 - 5:30

Yellow Track

Simultaneous Topic Maps and RDF metadata structures in SVG
David Dodds, Open Text

Topic Maps constructs can be embedded as RDF metadata in Scalable Vector Graphics (SVG) resources. SVG resources have distinct "title" and "description" elements that permit XML Namespaces to be invoked. Collections of SVG objects, such as the bars in a bar chart, can be accompanied by RDF metadata that can be both rendered for human perception and understood by machines. Using such metadata, software can "know" that a bar chart is a bar chart whose axes are expressed in terms of certain measurement domains, that it has a specific number of bars in it, and the quantitative significance of the lengths and positions of the bars with respect to those measurement domains. Similarly, a Topic Map can be embedded in an RDF element (rdf:parseType="Literal"), allowing the Topic Map information to be handled by an external Topic Map processing system.

Blue Track

Demonstrational interface for XSLT stylesheet generation
Teruo Koyanagi, Kouichi Ono, and Masahiro Hori, all of IBM

XSLT plays an important role on the conversion of data among different XML representations. Converting XML into HTML is a particularly practical task because it lets Web browsers render XML documents in human-readable form. Describing the desired HTML rendering of a document can be done by people who are skilled in Web page styling but who may not have the skills to write XSL transformations. For such users we suggest XSLT stylesheet authoring by demonstration. First, we introduce the paradigm of programming by demonstration, and briefly explain a model of WYSIWYG editing. We then elaborate a process of XSLT rule generation based on the use’s operation history recorded behind the WYSIWYG editor. Finally, we give an example of XSLT rules for HTML rendering, created by a rule generation module.


HISTORY SCHEDULE-AT-A-GLANCE
CONFERENCE PROGRAM TUTORIALS
REGISTRATION INFO RELATED EVENTS
HOTEL INFORMATION

Attend a GCA ConferenceBecome a GCA MemberBuy a GCA Publication
Today's News Digest
What is XML?What is SGML?ICEGCA's Mail.dat
Technical CommitteesTechnical ResourcesTargeted InitiativesGCA's GRACol
What is GCA?GCA Press ReleasesGCA MembersGCA's ICCContact GCA
GCA - Phone: +1 703-519-8160   Click Here For Legal And Technical Information
Click Here For Legal And Technical Information email: info@gca.org