XML Europe 2001 logo21-25 May 2001
Internationales Congress Centrum (ICC)
Berlin, Germany

A Framework for Value-Added Publishing Concepts

Klaus Kreulich <klaus.kreulich@mbv.tu-chemnitz.de>
Michael Reiche <michael.reiche@mbv.tu-chemnitz.de>
Arved C. Hübler <arved.huebler@mbv.tu-chemnitz.de>
 PDF version    Latest version   

ABSTRACT

At the Institute for Print- and Media Technology at the Technical University of Chemnitz (Institute pm) a common framework for value-added publishing concepts was designed. The first application within the common framework, a Study-Guide-on-Demand (SGoD), which provides the key features of the framework, is now online. This application provides personalized and individualized information about the Institute pm and the offered training courses at the Institute. The content of the application is selectable via a standard Internet browser and is published on-demand as a print version, HTML version, and PDF document.

The generic xml-based publishing concept behind the case study can be applied to various publisher products. Examples are teaching materials with guided or individual tours, internal enterprise information documents, individually arrangeable travel guides, etc.

The paper is divided into two main parts. In the first part the developed publishing concept is presented. Thereby discussed is the emphasis put on the modeling of data and metadata with regard to feasible XML solutions. The second part of this paper will present the SGoD.

Table of Contents

1. INTRODUCTION

One aim of the XML standardization efforts is to establish new opportunities for innovative publishing processes. Especially, XML is best suited for personalized and individualized on-demand publishing solutions. This paper is about a common model for such an XML-based publishing application. And it is a report of a case study for a first prototype of that model.

At the Institute for Print- and Media Technology at the Technical University of Chemnitz (Institute pm) a common framework for value-added publishing concepts was designed. Some of the background testings, particularly with regard to the use of the XML Schema Recommendation, have already been presented at the XML Europe 2000 conference [HüKr 00]. The first application within the common framework, a Study-Guide-on-Demand (SGoD), which provides the key features of the framework, is online now. This application provides personalized and individualized information about the Institute pm and the offered training courses at the Institute. The content of the application is selectable via a standard Internet browser and is published on-demand as a print version, HTML version, and PDF document.

The generic publishing concept behind the case study can be applied to various publisher products. Examples are teaching materials with guided or individual tours, internal enterprise information documents, individually arrangeable travel guides, etc.

The following paper is divided into two main parts. In the first part the developed publishing concept is presented. Thereby discussed is the emphasis put on the modeling of data and metadata with regard to feasible XML solutions.

The second part of this paper will present the SGoD. Firstly, an overview of the system architecture will be given. Following the data model, this means the Document Type Definition and the implementation of the metadata, will be presented.

2. GENERIC BOOK MODEL

For the publishing and printing industry the innovative changes in the media field involve far-reaching opportunities for new publication concepts. The wide spreading of the Internet allows customers to access publisher's contents fast, directly, and individually. At the same time, the further developments of Print-on-Demand workflow- and production technologies carry the potentiality of individualized on-Demand production of print products. The Generic Book Model considers these developments and describes, under these premises, a plain workflow for the production of individual books.

The basic approach of the Generic Book Model is the combination of independent and already existing information objects to a new book. This means that the textual data basis for a book is not the complete draft of an author or an author collaboration, as in classic book production workflow, but a collection of book units from different sources. These book units can be, for instance, entire chapters as well as single tables or images, etc. A vital criterion to distinguish book units is that they each build a coherent unit regarding their content.

The aggregation of the book units to a concrete book can take place by different means. From a manual compilation by the reader to an automatic compilation by a Generic Book Software all solutions are possible.

Figure 1 shows the substantial production steps of the Generic Book Model. The first step is to select the book units that are relevant with regard to a concrete user request. Such a request results in an unorganized quantity of book units. Following, within the Generic Book Model, these contents are brought in context-based order. This means that the selected information objects are structured logically. They are, for example, assigned to chapters, paragraphs, appendices, etc. After the logical structure is fixed the layout of the book is determined within a further independent production step. By the strict division of the textual selection, logical structure and design, it is easier to use the same book units in different books or for the output in different media. The last production step is the printing of the book. Parallel to being printed, the resulted output could be carried out by another medium, for instance, as an HTML document in the WWW.

Figure 1: Product steps

2.1. Technical View

For a flexible and effective book production, a segmentation into single work steps, as specified in the Generic Book Model, is reasonable. Desirable are widely independent work steps. Because then each result from the different production steps of a book can firstly be acquired in the corresponding process and secondly be reused in other book projects.

For data technical purposes, XML is an ideal basis for a respective separation of work steps. XML offers a clear separation of the logical, textual, and creative view on the single book units. Useful for the technical conversion of this concept are, in addition to the "basis standard" XML, some of the standards accompanying XML. Particularly, the standards XSL/T, XQL, RDF, and Topic Maps inhere a crucial function. XSL/T offers a standardized concept for the transformation and formatting of the media neutral XML documents. With support of XSL/T, data formats like HTML, PDF and others can be generated. XQL, the XML Query Language, is still in the developing process. As soon as it is recommended, however, it may be expected that XML-based XQL processors will provide wide functionalities to combine XML document parts. The importance of RDF and Topic Maps to the modeling of metadata in Generic Book applications will be described closer in the following paragraph.

2.2. Data Modeling

In order to implement the formulated general publication concept technically, a suitable book model is needed first. As described before the simple starting point is a quantity of semantically coherent book units. Within the framework of a specific project these book units can be put together to one new book.

To implement a suitable compilation of an entire book, different characteristics of the book units, such as size, format, author rights, semantics, etc., have to be considered. Generally formulated, besides the actual contents of the book units also essential is the information about the contents, namely metadata. With the aid of suitable metadata the best possible compilation for concrete user and project demands can be found. Certainly, the actual contents of a book unit can provide further important information. However, the assumption is that the metadata are edited sufficiently so that further analyses of the content become redundant.

2.3. Metadata

XML, including the technological environment, offers varied approaches to the implementation of metadata. The current options range from an explicit "inline-markup" to separate metadata descriptions that can have a special encoding like RDF or Topic Maps. Following, the different methods are described briefly.

Inline-markup approach

Metadata, together with the logical structure of a document, can be specified within the DTD of an XML application. There are no explicit rules for the decision whether the metadata are defined as elements or attributes. In practice, both versions are used successfully. The following example shows the use of a keyword as a simple semantic description for a paragraph. In the respective DTD the inline-markup element <keyword> is part of the content-model of the element <paragraph>.


<?xml version ="1.0" ? >
<!DOCTYPE book SYSTEM "book.dtd"> <book> ... <chapter>
<paragraph> This is an example for using <keyword> inline-markup
</keyword> as metadata information.</paragraph> </chapter>
</book> 

Metadata-record approach

Often, metadata are given in a separate part of a document, for instance, the document header. This part can also be saved in an external document. An example from the publishing field is NITF, the http://www.nitf.org/"News Industry Text Format" [NITF 00]. NITF was developed for the markup of news, and it allows the embedding of metadata in the <head> element. The following example from the NITF Website [NITF 00] is part of a "news summary with links to related story and photograph".


<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE nitf SYSTEM "nitf.dtd"> 
 <nitf>
	<head> 
	 <meta name="ap-cycle" content="AP"/> 
	 <meta name="ap-online-code" content="1000"/> 
	 <meta name="ap-routing" content="ENTITLEMENTS,
      pfONLINE,pf1000,pfbrief"/> 
	 <meta name="ap-format" content="bx"/> 
	 <meta name="ap-category" content="a"/> 
	 <meta name="ap-selector" content="-----"/> 
	 <meta name="ap-transref" content="V0607"/> 
	 <docdata> 
		... 
	 </docdata> 
	</head>
 <body> 
	... 
 </body> 
</nitf> 
	

RDF approach

An important method to encode metadata is RDF, the Resource Description Format [RDF 99] http://www.w3.org/RDF/. RDF is an abstract model for the implementation of metadata. Concrete realizations of a metadata area are, among other things, well-formed XML documents. RDF is used by many important standardizing initiatives. Here, for the publishing industry PRISM [PRISM 00] http://www.idealliance.org/prism/and XMLNews [XMLN 99] http://www.xmlnews.org/ need to be mentioned. PRISM, the Publishing Requirements for Industry Standard Metadata initiative, is on the way to "develop an XML metadata vocabulary for the magazine, catalogue, mainstream journal, news, and book industries" which is based upon RDF. XMLNews was first developed as a subset of NITF, however, today it is in same parts a supplement to NITF. This applies in particular to the XMLNews-Meta, an RDF application for describing the content of news.

The following RDF example document from the PRISM website [PRISM 00] uses the PRISM rights vocabulary to describe an image by a freelance photographer. In this case, a "Pop Star Magazine" negotiated the right to resell one of their photos as long as the photo has not been edited.


<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF
 xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
 xmlns:dc="http://purl.org/metadata/dublin_core#"
 xmlns:prism="http://prismstandard.org/1.0#">
 <rdf:Description rdf:about=
	 "http://popstarmag.com/photos/20000807/bubblestempo.jpg">
	<dc:identifier rdf:resource=
	 "http://freelance.com/photos/20000807/13245"/>
	<dc:description>
 Bubbles leaving the Tempo restaurant in Los Angeles.</dc:description>
	<dc:rights rdf:parseType="resource">
	 <rdf:value>
	This image can be reused as long as it is not modified in any way.</rdf:value>
	<prism:copyright>
	Copyright (c) 2000 Pop Star Magazine,inc. All Rights Reserved.</prism:copyright>
	 <prism:isCopyrighted>yes</prism:isCopyrighted>
	 <prism:isForReuse>yes</prism:isForReuse>
	 <prism:providerType rdf:resource=
		 "http://prismstandard.org/1.0/provider.xml#internal"/>	  
	 <!-- Cropping and embedding okay. Editing not okay. -->
	 <prism:right rdf:resource=
		 "http://prismstandard.org/1.0/right.xml#extract"/>
	 <prism:right rdf:resource=
		 "http://prismstandard.org/1.0/right.xml#embed"/>
	</dc:rights>
 </rdf:Description>
</rdf:RDF>

Topic Maps approach

Another option to implement metadata is offered by the Topic Maps standard XTM [XTM 01] http://www.topicmaps.org/. Topic Maps enables different kinds of information objects to be classified and navigated in a consistent manner. Within Topic Maps abstract themes and connections between these themes can be defined. Figuratively spoken, Topic Maps can be seen as a semantic net across the actual contents of information objects. The connection with the contents is implemented via so-called occurrences. According to this, a Topic Map contains content-relevant meta information. The implicit encoding of the semantic metadata within Topic Maps seems reasonable. Also the assignment of external metadata to the actual contents of a book unit can be carried out by Topic Maps. In the following example the meta information "'Günther Grass' is the author of the novel 'Die Blechtrommel'" together with a link to an online review of the novel is represented in a Topic Map extract:


		<topicmap>
   <topic id="sid-03-1author">
    <baseName> 
     <baseNameString>author</baseNameString> 
    </baseName>
   </topic>

   <topic id="sid-03-1book">
    <baseName> 
     <baseNameString>book</baseNameString> 
    </baseName>
   </topic>

   <topic id="sid-03-1written_by">
    <baseName>
	    <scope> 
      <topicRef xlink:href="#author"/> 
     </scope>
     <baseNameString>is author of</baseNameString>
    </baseName>
    <baseName>
	    <scope> 
	     <topicRef xlink:href="#book"/> 
	    </scope>
     <baseNameString>written by</baseNameString>
    </baseName>
   </topic>

  <topic id="sid-03-1review-online">
   <baseName>
	   <scope> 
     <topicRef xlink:href="#book"/> 
    </scope>
    <baseNameString>online review</baseNameString>
   </baseName>
  </topic>

  <topic id="sid-03-1Die_Blechtrommel">
   <instanceOf> 
    <topicRef xlink:href="#book"/> 
   </instanceOf>
   <occurrence>
    <instanceOf> 
     <topicRef xlink:href="#review_online"/> 
    </instanceOf>
    <resourceRef xlink:href="http://www.goethe.de/so/jak/degrass1.htm#g5"/>
   </occurrence>
  </topic>

  <topic id="sid-03-1G&uuml;nther_Grass">
   <instanceOf> 
    <topicRef xlink:href="#author"/> 
   </instanceOf>
  </topic>

  <association>
	  <instanceOf> 
    <topicRef xlink:href="#written-by"/> 
   </instanceOf>
	  <member>
		  <roleSpec> <topicRef xlink:href="#author"/> </roleSpec>
			 <topicRef xlink:href="#G&uuml;nther_Grass"/>
	  </member>
	  <member>
		  <roleSpec> <topicRef xlink:href="#book"/> </roleSpec>
    <topicRef xlink:href="#Blechtrommel"/>
	  </member>
  </association>

 </topicmap>
		 

A Generic Book application can be realized by all presented metadata approaches. Hereby, a crucial intellectual task is the selection of metadata that are suitable for this application. The selection of the metadata concept is certainly also a decision that is made in dependence on available tools. Therefore it is likely that in future an appropriate Topic Maps Engine or a suitable RDF processor will cover the essential "composer functionalities" of a Generic Book application.

Altogether, it is to be expected that with the on-going development of XML software tools, which offer a more and more comfortable support for the several XML standards, the implementation of a general Generic Book application will become an integration of appropriate modules. Just as the integration of different modules into a whole new system will be possible, also a development in the field of complex Content-Management- and Knowledge-Management-Systems will go on. This will result in Generic Book projects that can be implemented as applications of these systems.

The following part of this paper will present a simple prototype of a Generic Book application.

3. STUDY-GUIDE-ON-DEMAND APPLICATION

Since the beginning of the winter term 2000/2001 the Institute pm at the Technical University Chemnitz has been offering a Study-Guide-on-Demand (SGoD) to students and others interested in it which is based on the Generic Book Model. Since then the Institute pm has offered the opportunity to select personalized and individually relevant study information directly via Internet [SGoD 00] http://www.tu-chemnitz.de/pm/sgod.html.

Based on the selected contents, a brochure is produced automatically, printed on-Demand and delivered by mail. The contents of the printed brochure can also be delivered automatically via email as a PDF document. Furthermore, the contents are available as HTML documents on the WWW. These HTML documents are the foundation for the individual selection of information. Via a simple marking mechanism the user can put single parts of the study guide into the so-called print basket. The print basket is similar to warehouse baskets in common online shops, i.e. at the end of the selection process the user gives his email address and possibly his mailing address and completes his order.

3.1. Overview of the Technical System

Figure 2 shows the several components of SGoD systems. As in many publication applications the entire system is divided into the two sectors editing and production. The connection between these sectors is the shared XML database.

Figure 2: SGoD system components

Editing has to fulfill two essential tasks. Firstly, the authors need to retrieve the SGoD contents and put them into the XML database by using appropriate tools. Secondly, an appropriate Content-Management is necessary to manage the different parts of the study guide adequately. A complex Content-Management system is particularly needed for future upgrades of the SGoD. With a growing database and a growing number of authors also the classic document-management characteristics like different versions and check-in / check-out will become more important.

The actual Generic Book functionalities are found in the production sector. This means mainly the entire workflow for the composition of an individual study guide, but also the web-based user-interface as well as the PoD service.

Figure 3 shows the technical architecture of the production system. The system is a typical three-tier-application. The system services are offered by a web server that communicates with the lower data layer via a middleware.

Figure 3: Production system architecture

Web Server

The web server provides classic Internet services. For the user-interface HTML pages including some JavaScripts are offered. An integrated email server is responsible for sending the requested PDF documents to the PoD provider or directly to the user. The communication between the web server and the SGoD middleware is done via the web server's CGI interface.

Middleware

The middleware contains the core functionality of the SGoD application. On the basis of user requests, single information objects are retrieved from the XML data base and composed to a whole document. The "XML composer" module which is responible for this process is a Perl application including a customized XML parser. The resulting XML document is combined with a style sheet (actually a "FrameMaker+SGML EDD") and transferred to the program "FrameMaker+SGML". - Due to a better support of the XML standard family, the XSL support in particular, the XSL compatible formatting module FOP of the "Apache XML Project" is intended for the next SGoD release to be the XML-to-PDF converter module. - FrameMaker+SGML produces a PDF document on-the-fly that is available for printing on-Demand or reading on the user monitor.

At present the layout of the study guide is not flexible. In future versions, however, the layout can be chosen by the interested persons, as far as the PoD processing allows that. Future XSL style sheets therefore have varying fonts, flexible page layouts and page formats. In future, also several media are to be served via XSL style sheets according to the general XML style sheet philosophy. Currently in preparation is a time-table information service via mobile phones. For the transformation of XML documents into mobile phone compatible Wireless Markup Language (WML) documents, it is sufficient to process a corresponding XSLT style sheet by using one of the available XSLT engines.

Data layer

Single files, as well as database contents can be managed within the data layer. The Institute pm is currently testing the usage of "Tamino" and "Oracle 9i" as underlying database. The data to be managed are XML documents, appropriate style sheets for future layout flexibility and some non XML documents, e.g. pictures in gif or jpg format.

3.2. Data Modeling

The SGoD is implemented as an XML application. For the time being, the structure of the SGoD contents is described by a Document Type Definition (DTD). The metadata of SGoD documents are encoded according to the above described "inline-markup" method. Following is a detailed explanation of this data modeling.

Document Type Definition

The functionality and flexibility of an XML application is mainly determined by the underlying Document Type Definition (DTD). In order to define a suitable and general document structure, the contents as well as the possible usage of the documents are to be considered. In the case of the described study guide particularly the structural variation possibilities to assemble an individual study guide are directly dependent on the DTD.

The contents of the study guide are basically text or book oriented, which means that the logical structure of a study guide document can be characterized by typical book structures, such as continuous text organized in paragraphs, sections, chapters, and so on. Correspondingly, the design of the underlying DTD was orientated towards established book oriented industrial DTDs.

For one thing, the "ISO 12083" DTD, which is used in the publishing industry for books, serials, and articles, was considered. Additionally, some concepts were taken from the "DocBook" DTD. Both DTDs were developed for a very general usage and describe the logical structure of a document. Concrete semantic information were not taken into consideration in these DTDs.

To describe the semantic information of each component of the SGoD, some parts of the DTD above were enlarged by SGoD specific elements. Figure 4 shows one extract of the SGoD-DTD. In this extract some of the typically used book-like structure elements can be seen: One "section" consists of an optional "title group" followed by any number of "subsections 1". One "subsection 1" consists again of an optional "title group" followed by any number of "paragraphs", followed by any number of "subsections 2".

Semantic elements are on a higher logical structure level. The elements "course-of-studies" and "course" are rnaged within a book structure on the level chapter or subchapter.

Figure 4: SGoD-DTD extract

4. CONCLUDING REMARKS

The presented SGoD application is a first prototype for Generic Book applications. For the purpose of a simple implementation on the basis of available XML tools, the "inline-markup" approach was chosen for implementation. The implementation of the SGoD could, for the most part, build up on available XML developer tools. As suggested in this paper, it is to be expected that future Generic Book applications will have the potential to be implemented as a standard application within the bounds of knowledge-management-systems. To what extent the presented metadata encoding based on RDF and Topic Maps may be employed is currently subject of the work at the Institute pm, which looks into it from a scientific as well as from a pragmatically technical view.

Bibliography

[HüKr 00] Arved C. Hübler, K. Kreulich. Modules for an XML Schema in the Book-on-Demand Process. Conference Proceedings, XML Europe 2000, 12-16 June 2000, Paris.
[NITF 00] News Industry Text Format (NITF) VERSION 2.5. International Press Telecommunications Council (IPTC), 20 September 2000. http://www.nitf.org/
[PRISM 00] Publishing Requirements for Industry Standard Metadata (PRISM) Specification. Version 1.0 beta B. IDEAlliance, 20 November 2000. http://www.idealliance.org/prism/
[RDF 99] Resource Description Framework (RDF) Model and Syntax Specification. W3C Recommendation, 22 February 1999. http://www.w3.org/RDF/
[SGoD 00] Study-Guide-on-Demand. Institute for Print and Media Technology of Technical University Chemnitz, 20 October 2000. http://www.tu-chemnitz.de/pm/sgod.html
[XMLN 99] XMLNews-Meta Technical Specification. XMLNews.org, 5 April 1999. http://www.xmlnews.org/
[XTM 01] XML Topic Maps (XTM) 1.0, TopicMaps.Org AG Review Specification. TopicMaps.Org XTM Authoring Group, 10 February 2001. http://www.topicmaps.org/

Biography

Klaus Kreulich
Researcher
Technical University Chemnitz
Institute for Print and Media Technology
Chemnitz
Germany
Email: klaus.kreulich@mbv.tu-chemnitz.de

Klaus Kreulich - Klaus Kreulich has been working as researcher at the Institute for Print and Media Technology [pm] at Chemnitz Technical University since 1997. Before joining the pm institute he worked as software developer and project manager within the IT-industry. He has been involved in various research projects dealing with XML and Publishing-on-Demand technologies. Currently he is active on the CUSTOMDP project. He is also working on his Ph.D. thesis concerning XML publishing models.

Michael Reiche
Technical University Chemnitz
Institute for Print and Media Technology
Chemnitz
Germany
Email: michael.reiche@mbv.tu-chemnitz.de

Michael Reiche - Michael Reiche is working as researcher at the Institute for Print and Media Technology [pm] at Chemnitz Technical University. Before, he worked for debis AG and other companies as project manager and SAP consultant. Besides his activities in publishing fields he is involved in launching a XML-based museum project.

Arved C. Hübler
Professor
Technical University Chemnitz
Institute for Print and Media Technology
Chemnitz
Germany
Email: arved.huebler@mbv.tu-chemnitz.de

Prof. Dr. Arved C. Hübler - Prof. Dr. Arved C. Hübler, Director of the Institute for Print and Media Technology [pm] at Chemnitz Technical University, formerly was Technical Director with the Bertelsmann Group in Gütersloh, Germany. At pm Institute, several projects in XML technology relating to BoD workflow, new publishing models, XML document structuring, automated book generation and document digitalisation are in progress. In addition consulting projects in implementing XML-based PoD production in several companies where done. Arved Hübler is Member of the TAGA. He joined several TAGA, IARIGAI, IS&T and other conferences.