Modules for an XML Schema in the book-on-demand process
Klaus Kreulich
Arved C. Hübler Prof. Dr.
Find


Abstract
In this paper we discuss an XML Schema design approach for developing Book-on-Demand applications. We consider the Book-on-Demand (BoD) process and discuss an implementation of it as an XML application. Our starting point is an analysis of the BoD process. As evaluation result of today's BoD technology, we propose an up to date BoD workflow model. The presented model enables the user to produce his or her personal book out of a structured information base, which is for instance provided by a publisher website. Within the production workflow, the user is allowed to select parts of relevant documents, to structure them by his own purpose, to determine a layout of his choice and forward the resulting new document to a Print-on-Demand Provider.
We review an XML-based approach for technical implementation of the presented BoD workflow model. With regard to the importance of the PDF format for the publishing and printing industry, we include a short discussion about an alternative approach which is based on an underlying PDF document information base.
The XML approach requires an appropriate XML Document Type Definition (DTD) or in view of more automatic processing an XML Schema. XML Schema provides new opportunities for Book-on-Demand applications, in particular it directly supports subtler content management methods for individual user requirements. The paper illustrates how to use and implement these capabilities.

Keywords

Contents
  1. Introduction
  2. Book-on-demand applications
  3. Book-on-demand production
    1. Production chain
  4. Book-oriented DTDs
  5. Benefits of XML Schema
  6. Conclusion
  7. Bibliography

Introduction
Presently, XML applications are spreading into all fields of electronic data processing. XML, the simplified variant of SGML, is perfectly suited for data exchange formats, as well as for data container formats. As simplified SGML it offers all fundamental advantages of its complex predecessor without losing essential functionality. Especially, the success of XML drives on further improvements and developments of the standards. For the time being, one of the most important renewals is the development of an XML Schema Specification [Schema 00]. XML Schema eliminates substantial shortcomings of the Document Type Definition (DTD) concept. This way the basis for the next generation of automated Web applications is established.
The publishing and printing industries profit from this development, as well. In these industries, XML can be used within the scope of Book-on-Demand (BoD) applications. In typical BoD applications data units are compiled on-the-fly according to a user request and printed as a book. BoDs can be characterized as products of a database production process. Within this process the user is provided with deeply structured data through an Web interface. This is exactly where the power of XML lies in. Therefore, an XML based modeling of BoD applications is self-evident.
Previous Previous Table of Contents
Book-on-demand applications
In parallel to the development of digital printing and the progress in Print-on-Demand (PoD), new concepts regarding the BoD production are continually evolving. The varied applications of these concepts can be divided into different groups according to their objectives:
The current approaches are only a first step towards a publication workflow that is completely brought into line with the user's demand. A consistent extension of the existing BoD procedures has been discussed for several years, see e.g. [Ahon 96], [Kreu 98]. The objective is the inclusion of the user into the editorial process. The user is to get more options to edit the publication copy. Simultaneously, an automation of this production stage is possible. Concrete demands on the next BoD generation, in the following also called "Generic Books", are:
At present there are some BoD approaches which are on the way to Generic Books. An example is the European Union founded project TRIAL-Solution [Trial]. The objective of this project is to develop mathematics textbooks which can be composed on an individual learning level. Another example is the project Study-Guide-on-Demand [SGoD] at Chemnitz Technical University. In this project study information can be put together to an individual booklet and offered via the Internet.
Previous Previous Table of Contents
Book-on-demand production
Production chain
At the production of a conventional book the reader or user stands at the end of a linear process chain. The author compiles book contents in a manuscript while the publisher is in charge of the book's presentation and marketing. In the pressroom the book is printed and, finally, the book is offered at the bookstore.
The process chain for the production of a BoD is more open to the user. The user accompanies the production of the book from selection of the contents over layout to printing. With the help of a future BoD system the users could generate the books by themselves and subsequently enlist the service of a PoD provider. In practice for this, however, the problem of copyright has to be settled.
In respect of the technical sequences the BoD production chain can be subdivided into five fundamental stages:
Irrespective of up to which process stage in BoD data-workflow is operated with XML data, for the use of XML a qualified data modeling is to be developed. The current method for realizing XML data models or XML applications is the definition of Document Type Definition (DTDs).
Previous Previous Table of Contents
Book-oriented DTDs
From the many uses of XML or SGML in the field of automated document management many DTDs have evolved since the passing of the SGML standard in 1986. In the meantime, some of these DTDs have established themselves as official or unofficial standards. Some prominent book oriented DTDs are:
In the development of BoD applications these DTDs can be used as the basis for modeling textual structures. Thereby, however, the universal weaknesses of DTDs have to be considered.
A serious disadvantage of DTDs is that they only offer very inexact data types. At the end of a hierarchic element definition stands a #PCDATA element. This means that the content of the element can consist of any number of symbols. Additionally, the attributes can only weekly be typed in XML. In practice this means that data often cannot be checked, as regards correct contents, by the standard means of an XML parser. For example a typical DTD definition for a date can be:
<!Element date (year,month,day)>
<!Element year (#PCDATA)>
<!Element month (#PCDATA)>
<!Element day (#PCDATA)>
According to this definition for an XML parser the following date is valid:
<date>
<year>this is not a year</year>
<month>04</month>
<day>10</day>
<date>
In an automatic finishing process, especially in the communication between different business partners, such as a PoD provider and a publisher, expendable measures of protection have to be made in order to extract the meant digits. In other words, in the case of DTDs the burden of adding program logic to deal with unspecified data falls on the developer.
Another disadvantage of DTDs is the insufficient support to reuse content models. DTDs offer the mechanism of the parameter entities, that is a comfortable way to use strings repeatedly. However, a logic link between two elements with different names but similar content models is not possible. The above listed DTDs provide some subtle solutions for customizing them in particular application relevant contexts. However, all these solutions are based on entities, wherefore processing programs generally have to be adjusted to the customized elements. With the assistance of the type concept, XML Schemas offer new opportunities.
As indicated above, the completion of a production process and the realization of a business are parts of the BoD workflow. Data concerning the ordering process and the print and postpress workflows are to be taken under consideration. In this context, the DTDs above are to be complemented by suitable DTDs of other usages or newly developed DTDs. For a versatile application, a modularization of the entire BoD DTDs into a content share, a control share, a printing share, a postpress-workflow share etc. would be useful. The existing mechanisms for modularizing DTDs are rudimentary. XML Schemas mean improvements in this area, as well.
Previous Previous Table of Contents
Benefits of XML Schema
To overcome the shortcomings of DTDs the W3C chartered a new Working Group that is concerned with the development of an XML Schema Recommendation. The current working draft contains fundamental approaches to improve a data management with XML. Overall XML Schema simplifies the automatic processing of data and documents. The interface between XML documents and databases is continually simplified and so the handling of dynamic data is supported.
One of the fundamental improvements compared to DTDs is a uniform syntax for document instances and document definitions; XML Schemas are XML Instances of the W3C Schema definition themselves. The processing of XML Schemas and XML Instances by the same software tools is possible. An automated processing of XML Schemas can be done on a much higher level than that of DTDs. In addition, less expenditure for the data management is necessary. The administration of the data as well as the programming expenditure is eased.
An important extension of the existing DTD properties is the opportunity to use strong typed data. XML Schema offers, similar to a modern computer language, an extensive quantity of implicit data types. Beyond it, within XML Schemas deduced or completely redefined types can be declared.
A further important property of XML Schemas is the support of Namespaces [NameSp 99], with which the reuse of entire XML Vocabularies and also individual structure definitions is made easier. The Namespace concept was developed independently from XML Schemas, but in connection with XML Schemas for the first time a workable option for use is possible.
The reuse of Schemas or parts of them is supported by an object oriented approach in type definition. Like the categories of an object oriented computer language can inherit their properties from a basis category, the transition of properties between hierarchic types of XML Schemas is facilitated. The so-called deduced types can have exactly the same properties as their basis types, but they can also be modified by restrictions or extensions.
From the novelties and improvements that XML Schema offers various uses of BoD applications can be deduced. The following examples show several opportunities:
Example 1: Use of data types
Data types can be used in BoD applications in many ways. A typical example for the field of meta data is the ISBN of a book. While within a DTD element specification the ISBN is to be defined as an unspecified sequence of symbols, a Schema arrangement can represent an exact specification:
<element name="isbn">
<simpleType base="string">
<pattern value="\d{1,3}-\d{1,5}-\d{1,5}-(\d{1}|[A-Z])" />
</simpleType>
</element>
According to this definition, an ISBN number is a character string of 1 to 3 digits, followed by a hyphen, followed by 1to 5 digits, followed by a hyphen, followed by 1 to 5 digits, followed by a hyphen, followed by either a digit or a capital letter. A respective and valid part of XML instances would be:
<isbn> 1-456-4897-X </isbn>
Example 2: Use of occurrence attributes
In DTDs the frequency of an element within a structure can be exactly one or can be fixed for one of the possibilities "zero or one", "one or more" or "zero or more", with the help of the occurrence indicators "?,+,*". XML Schemas provide the attributes "minOccurs" and "maxOccurs", with which an exact number of repetitions can be defined. An example for the optimization of BoD applications are details on limitations of the book pagination, such as limiting the number of chapters or limiting the images within one passage etc.
<element name="chapter" minOccurs="3" maxOccurs="10">
The definition signifies that the element "chapter" can occur exactly 3 to 10 times. Such a limitation can be particularly useful in connection with a respective cost model for a BoD.
Example 3: Use of inheritance
A typical inheritance application is the connection between a publisher's series and a particular book of this series. So, a publishing house could define a universal type for academic publications within an XML Schema and could deduce the characteristics of informatics books from that.
Example 4: Use of Namespaces
The Namespace concept enables other communities to quite simply use the Schemas. Besides the reuse of existing Schemas, a modularization of a BoD Schema is supported, as well.
Assuming that in future the communities involved in the BoD process will define appropriate Namespaces and belonging XML Schemas, a universal XML Schema could use all these Schemas directly, with the help of the "import mechanism". A part of a universal BoD Schema, i.e. based on Dublin Core meta data, the Adobe JobTicket format, and the ISO 12083 book model could be as follows:
<schema xmlns="http://www.w3.org/1999/XMLSchema">
...
<import namespace="http://www.purl.org/DC#"
schemaLocation="http://www.purl.org/DC/dc.xsd"
/>
<import namespace="http://www.adobe.com/PJTF"
schemaLocation=""http://www.adobe.com/PJTF/pjtf.xsd"
/>
<import namespace="http://www.iso.org/../iso12083"
schemaLocation="http://www.iso.org/../iso12083/iso12083.xsd
/>
...
</schema>
Previous Previous Table of Contents
Conclusion
Future BoD applications will mean further progress regarding individualization and personalization of books. This progress will affect the structural composition as well as the layout of the products.
The technical realization of BoD can be divided into two production sections. The compilation of the contents on the website of the supplier and the book production by the PoD provider. XML is the predestinated format for the section. Presently, PDF is about to establish itself within the second section and to replace PostScript workflows. Here, XML is (still) secondary.
The use of XML Schema instead of DTDs promises a considerable automation of the processes. However, firstly, XML Schemas have to win recognition as a new standard against the widespread and established DTD-based applications. Despite all described advantages, it remains to be seen how quickly high-performance XML Schema parser and respective tools will be available.
Previous Previous Table of Contents
Bibliography
[Schema 00]World Wide Web Consortium: XML Schema Working Draft, Feb 2000. http://www.w3.org/TR/xmlschema-1/
[Ahon 96]H. Ahonen, B. Heikkinen, O. Heinonen, J. Jaakkola, P. Kilpeläinen, G. Linden, and H. Mannila. Intelligent assembly of structured documents. Report C-1996-40, Department of Computer Science, University of Helsinki, 1996.
[Kreu 98]K. Kreulich: The Generic Book as an Application of Intelligent Information Retrieval Systems, Abstracts of the 22nd Annual Conference of the German Society for Classification, Dresden, March 1998, S.65
[Trial]Interoperability and Intelligent Reuse of Distributed Teaching Materials. The TRIAL-Solution Project. http://www.trial-solution.de/
[SGoD]Study-Guide-on-Demand Project of the Institute for Print- and Media Technology at Chemnitz Technical University, Germany. http://www.pm.tu-chemnitz.de/sf/
[NameSp 99]World Wide Web Consortium: Namespaces in XML Recommendation, Jan 1999. http://www.w3.org/TR/1999/REC-xml-names-19990114/
Previous Previous Table of Contents