XHub: Creating OEB E-Books from XML Documents*
* Talk given at Extreme Markup Languages 2000, Montréal, Canada, Aug. 15-18, 2000.
Background: E-Books and the OEB specification
XHub
Ramifications of XHub
There are many E-Books "out there"; mostly HTML and proprietary formats.
Lots of attention on hardware, software, business aspects
E-Books and especially E-content are not, however, new
In order for E-Books to succeed for a wide audience of readers:
Content creation has to be easy, device independent, and versatile
Content has to serve many purposes
There is lots of potential content out there
HTML, XML, Word Processors, and of course, SGML
Industry Consortium develops interoperable specification for content of E-Books
Representatives from publishers, researchers, hardware, software, e-commerce, telecommunications
OEB 1.0 finished after a years hard labor, in September, 1999
[More on all this on Friday; its a different talk]
Product of consensus building process among E-Book manufactures, publishers, experts
Work easily with current content while adapting to future developments and enhancements
Relies on existing standards and specifications
Uses the following:
XML (must be well formed, may be valid)
HTML 4.0 subset
Dublin Core metadata elements
CSS1.0 subset
DTD for a package file.
Identifies the content, structure and some navigational features of an E-Book
The content of an E-Book may be in either Basic or Extended OEB format.
Basic: The document is structured using subset of HTML4.0 with XML syntax. It must be well-formed.
Extended: The document may use arbitrary XML elements, and must be well formed.
Metadata
Manifest (list of all files)
Spine (list of significant files in reading order)
Tours (different paths through the document)
Guide (List of structural components, e.g. TOC)
What this talk is really about
A system for creating OEB documents from pre-existing structured documents
Initially available on the web for non-commercial use
XHub components may also be licensed for commercial use.
User wants copy of WWP Sermons on a handheld to read on the train
Goes to Xhub site, types in URL of WWP text
Makes OEB Extended publication
Converts to .lit or Peanut Press format
Downloads into Visor
User goes to XHub web site, signs in, starts to convert the Extreme Website
Gets hints about conversions
Selects interesting sections
Makes OEB publication, then .lit file
Downloads into Jornada
Existing documents can be converted in to binary, system- specific E-Books at various points, and by various agencies e.g..
Pre-processed: Binaries ready for downloading
On-demand: User isnt aware of conversion. It happens as part of the download process.
XHub components fit into either process
Its too easy to bypass OEB and go from HTML or WP formats to binary E-Books.
Need an equally easy way to get to OEB
STG disinterested, we dont make hardware or publish documents
We want to simplify the use of OEB, and enable it to be tested and to develop further
Only handles XML or XML- like source document. Relies on XML tools.
We focussed on HTML, worse possible scenario, but widely used.
We also handle the XML-ified TEI-Lite DTD. Less widely used, more correct.
Will ultimately handle other popular DTDs, as well as LateX output via HTML.
User initiates Web Grab or other upload
Check files for valid or WF content on upload. (use Tidy and XML validator)
Fix bad HTML files if Tidy can do it
Store information about users and their jobs
Most work then centers on building a valid package file
Central part of XHub interface
Shows user all uploaded files, then can add, remove files
Can change and re-order spine documents
Can create fallbacks for un-supported media types.
All information about source and OEB files is stored in an XML file
Any user action results in some file activity and a transformation of the XML file using XSLT
Updated information displayed once again
Process ends now with an Basic OEB publication being mailed to the user
Complete XHub will offer a choice of binary formats for further conversion
Have already created valid .lit files
Are waiting for other vendors to support OEB, to incorporate their binaries
Guides and Tours are not yet available
Extended OEB
Lots more DTDs
More sophisticated user interface, based on need
Semantic Heterogeneity, or rather Semantic Rapprochement
Annotation support
Vested interest in further development of XHub specification
Proliferation of DTDs caused problems for work on diverse documents
Need to figure out not only where they diverge, but where they are the same
Conversion to OEB Basic already forces this kind of thinking
What might the ramifications of a richer Basic DTD be?
Easy to do simplistically; hard to do well
Different types of structure based annotation
Need to merge and share annotations
OEB offers a platform for developing sophisticated annotation capability
Annotation is also linking and hypertext
Need for semantic interoperability could lead to the definition or adoption of a more powerful DTD than HTML 4.0
Study of the uses of XHub can show how consistent users are with the fixed HTML tagset
One motivation for Xhub is to aid and abet OEB
Keep STG happy doing interesting work
By doing so, it opens the way for hands on investigation of current markup problems and features
OEB Web Page: http://www.openebook.org
Authors: elli_mylonas@brown.edu, carole_mah@brown.edu