SGML-oriented Integral Editorial System
Beatriz del Aguila Olmos
Find


Abstract
To keep giving a quick response to the growing demand of the market as well as to face the new customer requirements (information on-demand), and of course, to reduce production time and costs, it's necessary to have all the information stored in a neutral Database. But besides, there is a need of a system that enables and makes easier the management of that information in a useful way. This presentation describes the global project we've developed (and still keep improving) that covers all the processes involved in the editorial work, from retrieving documents, from any of our sources, to generating a publication in the desired medium/media.

Keywords

Contents
  1. General overview
    1. CISSPRAXIS
    2. Why create an Integral Editorial System
    3. Our goals
  2. Objective 1.- To create an Editorial Database(BDE)
  3. Objective 2.-To create an Editorial System (SE)
    1. Creation of a system to enable the internal management of the information stored in the DBDatabase
    2. Creation of an automated system to build-up publications
      1. Build-up of the Table of Contents (TOC)
      2. Updating of the TOC
    3. Work-flow
      1. Why use intranet/extranet technology?
      2. Intranet module
      3. Extranet module
    4. Other support utilities: a Thematic Index and a thesaurus
      1. Thematic Index
      2. Thesaurus
  4. Objective 3.- Publications delivery
    1. Physical build-up of the publication
      1. Paper:
      2. CD:
      3. Internet:
    2. Publication's updating
      1. Paper:
      2. CD:
      3. Internet:
  5. Technical information
    1. Database
    2. Editorial System
    3. Data processing
    4. Typesetting
  6. Problems, advice, ...
  7. Conclusions
  8. Acknowledgements

General overview
CISSPRAXIS
CISSPRAXIS is a Spanish publishing company, part of Wolters Kluwer, with more than 25 years of experience, leader in providing information services to professionals and companies about legal, fiscal (tax), labour and mercantile law, business management, account issues, etc. Our publication media are mainly loose-leaves, magazines and CD-ROMs, and we have just begun to publish on Internet.
Actually, CISSPRAXIS is the result of the integration of two companies, Editorial CISS and PRAXIS, which were part of Wolters Kluwer as well. Both companies were working on their own project and, since April, those projects became one. Therefore, all the developed applications should become one too. On one hand, we had been developing different and complementary modules; on the other hand, we had been using different tools (Visual Basic - Delphi; ADEPT Editor - WordPerfect + SGML, ...). Fortunately, the policy for both companies has been to create open, modular systems and processes. Thus, we have had few problems in putting them all together. With all this, we'd like to highlight the importance of building modular applications, using standard languages, programs, etc.
Why create an Integral Editorial System
Until the creation of the SE, all the information contained in our publications was "stored" as "paper" or by our external providers (specially, typesetting companies), i.e., it was unavailable at that moment in a reusable digital format (not saved, nor classified, nor indexed). Therefore, we had to recover, manipulate and correct the same information once and again, every time we needed it for a different, or even the same, publication and/or medium. And of course, most of our external providers used their own "proprietary and closed" systems (typesetting and formatting engines) or formats, often incompatible with our internal tools. In the end, all this represented costs in time and money.
To the above, we have to add the rapid evolution of communication technologies and, therefore, the changes in our costumers' requirements.
Our goals
We've been working on this project for the last 3 years, the main object being the creation of an Integral Editorial System, oriented not only to the storage of information, but also to the production of publications on any media (paper, CD-ROM, Internet).
In general, the company aims were (and still are):
To achieve all these goals we defined several milestones in our project, most of which we have already achieved:
On one hand, the creation of an Editorial Database to store all the information required to make up our publications (Legislation, Jurisprudence, Authors' Comments and Added Value related to the publications).
Besides that, we decided to use the SGML standard, to mark-up the information (documents) we want to store in the database, so that we can take advantage of the features of this language (structured information, standard, ...) as well as of the different tools existing in the market (Arbortext, WordPerfect+SGML, Omnimark, FM+SGML, ...) which simplify the automation of the subsequent data processing.
In addition, we had to develop a tool (the Editorial System - SE - ) that would enable the internal management of the information stored in the BDE and would simplify the creation and updating of publications, replacing the usual "Cut & Paste" by a set of electronic applications which make the work easier and enable full exploitation of available information. As a complement to this, we should create a "Workflow System" module to make the communication easier among system users (including internal staff and external authors) and to establish a control of tasks and activities.
The scope of the project goes beyond the SE, as all this has enabled us to automate (as far as possible when not completely) the production processes, for both graphic and electronic media publications, which will give us greater independence from external providers and will allow us to keep down costs and time. To achieve this goal, we've taken advantage of some SGML tools, like Omnimark for conversion and FM+SGML for our paper publications. But we are still investigating in this area, seeking the tool or system that best fits our current and future needs (QuarkXPress, 3B2, ...).
Previous Previous Table of Contents
Objective 1.- To create an Editorial Database(BDE)
The final aim is to have, at any one time, all the information used in the Company (Legislation, Jurisprudence, Authors' Comments and Added Value) stored in electronic format, so that it can be used whenever needed. The idea is to process the information just once and to reuse it in several publications and media.
To achieve this, we must create a Database with the most suitable structure and according to our needs.
The structure of our DB is based on the different kinds of documents it will store (Legislation, Jurisprudence, Authors' Comments and Notes, ...), i.e., there are specific tables for each kind of document and its metadata. Also, it contains all the information related to links between documents and all the references to documents contained in the different publications as well.
In addition, we decided to store the information in a standard and structured way. The chosen standard is SGML, to take full advantage of both the structure it provides and the existing tools that simplify the automation of the data processing.
It's important to underline that the "source" documents must be stored in "neutral" or "pure" state, i.e., identical in its content to the original. All Added Value, that is, the additional information provided by editorial work, will be stored such that it will maintain the relationship with the original documents but without modifying the content as against the original.
On the basis of all this we can clearly differentiate 2 phases or milestones:
Previous Previous Table of Contents
Objective 2.-To create an Editorial System (SE)
Creation of a system to enable the internal management of the information stored in the DB
The objective is to create an electronic tool to make the editorial work easier and to take advantage of all the possibilities provided by having the information stored on a DB (BDE).
The developed application, the Editorial System, must enable the interaction with the DB, providing tools for:
Of particular interest is the tool that enables the introduction and management of both in force and no longer in force documents, as this kind of information is highly used in our publications. This tool gives us the choice of having/retrieving a document in force on a certain date, within a stipulated period of time or the whole document background.
Apart from this, the Editorial System has different ways to control the information introduced into the DB, as well as users' access and allowed actions control, which warrants certain minimum levels of quality of the information and avoids duplication and loss of documents.
Besides all that, the application contains a specific tool for "Creation of Table Of Contents" (TOC) (See section "Build-up of the Table of Contents (TOC)" ).
An important advantage provided by this system is the way it tracks changes in documents. This can be used to provide useful information to the publishers when updating a publication, as the system will be able to warn them about modifications in documents contained in this publication. The publisher can then decide whether to apply them or not.
Another advantage is the wide range of possibilities for searching for documents, by making queries based on the different metadata already registered. This can also be used in the future as a service offered to our customers through Internet.
Finally, we would like to emphasize that the idea is to create an open, modular system, so that once finished the main part (the "kernel"), we can append new modules with improvements and integrate other applications, irrespective of whether they have been internally (e.g., Work-flow tool- See section "Work-flow" ) or externally (e.g., the application for retrieving Legislation documents from the DB of La Ley - another WKE company) developed.
Creation of an automated system to build-up publications
In the process of generating a product we can distinguish 2 phases:
Within the above-mentioned Editorial System, the most significant innovation is the tool developed to create TOCs of the publications, by reusing information existing in the DB.
The Editorial System stores and checks the information so that, at any moment, it can determine which modifications in documents have been made, concerning with the publication, helping publishers with updating.
All this entails a new way of working for the publishing areas. From now on, they will actually just maintain and update one big "product", the BDE, from which any of our publications can be generated, in any medium. Besides, this system can change the concept of "publication", as we can now also focus on "information on demand", monographic on a specific subject, etc. and of particular importance are the multiple and different possibilities which open up for electronic media and specially for Internet.
Going deeper into each of the phases above-mentioned:
Build-up of the Table of Contents (TOC)
This process is independent from the medium/media in which the publication will be delivered. There are two possibilities:
The criteria to decide which option is chosen will depend on the content of the publication in each medium as well as on the updating-periods, etc.
The TOC is a SGML document, which contains all the information of the publication, but just "referenced", i.e., not physically included. In few words, the TOC contains all the information specific of the publication itself, like titles, main structure (chapters, sections, etc.) and, in the place where the "information" should appear, there is just a reference to which piece of which document stored in our DB must be included. For that reason, we have assigned an identifier to our documents and to every element within them. Besides, for every piece of document we'd like to reuse, we can specify wether we want to reuse its associated images and notes (if there are, and which ones) or not.
Figure 1 . Example of a Table of Contents of one chapter of the publication "Procedimiento, Sanciones y Recursos"
With this solution, we can reuse information, we save disk-space and obtain more tractable publication' documents. In section "Physical build-up of the publication" we explain how we fill-in this TOC to obtain the complete publication.
Updating of the TOC
An updating can consist of different situations: there's new information to be added and/or some of the existing information has either changed or has been eliminated from the publication.
In this aspect, the SE provides support to the publishers, by warning them about changes in documents contained in the publication, so that they can decide whether to apply or to ignore them (the SE obtains this information from the control and checking processes it carries out).
Each time an updating finishes, it is saved as a version of the publication; this way we can always go back to consult. The updating is done based on the last version of the TOC where modified documents are highlighted. So, the publisher has a "full image" of the publication every time.1
Work-flow
The introduction of the SGML/XML technology and the new tools provided by the SE imply a change of approach, and due to this we have had to make a thorough analysis of the repercussions in order to:
In addition to this, using SGML as standard for neutral markup of information has enabled us to unify the tasks related to documental analysis of source information (Legislation and Jurisprudence) among the different BUs of CISSPRAXIS, speeding up electronic publication updating processes.
The objective of Editorial System Workflow development is to facilitate the communication between the system users, the transparent-to-user access to internal and external information repositories and an internal staff activity control, allowing publishers and authors to manage documents and their added-value relationships.
Why use intranet/extranet technology?
By using the Intranet/Extranet technology in the current production process we pursue:
Intranet module
It allows a 1:1 information service, i.e., a personalized access to internal SGML Primary Information database and its added-value (comments, relationships, links), in terms of specific profiles created and maintained by each internal collaborator.
In addition, we have developed complementary services related to the day-to-day work and available to all areas of the company:
Figure 2 . Intranet application interface
Extranet module
This development provides BU publishers with an extranet tool accessible by browser and personalized for each external author or collaborator. This allows the integration and management of internal and external workflow tasks related to electronic documents management processes and product maintenance:
Figure 3 . Extranet application interface
Also, this module will allow us to reduce the time needed to update our products and prepare the current workflow processes to the new updating demands needed by on-line services.
Other support utilities: a Thematic Index and a thesaurus
Apart from what has already been mentioned in previous sections, we have planned to create certain support utilities that enable us to take full advantage of the SE and the information stored on the BDE.
Thematic Index
The idea of a Thematic Index is to group all the information in the Database by the subject/s the different documents are related to. This can be very useful for internal use and to provide information on demand on a certain subject.
First of all, we have to develop an application that allows the creation of the Thematic Index: a tree structure organized by the subject.
Once finished that tool, the publishing areas will carry out the task of linking the documents with the related subject/s of the thematic index.
The application to create the Thematic Index is already finished. It has been tested and it is ready to be used. The main index is established. The next step is to link the documents with the subjects.
Thesaurus
It's quite related to the previous utility. They are complementary. This kind of utility enables us to enhance the results of searches based on a certain word or subject, as it can consider synonyms of that word too.
The main tasks to build it are:
Previous Previous Table of Contents
Objective 3.- Publications delivery
There are 2 different phases (both in the first delivery and in the periodic updating):
Physical build-up of the publication
With "Physical build-up" we mean all the process of substituting the references to the documents (or pieces of them) to be reused, by the documents (or pieces) themselves. The procedure to be followed now is medium-dependant. These are the different processes:
Paper:
CD:
there are 2 possibilities:
Internet:
The procedure will be different depending on the kind of information to be published:
In both cases, certain data processing is needed, in order to give the information the right format/appearance. According to the popularity acquired by XML in Internet, we are in a advantageous position, as the conversion from SGML to XML is very simple (almost direct).
Publication's updating
The procedure will depend on the delivering medium:
Paper:
This is one of the most critical parts of the project. The automation of the paging is really complex due to the updating system we use, based on the minimization of the number of pages to send.
One of the aims within this general objective (updating system) is to study different possibilities to carry out the updating process, trying to make it as automatic as possible and, if we find a proper solution, develop it (internal or external solution and development). Depending on the level of automation achieved, the time and resources needed for typesetting will vary.
Whatever the definite system to use will be (manual, partly automated or completely automated) the steps to be followed are listed below:
CD:
All the information contained will be processed again, so that all the lists, indexes, etc. are completely updated. Therefore, the process is the same as described for the creation process (See section "CD:" ).
Internet:
The data we show on the web must be updated as often as possible. This is possible because we feed our DB daily.
Regarding on-line products, the process is similar to the one mentioned when speaking about CD Products.
Figure 4 . Schema of the different publication production processes
Previous Previous Table of Contents
Technical information
In this section, we provide brief information about the tools we have used and why we have chosen them.
Database
We decided to use Oracle 8 because, apart from its efficiency and scalability, it enables us to store an entire (SGML) document in a binary field up to 4Gb, which well covers our requirements. And it also allows us to search through the content of documents.
Editorial System
In the development of this Editorial System, the main tools used are:
Data processing
We have to carry out different kinds of data processing. In all these situations we use Omnimark as conversion programming language, as it covers quite well our requirements. Sometimes we use some other auxiliary languages, programs, etc. to optimize, as far as possible, each particular process.
Typesetting
This is one of the less developed parts. At the moment we have only tested and used FM+SGML. This tool covers only part of our needs. Therefore, we keep seeking for tools which may be complementary to it and could cover our extra needs or may substitute it in the future, at least for certain kind of publications.
Previous Previous Table of Contents
Problems, advice, ...
From our experience (we have had to face some problems, constraints, etc.), there are certain advice we consider useful to anybody who is thinking of getting involved in a similar project. We had also certain misunderstandings that confused us at the beginning. In this section, we would like to name just a few of them.
First of all, regarding "information" (data):
Previous Previous Table of Contents
Conclusions
We have develop a global project that covers all the processes involved in the editorial work, from retrieving documents form any of our sources, to generating a publication in the desired medium/media.
To perform it we defined several milestones:
Previous Previous Table of Contents
Acknowledgements
I would like to thank all the people who have been, or still are, involved in the project through its different phases, from the very beginning during the pilot project till now, after the merger, because all of them have made it possible, to convert the initial project into a reality.
And special thanks to Salvador Martinez, Manel Montero, Jose Garzó and Jordi Mulet, for their help in the preparation of this document and all the conference arrangements.
Previous Previous Table of Contents