|
SGML-oriented Integral Editorial System
|
 |
To keep giving a quick response to the growing demand of the market
as well as to face the new customer requirements (information on-demand),
and of course, to reduce production time and costs, it's necessary to have
all the information stored in a neutral Database. But besides, there is a
need of a system that enables and makes easier the management of that information
in a useful way. This presentation describes the global project we've developed
(and still keep improving) that covers all the processes involved in the editorial
work, from retrieving documents, from any of our sources, to generating a
publication in the desired medium/media.
General overview
CISSPRAXIS
CISSPRAXIS is a Spanish publishing company, part of Wolters Kluwer,
with more than 25 years of experience, leader in providing information services
to professionals and companies about legal, fiscal (tax), labour and mercantile
law, business management, account issues, etc. Our publication media are mainly
loose-leaves, magazines and CD-ROMs, and we have just begun to publish on
Internet.
Actually, CISSPRAXIS is the result of the integration of two companies,
Editorial CISS and PRAXIS, which were part of Wolters Kluwer as well. Both
companies were working on their own project and, since April, those projects
became one. Therefore, all the developed applications should become one too.
On one hand, we had been developing different and complementary modules; on
the other hand, we had been using different tools (Visual Basic - Delphi;
ADEPT Editor - WordPerfect + SGML, ...). Fortunately, the policy for both
companies has been to create open, modular systems and processes. Thus, we
have had few problems in putting them all together. With all this, we'd like
to highlight the importance of building modular applications, using standard
languages, programs, etc.
Why create an Integral Editorial System
Until the creation of the
SE, all the
information contained in our publications was "stored" as "paper" or by our
external providers (specially, typesetting companies), i.e., it was unavailable
at that moment in a reusable digital format (not saved, nor classified, nor
indexed). Therefore, we had to recover, manipulate and correct the same information
once and again, every time we needed it for a different, or even the same,
publication and/or medium. And of course, most of our external providers used
their own "proprietary and closed" systems (typesetting and formatting engines)
or formats, often incompatible with our internal tools. In the end, all this
represented costs in time and money.
To the above, we have to add the rapid evolution of communication technologies
and, therefore, the changes in our costumers' requirements.
Our goals
We've been working on this project for the last 3 years, the main object
being the creation of an Integral Editorial System, oriented not only to the
storage of information, but also to the production of publications on any
media (paper, CD-ROM, Internet).
In general, the company aims were (and still are):
- To ensure that CISSPRAXIS controls all the information contained
in its publications (the whole publishers' portfolio), stored on electronic
neutral format (SGML), independent of software specific solutions and providers
and using open standards.
- To develop tools to create and maintain information on electronic
format, to allow the different BUs to reuse
added value existing in different publications, irrespective of the final
format (paper, CD-ROM, online) of their products and services to date.
- To provide the BUs with the
necessary tools to enable them to develop new services and products "one to
one", the customers precise needs being our main objective.
- To integrate the XML/SGML technology in the current processes of
paper and electronic publication production, taking care over a smooth transition
from the traditional production systems and boosting the maximum automation
of the current processes of paper typesetting.
To achieve all these goals we defined several milestones in our project,
most of which we have already achieved:
On one hand, the creation of an
Editorial Database to store all the information required to make
up our publications (Legislation, Jurisprudence, Authors' Comments and Added
Value related to the publications).
Besides that, we decided to use the SGML
standard, to mark-up the information (documents) we want to store
in the database, so that we can take advantage of the features of this language
(structured information, standard, ...) as well as of the different tools
existing in the market (Arbortext, WordPerfect+SGML, Omnimark, FM+SGML, ...)
which simplify the automation of the subsequent data processing.
In
addition, we had to develop a tool (the
Editorial System
-
SE - ) that would enable the
internal
management of the information stored in the
BDE and would simplify the creation and updating of publications,
replacing the usual "Cut & Paste" by a set of electronic applications
which make the work easier and enable full exploitation of available information.
As a complement to this, we should create a "Workflow System" module to make
the communication easier among system users (including internal staff and
external authors) and to establish a control of tasks and activities.
The scope of the project goes
beyond the
SE, as all this has enabled
us to
automate (as far as possible when not completely) the production
processes, for both graphic and electronic media publications,
which will give us greater independence from external providers and will allow
us to keep down costs and time. To achieve this goal, we've taken advantage
of some SGML tools, like Omnimark for conversion and FM+SGML for our paper
publications. But we are still investigating in this area, seeking the tool
or system that best fits our current and future needs (QuarkXPress, 3B2, ...).
Objective 1.- To create an Editorial Database
(BDE)
The final aim is to have, at any one time, all the information used
in the Company (Legislation, Jurisprudence, Authors' Comments and Added Value)
stored in electronic format, so that it can be used whenever needed. The idea
is to process the information just once and to reuse it in several publications
and media.
To achieve this, we must create a Database with the most suitable structure
and according to our needs.
The structure of our
DB is based
on the different kinds of documents it will store (Legislation, Jurisprudence,
Authors' Comments and Notes, ...), i.e., there are specific tables for each
kind of document and its metadata. Also, it contains all the information related
to links between documents and all the references to documents contained in
the different publications as well.
In addition, we decided to store the information in a standard and structured
way. The chosen standard is SGML, to take full advantage of both the structure
it provides and the existing tools that simplify the automation of the data
processing.
It's important to underline that the "source" documents must be stored
in "neutral" or "pure" state, i.e., identical in its content to the original.
All Added Value, that is, the additional information provided by editorial
work, will be stored such that it will maintain the relationship with the
original documents but without modifying the content as against the original.
On the basis of all this we can clearly differentiate 2 phases or milestones:
- Database creation.
- Feeding the DB, in two ways:
- Converting all the editorial collections information into SGML and
storing it in the DB.
- Feeding new information into the DB
day to day, in order to keep it up to date.
Objective 2.-To create an Editorial System (SE)
Creation of a system to enable the internal management of the information
stored in the
DB
The objective is to create an electronic tool to make the editorial
work easier and to take advantage of all the possibilities provided by having
the information stored on a
DB (BDE).
The developed application, the Editorial System, must enable the interaction
with the
DB, providing tools for:
- document check-in (validation + introduction), queries (based on
different kinds of searches) and extraction;
- input of Added Value information (metadata, links, authors' notes,
voices, ...);
- modification, amending and updating of stored information.
Of particular interest is the tool that enables the introduction
and management of both in force and no longer in force documents,
as this kind of information is highly used in our publications. This tool
gives us the choice of having/retrieving a document in force on
a certain date, within a stipulated period of time or the whole document background.
Apart from this, the Editorial System has different ways to control
the information introduced into the DB, as well as users' access and allowed
actions control, which warrants certain minimum levels of quality of the information
and avoids duplication and loss of documents.
Besides all that, the application contains a specific tool for "Creation
of Table Of Contents"
(TOC) (See section
"Build-up of the Table of Contents (TOC)"
).
An important advantage provided by this system is the way it tracks
changes in documents. This can be used to provide useful information to the
publishers when updating a publication, as the system will be able to warn
them about modifications in documents contained in this publication.
The publisher can then decide whether to apply them or not.
Another advantage is the wide range of possibilities for
searching for documents, by making queries based on the different
metadata already registered. This can also be used in the future as a service
offered to our customers through Internet.
Finally, we would like to emphasize that the idea is
to create
an open, modular system, so that once finished the main part (the
"kernel"), we can append new modules with improvements and integrate other
applications, irrespective of whether they have been internally (e.g., Work-flow
tool- See section
"Work-flow"
) or externally (e.g., the
application for retrieving Legislation documents from the
DB
of La Ley - another
WKE
company) developed.
Creation of an automated system to build-up publications
In the process of generating a product we can distinguish 2 phases:
- Phase 1: definition of the product; here is decided which information
is contained (as a result of this we get what we've called TOC (or TDC in Spanish)).
- Phase 2: physical build-up or production of the publication.
Within the above-mentioned Editorial System, the most significant innovation
is the
tool developed
to create TOCs of the publications,
by
reusing information existing in the DB.
The Editorial System stores and checks the information so that, at any
moment, it can determine which modifications in documents have been made,
concerning with the publication, helping publishers with updating.
All this entails a new way of working for the publishing areas. From
now on, they will actually just maintain and update one big "product", the
BDE, from which any of our publications can be
generated, in any medium. Besides, this system can change the concept of "publication",
as we can now also focus on "information on demand", monographic on a specific
subject, etc. and of particular importance are the multiple and different
possibilities which open up for electronic media and specially for Internet.
Going deeper into each of the phases above-mentioned:
Build-up of the Table of Contents (TOC)
This process is independent from the medium/media in which the publication
will be delivered. There are two possibilities:
- one TOC is enough for the publication
(no dependence on the media)
- one different TOC is needed
for each medium.
The criteria to decide which option is chosen will depend on the content
of the publication in each medium as well as on the updating-periods, etc.
The
TOC is a SGML document, which
contains all the information of the publication, but just "referenced", i.e.,
not physically included. In few words, the
TOC
contains all the information specific of the publication itself, like titles,
main structure (chapters, sections, etc.) and, in the place where the "information"
should appear, there is just a reference to which piece of which document
stored in our
DB must be included. For
that reason, we have assigned an identifier to our documents and to every
element within them. Besides, for every piece of document we'd like to reuse,
we can specify wether we want to reuse its associated images and notes (if
there are, and which ones) or not.

Figure 1
. Example of a Table of Contents of one chapter of the publication
"Procedimiento, Sanciones y Recursos"
With this solution, we can reuse information, we save disk-space and
obtain more tractable publication' documents. In section
"Physical build-up of the publication"
we explain how we fill-in this
TOC to
obtain the complete publication.
An updating can consist of different situations: there's new information
to be added and/or some of the existing information has either changed or
has been eliminated from the publication.
In this aspect, the
SE provides
support to the publishers, by warning them about changes in documents contained
in the publication, so that they can decide whether to apply or to ignore
them (the
SE obtains this information
from the control and checking processes it carries out).
Each time an updating finishes, it is saved as a version of the publication;
this way we can always go back to consult. The updating is done based on the
last version of the
TOC where modified
documents are highlighted. So, the publisher has a "full image" of the publication
every time.
1
Work-flow
The introduction of the SGML/XML technology and the new tools provided
by the
SE imply a change of approach,
and due to this we have had to make a thorough analysis of the repercussions
in order to:
- enable a smooth adaptation to the new technological environment
in the daily tasks of both publication maintenance and electronic and paper
production.
- analyze competitive improvements achieved by the BUs
(production-time, costs, quality) when applying the SGML/XML technology.
In addition to this, using SGML as standard for neutral markup of information
has enabled us to unify the tasks related to documental analysis of source
information (Legislation and Jurisprudence) among the different
BUs
of CISSPRAXIS, speeding up electronic publication updating processes.
The objective of Editorial System Workflow development is to facilitate
the communication between the system users, the transparent-to-user access
to internal and external information repositories and an internal staff activity
control, allowing publishers and authors to manage documents and their added-value
relationships.
Why use intranet/extranet technology?
By using the Intranet/Extranet technology in the current production
process we pursue:
- A greater independence from the final repository of documents, allowing
changes on the "backend" side, in a transparent-to-user way.
- More active involvement of external collaborators in updating and
publication maintenance processes.
- To bridge the "gap" between the technology that our customers can
easily obtain and that internally used by part of the BUs
staff, in order to have personalized access to the information.
Intranet module
It allows a 1:1 information service, i.e., a personalized access to
internal SGML Primary Information database and its added-value (comments,
relationships, links), in terms of specific profiles created and maintained
by each internal collaborator.
In addition, we have developed complementary services related to the
day-to-day work and available to all areas of the company:
- Personalized Agenda
- Product Portfolio and FAQs
- Links to most important Web Sites for each BU.
- Shared folders with links to internal documents and external internet
URLs.

Figure 2
. Intranet application interface
Extranet module
This development provides
BU publishers
with an extranet tool accessible by browser and personalized for each external
author or collaborator. This allows the integration and management of internal
and external workflow tasks related to electronic documents management processes
and product maintenance:
- check-in of referential information associated to SGML primary information.
- added-value inclusion.

Figure 3
. Extranet application interface
Also, this module will allow us to reduce the time needed to update
our products and prepare the current workflow processes to the new updating
demands needed by on-line services.
Other support utilities: a Thematic Index and a thesaurus
Apart from what has already been mentioned in previous sections, we
have planned to create certain support utilities that enable us to take full
advantage of the
SE and the information
stored on the
BDE.
Thematic Index
The idea of a Thematic Index is to group all the information in the
Database by the subject/s the different documents are related to. This can
be very useful for internal use and to provide information on demand on a
certain subject.
First of all, we have to develop an application that allows the creation
of the Thematic Index: a tree structure organized by the subject.
Once finished that tool, the publishing areas will carry out the task
of linking the documents with the related subject/s of the thematic index.
The application to create the Thematic Index is already finished. It
has been tested and it is ready to be used. The main index is established.
The next step is to link the documents with the subjects.
Thesaurus
It's quite related to the previous utility. They are complementary.
This kind of utility enables us to enhance the results of searches based on
a certain word or subject, as it can consider synonyms of that word too.
The main tasks to build it are:
- To define the themes of the documents.
- To choose the set of terms that represents each theme.
- To establish the relationships between the concepts represented
by those terms.
Objective 3.- Publications delivery
There are 2 different phases (both in the first delivery and in the
periodic updating):
- - Phase 1.- Build-up of the TOC
- - Phase 2.- Production.
Physical build-up of the publication
With "Physical build-up" we mean all the process of substituting the
references to the documents (or pieces of them) to be reused, by the documents
(or pieces) themselves. The procedure to be followed now is medium-dependant.
These are the different processes:
Paper:
- replace the references to documents to be reused by the real content
of the documents, keeping the SGML tags (sometimes some manipulation may be
needed to adapt the document to the requirements of FM+SGML or to take real
advantage of its possibilities).
- Partly-automatic type-setting and format rules applying. (Big part
is done automatically by FM+SGML, but some manual work is always needed).
- Generate PDF files to send to the printing company. (A copy of the
PDF is stored on the DB so that we have
the publication at a certain date, whenever we need it).
CD:
there are 2 possibilities:
- An application for automatic generation of the CD (content) by loading
all the information directly from the DB.
- Internal development of computing programs which convert from SGML-tagged TOC to the suitable format required by the CD-load
application. Next step will be the load process itself (whether it's done
by a provider or internally).
Internet:
The procedure will be different depending on the kind of information
to be published:
- on one hand, we have the on-line products, with their own and specific
structure (determined by its TOC) which
have established periods of updating.
- On the other hand, the new work-system enables us to provide the
customer information on demand. In this case we'll have a mirror DB (copy of the BDE)
from which the customer will get the information, depending on his query and
his access-rights.
In both cases, certain data processing is needed, in order to give the
information the right format/appearance. According to the popularity acquired
by XML in Internet, we are in a advantageous position, as the conversion from
SGML to XML is very simple (almost direct).
Publication's updating
The procedure will depend on the delivering medium:
Paper:
This is one of the most critical parts of the project. The automation
of the paging is really complex due to the updating system we use, based on
the minimization of the number of pages to send.
One of the aims within this general objective (updating system) is to
study different possibilities to carry out the updating process, trying to
make it as automatic as possible and, if we find a proper solution, develop
it (internal or external solution and development). Depending on the level
of automation achieved, the time and resources needed for typesetting will
vary.
Whatever the definite system to use will be (manual, partly automated
or completely automated) the steps to be followed are listed below:
- Based on the TOC, we just have
to generate those "units" that contain changes ("Unit" = minimum predefined
part of the publication : chapter, section).
- Those parts will be typeset and compared with the previous version
to minimize the number of pages to send to the print company. (This will be
more or less manual, depending on the definite system).
CD:
All the information contained will be processed again, so that all the
lists, indexes, etc. are completely updated. Therefore, the process is the
same as described for the creation process (See section
"CD:"
).
Internet:
The data we show on the web must be updated as often as possible. This
is possible because we feed our
DB daily.
Regarding on-line products, the process is similar to the one mentioned
when speaking about CD Products.

Figure 4
. Schema of the different publication production processes
Technical information
In this section, we provide brief information about the tools we have
used and why we have chosen them.
Database
We decided to use Oracle 8 because, apart from its efficiency and scalability,
it enables us to store an entire (SGML) document in a binary field up to 4Gb,
which well covers our requirements. And it also allows us to search through
the content of documents.
Editorial System
In the development of this Editorial System, the main tools used are:
-
VB 5.0 (32bits) and S4 (from
I4I) to create the application that allows the document management and maintenance.
The reason to use VB was that we had experience
with this tool and knew it quite well. And we chose S4 because it worked well
with VB and at that moment it was the
only tool we knew of, to develop our SGML system.
- Delphi 5.0 (32 bits) to develop the SGML Primary Information import
process from WKE SGML databases and quality
control check-in of referential information.
- Active Server Pages 2.0 and COM objects to develop the 1:1 information
system Intranet/Extranet and check-in of added-value by authors and external
staff.
- Verity Information Server as indexer for Primary (source) Information
and support tool for added-value introduction.
Data processing
We have to carry out different kinds of data processing.
-
From Whatever-Format to SGML: On one hand,
we have a lot of "source" information in different electronic formats (proprietary
provider-tags, txt files, QuarkXPress files, RTF, ...) we would like to be
able to recover and reuse. This information should be converted to SGML, as
automatically as possible. Therefore, we need a tool to develop conversion
programs to automatize the SGML-markup process, completely or at least to
"minus-SGML" as we call it, which means adding tags until certain level to
reduce manual work.
There is a part of the information we must insert in the DB
which we can't recover in electronic format. In these cases we have define
certain rules to external provider who will type them. From these rules we
can easily convert to SGML.
-
SGML to output-format: once the TOC has been created, we must fill it in with
the real content and convert the SGML document to HTML/XML, RTF, SGML (but
another-DTD compliant), etc., or we must split the document in several ones,
in fields of a DB, etc.
In all these situations we use Omnimark as conversion programming
language, as it covers quite well our requirements. Sometimes we use some
other auxiliary languages, programs, etc. to optimize, as far as possible,
each particular process.
Typesetting
This is one of the less developed parts. At the moment we have only
tested and used FM+SGML. This tool covers only part of our needs. Therefore,
we keep seeking for tools which may be complementary to it and could cover
our extra needs or may substitute it in the future, at least for certain kind
of publications.
Problems, advice, ...
From our experience (we have had to face some problems, constraints,
etc.), there are certain advice we consider useful to anybody who is thinking
of getting involved in a similar project. We had also certain misunderstandings
that confused us at the beginning. In this section, we would like to name
just a few of them.
First of all, regarding
"information" (data):
-
XML or SGML:
- On one hand, it will depend a lot on the kind of information you
are going to manage. In our case, our documents have quite complex structure
and we need to use INCLUSIONs; therefore we keep using SGML. On the other
hand, you should take into account the market tendency. It's obvious that
XML is growing more and more, and the number of tools related to the XML-world
increases day by day. Obviously, if you focus on Web Publications, your decision
is clear.
- Anyway, just say that you shouldn't expect SGML/XML to solve all
your problems on its own. In the end, it is just a standard which tells you
how to structure your information. But don't forget that you need to build
a system around it, to manage this information. On the other hand, you have
a lot of possibilities to do it.
-
Tools:
- Try to build a system as open and modular as possible, to be independent
from market trends, etc.
- To avoid making the mistake of applying the latest available technology
just because it is "the latest" or "the best", it is highly advisable to carry
out a thorough analysis of basic aspects, such as:
- the existence of external providers and reliable software
- the use of stable development environments and tools,
- training the internal staff in the new working tools and methods.
- to what extent and at which rate users update their tools.
- Each company and each situation is different. Therefore, it is very
important to analyze your company's case and define your system requirements,
in order to decide wether to purchase an already existing product (out-of-the-box
or personalized in some ways) or to design your own one. If you choose the
last one, you must decide if you want/can develop it on your own or if you'd
rather externalize the development.
- Regarding existing tools, currently there is a lot of variety (of
different prices). You must analyze your real needs: perhaps there is a cheap
SGML-processor which fits your needs for typesetters. In general, all the
current tools cover the basics. The difference is usually in the way the deal
with certain complex issues or in the more or less friendly interface they
provide. Sometimes purchasing a more expensive one it's worth it, because
it saves a lot of work, time, etc.
Besides, if you decide to develop a system by yourself, you should try
to use tools, programming languages etc. you know well, when possible.
-
Others:
- A good advice would be to create a work-team with people from all
the involved departments. This will provide different points of view, which
is very helpful in the project development. For example, at the beginning
we were a group of 5 people, from publishing and technical areas and worked
together in a Pilot Project to study its viability and to define a real Proposal,
according to our company's needs. Then that team evolved, incorporating new
people, to track the different aspects of the project.
- When preparing the planning for your project and its implementation,
be aware of the transition period needed for people to get used to this new
approach in the way of working.
Conclusions
We have develop a global project that covers all the processes involved
in the editorial work, from retrieving documents form any of our sources,
to generating a publication in the desired medium/media.
To perform it we defined several milestones:
- We should store all the information in electronic format. We decided
to create an Editorial Database.
- To take advantage of the information we were to store, we decided
to use the SGML standard, as it allows to provide structural information which
helps in the data processing for publication delivery.
- We needed an application to manage all that information stored in
the DB; therefor we decided to create
a Content Management System. In our case, we decided to design and develop
our own proprietary system, because none of the existing (at that moment)
Content Management Systems fit well our requirements.
Acknowledgements
I would like to thank all the people who have been, or still are, involved
in the project through its different phases, from the very beginning during
the pilot project till now, after the merger, because all of them have made
it possible, to convert the initial project into a reality.
And special thanks to Salvador Martinez, Manel Montero, Jose Garzó
and Jordi Mulet, for their help in the preparation of this document and all
the conference arrangements.