|
Quality management considerations for implementing SGML
|
 |
Quality management principles of customer focus, quality measurement,
and continuous improvement can be used to bridge the gap between the theoretical
promise of SGML as a tool for information owners and the practical compromises
required to accommodate application limitations.
Introduction
SGML was originally developed to meet the requirements of publishers--that
is, people who own information and need to process it in multiple formats
and disseminate it in a variety of media. SGML was NOT developed to meet the
requirements of the developers of the processing systems... nor of the owners
of the distribution media....
In this paper we want to describe what happens when quality management
principles meet markup technology in the area of technical publications. We
will introduce some of the important principles of quality management and
discuss their application to many aspects of technical publications.
Our management goal is to improve the quality of technical publications
at all levels and phases--production, storage, access, and delivery. Ultimately,
the value of any information management system is determined by the degree
to which it enhances or impedes communication among humans. We believe quality
management principles, applied to standardized markup technologies, will help
us meet our goal.
Principles of quality management
In this section we will introduce some of the principles of quality
management and discuss how we have applied them in the area of technical publications.
Communication is the product
At the outset we must be clear about what is the real product (the ultimate
goal) of technical publications. We don't believe that it's just a stack of
manuals, or a CD-ROM, or a website. Those are means to an end. The goal is
communication.
In humans, the task of communicating is dynamic. The elements of sight
and sound, independently or collectively, are what modify behavior as a result
of communication. The stimulus we deliver to, for instance, five people could
result in five different behaviors. In technical publications, Simplified
English for instance, is one quality card in the playing deck. Can SGML implementation
enhance the burden of technical writing with Simplified English? What types
of authoring and behavior changes need to be in place to insure complete,
efficient implementation? Should Simplified English be viewed as a natural
derivative of the SGML and quality culture standpoint or just an additional
feature? Using quality management considerations, the determinate would be
our goal of world wide understanding amongst our customers, not just from
person to person but country to country, across languages.
Standards of communication in our industry include subjects such as
Simplified English, standard abbreviations, standard dictionaries, units of
measure, and military specifications. These are all important but often fail
to address the real standard, that of consistent behavior. The human at the
end of the information assembly line needs a change to take place in the mind.
Technical publications deliverables of the future must consistently and effectively
manipulate the matter between our ears.
Quality product
A quality product or service, in any field of commerce, is characterized
by an abundance of useful features and an absence of defects. In technical
publications, what features and defects are important? Your lists may be different
from ours, but the important thing is to identify the quality characteristics
that determine the success or failure of your endeavor.
Features
- well-organized
- complete
- easy to find
- engaging presentation
- up-to-date
-
etc.
Defects
- incorrect data
- unintelligible content
- incorrect references
- out-of-date material
-
etc.
Quality culture
Humans are at the beginning and end of all communication processes.
It stands to reason that better quality products would come out of organizations
that have better "quality cultures". A quality culture is characterized by
specific attitudes, abilities, and behavior of each member of the organization.
Quality culture attitudes and abilities can be summarized (and measured)
as the degree of self-control employees have in their work. Specifically,
there are three requirements for employee self-control:
- 1. Know what the quality goals are.
- 2. Be able to recognize quality shortcomings.
- 3. Have the means and authority to correct defects.
We believe that SGML can enhance the quality culture of an
organization by enabling and empowering employees to monitor and correct many
more dimensions of data quality.
J. M. Juran is one of the pioneers of quality management. In
Quality
planning and analysis [Juran 1993], he says that to
be superior in quality we must pursue two courses of action: 1) develop technologies
to create products and processes that meet customers' needs; and 2) stimulate
a culture of quality throughout the organization--one that continually views
quality as the primary goal. Juran goes further to say "technology touches
the head, but a quality culture touches the heart". Our goal is to develop
pride and ownership of mind-altering communication. SGML promises to unlock
what we call "the shackles of intense development time" and allow our imaginations
the time to create and innovate. We feel that once dedication to the process
becomes habit, confidence in our abilities will follow, and with that, policies
designed to constantly listen and understand customers will deliver us to
the top marketplace of the future.
Another aspect of quality culture is the personnel requirements, which
we feel are critical. The technical publications employee profile itself is
changing. The capacity for abstract thought and reasoning coupled with technical
imagination are significant human features we need to provide all our data
recipients both internal and external.
Quality process
Quality management encompasses three distinct processes: quality planning,
quality control, and quality improvement. Any of several different methods
can be used to actually implement "quality management". Each of them favors
a certain sequence of steps, uses slightly different terminology, and employs
different tools. What they have in common, though, is more important than
the differences. The common elements include: focusing on customers and their
needs; identifying and designing products with features that meet those needs;
and always measuring performance to enable continuous improvement of products
and processes.
Needs
As we have mentioned, our customers have specific needs related to their
goal of getting information for the purpose of maintaining or operating their
airplanes. The technical publications organization itself also has needs.
What are the needs of a quality process? They may be a set of tools such as
authoring guidelines which are ISO 9000 compliant, application skills acquired
through training, or a document instance. These needs originate within our
department. Customer needs, on the other hand, are best defined through marketing
surveys, technical committee meetings with operators, and personal contact
through our support groups. External and internal needs must be carefully
analyzed and prioritized to allow effective allocation of limited resources
toward the goal of complete customer satisfaction.
Features
Features, such as delivery buttons, hotspot links, printing, and search
routines are a few of the vehicles we use to meet customer needs. To provide
our customers with information, we develop these features to deliver information
in a user-friendly, efficient form. In our organization, we are still formulating
our “features” list. The capabilities of SGML within Tech Pubs
have certainly not developed completely. We still don’t know or fully
understand all of the possibilities. We do, however, have high hopes of surpassing
all customer needs through feature development, conscientious planning, and
data ownership.
Measurement
According to Juran, things that get measured are things that will be
prioritized and accomplished. In our business we anticipate SGML could allow
quality measurements to be automated. This could be represented, for example,
by dividing the man-hours expended by the number of revised figures, paragraphs
or words to develop some type of efficiency scale. Revision start and stop
times are a few manpower based ideas. Hits on a web page, or a deeper analysis
of key strokes or mouse clicks are future possibilities. Our hope is that
SGML implementation will provide these management tools to serve as building
blocks for information reuse, repository development, and technical manual
customization. A rough count of customer comments presently represents one
of our measurement tools. SGML will, we predict, enable a much more scientific
measurement of perceived or real defects in the future.
Data quality, SGML, and technical publications
Note:
A word of clarification: In this paper when we say "data" we generally
mean some collection of detectable differences in a physical medium (e.g.,
bits or bytes in a computer storage device), as well as the patterns or organizing
schemes by which we make use of those detectable differences. We don't pretend
to say what "information" is, except that it seems to happen behind the eyeballs
and between the ears. We aren't overly precise in our usage, however, and
may sometimes talk about "finding information", as if it's something available
out in the physical world; and we may also say things like "the data tells
us", as if the data has some inherent meaning apart from the conventions we
assign to it. There are, of course, fields of application where the distinction
would have to be much more finely drawn, but we ask the reader to grant that
this is not one of them.
Data is not the product
We don't believe that data is the final product of technical publications.
Data is the means to the end. The goal is communication, which only happens
when someone's mind is changed or improved. The direct agent of communication
in our business is a "presentation instance"--that is, a formatted, printed
or displayed instance of a document.
Twenty years ago our product was a stack of manuals, and our "data"
was our product--that is, the data and the presentation instances were one
and the same. As technology has evolved, we have taken advantage of opportunities
to separate the data from the presentation instances. While we have become
comfortable with this paradigm shift on the production side of technical publications,
there is still a predominate tendency to speak of "delivering data" to the
end users. Actually, what we deliver to users is a series of presentation
instances derived from the data. In many cases, we also deliver (or require)
a specific environment for viewing the presentation instances.
Many factors determine the quality and effectiveness of the presentation
instances that we deliver. (Refer to
Figure 1.) This paper
is based on the assumption that there is a very close correlation between
data quality and the quality of information delivery. Many people might think
this is too obvious even to bother noting the correlation. But what, exactly,
is the nature of this correlation? High quality data may be a necessary, but
not a sufficient, cause of good information delivery. Even this connection
is tenuous, because there are many ways that poor quality data can be dressed
up and used, especially when expectations or requirements are low. Conversely,
you can have exquisitely formed data that fails to meet users' information
needs.

Figure 1
. Presentation instance quality cause-and-effect
diagram
We take the position that data quality is an important contributing
factor to the success of technical publications delivery, and that SGML can
be used to enhance the quality of technical publications data. We definitely
are not saying that high quality data (in SGML or any other notation) will
ensure successful delivery of technical publications. There are opportunities
for further research in this area.
In the following sections we will discuss some specific approaches to
measuring data quality, and how SGML can be used to improve data quality in
the area of technical publications, as well as improve the processes of producing
data.
Dimensions of data quality
Thomas C. Redman, in
Data quality for the information age [Redman 1996], discusses several "dimensions" of data quality that
apply to three distinct aspects of data processing. His analysis provides
a starting point for applying quality measurements to SGML data. Although
Redman focuses on record-based relational data structures, many of the quality
dimensions are applicable to data structures that are used to represent documents.
We'll review Redman's dimensions of data quality, and discuss how they can
be used to measure and improve the quality of SGML data.
Dimensions of data quality (from Data quality for the
information age [Redman 1996])
Note:
The dimensions apply to various aspects of data processing, grouped
under "Conceptual view", "Values", and "Representation". As applied to SGML,
these aspects correspond roughly to DTDs, document instances, and notation
of storage objects. We have used Redman's original terms for the dimensions,
but have supplied our own definitions to suggest how they might be applied
to SGML data.
- 1. Conceptual view (DTDs)
- 1. Content
|
|
|
| relevance | The degree to which a component is
related to the primary needs defined for the scope of application of the document
instances.
|
| obtainability | The ease with which content values
can be obtained for inclusion in a document instance.
|
| clarity of definition | How well is the component
defined?
|
- 2. Scope
|
|
|
| comprehensiveness | The proportion of defined
needs met by this view.
|
| essentialness | In the strictest sense, components
are either essential or irrelevant to the application needs. In reality, there
may be a scale of "essentialness", perhaps correlated to the seriousness of
the consequences of omitting a particular component.
|
- 3. Level of detail
|
|
|
| granularity | The fineness of component definition.
(E.g. how many elements does it take to describe a postal address?)
|
| precision of domains | The precision of specification
supported by the view.
|
- 4. Composition
|
|
|
| naturalness | The degree to which each component
corresponds to a recognizable idea or thing within the scope of application.
|
| identifiability | The ease of identifying components
and distinguishing among components.
|
| homogeneity | Similarity of like structures.
|
| minimum redundancy | The degree to which the
view requires or allows data redundancy.
|
- 5. View consistency
|
|
|
| semantic consistency | The degree to which similar
relationships are expressed by similar markup constructs.
|
| structural consistency | The degree to which
similar components are similarly structured.
|
- 6. Reaction to change
|
|
|
| robustness | The number and type of application
extensions which this view can accommodate.
|
| flexibility | The ease with which the view itself
can be changed.
|
- 2. Values (data content, document instances)
|
|
|
| accuracy | How closely the element content or
attribute values correspond to the "correct" values.
|
| completeness | Proportion of the "required" content
actually included in the document instance.
|
| consistency | Constraints on a collection of
values (content) that limit the range of acceptable values.
|
| currency/cycle time | Is the content currently
applicable?
|
- 3. Representation (notation, storage)
- 1. Formats
|
|
|
| appropriateness | How well suited is the format
to the functional characteristics of the working environment?
|
| format precision | Can the notation represent
all data content without ambiguity?
|
| efficient storage | Compactness, compressibility.
|
| interpretability | The set of rules for interpreting
data in a specified notation.
|
| format flexibility | Adaptability to different
user needs, applications, and recording media.
|
| portability | How easily can the data be used
in different computing environments?
|
| ability to represent null values | The difference
between "no content" and "missing content".
|
- 2. Physical instances
|
|
|
| representation consistency | Degree to which
all instances are represented in similar notation.
|
As mentioned previously, these dimensions are derived primarily from
data structures of a relational database model. We have not attempted to extend
this list by adding dimensions that apply specifically to document data structures.
But there appear to be some promising opportunities in this direction. Document
instances will have a greater range of quality dimensions in the "Values"
area, since they are composed of semantically complex structures instead of
simple typed data fields. Considerations such as simplified language, spelling,
and composition would apply. Linking and addressing, which are such challenging
areas of document management, should also have some applicable dimensions
of data quality.
A fundamental principle of quality management is that you can't control
what you don't measure. Only by identifying specific dimensions of data quality
can you begin the process of measuring data quality, comparing actual quality
with intended quality, and analyzing the process of creating the data to improve
the overall data quality. There is no single measurement or set of measurements
that is appropriate for all applications. The important thing is to select some appropriate measurements along one or more identifiable
dimensions of data quality.
The remaining subsections will discuss some ways in which these dimensions
of data quality can be applied in technical publications to the areas of DTD
development, authoring, and processing and presentation of SGML documents.
DTD development
DTDs correspond to the "conceptual view" aspect of data quality, as
discussed by Redman. The importance of good DTDs to a successful SGML implementation
has long been recognized. But what is a "good" DTD? And what is the relationship
between the "goodness" of a DTD, and the effectiveness of information delivery?
As shown in
Figure 1, DTDs are not a direct factor affecting
the final presentation instance. It is often possible to create bad document
instances from good DTDs, and good instances from bad DTDs.
Furthermore, as markup theory has evolved it has recognized that, ultimately,
the structure of the document instance is more important than the schema(s)
to which it conforms. This concept is embodied in HyTime's notion of "enabling
architectures", and, less robustly, in XML's notion of "well-formed" documents.
Nevertheless, as a practical matter, you want DTDs; and, if you aim
to produce quality document instances, quality DTDs can help. Notice the chain
of cause-and-effect assumptions here (refer to
Figure 1):
the end user's experience depends (in part) on a quality presentation instance;
the quality of the presentation instance depends (in part) on the quality
of the document instance; the quality of the document instance depends (in
part) on the quality of the governing DTD. The level of effort you devote
to DTD development should be determined by the degree to which your final
product is affected by the DTDs. We do not know of any general rule for determining
this--many other parts of the process can compensate for a poor DTD, or dilute
the benefits of a good DTD. For example, enforcing specific writing procedures
can assure the production of good document instances even if a DTD allows
nonsensical structures. On the other hand, limitations of your processing
applications can obviate the benefits of clever markup constructs.
These caveats aside, we still want to be able to determine whether our
DTDs are "good", and if not, how to make them better. If we are just starting
out with SGML, we want to know whether an off-the-shelf DTD will work, or
if we should buy one from a consultant, or make one ourselves. DTD development
is a challenging effort, even in the best of circumstances. Fortunately, there
is excellent practical advice available in books such as
Developing SGML
DTDs [Maler 1996] and
Structuring XML documents [Megginson 1998]. These books describe how to create
a high-quality DTD, and we know, based on our own experience, that these processes
can indeed help you produce good DTDs.
But if we want to apply quality management principles in this area,
we must identify the specific features that contribute to the quality of the
product (i.e., the DTD), and we must be able to measure, directly or indirectly,
the "performance" of those features. Consider the following statements that
explain why a DTD is "good":
This DTD is good because it was created using a proven DTD development
process.
This DTD is good because it allows the creation of quality document
instances, and enables efficient document processing that meets all our application
needs.
This DTD is good because all markup constructs exhibit semantic
and structural consistency; all elements satisfy our
design goals of naturalness and identifiability;
all components are relevant to our document content
needs, have obtainable values, and are clearly
defined.
(In the last statement, the terms in bold face correspond to data quality
dimensions of the conceptual view.
"Dimensions of data quality"
.)
The first statement implies that the right process will always produce
a good DTD. The second implies that the quality of a DTD is determined by
seeing it in operation. Only the third statement recognizes explicitly that
the quality of a DTD depends on specific identifiable, measurable features
of the DTD itself. In practice, if you have created a DTD by following Maler's
guide
[Maler 1996], you will almost certainly be able to
make a statement like the third about your DTD. But if you need to decide
whether to adopt an industry-standard DTD, or must maintain a hand-me-down
DTD, it helps to have a list of specific, measurable features and feature
goals.
In our department, we've tried two different approaches to developing
DTDs. First we made some minor customizations to a couple of industry standard
DTDs. These DTDs currently govern the bulk of our SGML document instances.
Although we have been able to meet our production commitments while migrating
our data and processes to these DTDs, we recognize that there are several
deficiencies. We have been able to work around these deficiencies with creative
data processing. Since we do not have any requirements to deliver data conforming
to these industry-standard DTDs, we will probably evolve away from them, toward
more specifically customized DTDs.
We have also created one DTD from the ground up, following the procedure
outlined in
Developing SGML DTDs [Maler 1996].
This effort was more time-consuming than the minimal customizations we did
on the industry-standard DTDs. But we were far more satisfied with the result,
and believe we have the basis of a robust DTD to govern the creation of another
large portion of our document instances.
Authoring
It is during authoring, perhaps more than any other phase of document
production, that SGML enables dramatic quality improvements. We take it for
granted that competent writers and editors can organize words, paragraphs,
and pictures to make good-looking, understandable documents. When all is said
and done, the final presentation instance, in paper or on screen, shouldn't look any different whether it was encoded in SGML
or Elbonian. So, why is SGML better for creating quality documents than some
other format?
It should not be news to anyone that writers, when authoring in SGML,
will concentrate primarily on the data content and the structural relationships
amongst document components, instead of on "how it looks". Dimensions of data
content ("Values") quality include accuracy, completeness, consistency,
and currency (or timeliness). It stands
to reason that, by focusing writers' attention on these values, the quality
should improve.
In our operation, we have also improved the quality culture by deploying
SGML editing applications. (Recall that the three preconditions of a quality
culture are employee awareness of quality goals, employee ability to detect
quality defects, and employee ability to correct quality defects.) Our previous
structured editor did not provide the means to detect many of the quality
defects that only showed up during batch processing of the data for CD-ROM
preparation. Even the most careful writers with the best intentions could
not possibly find and correct errors on their own, using the tools at their
disposal. (Many such errors involved cross-reference pointers to other documents.)
As a result, we had to rely on a feedback loop of returning error lists to
the writers, waiting for the writers to correct the documents, then reprocessing
the data, returning a list of remaining errors, etc. Not only was this time-consuming,
it was annoying to the writers who, to the best of their knowledge, thought
they were done working on the documents. Now we can enable the writers to
check for many types of errors within their authoring environment, which has
reduced (but still hasn't eliminated) the error-list feedback loop.
We would be thought naive if we didn't comment on the supposed resistance
of authors to working in an SGML environment. In our department, we had completely
different reactions from two groups of writers. The group that had already
been using a structured editor did not have much difficulty, for two main
reasons. First, they were already familiar with the concept of tagging document
components. Secondly, our previous editing application suffered from terribly
slow performance, so most writers were pleased with the responsiveness of
a native SGML editing application. The group that migrated from a traditional
word processing/desktop publishing application experienced more difficulty,
for a variety of reasons, including unfamiliarity with tagging concepts, lack
of control over formatting, and insufficient training.
Any organization that faces this problem should try to emphasize the
data quality improvements that are possible with SGML authoring. But many
important aspects of SGML data quality are somewhat abstract, and therefore
less apparent to writers than other, traditional indicators of document quality.
If quality principles are already well-understood and supported in an organization,
then SGML should be an "easy sell" to authors.
Processing and presentation
During processing and presentation of SGML data, Redman's "Representation"
dimensions of data quality come into play. As anyone who has worked with computers
for very long knows, you can almost always program a solution (or buy an application)
for any particular data processing need. The main question is not whether you can process the data, but how reliably
and efficiently you can process it, what knowledge and skill sets are required
of developers, and how many different processing paradigms you want to support.
These are all functions of the "Representation" dimensions of data quality,
particularly of interpretability, portability, flexibility,
and representation consistency. We do
not need to argue in this forum that SGML data outscores all other notations
on these scales.
We will, however, point out that these very features of SGML present
other opportunities for applying quality management principles to improve
processes and products. The portability and flexibility of SGML allow you
to implement modular solutions to different operations in the technical publications
production process. For example, your editing application and your storage
system can be chosen primarily based on how well they meet the needs in each
respective area. In our operation, we began with file-system storage of text
and graphic entities. While the security and versioning capabilities provided
by a database storage system are attractive, we concluded that these were
not essential to support initial migration and production.
As with any process or product controlled by quality management principles,
you will look at actual customer needs, then evaluate the features of proposed
products as they pertain to those needs, and implement a process, or obtain
the product, that best meets your needs. Largely because of its representational
data quality, SGML gives you the freedom to apply quality planning principles
precisely and appropriately to each segment of your production process.
Conclusions
Through a deliberate and well executed endeavor, our technical publications
department will be an example of a complete customer focus group for others
to emulate. The processes we've outlined in this paper illustrate one approach
to bridging the gap between theory and application of SGML. We have shown
how quality management principles and concepts can be applied to the problems
of SGML implementation in the area of technical publications. Quality considerations
affect many levels and aspects of this operation--data, processes, and culture.
Regardless of the specific process or techniques used for quality management,
the goals are always the same: improve product features and reduce product
defects.
Acknowledgements
The authors would like to thank Jim Laney, Director of Engineering Services
and Product Safety at Cessna Aircraft Company, for his long-standing support
of quality management principles and markup technologies in Cessna Technical
Publications.
Bibliography
| [Juran 1993] | Juran, J. M., and Frank M. Gryna. 1993. Quality planning and analysis: from product development through
use. 3rd ed. New York: McGraw-Hill |
| [Maler 1996] | Maler, Eve, and Jeanne El Andaloussi. 1996. Developing SGML DTDs: from text to model to markup.
New Jersey: Prentice Hall PTR. |
| [Megginson 1998] | Megginson, David. 1998. Structuring
XML documents. Prentice Hall |
| [Redman 1996] | Redman, Thomas C. 1996. Data
quality for the information age. Boston: Artech House |