Quality management considerations for implementing SGML
Paul Tyson
Dan McPartland
Find


Abstract
Quality management principles of customer focus, quality measurement, and continuous improvement can be used to bridge the gap between the theoretical promise of SGML as a tool for information owners and the practical compromises required to accommodate application limitations.

Keywords

Contents
  1. Introduction
  2. Principles of quality management
    1. Communication is the product
    2. Quality product
    3. Quality culture
    4. Quality process
      1. Needs
      2. Features
      3. Measurement
  3. Data quality, SGML, and technical publications
    1. Data is not the product
    2. Dimensions of data quality
    3. DTD development
    4. Authoring
    5. Processing and presentation
  4. Conclusions
  5. Acknowledgements
  6. Bibliography

Introduction
SGML was originally developed to meet the requirements of publishers--that is, people who own information and need to process it in multiple formats and disseminate it in a variety of media. SGML was NOT developed to meet the requirements of the developers of the processing systems... nor of the owners of the distribution media....
Charles F. Goldfarb, "Entity Management in SGML", 1993, http://www.oasis-open.org/cover/goldenti.html
In this paper we want to describe what happens when quality management principles meet markup technology in the area of technical publications. We will introduce some of the important principles of quality management and discuss their application to many aspects of technical publications.
Our management goal is to improve the quality of technical publications at all levels and phases--production, storage, access, and delivery. Ultimately, the value of any information management system is determined by the degree to which it enhances or impedes communication among humans. We believe quality management principles, applied to standardized markup technologies, will help us meet our goal.
Previous Previous Table of Contents
Principles of quality management
In this section we will introduce some of the principles of quality management and discuss how we have applied them in the area of technical publications.
Communication is the product
At the outset we must be clear about what is the real product (the ultimate goal) of technical publications. We don't believe that it's just a stack of manuals, or a CD-ROM, or a website. Those are means to an end. The goal is communication.
In humans, the task of communicating is dynamic. The elements of sight and sound, independently or collectively, are what modify behavior as a result of communication. The stimulus we deliver to, for instance, five people could result in five different behaviors. In technical publications, Simplified English for instance, is one quality card in the playing deck. Can SGML implementation enhance the burden of technical writing with Simplified English? What types of authoring and behavior changes need to be in place to insure complete, efficient implementation? Should Simplified English be viewed as a natural derivative of the SGML and quality culture standpoint or just an additional feature? Using quality management considerations, the determinate would be our goal of world wide understanding amongst our customers, not just from person to person but country to country, across languages.
Standards of communication in our industry include subjects such as Simplified English, standard abbreviations, standard dictionaries, units of measure, and military specifications. These are all important but often fail to address the real standard, that of consistent behavior. The human at the end of the information assembly line needs a change to take place in the mind. Technical publications deliverables of the future must consistently and effectively manipulate the matter between our ears.
Quality product
A quality product or service, in any field of commerce, is characterized by an abundance of useful features and an absence of defects. In technical publications, what features and defects are important? Your lists may be different from ours, but the important thing is to identify the quality characteristics that determine the success or failure of your endeavor.
Features

Defects

Quality culture
Humans are at the beginning and end of all communication processes. It stands to reason that better quality products would come out of organizations that have better "quality cultures". A quality culture is characterized by specific attitudes, abilities, and behavior of each member of the organization.
Quality culture attitudes and abilities can be summarized (and measured) as the degree of self-control employees have in their work. Specifically, there are three requirements for employee self-control: We believe that SGML can enhance the quality culture of an organization by enabling and empowering employees to monitor and correct many more dimensions of data quality.
J. M. Juran is one of the pioneers of quality management. In Quality planning and analysis [Juran 1993], he says that to be superior in quality we must pursue two courses of action: 1) develop technologies to create products and processes that meet customers' needs; and 2) stimulate a culture of quality throughout the organization--one that continually views quality as the primary goal. Juran goes further to say "technology touches the head, but a quality culture touches the heart". Our goal is to develop pride and ownership of mind-altering communication. SGML promises to unlock what we call "the shackles of intense development time" and allow our imaginations the time to create and innovate. We feel that once dedication to the process becomes habit, confidence in our abilities will follow, and with that, policies designed to constantly listen and understand customers will deliver us to the top marketplace of the future.
Another aspect of quality culture is the personnel requirements, which we feel are critical. The technical publications employee profile itself is changing. The capacity for abstract thought and reasoning coupled with technical imagination are significant human features we need to provide all our data recipients both internal and external.
Quality process
Quality management encompasses three distinct processes: quality planning, quality control, and quality improvement. Any of several different methods can be used to actually implement "quality management". Each of them favors a certain sequence of steps, uses slightly different terminology, and employs different tools. What they have in common, though, is more important than the differences. The common elements include: focusing on customers and their needs; identifying and designing products with features that meet those needs; and always measuring performance to enable continuous improvement of products and processes.
Needs
As we have mentioned, our customers have specific needs related to their goal of getting information for the purpose of maintaining or operating their airplanes. The technical publications organization itself also has needs. What are the needs of a quality process? They may be a set of tools such as authoring guidelines which are ISO 9000 compliant, application skills acquired through training, or a document instance. These needs originate within our department. Customer needs, on the other hand, are best defined through marketing surveys, technical committee meetings with operators, and personal contact through our support groups. External and internal needs must be carefully analyzed and prioritized to allow effective allocation of limited resources toward the goal of complete customer satisfaction.
Features
Features, such as delivery buttons, hotspot links, printing, and search routines are a few of the vehicles we use to meet customer needs. To provide our customers with information, we develop these features to deliver information in a user-friendly, efficient form. In our organization, we are still formulating our “features” list. The capabilities of SGML within Tech Pubs have certainly not developed completely. We still don’t know or fully understand all of the possibilities. We do, however, have high hopes of surpassing all customer needs through feature development, conscientious planning, and data ownership.
Measurement
According to Juran, things that get measured are things that will be prioritized and accomplished. In our business we anticipate SGML could allow quality measurements to be automated. This could be represented, for example, by dividing the man-hours expended by the number of revised figures, paragraphs or words to develop some type of efficiency scale. Revision start and stop times are a few manpower based ideas. Hits on a web page, or a deeper analysis of key strokes or mouse clicks are future possibilities. Our hope is that SGML implementation will provide these management tools to serve as building blocks for information reuse, repository development, and technical manual customization. A rough count of customer comments presently represents one of our measurement tools. SGML will, we predict, enable a much more scientific measurement of perceived or real defects in the future.
Previous Previous Table of Contents
Data quality, SGML, and technical publications
Note:
A word of clarification: In this paper when we say "data" we generally mean some collection of detectable differences in a physical medium (e.g., bits or bytes in a computer storage device), as well as the patterns or organizing schemes by which we make use of those detectable differences. We don't pretend to say what "information" is, except that it seems to happen behind the eyeballs and between the ears. We aren't overly precise in our usage, however, and may sometimes talk about "finding information", as if it's something available out in the physical world; and we may also say things like "the data tells us", as if the data has some inherent meaning apart from the conventions we assign to it. There are, of course, fields of application where the distinction would have to be much more finely drawn, but we ask the reader to grant that this is not one of them.
Data is not the product
We don't believe that data is the final product of technical publications. Data is the means to the end. The goal is communication, which only happens when someone's mind is changed or improved. The direct agent of communication in our business is a "presentation instance"--that is, a formatted, printed or displayed instance of a document.
Twenty years ago our product was a stack of manuals, and our "data" was our product--that is, the data and the presentation instances were one and the same. As technology has evolved, we have taken advantage of opportunities to separate the data from the presentation instances. While we have become comfortable with this paradigm shift on the production side of technical publications, there is still a predominate tendency to speak of "delivering data" to the end users. Actually, what we deliver to users is a series of presentation instances derived from the data. In many cases, we also deliver (or require) a specific environment for viewing the presentation instances.
Many factors determine the quality and effectiveness of the presentation instances that we deliver. (Refer to Figure 1.) This paper is based on the assumption that there is a very close correlation between data quality and the quality of information delivery. Many people might think this is too obvious even to bother noting the correlation. But what, exactly, is the nature of this correlation? High quality data may be a necessary, but not a sufficient, cause of good information delivery. Even this connection is tenuous, because there are many ways that poor quality data can be dressed up and used, especially when expectations or requirements are low. Conversely, you can have exquisitely formed data that fails to meet users' information needs.
Figure 1 . Presentation instance quality cause-and-effect diagram
We take the position that data quality is an important contributing factor to the success of technical publications delivery, and that SGML can be used to enhance the quality of technical publications data. We definitely are not saying that high quality data (in SGML or any other notation) will ensure successful delivery of technical publications. There are opportunities for further research in this area.
In the following sections we will discuss some specific approaches to measuring data quality, and how SGML can be used to improve data quality in the area of technical publications, as well as improve the processes of producing data.
Dimensions of data quality
Thomas C. Redman, in Data quality for the information age [Redman 1996], discusses several "dimensions" of data quality that apply to three distinct aspects of data processing. His analysis provides a starting point for applying quality measurements to SGML data. Although Redman focuses on record-based relational data structures, many of the quality dimensions are applicable to data structures that are used to represent documents. We'll review Redman's dimensions of data quality, and discuss how they can be used to measure and improve the quality of SGML data.
Dimensions of data quality (from Data quality for the information age [Redman 1996])
Note:
The dimensions apply to various aspects of data processing, grouped under "Conceptual view", "Values", and "Representation". As applied to SGML, these aspects correspond roughly to DTDs, document instances, and notation of storage objects. We have used Redman's original terms for the dimensions, but have supplied our own definitions to suggest how they might be applied to SGML data.
As mentioned previously, these dimensions are derived primarily from data structures of a relational database model. We have not attempted to extend this list by adding dimensions that apply specifically to document data structures. But there appear to be some promising opportunities in this direction. Document instances will have a greater range of quality dimensions in the "Values" area, since they are composed of semantically complex structures instead of simple typed data fields. Considerations such as simplified language, spelling, and composition would apply. Linking and addressing, which are such challenging areas of document management, should also have some applicable dimensions of data quality.
A fundamental principle of quality management is that you can't control what you don't measure. Only by identifying specific dimensions of data quality can you begin the process of measuring data quality, comparing actual quality with intended quality, and analyzing the process of creating the data to improve the overall data quality. There is no single measurement or set of measurements that is appropriate for all applications. The important thing is to select some appropriate measurements along one or more identifiable dimensions of data quality.
The remaining subsections will discuss some ways in which these dimensions of data quality can be applied in technical publications to the areas of DTD development, authoring, and processing and presentation of SGML documents.
DTD development
DTDs correspond to the "conceptual view" aspect of data quality, as discussed by Redman. The importance of good DTDs to a successful SGML implementation has long been recognized. But what is a "good" DTD? And what is the relationship between the "goodness" of a DTD, and the effectiveness of information delivery? As shown in Figure 1, DTDs are not a direct factor affecting the final presentation instance. It is often possible to create bad document instances from good DTDs, and good instances from bad DTDs.
Furthermore, as markup theory has evolved it has recognized that, ultimately, the structure of the document instance is more important than the schema(s) to which it conforms. This concept is embodied in HyTime's notion of "enabling architectures", and, less robustly, in XML's notion of "well-formed" documents.
Nevertheless, as a practical matter, you want DTDs; and, if you aim to produce quality document instances, quality DTDs can help. Notice the chain of cause-and-effect assumptions here (refer to Figure 1): the end user's experience depends (in part) on a quality presentation instance; the quality of the presentation instance depends (in part) on the quality of the document instance; the quality of the document instance depends (in part) on the quality of the governing DTD. The level of effort you devote to DTD development should be determined by the degree to which your final product is affected by the DTDs. We do not know of any general rule for determining this--many other parts of the process can compensate for a poor DTD, or dilute the benefits of a good DTD. For example, enforcing specific writing procedures can assure the production of good document instances even if a DTD allows nonsensical structures. On the other hand, limitations of your processing applications can obviate the benefits of clever markup constructs.
These caveats aside, we still want to be able to determine whether our DTDs are "good", and if not, how to make them better. If we are just starting out with SGML, we want to know whether an off-the-shelf DTD will work, or if we should buy one from a consultant, or make one ourselves. DTD development is a challenging effort, even in the best of circumstances. Fortunately, there is excellent practical advice available in books such as Developing SGML DTDs [Maler 1996] and Structuring XML documents [Megginson 1998]. These books describe how to create a high-quality DTD, and we know, based on our own experience, that these processes can indeed help you produce good DTDs.
But if we want to apply quality management principles in this area, we must identify the specific features that contribute to the quality of the product (i.e., the DTD), and we must be able to measure, directly or indirectly, the "performance" of those features. Consider the following statements that explain why a DTD is "good":
This DTD is good because it was created using a proven DTD development process.
This DTD is good because it allows the creation of quality document instances, and enables efficient document processing that meets all our application needs.
This DTD is good because all markup constructs exhibit semantic and structural consistency; all elements satisfy our design goals of naturalness and identifiability; all components are relevant to our document content needs, have obtainable values, and are clearly defined.
(In the last statement, the terms in bold face correspond to data quality dimensions of the conceptual view. "Dimensions of data quality" .)
The first statement implies that the right process will always produce a good DTD. The second implies that the quality of a DTD is determined by seeing it in operation. Only the third statement recognizes explicitly that the quality of a DTD depends on specific identifiable, measurable features of the DTD itself. In practice, if you have created a DTD by following Maler's guide [Maler 1996], you will almost certainly be able to make a statement like the third about your DTD. But if you need to decide whether to adopt an industry-standard DTD, or must maintain a hand-me-down DTD, it helps to have a list of specific, measurable features and feature goals.
In our department, we've tried two different approaches to developing DTDs. First we made some minor customizations to a couple of industry standard DTDs. These DTDs currently govern the bulk of our SGML document instances. Although we have been able to meet our production commitments while migrating our data and processes to these DTDs, we recognize that there are several deficiencies. We have been able to work around these deficiencies with creative data processing. Since we do not have any requirements to deliver data conforming to these industry-standard DTDs, we will probably evolve away from them, toward more specifically customized DTDs.
We have also created one DTD from the ground up, following the procedure outlined in Developing SGML DTDs [Maler 1996]. This effort was more time-consuming than the minimal customizations we did on the industry-standard DTDs. But we were far more satisfied with the result, and believe we have the basis of a robust DTD to govern the creation of another large portion of our document instances.
Authoring
It is during authoring, perhaps more than any other phase of document production, that SGML enables dramatic quality improvements. We take it for granted that competent writers and editors can organize words, paragraphs, and pictures to make good-looking, understandable documents. When all is said and done, the final presentation instance, in paper or on screen, shouldn't look any different whether it was encoded in SGML or Elbonian. So, why is SGML better for creating quality documents than some other format?
It should not be news to anyone that writers, when authoring in SGML, will concentrate primarily on the data content and the structural relationships amongst document components, instead of on "how it looks". Dimensions of data content ("Values") quality include accuracy, completeness, consistency, and currency (or timeliness). It stands to reason that, by focusing writers' attention on these values, the quality should improve.
In our operation, we have also improved the quality culture by deploying SGML editing applications. (Recall that the three preconditions of a quality culture are employee awareness of quality goals, employee ability to detect quality defects, and employee ability to correct quality defects.) Our previous structured editor did not provide the means to detect many of the quality defects that only showed up during batch processing of the data for CD-ROM preparation. Even the most careful writers with the best intentions could not possibly find and correct errors on their own, using the tools at their disposal. (Many such errors involved cross-reference pointers to other documents.) As a result, we had to rely on a feedback loop of returning error lists to the writers, waiting for the writers to correct the documents, then reprocessing the data, returning a list of remaining errors, etc. Not only was this time-consuming, it was annoying to the writers who, to the best of their knowledge, thought they were done working on the documents. Now we can enable the writers to check for many types of errors within their authoring environment, which has reduced (but still hasn't eliminated) the error-list feedback loop.
We would be thought naive if we didn't comment on the supposed resistance of authors to working in an SGML environment. In our department, we had completely different reactions from two groups of writers. The group that had already been using a structured editor did not have much difficulty, for two main reasons. First, they were already familiar with the concept of tagging document components. Secondly, our previous editing application suffered from terribly slow performance, so most writers were pleased with the responsiveness of a native SGML editing application. The group that migrated from a traditional word processing/desktop publishing application experienced more difficulty, for a variety of reasons, including unfamiliarity with tagging concepts, lack of control over formatting, and insufficient training.
Any organization that faces this problem should try to emphasize the data quality improvements that are possible with SGML authoring. But many important aspects of SGML data quality are somewhat abstract, and therefore less apparent to writers than other, traditional indicators of document quality. If quality principles are already well-understood and supported in an organization, then SGML should be an "easy sell" to authors.
Processing and presentation
During processing and presentation of SGML data, Redman's "Representation" dimensions of data quality come into play. As anyone who has worked with computers for very long knows, you can almost always program a solution (or buy an application) for any particular data processing need. The main question is not whether you can process the data, but how reliably and efficiently you can process it, what knowledge and skill sets are required of developers, and how many different processing paradigms you want to support. These are all functions of the "Representation" dimensions of data quality, particularly of interpretability, portability, flexibility, and representation consistency. We do not need to argue in this forum that SGML data outscores all other notations on these scales.
We will, however, point out that these very features of SGML present other opportunities for applying quality management principles to improve processes and products. The portability and flexibility of SGML allow you to implement modular solutions to different operations in the technical publications production process. For example, your editing application and your storage system can be chosen primarily based on how well they meet the needs in each respective area. In our operation, we began with file-system storage of text and graphic entities. While the security and versioning capabilities provided by a database storage system are attractive, we concluded that these were not essential to support initial migration and production.
As with any process or product controlled by quality management principles, you will look at actual customer needs, then evaluate the features of proposed products as they pertain to those needs, and implement a process, or obtain the product, that best meets your needs. Largely because of its representational data quality, SGML gives you the freedom to apply quality planning principles precisely and appropriately to each segment of your production process.
Previous Previous Table of Contents
Conclusions
Through a deliberate and well executed endeavor, our technical publications department will be an example of a complete customer focus group for others to emulate. The processes we've outlined in this paper illustrate one approach to bridging the gap between theory and application of SGML. We have shown how quality management principles and concepts can be applied to the problems of SGML implementation in the area of technical publications. Quality considerations affect many levels and aspects of this operation--data, processes, and culture. Regardless of the specific process or techniques used for quality management, the goals are always the same: improve product features and reduce product defects.
Previous Previous Table of Contents
Acknowledgements
The authors would like to thank Jim Laney, Director of Engineering Services and Product Safety at Cessna Aircraft Company, for his long-standing support of quality management principles and markup technologies in Cessna Technical Publications.
Previous Previous Table of Contents
Bibliography
[Juran 1993]Juran, J. M., and Frank M. Gryna. 1993. Quality planning and analysis: from product development through use. 3rd ed. New York: McGraw-Hill
[Maler 1996]Maler, Eve, and Jeanne El Andaloussi. 1996. Developing SGML DTDs: from text to model to markup. New Jersey: Prentice Hall PTR.
[Megginson 1998]Megginson, David. 1998. Structuring XML documents. Prentice Hall
[Redman 1996]Redman, Thomas C. 1996. Data quality for the information age. Boston: Artech House
Previous Previous Table of Contents