Internet Based Documentation Processing System
ABSTRACT
As the Internet becomes better, mature, sophisticated, and more reliable, the notion of time, distance, value and service is redefined by looking through the practical solutions. In this age of Internet, where the combination and distinction of many different things of our society, working environment and global economy is blurred, providing simple solution helps demist the complication that exists in different areas of information technology. This paper will provide some approaches, models and methods based on the real world experience in the electronic documentation publishing field. It will discuss some possibilities based on the idea of singleness that are reasonable, viable, simple and practical because of the Internet.
Table of Contents
1. INTRODUCTION
The theme of my paper, "Internet based documentation processing system" builds upon the idea of "singleness" with the dynamic behavior of the Internet in the background.
1.1. The Internet and its Dynamic Behavior
Actively or passively we have embraced the Internet, because it has given us a very effective communication and collaboration medium. It has brought us different methods of information sharing. With this capability, we are taking the opportunity to move various aspects of our society, such as education, entertainment, business and politics to the Internet. However, we are not satisfied. We have complains about infrastructure, speed, privacy and security. We also have complains about the available services, quality of the content, consistency, accuracy and the style. Then again, we have opinions about cost, performance, availability and benefits. On top of all these we complain that it keeps changing very fast. We even mention Internet time to mean rapidly moving time.
This is the nature of change and dynamism of the Internet. Various observations such as Moore's law, Parkinson's Law of Data etc. are appearing to be true. And we all, from various part of the world, directly or indirectly, are contributing whatever we can and are able to, to this global community. We all are participating for the change, visibly or invisibly, because we are not satisfied with the current Internet and we all want it to be a better place. We know it is far less than the dynamic potential a vast and diverse community can have. So, there is no doubt, it will keep changing.
Now, imagine the Internet 3 years from now, 5 years from now, or even more, 10 years from now. Most probably we will still have our complain and opinion about the Internet, but most likely those will be about the new issues the future time will bring then. May be we will complain less about the issues we are concerned at the present time and condition. Suppose the digital communication infrastructure is fast, low cost (even free), easily available and we are able to utilize this power and context. Some questions come to mind: How would we like to do our job? How would we balance the competition? What would be our value? What would be our communication and information sharing methods?
When the Internet matures and advances to the point where there is no distinction between the machine/service we are using in front of us or somewhere in our country or across the continent any where in the world, how would we like to manage our information? How would we like to distribute and share? What processing or service model would we consider? How would we like to collaborate with our suppliers, partners and customers?
These questions are not related to any vender, hardware, operating system, platform or other software tools. These are some simple and neutral questions that hammer our thoughts everyday. These are the questions related to our working pattern and working environment. Whether we get satisfactory answer or not, whether we try to get answer or not, these questions will always continue the journey with us, because they are with us for years in different context.
By human nature, we like to do our work in our own way. We like to do simply, uniquely and probably lazily in our own pace. By the very same nature we like to learn, create and evaluate only once, reuse as much as possible and continue with more satisfying and valuable work. We are expecting that in the Internet, and any field that makes use of the Internet.
Because of my experiences and experiments in the electronic publishing field, I would like to elaborate these concepts in the following sections with the Internet in the background. We are not perfect, the resources and infrastructure is not perfect, so I do not expect this discussion to be perfect. However, I do believe that it has some meaning and clue for you, if you are considering the Internet seriously.
Although this paper basically focuses on the Internet, you can interchange the meaning and concepts discussed here to any networked environment of any scale that can behave like the Internet.
1.2. Singleness: The Path to Completeness
A single form is probably the most simple form because we can observe it in its entirety. It is easy to see, examine, learn, test, understand, use and think such a form. Let's say such a single form an object. Planning, management, development and evolution becomes simple when different entities are viewed as single objects. We tend to prefer to have a single object in its completeness and not worry about it without need.
This is the reason we like to create a single place to store application, single place to distribute information, single place to process information, singe place to filter information, single place to delegate information, single place to collect information, single place to share information. Why? Because, we want to do simple things simply and easy things easily. Singleness is the basis of all these ideas.
Because of these reasons I believe that singleness is the path to completeness. Although it is controversial, I believe that, if there were only one object (resource) of each, uniquely identifiable and addressable in the Internet, we would have lot less trouble in planning, developing, finding, using and managing them.
2. INTERNET BASED PROCESSING
In our context, the term processing implies that there is some kind of data in one format which is passed through a software program to produce different kind of data in next format. By the Internet based processing, I mean, we want to make use of the Internet (or any other type of network), to use resources located in another location to produce the output we desire. There are obvious advantages of doing this and there are different ways to achieve what we want. In the following sections, I will discuss some processing approaches, models and methods. Then, I will show a real world example of technical documentation processing system based on the markup technology.
2.1. Processing Roles
There are basically two types of roles, the server and the client. In the Internet, any resource (machine) can play the role of server or client when configured with right tools. When all the nodes (even mobile phones, TVs, cars etc.) connected to the Internet become addressable peers, many variations of processing approaches and methodologies will appear. What ever the networking model, the roles and responsibilities will remain the same but with sophistication. In our context, to distinguish from the static server and client roles, I would like to call the server as the "Processing Server" and client as the "Processing Client."
2.2. Processing Approaches
Internet based processing approaches depend on the factors like connection bandwidth, server performance, client performance, size of input and output data, size of the processing software program, security, management and business logic. Whatever factor is influencing our decision, we will take one of the following approaches. Other variations of the processing approaches can be thought, but the following are the basic ones for documentation processing.
The key idea behind all these approaches is that the processing software remains on one machine while input and processing request comes from other machine.
2.2.1. Do-it-on-the-server approach
The Processing Server is configured with the processing software and input is expected from the Processing Client. The Processing Server processes the input data and deals with the output according to the processing model as described below.
2.2.2. Do-it-on-the-client approach
The Processing Server is configured to return the processing software to the Processing Client in response to the processing request. The Processing Client processes the input data on the local machine and deals with the output according to the processing model as described below.
2.2.3. Do-it-somewhere approach
The Processing Server is configured to return the processing software to any other participating processing machine to which the Processing Client sends the input data. The output is produced on the participating third machine which acts as a temporary Processing Server. The output is dealt according to the processing model as described below.
2.3. Processing Models
Just like the processing approaches, processing models depend on various factors. A processing model is determined by the nature of input and ways of dealing with the result of the processing.
2.3.1. Process-and-Receive model
In this model, the input data and the output produced by the processing system is mobilized by the Processing Client. This kind of model is mostly used during the testing and verification process. It is also used when the user wants to carry non-automated work on the result.
Output is returned to the processing client
2.3.2. Process-and-Forward model
In this model, when the user wants to pass the output produced by the processing system, the result is forwarded to another collaborator for further work. A collaborator can be any of group/division, partner, supplier, customer, user etc.
Output is forwarded to a collaborator
2.3.3. Process-and-Distribute model
In this model, when the user is satisfied with the output produced by the processing system, the result can be distributed to the users. The result can also be placed in a known location for common access.
Output is distributed to users
2.4. Processing Methods
Processing methods deal with techniques of mobilizing the input and output information and communicating with Processing Client, Processing Server, collaborator and user.
Because of the close relationship of the WWW browser to the Internet, and their abundant use, it is better to make distinction in the processing methods. This distinction will help us simplify our thinking because data processing involves reading and writing of files from a file system which is an important issue in the Internet based systems.
2.4.1. Processing through browser
Currently this method is generally used and better understood because of its simplicity. However, making use of this method requires standard or commonly used protocols (such as HTTP) and supporting server program on top of which the processing system needs to be developed.
Most of the time, already available suitable browser software can be used as the Processing Client. Through the client, the user can send the input data to the Processing Server which then returns back the output.
Other software which run on top of the client, such as applet, certificate based process and restriction relaxed process (or service) can also be considered as Processing Client.
2.4.2. Processing through special software
Special purpose processing software can be developed that can satisfy our need. Special protocol, server program and clients can be developed. Data can be serialized in one location and sent to other location for further work. However, this imposes software life-cycle maintenance concerns as well as the distribution, installation and user training. It is a build-your-own type of method.
Although this method could be difficult and time consuming, it is the most flexible method which can be used with all of the approaches discussed in this paper.
3. REAL WORLD IMPLEMENTATION: TECHNICAL DOCUMENTATION PROCESSING SYSTEM
This real world implementation is about a technical documentation processing system. It was developed for Kyushu Matsushita Electric Co., Office Service Center (KME-OSC), Fukuoka, Japan. The system is being used to produce technical documentation of telecommunication equipment.
3.1. Case Study
As discussed above, this case study is an example which is based on the following ideas:
- Approach
-
Do-it-on-the-server
- Model
-
Process-and-Receive
- Method
-
Processing through browser
3.1.1. Background
KME-OSC produces about 1400 pages of documentation for a single model. 4 to 5 products in about 20 models/month are concurrently documented. The documentation revision cycle is about 20 to 30 times in average. About 50% of the information is reusable for new models and 80% information is reusable for revised models. Currently it needs to support 13 languages including the Chinese and Russian.
3.1.2. Output need
The processing system needed to support the following basic needs:
| Need | Purpose |
| Paper | Book binding |
| Paper distribution | Electronic paper/book distribution |
| Interactive, searchable, indexed documentation | For CD-ROM, Internet/Intranet online system |
| Translatable format | For translation memory |
| Interchangeable format | For data interchange with other group/division and suppliers |
To help the integrity of the output and maintenance, the processing system should also generate processing log, reports for id and reference use, component use, graphics use etc.
To give strength to the idea of singleness of data, the style formats (such as A4, Letter, A5, Chinese etc.) should be separated from the content, allowing user to specify the desired style format.
3.1.3. Output format solution
As a solution to the above needs, following formats are used for output generation.
| Format | Output | Remark |
| Paper | MIF (FrameMaker®) | Direct conversion |
| Paper distribution | Through FrameMaker | |
| Interactive, searchable, indexed format | HTML (with JavaScript, CSS, DHTML) | Direct conversion |
| Translatable format | SGML text | Normalized output |
| Interchangeable format | SGML text | Normalized output |
Desired reports and log is automatically generated and style formats are independently specifiable by the user.
3.1.4. Input process
The input data is authored in SGML with 3 original DTDs in different supplier locations of KME-OSC. All authored documents are stored as single file components. These single files are stored in directories as external entities. A 4th original DTD is used to assemble necessary external entities into a document instance that resembles a set of documentation for the equipment. Few utilities are developed to help user productivity, efficiency and data interchange. A SGML editor is used for data input, which is the only software that users use 92% of the time. 5% goes to utility and text editor use. Remaining 3% goes to Processing Server use.
3.1.5. Output process
The user interface is browser based which lets the user to select simple options that are relevant for the output (such as showing graphic file names in the output, counting table or figures etc.). Users (Processing Clients) can upload a ZIP compressed input data to the Processing Server, which consists of only the files for external entities and a document instance that represents the documentation set.
The users from anywhere in the world, at any time, can send the input data to the Processing Server (hosted by Daitec in Hiroshima, Japan) and get back the output result. Same source of input can be used to produce different output formats depending on the need. Obviously, the output is returned in ZIP format to save the transmission time.
3.1.6. Data transmission
The average size of the uploaded input data in ZIP format is about 800KB which consists of 850 to 1000 files (external entity files). After extracting the files from the ZIP archive in the Processing Server, the output processing starts. The size of the output produced for paper version (1400 pages) is about 9MB without compression (600KB with compressed) excluding the graphics. All graphic objects used in the documentation are externally linked. Depending on the connection condition, it takes about 2 to 5 minutes to upload, process and download the data with an ISDN connection.
3.1.7. Selection of processing approach
All approaches described in this paper were considered for the solution development. The "Do-it-on-the-client approach" was not feasible because the size of the processing system was larger than the uploaded or downloaded data. It also required to develop special purpose server and client software, because the browser based system is not able to read and write data in the local system. For the same reasons, "Do-it-somewhere approach" was also discarded.
3.1.8. System architecture
The Internet based processing architecture relevant to the case described above is shown below.
3.1.9. Pros and cons
Besides the pros and cons of using markup technology based solution, here are the Internet based documentation processing system's pros and cons.
-
Pros
-
Reduced the burdens of distribution, installation, evolution and training.
-
Simplified the working environment by concentrating on only the necessary need.
-
Produced accurate information by using the process-time-generation of all references.
-
Removed duplication of work by utilizing single information for all needed format.
-
Quality of content improved because of simplified working pattern.
-
Enabled to unify consistent look and feel.
-
Enabled anywhere-anytime processing.
-
Even without a content management system, suppliers with very limited resources could be mobilized.
-
Helped in foreign language translation (by translation companies in native language countries) of the documentation.
-
Cons (due to current conditions)
-
It requires networked environment.
-
Network bandwidth is a concern.
-
Cons (due to nature of solution)
-
The result can not be verified until the output is produced.
-
Uploading/downloading of ZIP files is a concern.
-
User transparency in data transmission is desirable.
3.2. Future Possibilities and Observations
Many promising Internet based technologies and methods are appearing. Internet oriented development environments, file systems, repositories, content management and resource management systems are being available. New techniques in linking, addressing, finding, identifying, capturing and reusing with identity will allow all connected parties, from users to suppliers to collaborators to customers to manufacturers (producers) use a viable processing model. Using the same single input (shared information) will enable to produce multiple output formats instantly at the moment the new format's output processor becomes available. Faster and cheaper or free network connection will make the Processing Servers user transparent. Any or all, even variations of the processing approaches discussed in this paper will be feasible and implementable. By creating and maintaining a process that is based on the idea of singleness we will be able to change faster and progress faster.
4. CONCLUSION
With the idea of Internet based processing, it is possible to use the distributed resources such as machines, man power and professional services from all over the world. Even with currently imperfect, incompatible and separated networks, it is possible to find simple solutions that are viable and useful. If we can keep our information in the state of singleness, invent effective methods of managing and assembling those single objects and get them at the time of need, many of our confusions and complains will start to disappear. Internet based processing systems will play a very important role in maintaining the singleness of the resources that will help making the Internet transparent and living. The dynamically changing behavior of the Internet and the energy of such a diverse community will certainly produce a better, reliable and efficient computing environment.
Acknowledgements
I would like to thank Mr. Masaru Yoshikawa of Kyushu Matsushita Electric Co., Office Service Center, Fukuoka, Japan, for kindly cooperating and allowing me to include the case study.


