Acquirement of XML skills in industry
Gert van der Steen
Find


Abstract
Incidentally or full time, workers in industry are increasingly confronted with an application of XML in one or more of its many aspects. People are frequently insecure about the background knowledge they have to acquire and the resources that are available. In order to come to a more general solution, we observe that many tasks in automation may be described in terms of Information System Methodologies. According to the layers in these methodologies, the respective XML aspects may be identified, together with the required background knowledge for their proper application.
The next steps will be to identify the resources for training and the ways to set up training effectively. In that respect, typical working habits in industry have to be taken into account.

Keywords

Contents
  1. Introduction
  2. Components of a document processing system
  3. Layers of XML system integration
    1. XML+ standards
    2. XML+ engines and API’s for processes and datastructures
    3. Integration of engines within XML+ tools and document processing systems
    4. Integration within infrastructure: databases, doc. management, network, OS
  4. System development methodology; XML aspects
    1. Definition study
    2. Information (data) analysis
    3. Global design
    4. Detail design
    5. Realization
    6. Implementation in organization, working procedures
  5. Conclusion

Introduction
The widespread introduction of XML in industry, government and other organizations, calls for an inventory of skills. Skills that are required for the proper use and implementation of XML and its related standards (“XML+”). This paper investigates which skills are needed.
Varieties of people in organizations are in contact with XML. Among them are:
All these people need different types of skills and background knowledge. They may be heavily involved in large projects, or only at first sight. Sometimes they use different terminology.
Where do people acquire new skills? For the new generation it may be in technical schools or universities, where XML is gaining popularity in regular courses. However, for most people in industry XML is new. People have to follow training courses or have to get incidental advice.
The author has given a number of training courses in industry, from crash courses taking half an hour to regular courses taking a week or more, and is preparing courses on the university level. There is a variety of demands and a variety of personalities and backgrounds involved. Some topics are basic, but some demand more research on an academic level.
This paper traces only the topics involved, and not the depth of training that is required for different target groups. It addresses the use of XML as a language for the exchange of messages as well as for the structuring of documents, as a partial replacement of SGML.
In order to locate the required skills we follow two approaches.
The first approach is a technical one. It takes the point of view of a developer who wants to construct a tool or a complete system, be it from scratch or as an addition of XML functionality to an existing one. This approach will recover mainly skills offered by Computer Science.
The second approach takes the view of an application that has to profit from the use of XML+. Following the steps of a System Development Methodology we encounter the required XML skills, independent of the size of the system to be developed. This approach is more or less covered by Information Science.
Besides the skills covered by Computer and Information Science the introduction of XML+ may require other skills.
This paper has three main divisions.
Previous Previous Table of Contents
Components of a document processing system
Every system that processes documents, small or large, has one or more of the following components. The (partial) markup of the documents calls for additional XML functionality.
Input:

Data storage:

Retrieval:

Output:

Document management:

Workbenches:

Previous Previous Table of Contents
Layers of XML system integration
This section takes the point of view of a gradual integration of XML+ into a Document Management System. The following steps are covered:
XML+ standards
The XML+ standards are based upon formal languages and datastructures. Therefore, members of the standardization committees should have a command of Theoretical Computer Science.
(The writing of specifications, like schemata and stylesheets, according to the standards is covered in the section on System Development Methodologies.)
Design of XML+ Syntax Structures Skills
  • In general
  • Design considerations for formal languages and grammars
  • Dtd's, schemas
  • Ambiguity
  • Transformations with XSLT
  • Other transformation languages
  • Rewriting systems; aspects of reversibility
  • Query languages on documents and trees
  • Database theory
  • Context-sensitivity within transformations, queries and stylesheets
  • Expression of context-sensitivity
Table 1
Design of XML + Information Structures Skills
  • Document tree, DOM
  • Tree walking languages
  • Operations on trees
  • External links
  • Hypertext, hypermedia
Table 2
XML+ engines and API’s for processes and datastructures
The processes, which are described in the standards, may be implemented within dedicated Engines. Most expertise stems from the theory of Formal Automata and Program Generation.
XMLSkills
  • XML parser + DOM constructor
  • Techniques for parser generation
Table 3
Conversions and transformations: 100% automatic and correctSkills
  • XSLT engine: add, delete, change components in trees
  • Other tree transformation techniques
  • Techniques for transducer generators
  • Programming languages: imperative, functional, logical, ..., event-based, pattern-based, rule-based
Table 4
XML+Skills
  • In general
Knowledge of:
  • Responsiveness: off-line, on-line, real-time
  • Execution: compiled, interpreted
  • Time and space complexity: exponential-NP, polynomial, (sub)linear
  • Decidability and correctness
  • Handling of ambiguity
  • Handling of ill-formed input
  • Software engineering
  • OO-aspects: (multiple) inheritance, sending messages to object
  • Sequence (in)dependency of rules
  • (Visual) developer workbenches
  • Browser engines on trees
  • Data storage manager
  • Query engine on documents and document trees (also Doms)
  • Database techniques
  • Techniques for pattern recognition: string and tree matching and comparison
  • Revision tracking and storage
  • Theory of sequence comparison for strings and trees
Table 5
Integration of engines within XML+ tools and document processing systems
Tools for Document Processing may obtain XML functionality by integration of XML+ Engines.
It may concern already existing tools, which have to be extended with XML+ functionality, or new tools, constructed from ground level. It may concern shrink-wrapped tools or home made ones, grown up from dedicated systems.
The required skills stem, at one hand, from engineering disciplines and, at the other hand, from the field of application for which a tool is constructed.
XML+Skills
  • In general
  • Access to requirements analyses for the desired functionality
  • Working with API's on internal datastructures
Table 6
XML Editors Skills
  • In general
  • Psychology-Ergonomics
  • Techniques for technical authoring
  • Language Technology for Controlled Languages
  • If built into existing word processors (like MsWord)
  • How to operate on text buffers without hierarchical structure
  • If built from scratch
  • Rendering techniques
Table 7
XSL, XSLT Editors Skills
  • Creation of specifications by example
  • Case systems, knowledge based systems
Table 8
Data storage and retrieval of XML objects: database techniquesSkills
  • Relational, OO, relational-OO, full-text
  • Meta-data
  • Storage of document trees in databases
  • OO-paradigm
  • Tradeoffs between relational, OO, relational-OO for XML components
  • Database technology
  • Information Retrieval models, e.g., Boolean, Vector, Probabilistic, Fuzzy Set, Bayesian
  • Pattern matching
  • Indexing and search
  • Web query languages
  • Connection to Data Storage Manager for XML+
Table 9
Document management for XML components: Skills
  • Workflow, authorization, version control, content management and collaborative authoring
  • Workflow principles
  • Connection to Data Storage Manager for XML+
Table 10
Browsers Skills
  • Rendering XML
  • Hyperlinks
  • Java and Internet technologies
  • Multitier solutions, XSL-HTML
  • XSL(T) engines
Table 11
Composers Skills
  • Mapping XML structure to format (paper and electronic)
  • Graphic design
  • Design principles graphical industry
  • XSL
Table 12
Electronic Delivery Skills
In XML, HTML, PDF, LaTex etcetera.
  • Transformations
  • Server Technology
Table 13
Integration within infrastructure: databases, doc. management, network, OS
The integration of tools and subsystems (with or without XML+ functionality) in one overall system can be simplified by the exchange of messages which are marked up with XML. Also (for instance in the B2B paradigm), systems of different organizations can become more or less integrated.
XML+Skills
  • XML as glue for exchange/ interface for composite systems
  • DTD design
  • Parsers
  • SAX
Table 14
Previous Previous Table of Contents
System development methodology; XML aspects
The former section took a bottom-up approach to the creation of a Document Processing System. Now we take the opposite view: constructing a system top-down, starting with a specific application in mind.
There exist several System Development Methodologies (SDM), some competing and some specializing in different areas of application. We may abstract from the differences between these methodologies because our goal is to recover the specific skills needed when XML+ is introduced. Therefore, we will follow the main steps of a SDM and will use general terminology.
It may be the case that in the evolution of an installed Information System, XML+ functionality has to be added. In that case methodologies for reverse engineering may apply, which may call for a mix of top-down and bottom-up strategies.
XML+Skills
  • In general
  • System development methodologies, also for OO systems
Table 15
Definition study
This is the phase of the definition of goals and the formulation of constraints.
XML+Skills
  • Business objectives
  • Strategic planning
  • Bottlenecks
  • Design lines
  • Costs / benefits
  • Critical success factors
  • Awareness on management level of benefits of XML
  • Business Economics
  • Costs / benefits studies
  • Case studies
  • Design patterns
  • Creation of working group with authors, database specialists, stylists, administrators
  • Initial training
Table 16
Information (data) analysis
In this phase the flow of information is analyzed and defined. It is an important phase for the definition and planning of XML+ activities.
DataSkills
input
  • Who, where, quantities
  • Types of documents, known logical structure, coherence between documents, versions
  • Input medium(s); from which platform(s), ways of delivery
  • Existing layout, styles, figures, formula, tables
  • Constraints
  • Quality documents, known inconsistencies and errors
  • Data analysis
storage
  • Document components
  • Entities to be shared by processes
  • Extended links
  • Specifications (dtd's, stylesheets, etc.)
  • indexes
  • Trade-off's for the level of granularity
output
  • Types of products to be delivered
  • Requirements for electronic exchange
  • Required types of queries; reliance on logical structure; requirements for precision and recall
  • Required logical structures
  • Additional information to be produced
  • Live links, shared data with other systems
  • Sharing of information between documents and remote document repositories
  • Fault tolerance
  • Concepts of information retrieval
  • Concepts of document structuring and transformation
  • Principles of XML, XSL, XLL, XMI
Table 17
Global design
In this phase the techniques are defined by analyzing the requirements, making selections and applying restrictions.
Document management Skills
  • On the XML component level: requirements for version control, approvals, production flow, working procedures, archiving
  • Trade-off's storage of process data within XML markup, metadata or elsewhere
Table 18
XML+Skills
  • Choice of XML+ standards
  • Which standards, and when they are required
  • Choices, (dis)advantages, availability of tools
Table 19
Document analysis Skills
  • Determine logical structure of documents and granularity for retrieval
  • Inference of specifications
  • Diagram techniques
  • Design of data models
Table 20
Required conversions Skills
Converters from non-XML, once only (legacy) or regular
  • Converters from XML
  • Machine aided human conversion, degree
  • Human aided machine conversion, e.g., reformatting existing documents with stylesheets
  • Volumes
  • (In need:) knowledge about the effectiveness of conversion strategies in different situations
Table 21
XML functionality of system (behavior) Skills
  • Requirements for user interfaces
  • Input- and output structure
  • Interface design
  • Ergonomics
Table 22
Selection of tools Skills
  • In general
  • In need: studies on the effectiveness of different XML functionality and strategies within tools
  • Criteria for selection
  • Quantification of usability, like in ISO 9126
  • Evaluation of tools on the market
  • Types of tools
    • common characteristics
    • requirements
    • usability analyses
  • Evaluation strategies
  • In-house building of tools / combining tools
  • (See Integration)
  • Building visual workbenches for the development of specifications
  • (See Integration)
Table 23
Detail design
In this phase the techniques to be used are further detailed and specifications are written, prior to realization.
XML+Skills
  • In general
  • Knowledge of standards for XML+
  • Know how and when to apply them
  • Aspects of usability
    • understandable
    • correct
    • customizable
    • effective/efficient
  • Applicability of alternatives
  • Analyses of users experiences
  • Design of Control Information: dtd's for input, storage, delivery
  • Ontology's for names
  • Initiatives of OASIS, IDEAlliance, BizTalk: Registry, Repository, Conformance
  • Meta-data
  • Creation of authoring instructions
  • Explaining the dtd's; shortcuts for keyboard
  • Readability instructions linked with structure, like in Information Mapping
  • Controlled Language guidelines
  • Stylesheets for word processors with subsequent conversion to XML
  • Design of XSL stylesheets
  • Knowledge of standard; experience
  • Design of XSLT transformations
  • Knowledge of standard; experience
  • Design of standard queries
  • Trade-off between different query languages for XML
  • design/adaptation of database structures
  • Design of user interfaces
  • XML messages for dynamic adaptation of user interfaces
  • Design of XML specifications for the exchange of messages between processes
  • Schema (dtd) design
Table 24
Design of conversions Skills
  • Determine methods and phases of conversion
  • The symbols in the document on which the conversion will be based (also dependent upon consistency and errors)
  • Degree of pre-, inter- and post-editing
  • Defining the operator environment, the functionality, the user interface
  • Quantity and qualification of operators
  • Selection of tools for conversion
  • Parsing on correctness of input document
  • Trade-off's between pattern matching / replacement and pattern grammar approach
  • Programming systems for pattern matching and replacement (like Omnimark and Perl)
  • Systems for pattern grammars and transduction
  • Tree transformations
  • Creation of working procedures
  • Experience in writing instructions for XML lay people
Table 25
Realization
In this phase, the detailed design materializes into a concrete working system that can be tested.
XML+Skills
  • Building of tools and systems, also for conversions
  • (See Integration)
  • Writing of documentation and user manuals
  • Training in Operations
  • Stylesheets for input
  • Training of all personnel
  • Psychology: motivation of authors in using XML
  • Introduction of XML in technical writing
Table 26
Implementation in organization, working procedures
The last phase we mention here regards the introduction of the system within the organization.
XML+Skills
  • Arrangement of user organization
  • Arrangement of central organization for the control of all XML+ specifications
  • Training in Operations
Table 27
Previous Previous Table of Contents
Conclusion
It is possible to define the XML skills that are required in industry by following the steps of bottom-up and top-down approaches for System Development.
The acquirement of skills has to be dealt with elsewhere. Varieties of people and different working habits ask for different approaches in the way training courses have to be set up.
The checklists in this paper are of a rather general nature. They can be extended and more refined. The author welcomes additions and other comments.
Previous Previous Table of Contents