|
Acquirement of XML skills in industry
|
 |
Incidentally or full time, workers in industry are increasingly confronted
with an application of XML in one or more of its many aspects. People are
frequently insecure about the background knowledge they have to acquire and
the resources that are available. In order to come to a more general solution,
we observe that many tasks in automation may be described in terms of Information
System Methodologies. According to the layers in these methodologies, the
respective XML aspects may be identified, together with the required background
knowledge for their proper application.
The next steps will be to identify the resources for training and the
ways to set up training effectively. In that respect, typical working habits
in industry have to be taken into account.
Introduction
The widespread introduction of XML in industry, government and other
organizations, calls for an inventory of skills. Skills that are required
for the proper use and implementation of XML and its related standards (“XML+”).
This paper investigates which skills are needed.
Varieties of people in organizations are in contact with XML. Among
them are:
- managers who have to oversee the consequences of the introduction
of XML
- the real end users who have to profit from the use of XML
- authors who have to write documents in XML
- business consultants who have to know about the applicability and
usability of standards and tools
- members of standardization committees who are developing new languages
- computer scientists who are developing data structures and efficient
algorithms and who study the intrinsic properties of the standards
- developers of tools and systems for XML+.
All these people need different types of skills and background knowledge.
They may be heavily involved in large projects, or only at first sight. Sometimes
they use different terminology.
Where do people acquire new skills? For the new generation it may be
in technical schools or universities, where XML is gaining popularity in regular
courses. However, for most people in industry XML is new. People have to follow
training courses or have to get incidental advice.
The author has given a number of training courses in industry, from
crash courses taking half an hour to regular courses taking a week or more,
and is preparing courses on the university level. There is a variety of demands
and a variety of personalities and backgrounds involved. Some topics are basic,
but some demand more research on an academic level.
This paper traces only the topics involved, and not the depth of training
that is required for different target groups. It addresses the use of XML
as a language for the exchange of messages as well as for the structuring
of documents, as a partial replacement of SGML.
In order to locate the required skills we follow two approaches.
The first approach is a technical one. It takes the point of view of
a developer who wants to construct a tool or a complete system, be it from
scratch or as an addition of XML functionality to an existing one. This approach
will recover mainly skills offered by Computer Science.
The second approach takes the view of an application that has to profit
from the use of XML+. Following the steps of a System Development Methodology
we encounter the required XML skills, independent of the size of the system
to be developed. This approach is more or less covered by Information Science.
Besides the skills covered by Computer and Information Science the introduction
of XML+ may require other skills.
This paper has three main divisions.
- Part one outlines the components of a complete document processing
system, to be referenced in the other parts.
- Part two identifies the depth of XML integration in a document processing
system, seen as a gradual integration of XML with tools, systems and infrastructure,
in a bottom-up fashion
- Part three traces the sequential steps of system development, to
be taken in time and from top to bottom.
Components of a document processing system
Every system that processes documents, small or large, has one or more
of the following components. The (partial) markup of the documents calls for
additional XML functionality.
Input:
- Author environment: for the creation and maintenance of XML tagged
data
- Conversion: for the translation of data and documents into XML
- Parsing: for the validation of an XML document against a DTD.
Data storage:
- Repositories, be it (distributed) databases (relational, OO, relational
OO) or simple flat files
- Data storage manager for XML data, connected to the repositories,
for the access to document objects, hyperlinks, entities and text.
Retrieval:
Output:
- Transformation: within and from XML data and documents
- Composition: mapping XML structure to format (paper and electronic)
- Electronic Delivery.
Document management:
- On the XML component level: workflow, authorization, version control
and content management.
Workbenches:
- For development and maintenance of specifications: document and
database schema’s, stylesheets, transformations.
Layers of XML system integration
This section takes the point of view of a gradual integration of XML+
into a Document Management System. The following steps are covered:
- The XML+ Standards themselves
- The building of Engines based upon the standards
- The integration of engines within XML+ Tools and Systems
- The integration of tools and systems within the Infrastructure:
Databases, Doc. Management, Operating Systems and Networks.
XML+ standards
The XML+ standards are based upon formal languages and datastructures.
Therefore, members of the standardization committees should have a command
of Theoretical Computer Science.
(The writing of specifications, like schemata and stylesheets, according
to the standards is covered in the section on System Development Methodologies.)
| Design of XML+ Syntax Structures |
Skills |
|
|
- Design considerations for formal languages and grammars
|
|
|
|
- Transformations with XSLT
- Other transformation languages
|
- Rewriting systems; aspects of reversibility
|
- Query languages on documents and trees
|
|
- Context-sensitivity within transformations, queries and stylesheets
|
- Expression of context-sensitivity
|
Table
1
| Design of XML + Information Structures |
Skills |
|
|
- Tree walking languages
- Operations on trees
|
|
|
|
Table
2
XML+ engines and API’s for processes and datastructures
The processes, which are described in the standards, may be implemented
within dedicated Engines. Most expertise stems from the theory of Formal Automata
and Program Generation.
| XML | Skills |
- XML parser + DOM constructor
|
- Techniques for parser generation
|
Table
3
| Conversions and transformations:
100% automatic and correct | Skills |
- XSLT engine: add, delete, change components in trees
- Other tree transformation techniques
|
- Techniques for transducer generators
- Programming languages: imperative, functional, logical, ..., event-based,
pattern-based, rule-based
|
Table
4
| XML+ | Skills |
|
| Knowledge of:
- Responsiveness: off-line, on-line, real-time
- Execution: compiled, interpreted
- Time and space complexity: exponential-NP, polynomial, (sub)linear
- Decidability and correctness
- Handling of ambiguity
- Handling of ill-formed input
- Software engineering
- OO-aspects: (multiple) inheritance, sending messages to object
- Sequence (in)dependency of rules
- (Visual) developer workbenches
|
|
| |
|
| |
- Query engine on documents and document trees (also Doms)
|
- Database techniques
- Techniques for pattern recognition: string and tree matching and
comparison
|
- Revision tracking and storage
|
- Theory of sequence comparison for strings and trees
|
Table
5
Integration of engines within XML+ tools and document processing systems
Tools for Document Processing may obtain XML functionality by integration
of XML+ Engines.
It may concern already existing tools, which have to be extended with
XML+ functionality, or new tools, constructed from ground level. It may concern
shrink-wrapped tools or home made ones, grown up from dedicated systems.
The required skills stem, at one hand, from engineering disciplines
and, at the other hand, from the field of application for which a tool is
constructed.
| XML+ | Skills |
|
|
- Access to requirements analyses for the desired functionality
- Working with API's on internal datastructures
|
Table
6
| XML Editors |
Skills |
|
|
- Psychology-Ergonomics
- Techniques for technical authoring
- Language Technology for Controlled Languages
|
- If built into existing word processors (like MsWord)
|
- How to operate on text buffers without hierarchical structure
|
|
|
|
Table
7
| XSL, XSLT Editors |
Skills |
- Creation of specifications by example
|
- Case systems, knowledge based systems
|
Table
8
| Data storage and retrieval of
XML objects: database techniques | Skills |
- Relational, OO, relational-OO, full-text
- Meta-data
- Storage of document trees in databases
|
- OO-paradigm
- Tradeoffs between relational, OO, relational-OO for XML components
- Database technology
- Information Retrieval models, e.g., Boolean, Vector, Probabilistic,
Fuzzy Set, Bayesian
- Pattern matching
- Indexing and search
- Web query languages
- Connection to Data Storage Manager for XML+
|
Table
9
| Document management for XML components: |
Skills |
- Workflow, authorization, version control, content management and
collaborative authoring
|
- Workflow principles
- Connection to Data Storage Manager for XML+
|
Table
10
| Browsers |
Skills |
|
|
- Java and Internet technologies
- Multitier solutions, XSL-HTML
- XSL(T) engines
|
Table
11
| Composers |
Skills |
- Mapping XML structure to format (paper and electronic)
|
- Graphic design
- Design principles graphical industry
- XSL
|
Table
12
| Electronic Delivery |
Skills |
| In XML, HTML, PDF, LaTex etcetera. |
- Transformations
- Server Technology
|
Table
13
Integration within infrastructure: databases, doc. management, network,
OS
The integration of tools and subsystems (with or without XML+ functionality)
in one overall system can be simplified by the exchange of messages which
are marked up with XML. Also (for instance in the B2B paradigm), systems of
different organizations can become more or less integrated.
| XML+ | Skills |
- XML as glue for exchange/ interface for composite systems
|
|
Table
14
System development methodology; XML aspects
The former section took a bottom-up approach to the creation of a Document
Processing System. Now we take the opposite view: constructing a system top-down,
starting with a specific application in mind.
There exist several System Development Methodologies (SDM), some competing
and some specializing in different areas of application. We may abstract from
the differences between these methodologies because our goal is to recover
the specific skills needed when XML+ is introduced. Therefore, we will follow
the main steps of a SDM and will use general terminology.
It may be the case that in the evolution of an installed Information
System, XML+ functionality has to be added. In that case methodologies for
reverse engineering may apply, which may call for a mix of top-down and bottom-up
strategies.
| XML+ | Skills |
|
|
- System development methodologies, also for OO systems
|
Table
15
Definition study
This is the phase of the definition of goals and the formulation of
constraints.
| XML+ | Skills |
- Business objectives
- Strategic planning
- Bottlenecks
- Design lines
- Costs / benefits
- Critical success factors
|
- Awareness on management level of benefits of XML
- Business Economics
- Costs / benefits studies
- Case studies
- Design patterns
|
- Creation of working group with authors, database specialists, stylists,
administrators
|
|
Table
16
Information (data) analysis
In this phase the flow of information is analyzed and defined. It is
an important phase for the definition and planning of XML+ activities.
| Data | Skills |
input
- Who, where, quantities
- Types of documents, known logical structure, coherence between documents,
versions
- Input medium(s); from which platform(s), ways of delivery
- Existing layout, styles, figures, formula, tables
- Constraints
- Quality documents, known inconsistencies and errors
|
|
storage
- Document components
- Entities to be shared by processes
- Extended links
- Specifications (dtd's, stylesheets, etc.)
- indexes
|
- Trade-off's for the level of granularity
|
output
- Types of products to be delivered
- Requirements for electronic exchange
- Required types of queries; reliance on logical structure; requirements
for precision and recall
- Required logical structures
- Additional information to be produced
- Live links, shared data with other systems
- Sharing of information between documents and remote document repositories
- Fault tolerance
|
- Concepts of information retrieval
- Concepts of document structuring and transformation
- Principles of XML, XSL, XLL, XMI
|
Table
17
Global design
In this phase the techniques are defined by analyzing the requirements,
making selections and applying restrictions.
| Document management |
Skills |
- On the XML component level: requirements for version control, approvals,
production flow, working procedures, archiving
|
- Trade-off's storage of process data within XML markup, metadata
or elsewhere
|
Table
18
| XML+ | Skills |
|
|
- Which standards, and when they are required
- Choices, (dis)advantages, availability of tools
|
Table
19
| Document analysis |
Skills |
- Determine logical structure of documents and granularity for retrieval
|
- Inference of specifications
- Diagram techniques
- Design of data models
|
Table
20
| Required conversions |
Skills |
Converters from non-XML, once only (legacy) or regular
- Converters from XML
- Machine aided human conversion, degree
- Human aided machine conversion, e.g., reformatting existing documents
with stylesheets
- Volumes
|
- (In need:) knowledge about the effectiveness of conversion strategies
in different situations
|
Table
21
| XML functionality of system (behavior) |
Skills |
- Requirements for user interfaces
- Input- and output structure
|
- Interface design
- Ergonomics
|
Table
22
| Selection of tools |
Skills |
|
|
- In need: studies on the effectiveness of different XML functionality
and strategies within tools
|
|
|
- Quantification of usability, like in ISO 9126
|
- Evaluation of tools on the market
|
- Types of tools
- common characteristics
- requirements
- usability analyses
- Evaluation strategies
|
- In-house building of tools / combining tools
|
|
- Building visual workbenches for the development of specifications
|
|
Table
23
Detail design
In this phase the techniques to be used are further detailed and specifications
are written, prior to realization.
| XML+ | Skills |
|
|
- Knowledge of standards for XML+
- Know how and when to apply them
- Aspects of usability
- understandable
- correct
- customizable
- effective/efficient
- Applicability of alternatives
- Analyses of users experiences
|
- Design of Control Information: dtd's for input, storage, delivery
|
- Ontology's for names
- Initiatives of OASIS, IDEAlliance, BizTalk: Registry, Repository,
Conformance
- Meta-data
|
- Creation of authoring instructions
|
- Explaining the dtd's; shortcuts for keyboard
- Readability instructions linked with structure, like in Information
Mapping
- Controlled Language guidelines
- Stylesheets for word processors with subsequent conversion to XML
|
- Design of XSL stylesheets
|
- Knowledge of standard; experience
|
- Design of XSLT transformations
|
- Knowledge of standard; experience
|
- Design of standard queries
|
- Trade-off between different query languages for XML
- design/adaptation of database structures
|
- Design of user interfaces
|
- XML messages for dynamic adaptation of user interfaces
|
- Design of XML specifications for the exchange of messages between
processes
|
|
Table
24
| Design of conversions |
Skills |
- Determine methods and phases of conversion
- The symbols in the document on which the conversion will be based
(also dependent upon consistency and errors)
- Degree of pre-, inter- and post-editing
- Defining the operator environment, the functionality, the user interface
- Quantity and qualification of operators
- Selection of tools for conversion
|
- Parsing on correctness of input document
- Trade-off's between pattern matching / replacement and pattern grammar
approach
- Programming systems for pattern matching and replacement (like Omnimark
and Perl)
- Systems for pattern grammars and transduction
- Tree transformations
|
- Creation of working procedures
|
- Experience in writing instructions for XML lay people
|
Table
25
Realization
In this phase, the detailed design materializes into a concrete working
system that can be tested.
| XML+ | Skills |
- Building of tools and systems, also for conversions
|
|
- Writing of documentation and user manuals
|
- Training in Operations
- Stylesheets for input
|
- Training of all personnel
|
- Psychology: motivation of authors in using XML
- Introduction of XML in technical writing
|
Table
26
Implementation in organization, working procedures
The last phase we mention here regards the introduction of the system
within the organization.
| XML+ | Skills |
- Arrangement of user organization
- Arrangement of central organization for the control of all XML+
specifications
|
|
Table
27
Conclusion
It is possible to define the XML skills that are required in industry
by following the steps of bottom-up and top-down approaches for System Development.
The acquirement of skills has to be dealt with elsewhere. Varieties
of people and different working habits ask for different approaches in the
way training courses have to be set up.
The checklists in this paper are of a rather general nature. They can
be extended and more refined. The author welcomes additions and other comments.