|
Impact of an XML repository in development of XML messages
|
 |
Without common semantic definitions, there is no guarantee that senders
and receives will interpret content in the same way. As such definitions are
not provided by DTD's or schemas, they have to be defined in external databases,
such as tag repositories. Without such repositories, no standardisation of
messages or documents will be really possible. In the described German project,
the semantic content of a repository using the xDT proprietary standard is
transferred into an XML repository and is used to establish XML messages for
the communication between physicians offices and hospital information systems.
Problems of generalisation and specialisations will be described. The repository
will be available on the Internet to improve and speed up the maintenance
of the repository and to give the user continuous access to the newest versions.
First experiences with this approach will be described and demonstrated.
XML is going to be used not only in the management of textual documents
but also as an interchange format of messages to be exchanged between application
systems in healthcare. It has already been decided that XML should be become
the interchange format of version 3 messages in HL7 which has been demonstrated
at HIMMS 1999 and 2000. DTD's have already been developed for HL7 version
2.3 messages. Also in the EDIFACT world and in particular in CEN TC 251 in
healthcare a movement towards using XML as the preferred interchange format
can be seen (XML / EDI projects and workshops
[1]).
The message for the exchange of healthcare record data, developed by project
team PT 29, has successfully been mapped directly from UML into XML. Rules
for directing this mapping process were developed. One of the first work items
established in the new ISO TC 215 on standards in healthcare is dealing with
this standardisation process.
Using XML as an interchange format of messages has several specific
implications. Messages are used to transfer structured information in a matter
that the receiver of the messages can directly store and integrate the transmitted
information in his own systems. This requires that sender and receiver have
agreed unequivocally upon how each information item is identified and represented
in the message. This is the main task of the standard development organisations
(SDO). In nearly all of the currently used communication standards like HL7,
EDIFACT, X12 the information items in the messages are identified by the position
in the message. In the HL7 standard e.g. is defined that the name of the patient
is transferred in the 5th field of the PID segment and that the first name
is the first component in this field, the first given name the second component
etc. The gender is transmitted in field 8 of the same segment and it is represented
by the characters f - female, m- male, o - others and u - unknown. Another
specification defines that the in healthcare often used codes should be transmitted
with the code value in the first position, the describing textual information
in the second and the used code system in the third position. Nearly the complete
HL7 standard consist of such definitions. Guided by the position in the message
the receiver identifies the transmitted information items by the position,
adapts, when necessary, the content to the representation in his database
and stores the data in the database. Specific interface programs are required
if an application system will make use of the advantages of a communication
standard.
The situation is completely different when using XML as an interchange
format. The position in a message is no longer of importance since the related
tag, which marks up the content by the beginning and end tag, identifies the
content of an information item. But for understanding each other it is also
necessary to define in a generally available form what information content
should be transmitted when using some specific tags. What is to be transmitted
in a tag like
<diagnose>, some textual information, some code
or a complex of other information items? The DTD's define the syntactical
structure of the document only. They do not provide any relationship to semantic
information. One can e.g. see in the DTD that the content of
<diagnose>
consist of some other information items. Using XML schema one can also define
some constraints like the representation codes for gender described above.
But up to now nowhere is defined which semantic content should be transmitted
within the start and end tags
<diagnose></diagnose>. Using
this tag it seems to be self-understanding what content is to be transmitted.
But why should one not use the tag
<diag> or
<dgn>
or just a number
<t6121>? Which tag should be used to transmit
the gender -
<gender> or
<sex> or even a number
<t3105>?
One can see that some agreements are required which have to be done in a generally
applicable way. This is the task of an XML tag or more comprehensively describe
in a semantic repository. The necessity of tag repositories have been described
several times among others by the white paper of the XML/EDI group
[2], by Steven Newcomb
[3] etc.. The
latter describes in particular, that also Namespaces cannot fulfill the tasks
of a tag repository.
In Germany we have already some experiences using a tag repository as
an integral part of a communication standard. In physicians office systems
(POS) in Germany a special communication standard has been developed historically
and is implemented in nearly all available systems, the so-called xDT (ADT,
BDT, GDT, LDT etc.) standards. The xDT standard uses tags for identifying
transmitted information items. A field in xDT is defined by the field length,
the descriptor (tag) and the content. The descriptors consist of four digit
numbers defined in a central repository that is run and maintained by a physicians
reimbursement association (ZI). Each item has to be defined in the repository
before it can be used within the standard. POS vendors and users receive regularly
updates of this repository. It works quite successfully and has shown that
tags have not to be self-understanding. If a comprehensive repository is continuously
available also pure numeric tags can be used.
In Germany we now have the situation that there is a complete gap between
hospitals and physicians office systems as far as communication standards
are concerned. In hospitals, HL7 is the well-accepted standard, which is completely
incompatible with the xDT standard of POS. Since the requirement for communication
between hospitals and POS is increasing we had to consider how to overcome
this communication gap in a standardised way. A working group has been established
consisting of experts both from the HL7 and xDT side and from vendors and
users of hospitals information systems and of POS. A European project has
been launched within the framework of the ISIS program. The first decision,
which was made as well as in the ISIS project proposal and in the working
group, has been dealing with the interchange format. There was no doubts that
we have to move to XML as the new standard for harmonising the HL7 and xDT
standards. It was also decided that we do not intend to develop a new German
standard. We have to be as close as possible related to the already available
efforts in HL7, CEN TC 251 and ISO TC 215 by utilising the experiences gained
with the xDT standards. On the other side we cannot wait until international
initiatives are coming up. We have to proceed immediately since the vendors
in particular of POS need the new standard as soon as possible not only for
establishing the communication between POS and hospital information systems
but also for developing networks for the exchange of information between POS.
The described work is to become a pilot study to analyse the impact and the
requirements of applying semantic repositories in XML.
In the first steps we tried to map the available repositories in particular
the xDT repository, the HL7 data dictionary and the version 3 Reference Information
Model (RIM). Several problems occurred. The xDT repository consists of atomic
items only. Patients' first name, given name, title, for example, have distinct
descriptors. It does not know any data aggregates like datatypes in HL7. In
the HL7 data dictionary the complete field content is described only. Atomic
items within data types are not integrated. This is also true in the RIM.
In the XML repository one has to start with the tags of the atomic data items.
In our first step of development we are now assembling the required atomic
data items by analysing the three already described sources, the xDT repository,
the HL7 data dictionary and datatype definitions and the RIM.
The repository has also to contain data aggregates like complex types
in the new schema proposal. It is absolutely necessary that besides the semantic
content of the atomic data items also the structure of frequently used data
aggregates like names, addresses or the representation of coded values like
diagnoses, procedures etc. has to be defined in a repository otherwise several
of such constructs will be applied. This is not as serious as in HL7 as long
as the standard tags are used. But the application of the standard will be
much more convenient if also frequently used complex types are unequivocally
defined in the repository.
Here also problems are coming up which have to be solved. Some components
in HL7 datatypes should be represented as attributes in XML. Repeatable components
are to be differently represented in XML than in the position oriented HL7
etc. Our first experiences have shown that the mapping of these three data
sources is a time-consuming but solvable problem. The attempts we have made
have convinced the participating partners that the development of a semantic
repository is absolutely necessary if XML is to be used successfully as an
interchange format for exchanging messages.
Another problem we will be dealing with is to scrutinise the necessity
and application of structured models for messages which are composed of atomic
data items and complex types unequivocally defined in a semantic repository.
The most comprehensive approach is the HL7 reference information model RIM
in which not only the data classes but also their relationships are distinctly
defined. One issue, which we try to investigate, is to what extent such comprehensive
models that require a tremendous time to be developed will be really required
in the future. Since the position in a message does no longer have any importance
for identifying information less complex models might be sufficient. It is
still an open issue.
Other issues to be investigated are the representation of tags. Should
we consider and allow tag synonyms for distinct semantic definitions like
tags in different languages or numeric tags. How can new tags to be added
in a fast, efficient and reliable way. Who is going to run the repository?
There are no doubts that it has to be available in the WEB, accessible continuously
by each authorised partners. Which software should be used? BSR, Basic Semantic
Repository is an interesting approach which already provides synonyms in different
languages. Topic Maps or the inheritable information architecture have also
to be considered.
The work is continuously going on. We have decided to start with some
elementary communication structures that are most frequently used between
in the data exchange between hospitals and physicians offices, the admission
and the discharge reports. This requires also to define and to structure data
items out of the medical record. Here also the Patient Record Architecture,
developed in HL7 will be taken into consideration.
In the presentation more detailed results of the ongoing work will be
reported.
Bibliography
| [1] | ISIS European XML/EDI Project Recommendations for standardisation
of XML /EDI www.tieke.fi/isis |
| [2] | Alan Kotok (Ed.) White Paper on Global XML Repositories for
XML / EDI The XML/EDI Group Febr. 1999, http://www.xmledi.com/repository/ |
| [3] | Newcomb, Steven R. XML Vocabularies; Opportunities for Efficiency
and Reliability |