Impact of an XML repository in development of XML messages
Joachim Dudeck
Udo Altmann
Find


Abstract
Without common semantic definitions, there is no guarantee that senders and receives will interpret content in the same way. As such definitions are not provided by DTD's or schemas, they have to be defined in external databases, such as tag repositories. Without such repositories, no standardisation of messages or documents will be really possible. In the described German project, the semantic content of a repository using the xDT proprietary standard is transferred into an XML repository and is used to establish XML messages for the communication between physicians offices and hospital information systems. Problems of generalisation and specialisations will be described. The repository will be available on the Internet to improve and speed up the maintenance of the repository and to give the user continuous access to the newest versions. First experiences with this approach will be described and demonstrated.

Contents
XML is going to be used not only in the management of textual documents but also as an interchange format of messages to be exchanged between application systems in healthcare. It has already been decided that XML should be become the interchange format of version 3 messages in HL7 which has been demonstrated at HIMMS 1999 and 2000. DTD's have already been developed for HL7 version 2.3 messages. Also in the EDIFACT world and in particular in CEN TC 251 in healthcare a movement towards using XML as the preferred interchange format can be seen (XML / EDI projects and workshops [1]). The message for the exchange of healthcare record data, developed by project team PT 29, has successfully been mapped directly from UML into XML. Rules for directing this mapping process were developed. One of the first work items established in the new ISO TC 215 on standards in healthcare is dealing with this standardisation process.
Using XML as an interchange format of messages has several specific implications. Messages are used to transfer structured information in a matter that the receiver of the messages can directly store and integrate the transmitted information in his own systems. This requires that sender and receiver have agreed unequivocally upon how each information item is identified and represented in the message. This is the main task of the standard development organisations (SDO). In nearly all of the currently used communication standards like HL7, EDIFACT, X12 the information items in the messages are identified by the position in the message. In the HL7 standard e.g. is defined that the name of the patient is transferred in the 5th field of the PID segment and that the first name is the first component in this field, the first given name the second component etc. The gender is transmitted in field 8 of the same segment and it is represented by the characters f - female, m- male, o - others and u - unknown. Another specification defines that the in healthcare often used codes should be transmitted with the code value in the first position, the describing textual information in the second and the used code system in the third position. Nearly the complete HL7 standard consist of such definitions. Guided by the position in the message the receiver identifies the transmitted information items by the position, adapts, when necessary, the content to the representation in his database and stores the data in the database. Specific interface programs are required if an application system will make use of the advantages of a communication standard.
The situation is completely different when using XML as an interchange format. The position in a message is no longer of importance since the related tag, which marks up the content by the beginning and end tag, identifies the content of an information item. But for understanding each other it is also necessary to define in a generally available form what information content should be transmitted when using some specific tags. What is to be transmitted in a tag like <diagnose>, some textual information, some code or a complex of other information items? The DTD's define the syntactical structure of the document only. They do not provide any relationship to semantic information. One can e.g. see in the DTD that the content of <diagnose> consist of some other information items. Using XML schema one can also define some constraints like the representation codes for gender described above. But up to now nowhere is defined which semantic content should be transmitted within the start and end tags <diagnose></diagnose>. Using this tag it seems to be self-understanding what content is to be transmitted. But why should one not use the tag <diag> or <dgn> or just a number <t6121>? Which tag should be used to transmit the gender - <gender> or <sex> or even a number <t3105>? One can see that some agreements are required which have to be done in a generally applicable way. This is the task of an XML tag or more comprehensively describe in a semantic repository. The necessity of tag repositories have been described several times among others by the white paper of the XML/EDI group [2], by Steven Newcomb [3] etc.. The latter describes in particular, that also Namespaces cannot fulfill the tasks of a tag repository.
In Germany we have already some experiences using a tag repository as an integral part of a communication standard. In physicians office systems (POS) in Germany a special communication standard has been developed historically and is implemented in nearly all available systems, the so-called xDT (ADT, BDT, GDT, LDT etc.) standards. The xDT standard uses tags for identifying transmitted information items. A field in xDT is defined by the field length, the descriptor (tag) and the content. The descriptors consist of four digit numbers defined in a central repository that is run and maintained by a physicians reimbursement association (ZI). Each item has to be defined in the repository before it can be used within the standard. POS vendors and users receive regularly updates of this repository. It works quite successfully and has shown that tags have not to be self-understanding. If a comprehensive repository is continuously available also pure numeric tags can be used.
In Germany we now have the situation that there is a complete gap between hospitals and physicians office systems as far as communication standards are concerned. In hospitals, HL7 is the well-accepted standard, which is completely incompatible with the xDT standard of POS. Since the requirement for communication between hospitals and POS is increasing we had to consider how to overcome this communication gap in a standardised way. A working group has been established consisting of experts both from the HL7 and xDT side and from vendors and users of hospitals information systems and of POS. A European project has been launched within the framework of the ISIS program. The first decision, which was made as well as in the ISIS project proposal and in the working group, has been dealing with the interchange format. There was no doubts that we have to move to XML as the new standard for harmonising the HL7 and xDT standards. It was also decided that we do not intend to develop a new German standard. We have to be as close as possible related to the already available efforts in HL7, CEN TC 251 and ISO TC 215 by utilising the experiences gained with the xDT standards. On the other side we cannot wait until international initiatives are coming up. We have to proceed immediately since the vendors in particular of POS need the new standard as soon as possible not only for establishing the communication between POS and hospital information systems but also for developing networks for the exchange of information between POS. The described work is to become a pilot study to analyse the impact and the requirements of applying semantic repositories in XML.
In the first steps we tried to map the available repositories in particular the xDT repository, the HL7 data dictionary and the version 3 Reference Information Model (RIM). Several problems occurred. The xDT repository consists of atomic items only. Patients' first name, given name, title, for example, have distinct descriptors. It does not know any data aggregates like datatypes in HL7. In the HL7 data dictionary the complete field content is described only. Atomic items within data types are not integrated. This is also true in the RIM. In the XML repository one has to start with the tags of the atomic data items. In our first step of development we are now assembling the required atomic data items by analysing the three already described sources, the xDT repository, the HL7 data dictionary and datatype definitions and the RIM.
The repository has also to contain data aggregates like complex types in the new schema proposal. It is absolutely necessary that besides the semantic content of the atomic data items also the structure of frequently used data aggregates like names, addresses or the representation of coded values like diagnoses, procedures etc. has to be defined in a repository otherwise several of such constructs will be applied. This is not as serious as in HL7 as long as the standard tags are used. But the application of the standard will be much more convenient if also frequently used complex types are unequivocally defined in the repository.
Here also problems are coming up which have to be solved. Some components in HL7 datatypes should be represented as attributes in XML. Repeatable components are to be differently represented in XML than in the position oriented HL7 etc. Our first experiences have shown that the mapping of these three data sources is a time-consuming but solvable problem. The attempts we have made have convinced the participating partners that the development of a semantic repository is absolutely necessary if XML is to be used successfully as an interchange format for exchanging messages.
Another problem we will be dealing with is to scrutinise the necessity and application of structured models for messages which are composed of atomic data items and complex types unequivocally defined in a semantic repository. The most comprehensive approach is the HL7 reference information model RIM in which not only the data classes but also their relationships are distinctly defined. One issue, which we try to investigate, is to what extent such comprehensive models that require a tremendous time to be developed will be really required in the future. Since the position in a message does no longer have any importance for identifying information less complex models might be sufficient. It is still an open issue.
Other issues to be investigated are the representation of tags. Should we consider and allow tag synonyms for distinct semantic definitions like tags in different languages or numeric tags. How can new tags to be added in a fast, efficient and reliable way. Who is going to run the repository? There are no doubts that it has to be available in the WEB, accessible continuously by each authorised partners. Which software should be used? BSR, Basic Semantic Repository is an interesting approach which already provides synonyms in different languages. Topic Maps or the inheritable information architecture have also to be considered.
The work is continuously going on. We have decided to start with some elementary communication structures that are most frequently used between in the data exchange between hospitals and physicians offices, the admission and the discharge reports. This requires also to define and to structure data items out of the medical record. Here also the Patient Record Architecture, developed in HL7 will be taken into consideration.
In the presentation more detailed results of the ongoing work will be reported.
Bibliography
[1]ISIS European XML/EDI Project Recommendations for standardisation of XML /EDI www.tieke.fi/isis
[2]Alan Kotok (Ed.) White Paper on Global XML Repositories for XML / EDI The XML/EDI Group Febr. 1999, http://www.xmledi.com/repository/
[3]Newcomb, Steven R. XML Vocabularies; Opportunities for Efficiency and Reliability
Previous Previous Table of Contents