XML Europe 2001 logo21-25 May 2001
Internationales Congress Centrum (ICC)
Berlin, Germany

Using XML in a Component Based Mediation Architecture for the Integration of Applications

Yigang Xu <xu@hegp.bhdc.jussieu.fr>
Patrice Degoulet
 PDF version    Latest version   

ABSTRACT

Allowing exchange of information and cooperation among network-wide distributed and heterogeneous applications is a major need of current healthcare information systems. It forces the development of open and modular integration architectures. Major issues in the development include defining a flexible and robust federation model, developing interaction and communication facilities as well as the mechanism insuring semantic interoperability.

We developed, in the SynEx European project, a mediation architecture composed by generic and reusable software components to ease the construction of any integration platform. The Pilot and the Mediator Service components facilitate the execution of services and the communication with heterogeneous source systems. The Semantic Model component ensures the meaningful transformation of information. These components have been tested within the SynEx project to construct integration platforms on different partners’ sites. The use of XML has enhanced the flexibility of the components and largely facilitated the development.

Table of Contents

1. Introduction

Healthcare is a domain where huge volumes of data are generated from hospitals, primary care surgeries, clinics, and laboratories every day. Information Technology has been used to develop increasing number of systems and applications to generate, maintain and reuse this information. The single doctor-patient relationship is being replaced by one in which the patient is managed by a team of healthcare professionals each specialized in one aspect of care. In order to improve the quality of care and to reduce cost, such shared care critically depends on the ability to share information easily between care providers. Co-operation among different medical information systems within a regional, national or even international scope is strongly required. However, these systems, either for information management or decision support, are developed progressively and in isolation. They do not communicate with each other. Redeveloping all the applications in a new standard way seems to solve the problem but is an expensive process and wastes resources. Integration becomes the only flexible solution. The integration of existing systems into an open architecture provides users with a uniform access to heterogeneous and distributed information whatever their natures and locations are. It provides a mean to collect information from disparate and heterogeneous systems and gives users a uniform access to the information by hiding the aspects of distribution and heterogeneity [1]. In other words, applications will then be able to interact with users as a federation of autonomous systems in a seamless way.

The process of integrating distributed applications remains a technical challenge. Applications that manage data and/or knowledge may run on multiple operating systems, use different types of inter-process communication mechanisms or different communication protocols.

Beyond the communication problems, medical information itself is by nature complex, combined with data and knowledge. Data located in the distributed systems can be coded in different ways, using different units (e.g., the value "sex", can be coded as an integer [0 | 1] or as a symbol ["M" | "F"]); it can be represented by different data models; it can also reference to different dictionaries, nomenclatures or classifications (e.g., ICD9-CM, ICD10, Mesh, SNOMED International, UMLS, etc.) [2]. Medical knowledge such as the semantic relations among medical concepts can be represented in different ways. Some is often not well structured and is always mixed with the live data, such as the representation of uncertainty or imprecision.

The design and development of an integration architecture should manage the above issues. The technical interoperability should be realized by ensuring the seamless access and communication to the distributed systems through a uniformed user interface. At the information level, the syntactic, structural and semantic interoperability should be handled in a flexible and extensible manner [3].

In order to construct an integration platform, an integration architecture need to be defined first. Generic components managing the technical and semantic interoperability are needed. These components will facilitate the interaction and communication among different application components, will allow integrating new and legacy applications dynamically, and will provide the semantic coherence of information coming from different systems.

Within the context of the SynEx European project [4], we developed the Pilot component which guarantees the identification, synchronization and execution of existing services on the platform. We developed also the Mediator Service component which provides a generic solution that simplifies the communication and meaningful transformation of information. A Semantic Model can be used within the Mediator Service which ensures the semantic coherence.

In this paper, general background on the integration architectures is first presented. Then we describe a multi-agents integration architecture, within which both the Pilot and the Mediator Service components have been tested on an integration platform for medical applications. The use of XML is then highlighted and further discussed. Some future developments will be figured out by the end of the paper.

2. Background

An integration architecture should support the integration of components that either manage information/data or provide services. Integration is considered at the interface, communication and data/knowledge levels. The integration of a component based on a given standard should be performed dynamically, complying with the “plug and play" paradigm. Existing projects gave experience on building integration architectures.

2.1. The evolution of architectures

At the interface and communication levels, the client/server paradigm has been extended to embed the distributed and heterogeneous clients and servers using the network technology. An example is the HERMES integration model [5].

Three-tier middle-ware architectures supplanted the original two-tier client/server architectures because of their shareability and connectivity. Two paradigms of exchange co-exist: exchange through objects, so called "object oriented middleware", CORBA compliant brokers as an example (e.g., used in the OpenLabs project [6]), and exchange through messages, so called "message oriented middleware", by defining a set of standards which allow different healthcare information systems to exchange messages that carry data (HL7, ASN.1, ASTM, DICOM, UN/EDIFACT, etc.) (e.g., used in the HELIOS project [7]). In the HELIOS project, EDIFACT messages are transported through a software bus connecting the basic components with an object-oriented database management system where the functions of the software components can be addressed [8]. XML, often termed "the language of e-commerce," is the latest technology for exchanging Web data. It has already been embraced by numerous business sectors for its efficiency in presenting complex forms of data.

In the 90’s, mediation has become a central issue of integration architectures. Wiederhold has formalized this concept with the idea to facilitate the formulation of queries and the grouping of answers [9]. Different categories of mediators have been proposed to break a query into elementary sub-queries, then access and merge data from multiple databases. Various implementations of mediators can be found in the TSIMMIS [10], IDIMS [11]and InfoSleuth projects [12]. At the implementation level, agent technologies bring dynamicity to different categories of mediators. A multi-agents approach supports the decomposition of a service into elementary functions. Specific agents are defined for each function and may use associated mediators. The main advantage of this approach is the reusability of the agents in different situations. Examples are the RETSINA [13]and the InfoSleuth projects.

2.2. Data/Knowledge Integration

At the data/knowledge level, the integration task refers to giving users a homogenous view of the heterogeneous information coming from disparate systems. Data warehouses have been used in the WHIPS project [14] for the data migration purpose. Meta-models such as the Object Exchange Model (OEM)of the TSIMMIS project [10]or the "Medical Concept Library" of the HELIOS project [8]facilitate the constitution of federated services. In the Synapses project [17], the approach is to base the sharing of data on a common data model, the Synapses Object Model (SynOM) which provides an aggregation mechanism to construct the Federated Healthcare Record (FHCR). It does not provide a single data repository bringing disparate data together, but provide a uniform view of the data at the federation layer. Underlying data models of the participating information systems can be mapped into the common data model.

A middleware can guarantee that the exchange is successfully conveyed to the right component, it does not guarantee that the sent information is meaningful for its receiver [2]. To bring the semantic interoperability, one approach is the use of "wrappers" between heterogeneous data sources and mediators. Each wrapper provides the mapping from a mediator’s integrated view to its specific data source view (e.g., TSIMISS [15]). Another approach is the use of a domain specific ontology on the integration platform. An ontology represents the vocabulary of the related domain and relations among the terminology of this vocabulary. An inference module is associated with the ontology to make concepts of different systems match (e.g., InfoSleuth [12]). Finally, the GALEN Common Reference Model [19]is a well known reusable, application independent and language independent ontology of medical concepts and terminology. It provides a set of building blocks and constraints, from which concepts can be composed.

2.3. The SynEx project

The SynEx project, initiated in July 1998, federates a group of researchers devoted to the development of an open and standard integration platform in which both new and legacy medical applications can easily exchange information. The platform makes possible the collaboration of distributed and heterogeneous healthcare records and services, and aims at providing access to Hospital Information Services, to remote sources of medical data and to medical knowledge.

The following schema (Figure 1) shows the expected result of the project: information located in distributed systems can be integrated on a SynEx integration platform according to a common federation model; this information can then be exchanged among different SynEx servers through XML messages; a user can retrieve medical information, access medical services through any SynEx server via internet whatever their natures and locations are.

Figure 1: The SynEx project

The SynEx integration architecture integrates results of several European projects. Among these components, the Distributed Healthcare Environment (DHE) is a middleware implementation of the prENV 12967-1 European Standard Healthcare Information System Architecture (HISA) [16]. The Synapses FHCR model [17] is invested into the SynEx project and to be used as the federation model on the SynEx integration platform. The GALEN ontology is also found on the SynEx platform to give the terminology support.

To make all the components work together and perform their own functionality, to ease the meaningful communication among all related systems including middleware components and legacy applications, to make the user easily retrieve information through this platform, the SynEx platform requires additional components to complete the integration tasks.

The work presented in this paper has been done within this context and will complete the SynEx integration platform.

3. Methods

3.1. Architecture

The construction of an integration architecture requires the following developments:

The following schema (Figure 2) illustrates the mediation architecture where the generic components that we have developed are tested: the Pilot as an intelligent broker and the mediators generated by the Mediator Service as both the interpretation tool and the communication tool. Web pages are used as user interfaces.

Figure 2: A multi-agents mediation architecture

3.2. The SynEx federation model

The Synapses FHCR model has been chosen as the federation model on the SynEx platform. It is a further development of the European Standard ENV-12265 Electronic Health Care Records (EHCR). It has defined an Object Model (the SynOM) and an Object Dictionary (the Synapses Object Dictionary (SynOD)). The Figure 3 below shows the architecture of the model.

Figure 3: The Synapese FHCR Object Model

XML has the power to become the independent data exchange format of the future. The use of XML to exchange data between heterogeneous systems provides support for hierarchically structured patient data, user defined tags and machine-understandable assertions for searching, reasoning and analysing healthcare information like federated healthcare record objects.

An XML Document Type Definition (DTD), called the SynEx Markup Language (SynExML) (see Figure 4), has been defined within the SynEx project to be used for inter-site exchange of FHCR information. It is based on the generic FHCR model.

Figure 4: The SynExML DTD

3.3. The Pilot component

The interaction between a user and the integration platform is controlled by the Pilot component. It proposes to a client a set of predefined high level services. Each service corresponds to a complex request that interrogates different information sources.

To perform this, a service is broken up into several elementary steps. Each step corresponds to a sub-query for a remote component. At runtime, the Pilot synchronizes the execution of different steps dynamically by using the description of the services and steps (see Figure 2) stored as an XML file (see Figure 5). This allows the Pilot to be highly generic and configurable for the construction of any integration platform.

Figure 5: Example of a configuration file used by the Pilot

The Pilot is multi-agents based. Each step is executed through an agent which gathers the parameters to complete each sub-query, sends them to the related data source and retrieves the response. In case of syntactic or semantic problems, a mediator can be associated with an agent (see Figure 2).

The services and steps are performed in a multi-threading mode. This allows several independent steps to be executed at the same time and therefore reduce the execution time of a service and increase the efficiency of the Pilot.

The response to the clients is in the XML format. The federation model on the integration platform can be presented to the Pilot through an XML DTD. Thus one can use the Pilot on different integration platforms working with different federation models by just giving to it their DTDs (see Figure 2). The ability to federate information by using the DTD of the federation model is supported by the use of mediators.

3.4. The Mediator Service component

Mediators aim of making heterogeneous systems communicate. They can be used in a "standalone" mode or in a mediation architecture. In the underlying architecture, mediators are used by the agents to bring various legacy applications or systems into an open environment and to ensure the interoperability among the systems. This global interoperability consists of the technical and semantic aspects. Mediators thus encapsulate the communication mechanism and can translate the data structures and terminology.

The Mediator Service is a component that proposes:

As shown on the Figure 6, a mediator connects a Source system and a Target system through an Interface. The Intermediate Representation (IR) serves as an intermediate repository during the transformation of information inside the mediator.

Figure 6: The generic model of a mediator

From the developer’s point of view, the specialization of an interface consists in associating a combination of a specialized CommunicationManager which encapsulates the communication API of the system, a specialized SyntaxManager which implements the syntax encoding and decoding processes between the Intermediate Representation and the connected system and a SemanticManager which is responsible for the meaningful transformation of information between the two systems using different nomenclatures and data format (see Figure 7).

Figure 7: The inside view of a Mediator

Some ready-to-use specializations are provided by the Mediator Service for the well-defined communication protocols (e.g., sockets, file system, ODBC) and well-defined syntax formats (e.g., EDIFACT, XML). The SemanticManager can also be specialized by associating with different "mapping" processes for making correspondence the terms used in the Source and the Target systems. For the one to one mappings, the SemanticManager will simply use a mapping table. For the more complex cases, it will use the Semantic Model developed within the project.

As an example of the ready-to-use specializations, the XMLManager is a specialized Syntax Manager for the XML-compliant systems.

Mapping information is also needed during the generation of XML messages. This information as well as the DTD of the receiver system are stored externally to the program. This approach makes the XMLManager generic, reusable and independent to both systems. To address a new XML-compliant system, users only need to send to the XMLManager the DTD of the system and to define the new mapping information. In the underlying mediation architecture, this has been used to transform different source information into XML messages according to the SynExML DTD.

3.5. The Semantic Model

The Semantic Manager can be specialized by using different mapping mechanisms. A mapping table is enough for a one to one mapping. It makes a term of one systems correspond to an intermediate term. It then makes this later correspond to a term of another system. For a more complex example such as mapping "Body Mass Index (BMI)" into "Weight/(Height*Height)", this method is not satisfying since the "mapping of terms" does not take into account the "sense" of the transformed information.

An ontology can be used during the mapping process. A source term is mapped first into one or several concepts of the ontology, then into one or several target terms. The mapping through a shared ontology is a "mapping of sense". It allows the Semantic Manager to realize the "one to one", "one to many", "many to many" correspondences. It thus allows the mediators to perform the transformation by taking into account the following complicated situations: same term, different meanings; different terms, same meaning; different terms, different meanings, but close.

An ontology represents the Concepts and their Relations of a domain, they are formalized in different representation languages. To integrate different ontologies into the Semantic Manager of the mediator model, we developed a generic Semantic Model representing the structure of an ontology. Terms used by a system are then organized in a Dictionary. An ontology can be linked to several dictionaries. A Term can be linked to an existing Concept or be described as a combination of several Concepts of the ontology. We also developed a Terminology Server whose API can be integrated by the Semantic Manager of the mediator to process the mapping through the Semantic Model.

XML is used as the representation language for both ontologies and terminologies. Thus the Semantic Model can be instantiated for the construction of any ontology.

4. Results

The Pilot and the Mediator Service as well as the Semantic Model components have been developed in Java. The IBM XML parser has been integrated in all these components for the XML processing.

An integration platform has been developed with these components as a demonstration in the Broussais University Hospital of Paris where medical sources (patient records, services, knowledge bases, …) have been integrated progressively. The following scenario, shown on the Figure 8 describes the use of such a platform achieved by the Pilot and the mediators. In this scenario, the related data sources are: the Authorization module, the DHE [16]based Master Patient Index and the Lab Result database.

Figure 8: Example of a scenario

Through an HTML page, a physician sends his/her request to the platform to retrieve information concerning a patient’s cardiology consultations. The only given parameter is the name of the patient.

The Pilot receives this request and finds a corresponding high-level service. Several steps are then instantiated dynamically to perform the sub tasks:

Figure 9: Example of federated information

Such result can also be easily exchanged with other SynEx sites since it conforms to the common exchange format, the SynExML DTD. The Figure 10 below shows that this result is visualised and integrated on another SynEx site.

Figure 10: Integration of Broussais patient record on another SynEx site

5. Discussion

The Pilot and the Mediator Service components provide generic tools to construct integration platforms. They are designed and implemented in a flexible and extensible way, with the objectives of minimizing development effort and fostering re-use. Moreover, both of them can be used as standalone applications outside an integration context.

On our integration platform, the Pilot component serves as an intelligent broker. It mainly handles the aspects of process. The multi-agents approach gives it the ability to dynamically integrate new components. The mediators mainly manage the aspects of communication based on the message passing paradigm. The Mediator Service allows different mediators to be created in a flexible manner through specialization.

The meaningful transformation is ensured at a semantic level in each mediator through the Semantic Model component. This last component allows the mapping among different vocabularies through a shared ontology.

The XML technology is used in all these components to enhance their independence from the federation model and from the integrated systems. The data retrieved from a data source can be structured in the Mediator respecting to any target schema thanks to the use of XML DTD. Both the Pilot and the Semantic Model are generic and configurable by using XML descriptions. Finally, multiple results coming from distributed systems can be federated into an XML message according to a common federation model.

However, some limitations and possible improvements should be pointed out. The Intermediate Representation used inside the Mediator model is a list of attribute-value which is not sufficient and flexible enough. Potential solutions to the structural problem can be inspired by the results of other projects. In the HELIOS project, a meta-model was used as an internal representation to explicitly declare the semantic correspondences [18]. Both in TSIMMIS and IDIMS projects, a tagged object exchange model OEM in which objects have labels, types, values and an optional identifier also acts as an intermediate representation [10]. We also argue that XML itself could be a candidate format.

Moreover, the SemanticManager may be extended by integrating the access to other terminology servers such as GALEN [19].

Besides the semantic interoperability, short term improvements are planned. The next version of the Pilot component will integrate functionality to dynamically update the description of services and steps. The Mediator Service will be extended to have more additional tools such as audit trail or monitoring tools, automation or configuration tools, to ease again the development process.

Acknowledgements

The authors would like to thank all the members of the SynEx Consortium for their contribution to this research. This work has received financial support from the Commission of the European Union (Telematics Applications Programme - project HC 4020).

Bibliography

[1] Ben-Shaul I, Gish JW, Robinson W, An Integrated Network Component Architecture, IEEE Software, September/October 1998: 79-87.
[2] Degoulet P, Sauquet S, Jaulent MC, Zapletal E, Lavril M, Rational and Design Considerations for a Semantic Mediator in Health Information Systems, Meth Inform Med, 1998; 37: 518-526.
[3] Manola F, Interoperability issues in Large-Scale Distributed Object Systems, ACM Computing Surveys , Vol. 27, No. 2, June 1995: 268-270.
[4] The SynEx project: http://www.gesi.it/synex.html.
[5] van Mulligen EM, A Flexible Approach to Client-Server Computing, MEDINFO 95 Proc , 1995:195-199.
[6] Grimson W, Grimson J, Groth T, O’Moore R, Wade V, The Openlabs Approach to Clinical Laboratory Computing, MEDINFO 95 Proc , 1995:372-376.
[7] Jean FC, Engelmann U, Sauquet D, Lavril M, Schröter A, Degoulet P, The HELIOS Medical Connection Services, Comput. Methods Programs Biomed, 1994; 45, Suppl. : S117-S126.
[8] Lavril M, Doré L, Zapletal E, Jean FC, Degoulet P, A Reuse Oriented Development Database: The HELIOS Object Information System, Comput. Methods Programs Biomed, 1994; 45, Suppl. : S35-S45.
[9] Wiederhold G, Mediators in the architecture of future information systems, IEEE Computor , 1992; 25 (3): 38-49.
[10] Papakonstantinou Y, Garcia-Molina H, Object Exchange Across Heterogeneous Information Sources, in : Engineering Conference , Taipei, Taiwan, March, 1995.
[11] Panchapagesan B, Hui Joshua, Wiederhold G, Erickson S, Dean L, The INEEL Data Integration Mediation System, May 1997: http://www-db.stanford.edu/LIC/ERIS/erisii.htm.
[12] Fowler J, Perry B, Nodine MH. Agent-Based Semantic Interoperability in InfoSleuth, SIGMOD Record , 1999; 28-1: 60-67.
[13] Sycara K, Decker K, Pannu A, Williamson M, Zeng D, Distributed Intelligent Agents, IEEE Expert , Dec 1996: http://www.cs.cmu.edu/~softagents/papers/ieee-agents96.pdf.
[14] Labio, WJ, Zhuge Y, Wiener JL, Guupta H, Garcia-Molina H, Widom J, The WHIPS Prototype for Data Warehouse Creation and Maintenance, ACM SIGMOD Record , 1997: 557-559.
[15] Hammer J, Garcia-Molina H, Nestorov S, Yerneni R, Breunig M, Vassalos, Template-Based Wrappers in the TSIMMIS System, http://WWW-DB.Stanford.EDU/tsimmis/publications.html.
[16] Ferrara FM. The middleware-based architectural approach for opening and evolving healthcare information systems, Medical Informatics Europe ’96, IOS Press, 1996: 264-270.
[17] Hurlen P, Skifjeld K, Design and functional specification of the Synapses Federated Healthcare Record Server, Medical Informatics Europe ’97, 1997: 334-338.
[18] Doré L, Lavril M, Jean FC, Degoulet P, An Object Oriented Computer-Based Patient Record Reference Model, Proc Annu Symp Comput Appl Med Care , 1995: 377-381.
[19] Galen: http://www.cs.man.ac.uk/mig/galen.

Glossary

BMI

Body Mass Index

DHE

Distributed Healthcare Environment

DTD

Document Type Definition

EHCR

Electronic Health Care Records

FHCR

Federated Healthcare Record

HISA

Healthcare Information System Architecture

OEM

Object Exchange Model

SynExML

SynEx Markup Language

SynOD

Synapses Object Dictionary

SynOM

Synapses Object Model

Biography

Yigang Xu
Medical Informatics Department, Broussais University Hospital
Paris
France
Email: xu@hegp.bhdc.jussieu.fr

Yigang Xu - She is now a PhD student in Medical Informatics in the University of Paris VI. She has worked in the Medical Informatics Department (MID) of Broussais University Hospital in Paris since 1996. Her research interests are mainly on the middleware technology, on mediation architectures and on semantic interoperability.

Patrice Degoulet
Medical Informatics Department, Broussais University Hospital
Paris
France

Patrice Degoulet - He received his M.D. from the Broussais-Hotel-Dieu Hospital in Paris, France in 1977 and his Ph.D. degree from the University of Paris, in 1984. He is actually Professor in Broussais-Hotel-Dieu school of Medicine and the head of the MID of Broussais University Hospital in Paris. His current research interests are conceptual modelling in medicine, data and knowledge base management, hospital information systems and medical record management. He is in charge of the Integration and Communication System of the Georges Pompidou University Hospital (Paris).