Using XML in a Component Based Mediation Architecture for the Integration of Applications
ABSTRACT
Allowing exchange of information and cooperation among network-wide distributed and heterogeneous applications is a major need of current healthcare information systems. It forces the development of open and modular integration architectures. Major issues in the development include defining a flexible and robust federation model, developing interaction and communication facilities as well as the mechanism insuring semantic interoperability.
We developed, in the SynEx European project, a mediation architecture composed by generic and reusable software components to ease the construction of any integration platform. The Pilot and the Mediator Service components facilitate the execution of services and the communication with heterogeneous source systems. The Semantic Model component ensures the meaningful transformation of information. These components have been tested within the SynEx project to construct integration platforms on different partners’ sites. The use of XML has enhanced the flexibility of the components and largely facilitated the development.
Table of Contents
1. Introduction
Healthcare is a domain where huge volumes of data are generated from hospitals, primary care surgeries, clinics, and laboratories every day. Information Technology has been used to develop increasing number of systems and applications to generate, maintain and reuse this information. The single doctor-patient relationship is being replaced by one in which the patient is managed by a team of healthcare professionals each specialized in one aspect of care. In order to improve the quality of care and to reduce cost, such shared care critically depends on the ability to share information easily between care providers. Co-operation among different medical information systems within a regional, national or even international scope is strongly required. However, these systems, either for information management or decision support, are developed progressively and in isolation. They do not communicate with each other. Redeveloping all the applications in a new standard way seems to solve the problem but is an expensive process and wastes resources. Integration becomes the only flexible solution. The integration of existing systems into an open architecture provides users with a uniform access to heterogeneous and distributed information whatever their natures and locations are. It provides a mean to collect information from disparate and heterogeneous systems and gives users a uniform access to the information by hiding the aspects of distribution and heterogeneity [1]. In other words, applications will then be able to interact with users as a federation of autonomous systems in a seamless way.
The process of integrating distributed applications remains a technical challenge. Applications that manage data and/or knowledge may run on multiple operating systems, use different types of inter-process communication mechanisms or different communication protocols.
Beyond the communication problems, medical information itself is by nature complex, combined with data and knowledge. Data located in the distributed systems can be coded in different ways, using different units (e.g., the value "sex", can be coded as an integer [0 | 1] or as a symbol ["M" | "F"]); it can be represented by different data models; it can also reference to different dictionaries, nomenclatures or classifications (e.g., ICD9-CM, ICD10, Mesh, SNOMED International, UMLS, etc.) [2]. Medical knowledge such as the semantic relations among medical concepts can be represented in different ways. Some is often not well structured and is always mixed with the live data, such as the representation of uncertainty or imprecision.
The design and development of an integration architecture should manage the above issues. The technical interoperability should be realized by ensuring the seamless access and communication to the distributed systems through a uniformed user interface. At the information level, the syntactic, structural and semantic interoperability should be handled in a flexible and extensible manner [3].
In order to construct an integration platform, an integration architecture need to be defined first. Generic components managing the technical and semantic interoperability are needed. These components will facilitate the interaction and communication among different application components, will allow integrating new and legacy applications dynamically, and will provide the semantic coherence of information coming from different systems.
Within the context of the SynEx European project [4], we developed the Pilot component which guarantees the identification, synchronization and execution of existing services on the platform. We developed also the Mediator Service component which provides a generic solution that simplifies the communication and meaningful transformation of information. A Semantic Model can be used within the Mediator Service which ensures the semantic coherence.
In this paper, general background on the integration architectures is first presented. Then we describe a multi-agents integration architecture, within which both the Pilot and the Mediator Service components have been tested on an integration platform for medical applications. The use of XML is then highlighted and further discussed. Some future developments will be figured out by the end of the paper.
2. Background
An integration architecture should support the integration of components that either manage information/data or provide services. Integration is considered at the interface, communication and data/knowledge levels. The integration of a component based on a given standard should be performed dynamically, complying with the “plug and play" paradigm. Existing projects gave experience on building integration architectures.
2.1. The evolution of architectures
At the interface and communication levels, the client/server paradigm has been extended to embed the distributed and heterogeneous clients and servers using the network technology. An example is the HERMES integration model [5].
Three-tier middle-ware architectures supplanted the original two-tier client/server architectures because of their shareability and connectivity. Two paradigms of exchange co-exist: exchange through objects, so called "object oriented middleware", CORBA compliant brokers as an example (e.g., used in the OpenLabs project [6]), and exchange through messages, so called "message oriented middleware", by defining a set of standards which allow different healthcare information systems to exchange messages that carry data (HL7, ASN.1, ASTM, DICOM, UN/EDIFACT, etc.) (e.g., used in the HELIOS project [7]). In the HELIOS project, EDIFACT messages are transported through a software bus connecting the basic components with an object-oriented database management system where the functions of the software components can be addressed [8]. XML, often termed "the language of e-commerce," is the latest technology for exchanging Web data. It has already been embraced by numerous business sectors for its efficiency in presenting complex forms of data.
In the 90’s, mediation has become a central issue of integration architectures. Wiederhold has formalized this concept with the idea to facilitate the formulation of queries and the grouping of answers [9]. Different categories of mediators have been proposed to break a query into elementary sub-queries, then access and merge data from multiple databases. Various implementations of mediators can be found in the TSIMMIS [10], IDIMS [11]and InfoSleuth projects [12]. At the implementation level, agent technologies bring dynamicity to different categories of mediators. A multi-agents approach supports the decomposition of a service into elementary functions. Specific agents are defined for each function and may use associated mediators. The main advantage of this approach is the reusability of the agents in different situations. Examples are the RETSINA [13]and the InfoSleuth projects.
2.2. Data/Knowledge Integration
At the data/knowledge level, the integration task refers to giving users a homogenous view of the heterogeneous information coming from disparate systems. Data warehouses have been used in the WHIPS project [14] for the data migration purpose. Meta-models such as the Object Exchange Model (OEM)of the TSIMMIS project [10]or the "Medical Concept Library" of the HELIOS project [8]facilitate the constitution of federated services. In the Synapses project [17], the approach is to base the sharing of data on a common data model, the Synapses Object Model (SynOM) which provides an aggregation mechanism to construct the Federated Healthcare Record (FHCR). It does not provide a single data repository bringing disparate data together, but provide a uniform view of the data at the federation layer. Underlying data models of the participating information systems can be mapped into the common data model.
A middleware can guarantee that the exchange is successfully conveyed to the right component, it does not guarantee that the sent information is meaningful for its receiver [2]. To bring the semantic interoperability, one approach is the use of "wrappers" between heterogeneous data sources and mediators. Each wrapper provides the mapping from a mediator’s integrated view to its specific data source view (e.g., TSIMISS [15]). Another approach is the use of a domain specific ontology on the integration platform. An ontology represents the vocabulary of the related domain and relations among the terminology of this vocabulary. An inference module is associated with the ontology to make concepts of different systems match (e.g., InfoSleuth [12]). Finally, the GALEN Common Reference Model [19]is a well known reusable, application independent and language independent ontology of medical concepts and terminology. It provides a set of building blocks and constraints, from which concepts can be composed.
2.3. The SynEx project
The SynEx project, initiated in July 1998, federates a group of researchers devoted to the development of an open and standard integration platform in which both new and legacy medical applications can easily exchange information. The platform makes possible the collaboration of distributed and heterogeneous healthcare records and services, and aims at providing access to Hospital Information Services, to remote sources of medical data and to medical knowledge.
The following schema (Figure 1) shows the expected result of the project: information located in distributed systems can be integrated on a SynEx integration platform according to a common federation model; this information can then be exchanged among different SynEx servers through XML messages; a user can retrieve medical information, access medical services through any SynEx server via internet whatever their natures and locations are.
The SynEx integration architecture integrates results of several European projects. Among these components, the Distributed Healthcare Environment (DHE) is a middleware implementation of the prENV 12967-1 European Standard Healthcare Information System Architecture (HISA) [16]. The Synapses FHCR model [17] is invested into the SynEx project and to be used as the federation model on the SynEx integration platform. The GALEN ontology is also found on the SynEx platform to give the terminology support.
To make all the components work together and perform their own functionality, to ease the meaningful communication among all related systems including middleware components and legacy applications, to make the user easily retrieve information through this platform, the SynEx platform requires additional components to complete the integration tasks.
The work presented in this paper has been done within this context and will complete the SynEx integration platform.
3. Methods
3.1. Architecture
The construction of an integration architecture requires the following developments:
-
User interface: supplies users with a uniform access to the heterogeneous information integrated by the system.
-
Federation model: provides an integrated data structure above all the different data models used in heterogeneous sources.
-
Intelligent broker: directs a user query to one or several components which perform the corresponding services.
-
Interpretation tools: manage the syntactic and semantic interoperability of different representations of information.
-
Communication tools: perform the connection and communication functions among components by using their proprietary communication protocols or APIs.
The following schema (Figure 2) illustrates the mediation architecture where the generic components that we have developed are tested: the Pilot as an intelligent broker and the mediators generated by the Mediator Service as both the interpretation tool and the communication tool. Web pages are used as user interfaces.
3.2. The SynEx federation model
The Synapses FHCR model has been chosen as the federation model on the SynEx platform. It is a further development of the European Standard ENV-12265 Electronic Health Care Records (EHCR). It has defined an Object Model (the SynOM) and an Object Dictionary (the Synapses Object Dictionary (SynOD)). The Figure 3 below shows the architecture of the model.
XML has the power to become the independent data exchange format of the future. The use of XML to exchange data between heterogeneous systems provides support for hierarchically structured patient data, user defined tags and machine-understandable assertions for searching, reasoning and analysing healthcare information like federated healthcare record objects.
An XML Document Type Definition (DTD), called the SynEx Markup Language (SynExML) (see Figure 4), has been defined within the SynEx project to be used for inter-site exchange of FHCR information. It is based on the generic FHCR model.
3.3. The Pilot component
The interaction between a user and the integration platform is controlled by the Pilot component. It proposes to a client a set of predefined high level services. Each service corresponds to a complex request that interrogates different information sources.
To perform this, a service is broken up into several elementary steps. Each step corresponds to a sub-query for a remote component. At runtime, the Pilot synchronizes the execution of different steps dynamically by using the description of the services and steps (see Figure 2) stored as an XML file (see Figure 5). This allows the Pilot to be highly generic and configurable for the construction of any integration platform.
The Pilot is multi-agents based. Each step is executed through an agent which gathers the parameters to complete each sub-query, sends them to the related data source and retrieves the response. In case of syntactic or semantic problems, a mediator can be associated with an agent (see Figure 2).
The services and steps are performed in a multi-threading mode. This allows several independent steps to be executed at the same time and therefore reduce the execution time of a service and increase the efficiency of the Pilot.
The response to the clients is in the XML format. The federation model on the integration platform can be presented to the Pilot through an XML DTD. Thus one can use the Pilot on different integration platforms working with different federation models by just giving to it their DTDs (see Figure 2). The ability to federate information by using the DTD of the federation model is supported by the use of mediators.
3.4. The Mediator Service component
Mediators aim of making heterogeneous systems communicate. They can be used in a "standalone" mode or in a mediation architecture. In the underlying architecture, mediators are used by the agents to bring various legacy applications or systems into an open environment and to ensure the interoperability among the systems. This global interoperability consists of the technical and semantic aspects. Mediators thus encapsulate the communication mechanism and can translate the data structures and terminology.
The Mediator Service is a component that proposes:
-
a generic model allowing specific mediators to be generated by specialization.
-
some ready-to-use specializations for well-defined situations in order to simplify the development process.
As shown on the Figure 6, a mediator connects a Source system and a Target system through an Interface. The Intermediate Representation (IR) serves as an intermediate repository during the transformation of information inside the mediator.
From the developer’s point of view, the specialization of an interface consists in associating a combination of a specialized CommunicationManager which encapsulates the communication API of the system, a specialized SyntaxManager which implements the syntax encoding and decoding processes between the Intermediate Representation and the connected system and a SemanticManager which is responsible for the meaningful transformation of information between the two systems using different nomenclatures and data format (see Figure 7).
Some ready-to-use specializations are provided by the Mediator Service for the well-defined communication protocols (e.g., sockets, file system, ODBC) and well-defined syntax formats (e.g., EDIFACT, XML). The SemanticManager can also be specialized by associating with different "mapping" processes for making correspondence the terms used in the Source and the Target systems. For the one to one mappings, the SemanticManager will simply use a mapping table. For the more complex cases, it will use the Semantic Model developed within the project.
As an example of the ready-to-use specializations, the XMLManager is a specialized Syntax Manager for the XML-compliant systems.
-
To decode, an XML parser is used to parse XML messages received from the system. The results are stored in the Intermediate Representation of the mediator.
-
To encode, an XML message generator is used to generate XML messages for a receiver system. This generation of messages is based on both the Intermediate Representation and the DTD of the receiver system. Thus the generated XML message contains the data of the sender system (provided by the IR) and the structure of the receiver system (provided by the DTD).
Mapping information is also needed during the generation of XML messages. This information as well as the DTD of the receiver system are stored externally to the program. This approach makes the XMLManager generic, reusable and independent to both systems. To address a new XML-compliant system, users only need to send to the XMLManager the DTD of the system and to define the new mapping information. In the underlying mediation architecture, this has been used to transform different source information into XML messages according to the SynExML DTD.
3.5. The Semantic Model
The Semantic Manager can be specialized by using different mapping mechanisms. A mapping table is enough for a one to one mapping. It makes a term of one systems correspond to an intermediate term. It then makes this later correspond to a term of another system. For a more complex example such as mapping "Body Mass Index (BMI)" into "Weight/(Height*Height)", this method is not satisfying since the "mapping of terms" does not take into account the "sense" of the transformed information.
An ontology can be used during the mapping process. A source term is mapped first into one or several concepts of the ontology, then into one or several target terms. The mapping through a shared ontology is a "mapping of sense". It allows the Semantic Manager to realize the "one to one", "one to many", "many to many" correspondences. It thus allows the mediators to perform the transformation by taking into account the following complicated situations: same term, different meanings; different terms, same meaning; different terms, different meanings, but close.
An ontology represents the Concepts and their Relations of a domain, they are formalized in different representation languages. To integrate different ontologies into the Semantic Manager of the mediator model, we developed a generic Semantic Model representing the structure of an ontology. Terms used by a system are then organized in a Dictionary. An ontology can be linked to several dictionaries. A Term can be linked to an existing Concept or be described as a combination of several Concepts of the ontology. We also developed a Terminology Server whose API can be integrated by the Semantic Manager of the mediator to process the mapping through the Semantic Model.
XML is used as the representation language for both ontologies and terminologies. Thus the Semantic Model can be instantiated for the construction of any ontology.
4. Results
The Pilot and the Mediator Service as well as the Semantic Model components have been developed in Java. The IBM XML parser has been integrated in all these components for the XML processing.
An integration platform has been developed with these components as a demonstration in the Broussais University Hospital of Paris where medical sources (patient records, services, knowledge bases, …) have been integrated progressively. The following scenario, shown on the Figure 8 describes the use of such a platform achieved by the Pilot and the mediators. In this scenario, the related data sources are: the Authorization module, the DHE [16]based Master Patient Index and the Lab Result database.
Through an HTML page, a physician sends his/her request to the platform to retrieve information concerning a patient’s cardiology consultations. The only given parameter is the name of the patient.
The Pilot receives this request and finds a corresponding high-level service. Several steps are then instantiated dynamically to perform the sub tasks:
-
Step 1, Identification of the user: An agent is instantiated to check the authorization status for this user on the required service.
-
Step 2, Identification of the patient: An agent is instantiated to retrieve the patient identity and other administrative information stored in the DHEbased Master Patient Index. A DHE specific mediator is instantiated dynamically.
-
Step 3, Retrieving of the relevant lab results of the patient: An agent is instantiated, with the found identity of the patient, to interrogate the Lab Result database.
-
Step 4, Construction of response to the user: The results (administration information from DHE Master Patient Index, exam results from Lab Result database) are federated into the federation model used by the Pilot according to the SynExML DTD. This result (see Figure 9) is then sent to the client.
Such result can also be easily exchanged with other SynEx sites since it conforms to the common exchange format, the SynExML DTD. The Figure 10 below shows that this result is visualised and integrated on another SynEx site.
5. Discussion
The Pilot and the Mediator Service components provide generic tools to construct integration platforms. They are designed and implemented in a flexible and extensible way, with the objectives of minimizing development effort and fostering re-use. Moreover, both of them can be used as standalone applications outside an integration context.
On our integration platform, the Pilot component serves as an intelligent broker. It mainly handles the aspects of process. The multi-agents approach gives it the ability to dynamically integrate new components. The mediators mainly manage the aspects of communication based on the message passing paradigm. The Mediator Service allows different mediators to be created in a flexible manner through specialization.
The meaningful transformation is ensured at a semantic level in each mediator through the Semantic Model component. This last component allows the mapping among different vocabularies through a shared ontology.
The XML technology is used in all these components to enhance their independence from the federation model and from the integrated systems. The data retrieved from a data source can be structured in the Mediator respecting to any target schema thanks to the use of XML DTD. Both the Pilot and the Semantic Model are generic and configurable by using XML descriptions. Finally, multiple results coming from distributed systems can be federated into an XML message according to a common federation model.
However, some limitations and possible improvements should be pointed out. The Intermediate Representation used inside the Mediator model is a list of attribute-value which is not sufficient and flexible enough. Potential solutions to the structural problem can be inspired by the results of other projects. In the HELIOS project, a meta-model was used as an internal representation to explicitly declare the semantic correspondences [18]. Both in TSIMMIS and IDIMS projects, a tagged object exchange model OEM in which objects have labels, types, values and an optional identifier also acts as an intermediate representation [10]. We also argue that XML itself could be a candidate format.
Moreover, the SemanticManager may be extended by integrating the access to other terminology servers such as GALEN [19].
Besides the semantic interoperability, short term improvements are planned. The next version of the Pilot component will integrate functionality to dynamically update the description of services and steps. The Mediator Service will be extended to have more additional tools such as audit trail or monitoring tools, automation or configuration tools, to ease again the development process.
Acknowledgements
The authors would like to thank all the members of the SynEx Consortium for their contribution to this research. This work has received financial support from the Commission of the European Union (Telematics Applications Programme - project HC 4020).
Bibliography
Glossary
- BMI
-
Body Mass Index
- DHE
-
Distributed Healthcare Environment
- DTD
-
Document Type Definition
- EHCR
-
Electronic Health Care Records
- FHCR
-
Federated Healthcare Record
- HISA
-
Healthcare Information System Architecture
- OEM
-
Object Exchange Model
- SynExML
-
SynEx Markup Language
- SynOD
-
Synapses Object Dictionary
- SynOM
-
Synapses Object Model


