Analysing XML health records
Andrew Roberts FRCS DM
Find


Abstract
What started out as a quest for a way of auditing free text gradually turned into a full electronic patient record system. The XML information system in Oswestry has been live for the past two years and contains half a million documents covering 50,000 patients. As elsewhere in the world the British National Health Service is under tremendous pressure from rising patient expectations and the inflationary pressures of technological medical advances. Failures of the system are often in the news whether these are related to human fallibility, criminal activity or simple inappropriate resource allocation. Medical data structured in XML allows examination of clinical activity with a power and scope never previously possible. Whilst a static data set can be indexed and searched in context using one of the many available SGML/XML aware systems dynamic data presents a greater challenge.

Keywords

Contents
  1. Introduction
  2. Static or dynamic data
  3. Source granularity
  4. Examples in practice
  5. Lessons from system failures
  6. Lessons from criminal activity
  7. Lessons from resource mis-allocation
  8. Conclusions
  9. Acknowledgements
  10. Bibliography

Introduction
The separation of content from rendition has many advantages which do not need reiteration here. In 1992 when we first started to develop the precursor to the Oswestry system the major advantage was that SGML would allow us to perform complex searches of large quantities of information to produce efficient result sets. Gradually we were diverted from our original mission as clinicians saw electronic delivery of their data and said "yes, that's the way it should look". Firing small XML documents into a browser through the hospital network giving an apparently instantaneous access to the records leads to the ability to transfer some medical activities to the electronic medium.
As the document repository became a sizeable mass of clinical information we again started to examine the strengths of XML with respect to searching. When presenting the Oswestry system to groups of information technologists a favorite pastime is to show the input tools and the browser with all it's tricks and frills and then ask "could you do this with HTML?" Usually there are some nervous looks and then one hand followed by a few more go up. The answer up to that point is that to an extent it is possible. Where are the strengths of XML which represent unique advantages making this technology irresistible for system implementers?
In Oswestry we have a number of legacy systems which we are not able to replace with open HL7 aware products. XML's ability to capture the structured output of these systems and output them to a repository or even another non compliant system is unique in it's flexibility. With XML around who would ever try EDI (Extremely Difficult Interfacing) as a method of linking two disparate systems? The specification for EDI runs to 2,7000 pages whilst that for XML to 26! The ability to transform data is not generally appropriate for health care text data but can be useful for laboratory results where different units and scales need reconciling and where drug doses may need turning into milligram of drug per kilogram of bodyweight.
In-context searching is the other killer app which XML brings to healthcare in a formulation which combines relative affordability with the power necessary to enable effective management of the medical process. It is not possible to manage the clinical process unless you have the ability to examine the medical record. The progress made so far in our efforts to develop an effective search capability in XML and the scope for future development are the subject of this paper.
Previous Previous Table of Contents
Static or dynamic data
In 1996 we were fortunate to be funded by the Information Management Group of the NHS to conduct a study of live clinical information collection with arrangements for static presentation of the resulting data as an anonymised document collection. We chose SoftQuad's Explorer product as the vehicle for delivery of the fully indexed and searchable collection. Partial records covering 700 patients were collected during a three month live phase of the trial. We were able to demonstrate the principles of in-context searching of clinical data.
The following year Graphnet built a series of markup engines to apply markup to four years worth of legacy word documents consisting of clinic notes; operation notes and ward round notes. A Q&A database of 13,000 discharge summaries was marked up as was information from the pharmacy and physiotherapy 'stand-alone' data bases at the hospital. The result of the legacy extraction was to produce a large quantity of data which could be queried to answer some preliminary questions about the scalability of searching. We stored the data as a pre indexed repository within Inso's Dynatext environment. Even complex text searches could be undertaken with the Dynatext search engine. If combinations of numbers such as dates were searched for the system slowed down very significantly because the indexing did not take these into account. The ability to produce complex transformations on numerical data will be important for the final system. For example "find all patients aged over sixty who had an operation last year" requires not only date range functions but also some manipulation of the patient's date of birth and the operation date to give the correct result set. XML data types offer a significant advantage over SGML in handling dates in a robust fashion.
The current system which has been operating live in Oswestry for the past two years differs from the previous two search demonstrators in that it is a live system with documents being added continuously during the working day. Perhaps saving a copy every night and indexing that would be a solution but there are two major difficulties which led us away from that approach. Firstly even if one has a repository with the 500,000 documents in it as a static data set, the registration process to turn the data into a fully indexed searchable repository is not currently either quick or automatic. a second consideration is that some of the most interesting events are the ones happening just recently and the frequency of the downloads would have to balance need for up to date information with practical considerations. We eventually decided upon a relational solution with pre-indexing building extra components to allow true cross document searching.
Previous Previous Table of Contents
Source granularity
We have followed a very traditional implementation model for SGML starting with the development of an understanding of the structure of the information needed to allow the clinical processes within the hospital. Secondly we developed a system for capturing that structure efficiently from dictated text and marking it up in XML The final step is to process and render the documents and contained data. We generally did not increase the granularity of the data beyond that which comes naturally to clinicians. A clinic note would have an history; examination; radiographs and opinion & plan elements which follows the SOAP model of medical records. Where patients were to be admitted to hospital or operations were planned we offered the option of adding extra granularity to the record by using elements to indicate various details of the planned admission. An example of extra granularity added because we had the ability to re-process our documents was to add an inpatient post operative instruction element and one for post operative outpatient instructions because these were aimed at different groups of carers.
Markup designed to enhance search capabilities consists of text fields for diagnoses, complications and co-morbidity. The code word elements also have attributes to enable codes to be attached from a thesaurus for subsequent processing although the DTDs generally favour the use of elements rather than attributes where extra information is required. We have not made the diagnosis keywords mandatory but are planning to influence the dictation behavior of clinicians by feeding back their coding rates relative to their anonymised peer group. Where an organisation must reliably collect information then making the relevant elements mandatory is necessary but the cultural change to make clinicians understand and commit to entering the information must precede the change. Altering the data structures as a driver of change will simply alienate the users with respect to the information system.
Previous Previous Table of Contents
Examples in practice
Our current search ability is limited by a variety of factors which are being progressively resolved. The major limitation at present is the presence of some non XML text entry systems, thus we are not able to undertake our final and definitive extract of legacy data until these systems have been replaced. Any search currently undertaken cannot provide a full examination of the hospital's clinical information. Even when we have a final legacy extract taking our text data back to 1994 the ability to apply markup to historical text is somewhat limited and generally allows the identification of the type of document, the author and the secretary as well as the date of the document. For operation notes the anaesthetist is also often identifiable in an automated legacy extract. Searches including data from the legacy source thus are less accurate because they are less focussed than those conducted on prospectively marked up data.
An example of the use of the system so far was to search for occurrences of operations where the Lautenbach procedure was performed. This is a technique of dealing with severe bone and joint infections which uses large quantities of extremely expensive antibiotics. The pharmacy budget was being significantly stretched by the use of this technique but we had no way of discovering the number of cases which had had Lautenbach operations because there is no specific code for the procedure in our hospital's coding system. A search on "Lautenbach inside <opnote>" revealed 27 cases. A further search for misspellings with wildcards within and at each end of the word failed to reveal any missed cases. We were able to identify the cases by surgeon so that the surgeon performing the most of these cases could be brought into a discussion as to the best way to recoup costs through purchasing agreements with purchasing Health Authorities.
Sophisticated enquiries require an iterative search method to produce the correct levels of specificity and sensitivity for any given characteristic of interest. Once a search has been built then there needs to be the ability to run the search on a regular basis to give a continuous monitoring of the quality of care. A flagging system can be used where limits are set for acceptable performance and any exceptions reported by E-Mail to the person responsible for monitoring clinical quality.
Previous Previous Table of Contents
Lessons from system failures
Both in the United States and the United kingdom there is extensive evidence of significant numbers of patients coming to harm as a result of their treatment. The National Academy of Science estimated that between 44,000 and 90,000 Americans die each year as a result of errors in treatment. Approximately 7,000 of these errors are connected with drug administration errors. In the United Kingdom a recent crisis in public confidence in medical care arose because of a children's cardiac surgical unit pushing on with a program of treatment in spite of evidence that their results were less good than could be expected. even with an effective internal clinical data monitoring system in place, the cardiac surgical tragedy might have occurred thus review of the results needs to be conducted externally to the audited organisation.
Previous Previous Table of Contents
Lessons from criminal activity
A recent criminal trial of Dr Harold Shipman revealed the lack of information available to assess the activities of self employed family doctors. Dr Shipman was convicted of the murder of a number of his elderly patients either at his surgery or at their homes. He was only caught when he forged the will of one of his victims in spite of various people raising their concerns over the previous few years. The Health Minister promised a thorough review of all aspects of the case including why those who should have been reviewing Dr Shipman's clinical activity failed to act. The simple fact is that those responsible for Dr Shipman had no information on which to act.
The Electronic Health record proposed by the NHS Management Executive would have allowed a regular review of death rates for each family doctor so that the consistent excess mortality produced by Dr Shipman's criminal activities could have been investigated. The Electronic Health Record is a summary record which contains a birth to death log of events and needs to contain marked up death certificates so that the final outcome can be analysed in the context of the treatment received. Ultimately catching the very occasional criminal will be much harder than identifying substandard practice because of the covert way in which a premeditated criminal will cover his tracks.
Previous Previous Table of Contents
Lessons from resource mis-allocation
All healthcare systems are under intense funding pressure and any inappropriate expenditure will lead to the overall benefit delivered to a community being less than optimal. Within our hospital's region there is a fourfold variation in the rates of total hip replacement. The disparity in hip replacement rates implies that in some areas patients who would benefit are going without surgery whilst in other areas patients are being operated on inappropriately. To address inequalities in resource allocation there is a need to include score data for disability or disease scale scores on a general basis so that data can be found on the relative levels of pre-operative disability in the areas of under and over provision. a simple reliance on records entered as a result of routine care will be insufficient to produce an effective answer concerning clinical activity. If the organisation requires the extra granularity brought about by the use of score systems then the culture needs changing to one of acceptance before the markup is changed to mandate scoring.
Previous Previous Table of Contents
Conclusions
Previous Previous Table of Contents
Acknowledgements
I wish to thank the Electronic Patient Record Project Board for their funding of the two Oswestry pilot trials.
Graphnet Computer Services Limited built the search engine and provided legacy data extraction to give the large body of data which was needed to develop the searching system.
Previous Previous Table of Contents
Bibliography
[1]The Standard Generalized Markup language for electronic patient records. Roberts A.P. Health Informatics Journal September 1998. Sheffield Academic Press.
Previous Previous Table of Contents