Using Validating Stylesheets with Healthcare Messages
ABSTRACT
The ability to validate at source with a DTD or Schema is familiar but this can be extended with further validation tools. Practical examples will be shown of Schematron and XSLT to validate CEN13606-4 healthcare messages.
Table of Contents
1. Introduction
Clinical interfaces take a significant amount of time to implement. Most of this cost is not as a result of technical complexity. There is normally more than one supplier involved changing or configuration their software to implement the interface. There is a requirement to map the data models, and often to change the working practices of the users of the systems. The interfaces are normally carrying critical patient data, and so must be tested fully before running in a live environment.
2. Why Validate?
Validation as discussed in this paper is the process of identifying where messages do not comply with the specification for the interface. This can be done manually or automatically, and the validation tests may be written with the Specification document, or afterwards as part of a regression testing process.
Validation of the messages does not guarantee a working interface, since the applications must be generating and processing the messages correctly, but by identifying problems, or potential problems, it can help to reduce the cost and time taken to develop interfaces.
3. Validation with XML
The simplest way to specify an XML document is to write a DTD or Schema, the contents of which further described in associated human-readable text.
One of the contributions of XML to such a messaging environment is that this message specification can be used to automatically validate documents.
A validating parser, can be used to check that documents conform to the structure defined in the DTD or schema. This check can be done when the message is sent, and when it is received. The receiving system can validate the messages against the schema, and reject any that fail. This simplifies the receiving system code, since there is no need to re-code the checks that the schema does.
But since the sending system also has the schema, it makes sense to validate the message as soon as it is produced, so that the sender gets immediate feedback if there are problems identified. The message can then be corrected, or an issue raised, without having to get the other system (or its supplier) involved at all. This is validation at source as practice with almost all XML messages.
By contrast the specifications for other data formats are either by example, or by writing prose text. Neither of these provide the automated checks that an XML validating parser gives. So how is validation done without XML?
4. Validation without XML
Since the textual specifications do not provide any automated testing, a range of approaches are used, as listed below:
-
Tedious field or character counting, and few programmers are either good at or enjoy such manual testing. Further just because the output matches the developers understanding of what should be output, there is no guarantee that it will match the expectations of the receiving system.
-
Waiting for the receiving system to be available to do the testing. This can delay the initial testing of outputs unless the receiving side of the interface has already been developed. It also means that testing involves both systems, which can introduce delays and costs in the testing cycle. A further problem is that the validation logic is embedded in the application, and so is not available for review or re-use outside of the application.
-
In-house testing tools. These need to be locally developed and maintained (at a cost), and while they remove the tedium of hand testing, they need to be tested themselves, and unless they are shared and trusted by the other communicating parties, they do not help to keep expectations in line.
-
Third party tools. These are often expensive, either in actual price, or in the need to go through a procurement process. Where the third party validation tool is a part of an interface engine there is normally a run-time cost for each interface installed.
5. Validation at source
When the validation checks are separated from the receiving application, they can be made available to those responsible for the sending system. By doing this the receiving system developers are also exporting responsibility for identifying failing cases and stopping them entering the message flow.
This approach of pushing responsibility for testing the validity of messages back towards the application that sent it can be extended further.
6. DTD and Schema validation are not enough
The specification for an XML message comprises of both the schema and a textual document, and there are a number of reasons why validation using the schema alone may not be as complete as possible.
-
There are some constraints that cannot be expressed with schema. DTDs are very limited in the constraints that they can express, but even W3C Schema have weaknesses in some areas. In particular there is no easy way to place constraints on the content of an element based on the value in a different element.
Such constraints are needed to ensure that the order of dates in a message make sense (the date of the message must be after the date of the events that it reports on), and to check that test results reported as normal are actually within the normal range.
-
There are checks that could have been included in the schema but were not. This could include specification of permitted values for short code lists, which may be expected to change with a different frequency to the rest of the message, and so kept out of the schema.
-
There are failing cases that could be explicitly tested for, to avoid repetition of previous errors. An example here would be a particular date that on the sending system is used to indicate a null date, but is passed through unchanged. It may have been possible to add this to the schema, but by providing a separate test for it, there is not a requirement to modify and check the full schema.
7. Version control
As can be seen from the examples above, validation checks are not always known at the time that the message is designed, and they may change at different rates.
The rules that change more often can be checked for using an additional validation check that could be stored and versioned separately from the full schema.
Where there is "many to many" messageing, there is also the possibility that some validation tests are not relevant to all parties. If as a receiver I identify a test that would identify invalid messages that I am receiving from one vendor's system, I would want to give that vendor the test so that they can have responsibility for fixing the problem, without waiting for agreement to a global update to the schema.
8. Structure or Rule?
Schema define the structure of the message, and in doing so must define the whole message. Rather than amend this structural definition every time a new constraint is identified, it is more intuitive to add a rule that expresses the new constraint, and validating stylesheets provide a mechanism for defining such rules.
9. Virtues of a Validating Stylesheet
XSLT stylesheets are a part of all XML developers toolkit, and so reading and writing stylesheets is something that XML developers will either already be able to do, or will be happy to learn.
XSLT processors are also almost as widely available as XML parsers, and indeed are bundled with parsers such as MSXML3 from Microsoft. There is therefore unlikely to be a requirement to install any new software to be able to use a validating stylesheet.
The size of the validating stylesheet is proportional to the number of rules that it contains. Thus the effort required to review the test is also relatively small where there are few new tests.
The stylesheet is a text document, that can be reviewed by anyone who doubts whether it is correctly written to implement the test.
The validating stylesheets can also be used as example programs for implementors of the message to use. This is particularly the case if they intend to use XSLT to convert the message into an application specific format before processing it.
10. Validating Stylesheets in action
The following code is part of a validating stylesheet written to provide additional checks on a CEN 13606–4 message. The message is being used to collect information to provide a comparative analysis of prescribing behaviour. The message was initially constrained by a DTD derived from the informative DTD in the standard. A significant problem during implementation was that a DTD allowed mandatory elements to be empty, whereas many in the message must have content. Thus the following trivial example was initially developed:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="//Cuid">
<xsl:choose>
<xsl:when test="string-length(.)!=36">invalid
</xsl:when>
<xsl:otherwise></xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
11. What Schematron add
Schematron is a rules based validation mechanism that has been developed by Rick Jelliffe of the Academia Sincica Computing Centre. A Schematron file is an XML document that contains the xPaths that express the constraints, and the message, warning, or other output that should be generated when the constraint is breached. This allows the development of the rules to be separated from the development of the stylesheets that implement them.
A schematron file for the above test would be as follows:
<schema>
<pattern name="Check that the GUID is really unique">
<rule context="Cuid">
<assert test="string-length(.)!=36"> Invalid Guid; </assert>
</rule>
</pattern>
<schema>
This can then be converted with an XSLT stylesheet into either a message browser that highlights the errors in the message, or a simple pass/fail validation test.
12. Conclusion
Validating stylesheets provide a valuable addition to conventional XML validation with schema and DTDs. They use a language that those using XML are familiar with, and they provide open checks that can be shared and adapted, allowing new validation tests to be got to the source of the messages where they can be used to identify and solve problems in the most cost and time effective manner.


