NewsML
a news markup and management tool in XML
David Allen
Find


Abstract
In autumn 1999 the IPTC embarked upon an ambitious new project to develop an XML based standard called NewsML. NewsML is an XML encoding for news which is intended to be used for the creation, transfer and delivery of news. NewsML is media independent, and allows equally for the representation of the evening TV news and a simple textual story. The need for this new standard was driven by IPTC members who had decided to adopt XML as a core technology for their future business needs. The requirements developed for NewsML converged with the emerging XML family of standards. The project was heavily time constrained and required decisions to be made on how far adopt standards that were not finalized. It also needed compromises to be reached between organizations with different business priorities as well as pragmatic decisions on the availability and usability of XML tools. This presentation explains the processes adopted to achieve a successful conclusion, explains some of the problems encountered along the way and reveals how successful the project has been to date.

Contents
  1. NewsML: an news industry standard for the new millennium
  2. Acknowledgements

NewsML: an news industry standard for the new millennium
Background. The IPTC has been developing standards for the interchange of news material since the late 1970s. Initial work focussed on text messages but was followed by a digital newsphoto format in the late 1980s. By the mid 1990s a new text format, initially based on SGML, was released. This is called the news Industry text Format (NITF). Following the announcement of XML version 1 in early 1998 the IPTC converted NITF to an XML document type definition and this now enjoys widespread usage.
The On-line Dimension. With the move towards the adoption of more open standards for the news industry, particularly as a result of the very rapid growth of online publishing on the world wide web, the IPTC decided to base all its future standards activity on XML. This decision was endorsed in mid 1999 and a new programme of work commenced in October 1999 with the introduction of the NewsML concept. prior to NewsML the IPTC standards had only allowed for one instance of a single media type to be exchanged in the containing envelope. This envelope carried both routing, identification, descriptive and editorial metadata but required bespoke applications to open and parse the information. Publishing outside of the traditional print medium called for news many formats and also required the provider to explicitly declare the relationships between the different media objects related to a particular news event.
Timescale. It was this pressure for multimedia, multi-object presentation of news that initiated the new work programme of IPTC2000 with at its heart the NewsML description. NewsML was seen as a news exchange and management tool that could be used throughout the life cycle of news from first assignment to final archival. This was a challenging requirement and extended the boundaries of the of the new standard beyond ant previous IPTC work. Above all, it was seen that market forces required a viable version 1 of the document type to be available by mid 2000 when it could be ratified at the next IPTC Annual General Assembly. Fortunately, the IPTC already had a significant intellectual investment in its previous envelope standard the Information Interchange Model and so an evolutionary approach could be adopted to develop NewsML.
Working Method. The normal method of working for IPTC members is to form specialized Working parties to address matters in detail and then to have these actions reviewed and endorsed by a plenary body known as the Standards Committee. This 2 stage approach allows small groups to progress rapidly while allowing the membership as a whole to have an input into the emerging standard. Above all, as an international body, IPTC has always had the aim of building a consensus for any of its publishable standards. As it had been decided to put NewsML in the public domain as soon as possible the development of such a consensus was more urgent that for previous standards. To further gain feedback from a larger audience it was decide, again for the first time, to establish a public mailing list for exchange of ideas pertinent to the NewsML work. Within IPTC 3 Working Parties were established to look at the news Structure and Management, the News Text markup and News Metadata. The 3 Chairman were appointed, based on their areas of expertise and each of the Working parties was able to draw, in part from the previous IPTC work. Relevant earlier activities included the Information Interchange Model (IIM), The Digital News Parameter Format (DNPR), the Common Linking Implementation Procedure (CLIP) and the News Industry Text Format (NITF).
News Structure & Management. This activity has the widest remit and is the most challenging. Not only were new concepts involved, particularly the ideas or roles and named relationships between different news objects but there was also the management issues to be considered in the evolving news scenario where relevance and accuracy are all important. In particular, the family of XML related standards is still not mature and trade-offs need to be made between tool usability, requirements and the adoption of unratified standards that could still change before final release. It was felt that, to assist in making such value judgements, outside expertise would be of great value
News Text. Written News can come in a variety of sizes from the one line announcement to a fully developed feature article or even a ready made newspaper page. There was concern that markup overheads would not unduly burden the smaller items, but at the same time it was necessary to provide sufficient richness in markup to serve the needs of the more developed articles. This was an area where the experience of the NITF was to prove valuable in allowing IPTC to draw from the experiences of NITF implementors.
News Metadata. For over 20 years the IPTC has been in the process of identifying and specifying what we now call metadata for the exchange of news. The initial focus was on text but has since been followed by photographs and audio as the members businesses have evolved. As it is hoped that NewsML will be widely adopted throughout the (news) publishing industries it was felt necessary to gain inputs from other areas than that represented by most of the IPTC members. The areas identified as being of specific concern are that of magazine publishing and video. There is also, the all encompassing dimension of Copyright and rights that concerns everyone who creates or publishes material in whatever media is relevant. In order to gain the appropriate expertise we have decided to work cooperatively with both the Graphic Communications Association PRISM initiative and the ISO MPEG-7 committees. Both these organizations will allow is to improve our metadata coverage and relevance in areas that we could not expect to cover from our own resources.
Progress Achieved. As at the time of writing the IPTC has formally endorsed the NewsML Requirements documentation and is actively working on the associated Functions documentation. The 3 working parties have already made significant progress and external Consultant effort has been brought in to assist in some of the trade off and technical decisions. Contact has been made with PRISM and MPEG-7 and collaboration frameworks established. We are still confident that a version 1.0 DTD can be made available by July 2000.
Previous Previous Table of Contents
Acknowledgements
The author wishes to thank the members and directors of IPTC for permission to publish this paper. NewsML is a trademark of the IPTC.
Previous Previous Table of Contents