The role of standard metadata in a portal publishing system
Jay Rothschild
Find


Abstract
This paper will discuss the impact of the Publishing Requirements for Industry Standard Medadata (PRISM) on editorial content exchange and syndication.

Keywords

Contents
  1. Standard metadata in portal publishing

Standard metadata in portal publishing
What is “metadata”? When and how should you begin to capture it? How much metadata do you need for a particular piece of content? How do you attach that metadata to the content it describes? How much of the metadata needs to travel with the content, and for how long? Exactly what metadata should you be capturing? These are some of the questions PRISM is working on, and these are precisely the questions Cahners is trying to answer as it makes the leap from being a print-based publishing company to a “new economy” electronic information company.
For about a year now, Cahners has been using the XML to facilitate the process of publishing its content electronically. Currently, this process begins with a Quark Express document (this is the layout/pagination tool Cahners uses for its print publications). Using a Quark extension developed for Cahners, a Web editor extracts the text portion of the document as well-formed XML, based on the “Xpress Tag” markup representing the names styles from the Quark. The Web editor then uses an ASP script that employs the XML DOM to convert the Well-formed XML to valid XML, conforming to a DTD Cahners wrote specifically for its magazine content.
Most of the Metadata Cahners currently captures is stored as attributes of the root XML element in the valid document. The conversion process itself can capture a certain amount of this metadata from the Quark documents (as much as there is). The Web editor then adds the rest of the required metadata by hand using an XML editor. Cahners then uses that metadata to route articles onto the Web, to sort articles by topic or article type, and to filter articles for re-use rights when we syndicate or otherwise re-purpose an article or a complete issue.
While we are probably ahead of the industry in our use of XML and metadata to re-publish and syndicate our content, we are not yet where we’d like to be, and our XML metadata is still coming at more of a cost in terms of production than it should. For example, when an article is first written or copy-edited, it would be a simple matter for the author or copy editor to add some basic metadata to the article—the author’s name, for example, along with the date it was created, the publisher, perhaps the subject of the article. Later, as an article moves through its editing cycle, it would be useful to capture some of the other information that accrues about the article—for example, who owns the primary and secondary rights to the article text? Who created the illustrations and other graphics? Does Cahners own re-use rights for all images attached to the article? For what publication volume and issue was the article first written? Out of necessity, a Web editor manages to gather all of this information, but the process is much less efficient than it could be, because the Web editor doesn’t know the answers to many of those questions and must track down the answers.
Beyond the issue of basic information is the question of context for an article. For a piece of content to be truly valuable to Cahners in an electronic arena, we need to be able to sort and re-collate that content in many different ways. We need to be able to search our content for specific pieces of information, and insert electronic “hooks” to capture connections between pieces of content. For example, Cahners may want to create a Web “portal” for all of its electronics industry titles. Within this portal, we want to let our customers view articles (or maybe even just pieces of articles), by subject matter (e.g., “semiconductors,” or “analog circuits”), or by content type (e.g., industry news, feature articles, commentary), or both. We want to send email notification to our readers when an article appears on a subject that readers have identified as of particular interest to them. Or when an article appears about a particular company, or a particular product.
Or, on a more granular level, we may want to link a company name within an article to a profile of that company, or to a list of products that company currently offers. We may want to link a product name to a profile of the company that sells that product. From there, we may want to take the next logical step, and facilitate some sort of transaction between the reader and the company. This is, after all, the ultimate direction of so-called business-to-business (“B2B”) e-commerce on the Web.
In order to do all of that, Cahners needs a format-neutral, centralized method for capturing metadata as content is created or acquired, and for storing that metadata in a way that makes it easy for people or programs to search and assemble our stored content. We need a workflow that allows content creators, editors and others to apply metadata to content efficiently, at the moment it first becomes available. Or even to generate the metadata automatically. And finally, to accomplish all I outlined above, we also need a rich, comprehensive set of industry-standard metadata terms.
To meet the first need, we are building an XML-based content management system that will give our editors workflow tools for creating content, adding metadata, and storing both in a centralized, searchable, repository. Another set of tools will facilitate the creation and publication of portal Web sites, targeted email, content syndication, and other electronic information products. The metadata framework that will help power this system will come, we hope, from the PRISM standard. Because we’re on the cutting edge of the industry right now, working with PRISM is giving us the opportunity to leverage the findings of the Working Group within our current efforts, as well as shape the direction of the industry standard as it develops.
Previous Previous Table of Contents