|
PRISM
e-commerce through metadata
|
 |
The PRISM Working Group is sponsored and hosted by IDEAlliance, a.k.a.
GCARI, the Research Institute of the Graphics Communications Association (GCA).
It was founded by the author in response to a critical need in the marketplace
for an extensible, XML metadata standard for repurposing, aggregating, syndicating,
personalizing and post-processing magazine, catalog, book, news and mainstream
journal content. This paper describes the business drivers for the PRISM Working
Group members, some of the challenges that the group has encountered along
the way and the progress and achievements to date.
PRISM: metadata to improve the bottom line
Rationale for joining the Working Group: The Biz Problems
Content providers of all types are faced with continuous challenges
to create new revenue vehicles and save money with existing ones. They need
to capture, manipulate, combine, shred, protect, manage, personalize, post-process
and recombine content for multiple media without invoking the labor-intensive,
time consuming processes of today. Consumers expect content to be delivered
on print media, hand-helds, mobile devices, screens, and kiosks, formatted
on-the-fly from one source - in a number of formats - if not gorgeous!
Changing technology is driving the creation of a new business environment.
There are new kinds of business relationships, new technological requirements
and more advanced technologies, new methods for sharing content, and totally
new types of content such as hybrids from multiple sources for print, streaming
media and interactive. For instance, previously unprecedented kinds of business
relationships like the alliance between MSNBC and the Washington Post require
fast, accurate automated sharing of large volumes of data in multiple formats
- in real time.
Today, in most shops, repurposing consists mostly of cutting and pasting
since there is no reliable way to automatically retrieve similar types of
content. Also, lack of agreement among publishers on how to describe various
pieces of content makes aggregation very difficult. For instance, when Getty
Images acquires a new company, all of the new images must be integrated into
the existing collection. This is an enormous task today because there is no
consistency of metadata! And what makes it even worse is that there is no
common language among software tools that create, store and manipulate these
pieces of content.
Syndication is also a challenge. Although a standard XML communications
protocol for syndication called ICE (Information Content & Exchange) has
been developed, there is no standard way to automatically describe the data
that is being syndicated.
Publishers, aggregators, syndicators and any kind of “re-publishers”
must also have access to the rights and permissions associated with each content
component - not just at the document level. And that information must be available mostly automatically. Some of the kinds of restrictions
that need to be considered are geographic, time, language, market, format,
alterations, and exclusive use. Knowledge is also required about use of freelancers’
work, for instance, and whether content can be used on a partner’s website
and so on.
Today, rights and permissions are dealt mostly via phone calls, pencil
and paper, email, faxes and paper contracts via mail. These processes are
very labor-intensive and highly unreliable since, for instance, one person
may quote one set of permissions and prices based on his/her research and
another person may discover something quite different.
Another issue that frustrates publishers and consumers alike is that
there is no way to find separate pieces of content that were published together
at one point in time, such as the New York Times special supplement. The reason
is content components carry no usage history information. Similarly, there
is no way to find letters of correction and subsequent articles today because
the relationships are not referenced. Nor are relationships within a particular
article tracked.
The hopes of resolving at least some of these barriers were the drivers
that brought the PRISM Working Group members together.
Goals
Initial Primary Goal: to develop an XML metadata vocabulary by mid
2000 specifically for the magazine, catalogue, mainstream journal, news and
book industries.
- Publishers must agree on a set of metadata descriptions so that
when they send content to each other over the web using standard communications
protocols (one of which is ICE), both the sender and receiver will know what
they’re getting.
- The metadata vocabulary will also make it possible for users of
aggregation sites to find information in a reliable way and for the aggregators
themselves to manage that content.
- Individual publishers will also have a standard-but-extensible vehicle
for managing their own re-use of information across their own content sources.
- Software tool vendors can then incorporate support for this standard
metadata vocabulary to provide off-the-shelf tools without complicated integration.
- The metadata vocabulary must also work for archiving and search
and retrieval.
- The vocabulary must be implementable when it is delivered.
- The metadata vocabulary will leverage and reference related standards
and technologies as possible and as appropriate, such as RDF, the
Dublin Core, NITF, NewsML, DOI, INDECS, ICE, DPRL and so on.
- Alliances with other standards groups must be built to ensure a
flow of information and to encourage adoption of PRISM by other initiatives.
The Working Group
Modeled on the successful ICE Authoring Group, which is also hosted
by IDEAlliance, the PRISM Working Group is made up of a combination of content
providers, integrators and software tool vendors. The thought process goes
something like:
If there are no tools, there will be little adoption. If the vocabulary
is not the right vocabulary - if it does not work for diverse types of publishing
- there will be little adoption.
At its inception, content providers and software vendors could join
the Working Group by invitation only, because it was crucial to provide the
right mix of vendors and providers and it was also critical to cover a variety
of types of publishing and software tool functions. However, today, any interested
company may join as long as it makes the commitment to assign the appropriate
resources. Another level of membership is also being established called the
Network Member Level. Network members will be able
to track progress, provide feedback (and get) and get early access to specifications.
Working Group members currently include:
- Adobe Systems
- Artesia Technologies
- Banta Integrated Media
- Cahners Business Information
- Condé Nast Publications
- Getty Images
- iCopyright.com
- International Data Group/ITworld
- Kinecta
- KPMG
- MarketSoft
- Metacode Technologies
- Quark Inc
- Sothebys.com
- Time Inc.
- Vignette Corp
- Wavo Corp
Timelines
The aggressive milestones that the group established called for the
release of a specification by mid 2000. At the time of writing, the work is
on track to produce the first version of the specification in that timeframe.
PRISM’s Progress
Background
The Working Group began meeting in June 1999 and has continued to meet
every month since then - although some meetings involve only subcommittees.
Creating a new standard is an exciting but challenging activity, especially
in a space that is already inhabited by other specifications, all of which
need to be examined, understood, leveraged and/or referenced. Thus the Working
Group proceeded in two directions, initially. The Group decided that the requirements
document would be a set of very specific “use case” scenarios
describing business problems that content providers need to solve. These scenarios
have proved extremely useful for determining scope and actual vocabulary requirements.
As new content providers join the Group, they provide additional scenarios
to ensure that the group is creating the ‘right” vocabulary.
The Group researched and analyzed existing standards and initiatives
to determine which aspects were not being addressed. The Group’s specific
goal was to avoid “re-inventing the wheel” - to make use of existing
specifications where possible. A secondary goal was publish a catalogue describing
existing specifications and how they relate to PRISM.
During this research phase it became evident that some of PRISM’s
interests coincided with those of the IPTC, who had developed NITF (News Industry
Text Format) and were in the process of developing NewsML. Thus a collaborative
relationship was crafted and representatives from each group have been attending
the other group’s meetings since January. The IPTC has already developed
some aspects of metadata that are very valuable to PRISM. But the IPTC’s
work is not as focused on component and general relationships, rights and
permissions and other requirements that are more specific to magazines and
catalogues.
Getting the work done
PRISM is using a supply chain framework to delineate various aspects
of required metadata. Objects consisting of any/all media type(s) are captured
and maybe archived, and then “passed” to another process for manipulation
and aggregation into a compound object, such as a magazine article. The compound
objects are then delivered to multiple media in multiple formats. They may
then be re-stored as individual components or passed outside the enterprise
as compound objects for aggregation, syndication, re-use and retrieval by
another business and/or by consumers. Post processing may occur at any step.
Metadata to facilitate these processes with the exception, probably, of delivery,
are the concerns of PRISM. Metadata to manage content objects, to describe
their rights and permissions and the relationships between and among them,
is also the concern of PRISM.
The Working Group is developing a framework for the metadata vocabulary
and expects to use RDF to describe the relationships of components. The Group
has formed subcommittees devoted to each aspect described above. Subcommittee
work is reviewed by the whole Working Group. PRISM requires consensus on all
decisions.
Achievements
In February 2000, at the Seybold Seminars Conference in
Boston, the Group presented an interoperability demo of a scenario involving
seven software tools exchanging and operating on content tagged with PRISM
metadata. The admiring audience was struck by the applicability of the technology
demonstration to the issues they struggle with on a daily basis. The demonstration
was repeated at XTech in San Jose later that month
- again receiving very positive responses.
At the time of writing work has proceeded on both vocabulary descriptions
and framework definitions such that the Working Group expects to release version
1.0 of the specification in the June timeframe.
In the exhibition area at this conference (XMLEurope 2000),
the PRISM Working Group members are demonstrating collaborations between content
providers and software tool makers using PRISM metadata.
Summary
In summary, it is clear that the industry needs a standard metadata
vocabulary to realize the potential of online publishing and e-commerce in
the publishing industry. PRISM provides a framework for the interchange and
preservation of content and metadata. PRISM also provides a set of controlled
vocabularies with which to describe the content being interchanged. Thus PRISM
will provide a common interchange that greatly expands the market for licensed
content.
Acknowledgements
The author wishes to thank the members of the PRISM Working
Group for ongoing dedication to the specification and in particular, Deren
Hansen of Wavo Corporation for his fine work as editor of the specification
and Ron Daniel of Metacode Technologies for his leadership as co-chair of
the Working Group.