|
A manager's guide to the latest hot topics
|
 |
No abstract was provided for this paper.
Well-formed XML
XML - (Well-formed) Extensible Markup Language
A well-formed instance:
<?xml version="1.0"?>
<weather>
<current>
<temp scale="F">72</temp>
<pressure>1005</pressure>
<humidity>43</humidity>
</current>
<min>
<temp scale="F">65</temp>
<pressure>998</pressure>
<humidity>38</humidity>
</min>
<max>
<temp scale="F">78</temp>
<pressure>1010</pressure>
<humidity>43</humidity>
</max>
</weather>
Valid XML
XML - (Valid) Extensible Markup Language
A valid instance:
<?xml version="1.0"?>
<!DOCTYPE weather [
<!ELEMENT weather ( current, ( min, max )? )>
<!ELEMENT current ( temp, pressure, humidity )>
<!ELEMENT min ( temp, pressure, humidity )>
<!ELEMENT max ( temp, pressure, humidity )>
<!ELEMENT temp ( #PCDATA )>
<!ATTLIST temp scale ( C | F ) #REQUIRED>
<!ELEMENT pressure ( #PCDATA )>
<!ELEMENT humidity ( #PCDATA )>
]>
<weather>
<current>
<temp scale="F">72</temp>
<pressure>1005</pressure>
<humidity>43</humidity>
</current>
<min>
<temp scale="F">65</temp>
<pressure>998</pressure>
<humidity>38</humidity>
</min>
<max>
<temp scale="F">78</temp>
<pressure>1010</pressure>
<humidity>43</humidity>
</max>
</weather>
XML Schema
XML Schema
- http://www.w3.org/TR/xmlschema-1/ - Structures
- http://www.w3.org/TR/xmlschema-2/- Datatypes
- a vocabulary for specifying schemata for different vocabularies
- defines and describes classes of XML documents
- defines structural, syntactic and value constraints and relationships
- elements
- attributes
- contents
- datatypes
- documents meaning and usage
- supplies default values for attributes and elements
- XML instance syntax
- schema document contains representations of schema components
- hierarchical constructs
- can be embedded within a document instance
- need not be a standalone document
- can be created with standard XML instance editing tools
- unlike Document Type Definitions
Does not attempt to provide all facilities
- some applications must still provide customized semantic validation
Part 1: Structures
- nature of XML schemas and component parts
- application of XML schemas to XML documents
- defines hierarchical relationships of constituent members
- type definition hierarchies
Part 2: Datatypes
- nature of content of attributes and text
- e.g.: integer, date, string, boolean, float, double, decimal, etc.
- atomic and aggregate types
- primitive and generated types
- built-in and user-generated types
- a language for use with XML Schema and other future XML Recommendations
- perhaps including XSL and RDF
- high degree of type checking
- ensures robustness in information interchange
- extensible to custom types
- stand-alone
- derived from standardized types
- attribute content and element content
Schematron
Schematron
-
http://www.ascc.net/xml/resource/schematron/schematron.html
- developed by Rick Jelliffe, Academia Sinica Computing Centre
- a data assertion validation language useful for validating information
against business rules
- an approach to validating an instance unlike using schemata
- a schema defines a grammar
- the validating process checks the instance against the grammar
- parent/child relationships specified in content models
- a set of assertions describes expected content of an instance
- the Schematron checks the instance against the set of assertions
- arbitrary relationships (e.g. cousin, grandparent, etc.) can be
specified
- encoding of business rules related to structural integrity of the
instance
- assertions are made using XPath expressions
- provides for coupling expressions
- assertions can rely on other assertions being true or false
- unable to be expressed in a grammar-based schema
A valid XML instance needing Schematron for business rule validation:
<?xml version="1.0"?>
<!DOCTYPE thing [
<!ELEMENT thing ( a | b )>
<!ATTLIST thing content ( a | b ) #REQUIRED>
<!ELEMENT a (#PCDATA)>
<!ELEMENT b (#PCDATA)>
]><thing content="a"><b>test</b></thing>

Figure 1
. Using a Schematron:
Of note:
- there are a number of different Schematrons providing different
styles of reports
- the output of using Schematron is a validation/reporting stylesheet
- the stylesheet can be run on any number of instances without regenerating
Namespaces
Vocabulary distinction
-
http://www.w3.org/TR/REC-xml-names
-
http://www.megginson.com/docs/namespaces/namespace-questions.html
- specifies a simple method for qualifying element and attribute names
used in XML documents
- allows the same element type name to be used from different vocabularies
in a given document
- consider two vocabularies each defining the element type named "<set>",
each with very different semantics
- in SVG (Scalable Vector Graphics) the element <set>
refers to setting a value within the scope of contained markup
- in MathML (Math Markup Language) <set> refers to
a collection of constructs treated as a set
- any document needing to mix elements from the two vocabularies may
need to use the same named
- without namespaces an application cannot distinguish which construct
is being used
- a namespace prefix differentiates the element type name suffix in
an instance
- composite name lexically parses as an XML name
- the use of the colon is defined by the namespaces recommendation
- also used to uniquely distinguish identification labels in some
Recommendations
- e.g.: customized sort scheme label
URI value association
- associates element type name prefixes with Universal Resource Identifier
(URI) references whether or not any kind of resource exists at the URI
- examples:
- xmlns:svg="http://www.w3.org/2000/svg-20000303-stylable"
- xmlns:math="http://www.w3.org/1998/Math/MathML"
- URI domain ownership under auspices of established organization
- URI conflicts avoided if rules followed
- explicitly does not expect nor require to de-reference any kind
of information from the given URI
- note that the Resource Description Framework (RDF) recommendation
does have a convention of looking to the URI for information, though this
is outside the scope of the Namespaces recommendation
- according to the recommendation, the URI is only used to disambiguate
otherwise identical unqualified members of different vocabularies
The choice of the prefix is arbitrary and can be any lexically valid
name
- the name need not be consistent with the use (though this helps
legibility)
- the name is never a mandatory aspect of any Recommendation
- the prefix is discarded by the XML namespace-aware processor along
the lines of:
- <{http://www.w3.org/2000/svg-20000303-stylable}set>
- <{http://www.w3.org/1998/Math/MathML}set>
- the above use of "{" and "}" are for example
purposes only
- note how the "/" characters of the URI would be unacceptable
given the lexical rules of names, thus, the URI could never be used directly
in the tags of an XML document
- the prefix is merely a syntactic shortcut preventing the need to
specify long distinguishing strings
Extensible HyperText Markup Language (XHTML)
Extensible HyperText Markup Language (XHTML)
-
http://www.w3.org/TR/xhtml1
- a reformulation of HTML using XML 1.0
- reproduces, subsets and extends HTML 4 vocabulary
- XML-conforming and operates in HTML 4 conforming user agents
- XHTML files are acceptable input to XML processing tools
Cascading Stylesheets (CSS)
Cascading Stylesheets (CSS)
-
http://www.w3.org/TR/REC-CSS1
-
http://www.w3.org/TR/REC-CSS2
- formatting property assignment for web documents (HTML and XML)
- no document manipulation capabilities
- only ornamentation of the document tree
- attaching stylistic information to nodes
- simple prefixing and suffixing of nodes with text
- control of whitespace around information
- overlapping and transparent rectangular regions
- multiple media type support
- character display presentation properties
- tabular presentation properties
- aural presentation properties for visually impaired browsing
- disabled users
- mobile users
- doesn't (shouldn't) interfere with legacy browsers not supporting
CSS
- working group producing a common formatting model for web documents
Example HTML Specification:
<html>
<head>
<Title>Test</title>
<style type="text/css">
H1 { color: green; text-align: right }
.info { color: red }
</style>
</head>
<body>
<h1>Test File</h1>
<p class="info">This is a test</p>
</body>
</html>
Example XML Specification:
The file samp.css:
EMPH { color: red; display: inline; font-style:italic }
PARA, TITLE { font-family: arial, sans-serif; display: block }
TITLE { font-weight: bold }
TITLE EMPH { color: blue }
for the file samp.xml:
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="samp.css"?>
<INFO>
<TITLE>Title with <EMPH>emphasis</EMPH></TITLE>
<P>This is a paragraph.</P>
<P>This has <EMPH>emphasis</EMPH>.</P>
<P>Last paragraph</P>
</INFO>
Document Object Model (DOM)
Document Object Model (DOM)
-
http://www.w3.org/TR/REC-DOM-Level-1
-
http://www.w3.org/TR/DOM-Level-2
- a tree-oriented platform- and language-neutral interface that allows
programs and scripts to dynamically access and update the content, structure
and style of documents
- a programmatic interface for XML and HTML
- a model of objects for XML and HTML
- a standard interface for manipulating the objects
- built on three layers of functionality
- Core
- fundamental interfaces
- XML extended interfaces
- HTML extended interfaces
- an agreed-upon fence between a programmer and a vendor
- programmers write to the standard interface without knowledge of
vendor implementation
- vendors support the standard interface to their internal data structures
without needing to share proprietary information
Example Principles:
- the document is a hierarchy of nodes
- a node can be in an ordered node list
- a node can be named in an unordered set
- the interface implementation can build node trees, lists and sets
from documents
- a programmer can manipulate node trees, lists and sets from program
control
Example Interface Definition:
interface Attr : Node {
readonly attribute DOMString name;
readonly attribute boolean specified;
attribute DOMString value;
};
- the name of the attribute
- the indication of the attribute being specified (true) or defaulted
(false)
- the attribute's value (obtainable and settable)
Example Language Binding (Java):
public interface Attr extends Node {
public String getName();
public boolean getSpecified();
public String getValue();
public void setValue(String value);
}
Simple API for XML (SAX)
Simple API for XML (SAX)
-
http://www.megginson.com/SAX
- developed entirely within the user community under the direction
of David Megginson, using the XML-DEV mail list
- a stream-oriented platform- and language-independent specification
of functions that allows programs to process XML information
- classical event-handling scheme
- XML constructs trigger events defined by the interface
- program routines handle events to implement application-specific
processing
- no need to maintain the XML hierarchy in memory
- an application can use a SAX interface to build an XML hierarchy
from a stream
- SAX 2 currently under development
Multiple implementations freely available:
- Java
- Python
- COM
- Perl
- C++
- many XML processors include a SAX interface to the XML information
Example of events defined in SAX:
- start of document
- start of element
- characters
- end of element
- end of document
Simple Object Access Protocol (SOAP)
Simple Object Access Protocol (SOAP)
-
http://search.ietf.org/internet-drafts/draft-box-http-soap-01.txt
- defines a remote procedure call (RPC) mechanism using XML syntax
- implements client-server interaction across a network
- a standard object invocation protocol built on Internet standards
- HTTP transport layer
- XML for encoding
- invocation requests
- responses
- extensible protocol
- extensible payload format
- incorporates SOAP vocabulary for enveloping information
- incorporates user vocabularies for payload
- simple datatypes for values based on XML Schema Datatypes
- includes arrays, compound and other types to the simple types
XML Path language (XPath)
Addressing identifies a hierarchical position or positions
- common semantics and syntax for addressing
- functionality required by both XSLT and XPointer
- a compact non-XML syntax
- for use in attribute values of XML documents
- select="id('start')//question[@answer='y']"
- select all question elements whose answer attribute
is "y" that are descendants of the element in the current document
whose unique identifier is "start"
A single W3C recommendation
-
http://www.w3.org/TR/xpath
- a data model for representing an XML document as a node tree
- a mechanism for addressing information found in the document node
tree
- an expression language for the manipulation of boolean, numeric,
string and node values
- a core upon which extended functionality specific to each of XSLT
and XPointer is added
XPath is not a query language
- one aspect of querying is addressing information that needs to be
found
- other aspects of querying involve working with the information that
is addressed before returning a result to the requestor
- XPath is used only to address components of an XML instance, and
in and of itself does not provide any traditional query capabilities (though
hopefully would be considered as the addressing scheme by those defining such
capabilities)
XSL and XSLT
XSL - Extensible Style Language
XSLT - XSL Transformations
-
http://www.w3.org/TR/xslt
-
http://www.w3.org/TR/xsl
- two separate vocabularies of XML
- transformation vocabulary to manipulate input documents into desired
output structure
- has a processing model makes it possible to script non-linear traversal
of information (identical to the DSSSL processing model and unlike the CSS
processing model)
- uses XPath as a base expression language
- added extensions specifically for linking
- can transform XML into XML, HTML, HTML+CSS, HTML+DOM+CSS, simple
text
- formatting vocabulary to express desired semantics for presentation
- adopts style semantics from already-established standards
- DSSSL (Document Style and Semantics Specification Language - ISO/IEC-10179)
- CSS (Cascading Style Sheets - W3C)
Consider the source file being processed includes a <note>
element
- must be rendered in italics and boldface, between two horizontal
lines, as a paragraph on its own, prefixed by "Note: "
To produce formatting objects according to XML lexical and syntax rules:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0"
xmlns:fo="http://www.w3.org/XSL/Format/1.0">
<xsl:import href="main.xsl"/>
<xsl:template match="note">
<fo:display-rule/>
<fo:block font-posture="italic" font-weight="bold">
<xsl:text>Note: </xsl:text>
<xsl:apply-templates/>
</fo:block>
<fo:display-rule/>
</xsl:template>
</xsl:stylesheet>
To produce HTML according to SGML lexical and syntax conventions:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/XSL/Transform/1.0">
<xsl:output method="html"/>
<xsl:import href="main.xsl"/>
<xsl:template match="note">
<hr/>
<p><i><b>
<xsl:text>Note: </xsl:text>
<xsl:apply-templates/>
</b></i></p>
<hr/>
</xsl:template>
</xsl:stylesheet>
Stylesheet association
Relating documents to their stylesheets
-
http://www.w3.org/TR/xml-stylesheet
- associating one or more stylesheets with a given XML document
- same pseudo-attributes and semantics as in the HTML 4.0 recommendation
elements:
-
<LINK REL="stylesheet">
-
<LINK REL="alternate stylesheet">
Ancillary markup
- not part of the structural markup of an instance, thus it is marked
up using a processing instruction rather than first-class (declared or declarable
in a document model) markup
Typical examples of use:
<?xml-stylesheet href="fancy.xsl" type="text/xsl"?>
<?xml-stylesheet href="normal.css" type="text/css"?>
Less typical examples provided for by the design:
<?xml-stylesheet alternate="yes" title="small"
href="small.xsl" type="text/xsl"?>
- provide the processor with an alternate stylesheet if some external
stimulus triggers it by name
<?xml-stylesheet href="#style1" type="text/xsl"?>
- instruct the processor to find the stylesheet embedded in the source
document at the named location
XML pointing and linking languages (XPointer and XLink)
Linking describes a relationship:
- the association of information as defined by two components
- the addresses of what are being associated
- the essence of the association
Two W3C recommendations:
- designed to work together to fulfill these two objectives
- XPointer
-
http://www.w3.org/TR/xptr
- defines constructs that support addressing into the internal structure
of XML documents
- specific reference to elements, character strings and other parts
of XML documents
- address ranges or individual nodes
- extends what is provided in XPath as a core
- addressing expressions in URI (Universal Resource Identifier) references
as fragment identifiers
- independent of use as either a link destination or some other application-specific
purpose
Two W3C recommendations (cont.):
- XLink
-
http://www.w3.org/TR/xlink
- defines constructs that support describing the links between addressed
information
- asserts relationships to exist between two or more data objects
or portions of data objects
- simple unidirectional hyperlinks
- more sophisticated multi-ended and typed links
- includes semantic and behavior attributes
- some believe these behaviors should be entirely layered upon generic
semantic information as happens in formatting
Rich heritage
- is based on concepts successfully developed and used in
- TEI (Text Encoding Initiative)
- HyTime (Hypermedia and Time-based Structuring Language - ISO/IEC-10744)
First-class markup
- constructs are not ancillary to the markup of the document
- linking information is marked up using elements and attributes
- unlike an ancillary construct such as stylesheet association (using
a processing instruction)
- the linking of two or more pieces of information is part of the
essence of the information being described
XML Topic Maps
XML Topic Maps
-
http://www.infoloom.com/topmap.htm
-
http://www.TopicMaps.com
-
http://www.TopicMap.com
- an architecture for defining notations for structuring information
resources by defining topics and relationships between topics
- occurrences
- groupings of addressable information objects
- associations
- relationships between topics
- facets
- property/value pairs to information objects and components
- scopes
- limitations on what can be associated (for consistency and appropriateness)
- based on ISO/IEC 13250:1999 Topic Maps
- itself based on ISO/IEC 10744:1997 HyTime
- being recast in XML
- will base locations on XPointer
Co-existing models of knowledge domains
- ideal for navigation by conceptual subjects and themes
- used to relate information found in a corpus of numerous resources
of numerous types
- oriented to a human as the end user of the information
- ideal for indexes, glossaries, thesauri, general finding aids
- typically ancillary to the corpus itself
- applied from "above" the information set rather than from "within"
the information set
- information set is untouched by the application of a topic map
Architecture based:
- information modelers define vocabularies derived from the Topic
Map architecture
- navigation tools can accommodate arbitrary vocabularies by acting
on Topic Map base architecture
A navigation description document:
Resource Description Framework (RDF)
Resource Description Framework (RDF)
-
http://www.w3.org/RDF
-
http://www.w3.org/TR/PR-rdf-schema
-
http://www.w3.org/TR/REC-rdf-syntax
- general treatment of metadata (information about information)
- oriented to a computer as the end user of the information
- enables automated processing of web resources
- searching
- cataloguing
- hierarchical site maps
- an RDF data model resembles an entity-relationship diagram
- schema-specification vocabulary in XML syntax
- user defines resource description language as an application-specific
schema
- attributes and relationships of items on the web
- resources and their properties
XML Query
XML Query
-
http://www.w3.org/TR/xmlquery-req
- create a data model for XML documents
- based on the XML Information Set
- includes support for Namespaces
- create a set of query operators against the data model
- for a fixed collection of documents
- for a single document
- for a collection of simple and complex values
- independent of any evaluation strategy
- supports simple and complex types from XML Schema Datatypes
- supports references within and between XML documents
- supports real and virtual documents
- can act on transient result of transformation not present in a static
form
XML Signature
XML Signature
- http://www.w3.org/TR/xmldsig-requirements
-
http://www.w3.org/TR/xmldsig-core/
- a XML syntax used for creating and representing signatures on digital
content
- procedures for computing and verifying such signatures
- methods of referencing collections of resources
- algorithms
- keying information and management
- flexible signing methods
- enveloped signatures for content from one or more resources in the
same or separate XML documents
- detached signatures for data outside of the document containing
the signature
- basic signature services
- data integrity ("this data is unchanged and whole")
- authentication ("this data came from me")
- non-repudiation ("this data came from you")
Scalable Vector Graphics (SVG)
Scalable Vector Graphics (SVG)
-
http://www.w3.org/TR/SVG/
-
http://www.svgcentral.com/
- a vocabulary for representing two-dimensional graphics
- common semantics with differing application syntax
- that can be exchanged
- final-use rendering properties embedded in the instance
- guarantees interoperability
- explicitly does not include facilities for embedding stylesheets
or style declaration blocks
- XML alternative for graphic interchange formats such as Encapsulated
PostScript (EPS)
- that can be styled
- final-use rendering properties are not embedded in the instance
- arbitrary styling languages can influence presentation
- external stylesheets
- internal stylesheets
- embedded style declaration blocks
- cascade of styling properties
- using DOM and scripting allows an existing instance of SVG to be
manipulated and dynamically refreshed
- can be the result vocabulary of the output of applying XSL Transformations
to XML sources
Basic types of constructs
- geometric vector
- image
- text
Unlimited application areas:
- charting
- information visualization
- cartography
Consider a simple example of polygons:
<?xml version="1.0"?>
<svg width="175" height="145" >
<g style="stroke:black; fill:black" >
<polygon points=" 5, 50, 5, 81, 12, 64" />
<polygon points=" 5, 45, 41,116, 41, 73" />
<polygon points=" 44, 76, 44,119, 61,115" />
<polygon points=" 46, 73, 75,140,105, 73" />
</g>
<g stroke="black" fill="black">
<polygon points="107, 76,106,119, 89,115" />
<polygon points="144, 45,109,116,109, 73" />
<polygon points="145, 41,167, 4,109, 71" />
<polygon points=" 66, 72, 75, 63, 84, 72" />
</g>
</svg>
Mathematical Markup Language (MathML)
Mathematical Markup Language (MathML)
-
http://www.w3.org/TR/REC-MathML
-
http://www.w3.org/TR/MathML2
- a vocabulary for describing mathematical notation
- captures structure/presentation
- 28 element types
- traditional math constructors
- basic kinds of symbols
- expression-building structures
- captures intended semantics
- 75 element types
- encoding of underlying mathematical constructs
- accommodates ambiguous presentation of differing concepts
- may be processed and evaluated
- supports (though not necessarily completely) arithmetic, algebra,
logic and relations, calculus, set theory, sequences and series, trigonometry,
statistics, linear algebra
- well-defined interaction between presentation and semantics
- catalogue of entities for extended characters
Wireless Application Profile (WAP)
Wireless Application Profile (WAP)
-
http://www.wapforum.org/
- a vocabulary targeted for information delivery and presentation
for mobile applications
- Wireless Markup Language (WML)
- derived from HTML
- includes WML scripting language
- presentation controls optimized for smalls screens
- scalable from line displays to graphical displays
- navigation controls facilitate one-hand usage
- security facilities
- client-server architecture
- simple, thin user agent (micro-browser)
- WAP Gateway manages and directs information traffic