Meaning and Interpretation of Markup

not as simple as you think


C. M. Sperberg-McQueen
Claus Huitfeldt
Allen Renear

Extreme Markup Languages

Montréal

15 August 2000


 Overview

Function of markup

Relevance / applications

Why worry about this question?

Related work

Henry Laurens to Lord William Campbell, 1775

It was be For When we applied to Your Excellency for leave to adjourn it was because we foresaw that we were ↑should continue↓ wasting our own time ...

Papers of Henry Laurens

The markup


  <p><del>It was be</del> <del>For</del> 
  When we applied to Your Excellency 
  for leave to adjourn it was because 
  we foresaw that we <del>were</del> 
  <add>should continue</add> 
  wasting our own time ... </p>

Markup-related Inferences

How does markup mean?

I.e. markup licenses inferences.

The meaning of markup is the set of inferences it licenses.

Specifying the meaning of markup

So ...

A straw-man proposal

Consider the example:


  <p><del>It was be</del> <del>For</del> 
  When we applied ... </p>
I.e. elements provide information about properties of their contents.

A Prolog illustration

Elements and characters represented as nodes: node(location,nodetype) where nodetype is element(gi) or pcdata(char).


node([1,5,2],element(p)).
node([1,5,2,1],element(del)).
node([1,5,2,1,1],pcdata("I")).
node([1,5,2,1,2],pcdata("t")).
node([1,5,2,1,3],pcdata(" ")).
node([1,5,2,1,4],pcdata("w")).
node([1,5,2,1,5],pcdata("a")).
node([1,5,2,1,6],pcdata("s")).
node([1,5,2,1,7],pcdata(" ")).
node([1,5,2,1,8],pcdata("b")).
node([1,5,2,1,9],pcdata("e")).
node([1,5,2,2],pcdata(" ")).
node([1,5,2,3],element(del)).
node([1,5,2,3,1],pcdata("F")).
node([1,5,2,3,2],pcdata("o")).
node([1,5,2,3,3],pcdata("r")).

[Referring to nodes]

We refer to nodes with numeric path expressions.

Think of it as a kind of pointing ...

... note also that XML nodes are nodes in trees.

Attributes

Attributes are triples: element, attribute name, value:


attr([1,5,2],id,implied).
attr([1,5,2],n,implied).
attr([1,5,2],lang,implied).
attr([1,5,2],rend,implied).
attr([1,5,2],teiform,"p").

Inferences from elements

E.g. for this paragraph:

p([1,5,2]).
del([1,5,2,1]).
del([1,5,2,3]).
add([1,5,2,97]).

Inferences from elements

property_applies(p,[1,5,2]).
property_applies(del,[1,5,2,1]).
property_applies(del,[1,5,2,3]).
property_applies(add,[1,5,2,97]).

What does property del mean?

Define properties using skeleton sentences: _____ is a paragraph, or _____ has been deleted (or marked as deleted) in the source," ...

Fill blanks with reference to element.

Inferences from attributes


  <p lang="eng"><del>It was be</del> <del>For</del> When we applied
  to Your Excellency for leave to adjourn it was because 
  we foresaw that we <del>were</del> <add>should continue</add> 
  wasting our own time ... </p>
The <p> is in English. In Prolog:
property_applies(english,[1]).
or
english([1]).

Two-argument predicates

A simpler way in Prolog:

language([1],english).
property_applies(language,[1],english).
Our way:
property_applies(language(english),[1]).

Propagating inferences downwards (inherited properties)

Properties are inherited: ("It was be" was deleted) → ("It" was deleted) → (The letter "I" of that word was deleted).

Side question: how far does a property propagate?

Automating inheritance

Inheritance is a general pattern, not element-type specific. In Prolog.

infer(Property,Loc) :- 
     node(Loc,element(Property)).
infer(Property,Loc) :- 
     node(Anc,element(Property)),
     descendant(Loc,Anc).

Inheritance for attributes

infer(Prop,Loc) :- 
     attr(Loc,Att,Val),
     not(Val = implied),
     Prop =.. [Att,Val].
infer(Prop,Loc) :-
     attr(Anc,Att,Val),
     not(Val = implied),
     Prop =.. [Att,Val],
     descendant(Loc,Anc).

Summary

Summarizing this first straw-man proposal, we can say:

Summary

Illustration

In Prolog, if the example paragraph is node 1.5.2:

?- infer(Property,[1,5,2]).
Property = p ->;
Property = doc ->;
Property = docbody ->;
Property = teiform([112]) ->;
Property = id([72,76,49,48,51,48,53]) ->;
Property = lang([101,110,103]) ->;
no
?-

Properties of children

?- infer(Property,[1,5,2|Tail]).

Property = p
Tail = [] ->;

Property = del
Tail = [1] ->;

Property = del
Tail = [3] ->;

Property = del
Tail = [95] ->;

Property = add
Tail = [97] ->;

Property = person
Tail = [184] ->;

Property = del
Tail = [318] ->;

Property = add
Tail = [320]
->

Finding a property

What locations have property del>?

?- infer(del,Loc).
Loc = [1,5,2,1] ->;
Loc = [1,5,2,3] ->;
Loc = [1,5,2,95] ->;
Loc = [1,5,2,318] ->;
Loc = [1,5,2,348] ->;
Loc = [1,5,2,717] ->;
Loc = [1,5,2,719,57] ->;
Loc = [1,5,2,866] ->;
Loc = [1,5,2,917] ->
...

Problems with the straw-man proposal

Nice in places, but:

Distributed and non-distributed features

What is true of the whole is not always true of the parts.

I have a dream is a sentence;

the word have is not.

Distinguish distributed properties (del) from non-distributed properties (sentence-hood).

Non-distributed properties

Non-distributed properties are true of the element as a whole, but not true of all of the content. From


  <P>Reader, I married him.</P>
we can infer the existence of one paragraph, but not that the word Reader is itself a paragraph. It is, however, within a paragraph.

Distributed properties

Consider this (from Tristram Shandy).


<hi rend="gothic">And this Indenture 
further witnesseth</hi> that the said
<hi rend="italic">Walter Shandy</hi>,
merchant, in consideration of the said
intended marriage ...

A synonymous example:

Or equivalently*:


  <P><HI REND="gothic">And</HI> 
  <HI REND="gothic">this</HI> 
  <HI REND="gothic">Indenture</HI> 
  <HI REND="gothic">further</HI> 
  <HI REND="gothic">witnesseth</HI> that the said
  <HI REND="italic">Walter Shandy</HI>,
  merchant, in consideration of the said
  intended marriage ... </P>

These examples license the same* inferences.

Distributed properties

In general: If x marks a distributed property, then

Overrides and incompatibilities

Consider:


<doc lang="en">
<p>Wittgenstein wrote:
<q lang="de"><ital>Die Welt ist alles,
was der Fall ist.</ital></q>
It is hard to escape, at first reading,
the suspicion that Wittgenstein is guilty
here of a gross platitude; it is only
after reading the rest of the
<title lang="la">Tractatus</title> that on returning
to its famous first sentence one appreciates
the depths of its intension.</p>
</doc>
For this example, the straw man leads to contradictions.

Inferences

We infer:

?- infer(lang("en"),[1]).
yes
?- infer(lang("de"),[1,1,22]).
yes
?- infer(lang("en"),[1,1,22]).
yes
?-

N-ary predicates

Consider the TEI <title> element:

n-ary predicates

N.B. these vary in convenience, but all carry the same basic information.

Arguments of predicates

In the straw man proposal, all arguments are the same:

In common markup languages, other arguments may be needed; e.g.

Such terms start from some reference point; we call them deictic expressions.

Deictic Expressions

Markup languages vary in the forms of deixis they require.

Requirements for deixis

What do we need in a language for deixis? At least:

More ... ?

Languages for deictic expressions

Conjecture: only a subset required.

Open question: how big a subset?

Deixis as complexity measure

Note:

A framework (generic parts)

To describe the meaning of the markup in a document, we will need:

Framework (DTD-specific parts)

For each markup language:

Open questions / further work

 Overview