What we’ve been doing so far works well for simple markup. In any complex grammar, however, the notation will become far too dense to be understandable, and the indentation will march off the right side of the paper. To avoid this problem, we’ll use named patterns to give a name to part of the pattern, and then refer to those names from another section of the grammar. Here’s the current address book grammar, modularized.
grammar {
start= element addressBook
{
cardContent*
}
cardContent =
element card {
nameContent,
(
element email {text} |
element phone {text}
),
element note { text } ?
}
nameContent =
element name { text } |
(
element firstname { text },
element lastname { text }
)
}
This is not just a simple notational convenience; it’s an absolute necessity if we are to have recursive grammars. This is a grammar where an item is referred to in terms of itself. Here’s an example:
A document consists of one or more lists. A list consists of one or more items, each of which may contain either text or another nested list.
This specification requires definitions and references. We will show a sample valid document first, and follow it by the Relax NG:
<document>
<list>
<item> First item outer </item>
<item> Second item outer </item>
<item>
<list>
<item> nested first </item>
<item> nested second </item>
</list>
</item>
<item> Third item outer </item>
</list>
</document>
grammar {
start =
element document
{
list-defn+
}
list-defn =
element list
{
element item
{
( text | list-defn )
}+
}
}
Note: a
definition name can be the same as an element name, but it’s better
to give it a different name. We’ve added the suffix -defn
to our definitions for this purpose.
To create an empty element, specify empty as the
element content rather than text. Specifying
text allows you to put no text between opening
and closing tags. Specifying empty forbids text
or child elements between opening and closing tags.
To specify an element’s attributes, you use
attribute specifications instead of
element specifications. Here’s how we’d specify
some of the attributes for HTML’s <img/>
element. This element requires a src and alt attribute,
and has optional width and height.
element img
{
empty,
attribute alt {text},
attribute src {text},
attribute width {text}?,
attribute height {text}?
}
Note: the following text is copied directly from the Relax NG tutorial; it’s beautifully written and nearly impossible to improve upon.
Copyright © The Organization for the Advancement of Structured Information Standards [OASIS] 2001. All Rights Reserved.
This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.
The , and | connectors can be applied to attribute patterns in the same way they are applied to element patterns. For example, if we wanted to allow either a name attribute or both a givenName and a familyName attribute, we can specify this in the same way that we would if we were using elements:
element addressBook {
element card {
(attribute name { text }
| (attribute givenName { text },
attribute familyName { text })),
attribute email { text }
}*
}
The , and | connectors can combine element and attribute patterns without restriction. For example, the following pattern would allow a choice of elements and attributes independently for both the name and the email part of a card:
element addressBook {
element card {
(element name { text }
| attribute name { text }),
(element email { text }
| attribute email { text })
}*
}
As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:
<card name="John Smith" email="js@example.com"/> <card email="js@example.com" name="John Smith"/> <card email="js@example.com"><name>John Smith</name></card> <card name="John Smith"><email>js@example.com</email></card> <card><name>John Smith</name><email>js@example.com</email></card>
However, it would not match
<card><email>js@example.com</email><name>John Smith</name></card>
because the pattern for card requires any email child element to follow any name child element.
While the preceding examples show the power and flexibility of Relax NG, I don’t recommend them as an example of good design. If you give people too many options for data-oriented markup, they will wonder why that last example doesn’t work, given that “every other combination works great.” We’ll see a way to get around even this problem a bit later.
We can use the | in Relax NG compact to
specify that an attribute can have one of a specific set of values.
For example, if we want an align attribute to have
the possible values left, right, or
center, we’d specify:
attribute align
{
"left" | "right" | "center"
}
Let’s take the description for the wrestling club database from the second lecture, and translate it into Relax NG.
All of our examples so far have been data-oriented; most of the meaning and structure is carried by the elements; the text simply fills in the blanks.
Let us now turn our attention to describing narrative-oriented markup. These are markup languages like HTML, where text is king, and elements are sprinkled throughout to add structure. Consider the following folksy version of a weather report:
<report> Here is your weather for <month>April</month> <day>3</day>,<year>2002</year>. Morning <cloud time="am">fog</cloud>, <cloud time="pm">clearing</cloud> in the afternoon. The high will be from <min-high>75</min-high> to <max-high>79</max-high> degrees, with an overnight low between <min-low>46</min-low> and <max-low>50</max-low> degrees. A total of <precip type="rain" units="in">1.5</precip> inches of rain fell, much to the delight of local farmers. </report>
This is called mixed content, since
it has text mixed with elements, and the elements may appear in any
order. Relax NG lets you specify mixed content with the
& character. When you join patterns with
&, they may appear in any order.
any order. Here’s the pattern for a weather report. You’ll notice that
our patterns become longer as we go along, since we are able to make
them more detailed and specific.
grammar {
start =
element report
{
text &
element day {text} &
element month {text} &
element year {text} &
element cloud
{
text,
attribute time { "am" | "pm" }
} * &
element min-low { text } &
element min-high { text } &
element max-low { text } &
element max-high { text } &
element precip
{
text,
attribute type { text },
attribute units { text }
} *
}
}
Note that using interleaving does not
automatically allow an infinite number of any of the child elements. In
the specification above, you can have exactly one
<month>, <day>, and
<year>. We had to use the * quantifier
to allow multiple <cloud> elements within the
weather report.
Finally, you may declare
mixed {some pattern}
as a shortcut for
text & some pattern
In the weather report, we specified the minimum and maximum
temperatures as text, but that’s really a bit
too broad a categorization. Content like 73 or
-12.5 is fine; it would be nice to say that content
like low 90's or twenty-two is invalid.
If you have a markup language that keeps track of people’s personal
information, you want to ensure that the person’s
age is an integer and that their bank balance is a floating point
number.
Relax NG is able to validate the data types of element content
and attribute values. It borrows its data typing language from
XML Schema. You tell Relax to use these datatypes by using a prefix
of xsd: when you are specifying the type of data that
an element or attribute should have.
The following fragment specifies that a bank
account’s acct-id must be an ID (i.e., unique and must
begin with a letter or underscore),
the <age> must contain an integer, and
the <balance> a decimal number.
element account
{
attribute acct-id { xsd:ID },
element owner { text },
element age { xsd:integer },
element balance { xsd:decimal }
}
Here’s a list of the most popular data types.
integerpositiveInteger,
negativeInteger, nonPositiveInteger,
and nonNegativeInteger.decimalfloat and doubleE notation. double
allows larger range of exponents than float. Examples:
3.5e12, 0.4e-2IDIDREFNMTOKENstring<text/>, but it is the specification you
must use if you wish to have parameters.Dates and times are specified as per the ISO 8601 specification.
| Data Type | Example |
|---|---|
date |
2002-05-27 |
gYear |
2002 |
gMonth |
--05-- |
gDay |
---21 |
gYearMonth |
2002-05 |
gMonthDay |
--05-27 |
time |
13:20:4813:20:37-05:00 |
Things like positiveInteger include a lot of territory
(and work only with integers). What if you decide that you need a price
to be a positive decimal number? Or a quantity must be an integer
greater than or equal to 10 and less than or equal to 100? You can
attach parameters to data types to further restrict the valid values
between inclusive or exclusive minimum and maximum values.
element price
{
xsd:decimal { minExclusive = "0" }
},
element qty
{
xsd:integer { minInclusive = "10" maxInclusive = "100" }
}
You may restrict a text element or attribute’s length
with the length, minLength and
maxLength parameters. Here’s a fragment that
restricts a postal-code attribute to be exactly
seven characters long, and a city to
be at least four but no more than seventeen characters long:
attribute postal-code
{
xsd:string { length="7" }
},
attribute city
{
xsd:string
{
minLength="4" maxLength="17"
}
}
Finally, the most important and powerful way to restrict a
string’s values: regular expressions.
The keyword for these
parameters, pattern, has been inherited from XML Schema,
and is not to be confused with the patterns of
elements and attributes that Relax NG sets up. Here’s an element
for verifying a Canadian Postal code (letter, digit, letter, space,
digit, letter, digit) and a US phone number in the form
408-555-1212
element canada-post
{
xsd:string { pattern="[A-Z]\d[A-Z]\s+\d[A-Z]\d" }
},
element us-phone
{
xsd:string { pattern="\d{3}-\d{3}-\d{4}" }
}
This, by the way, now lets us update the wrestling club database grammar so that we can check that the age groups consist of the letters K, C, J, and O, in that order, and at most one of each:
element age-groups
{
empty,
attribute type
{
xsd:string { pattern="K?C?J?O?" }
}
}