CIT041X Index > Relax NG Compact Syntax (continued)

Relax NG Compact Syntax (continued)

References and Definitions

What we’ve been doing so far works well for simple markup. In any complex grammar, however, the notation will become far too dense to be understandable, and the indentation will march off the right side of the paper. To avoid this problem, we’ll use named patterns to give a name to part of the pattern, and then refer to those names from another section of the grammar. Here’s the current address book grammar, modularized.

grammar {
    start= element addressBook
    {
        cardContent*
    }

    cardContent =
        element card {
            nameContent,
            (
                element email {text} |
                element phone {text}
            ),
            element note { text } ?
        }

    nameContent =
        element name { text } |
        (
            element firstname { text },
            element lastname { text }
        )

}

This is not just a simple notational convenience; it’s an absolute necessity if we are to have recursive grammars. This is a grammar where an item is referred to in terms of itself. Here’s an example:

A document consists of one or more lists. A list consists of one or more items, each of which may contain either text or another nested list.

This specification requires definitions and references. We will show a sample valid document first, and follow it by the Relax NG:

<document>
  <list>
    <item> First item outer </item>
    <item> Second item outer </item>
    <item>
      <list>
          <item> nested first </item>
          <item> nested second </item>
      </list>
    </item>
    <item> Third item outer </item>
  </list>
</document>
grammar {

start =
    element document
    {
        list-defn+
    }

list-defn =
    element list
    {
        element item
        {
            ( text | list-defn )
        }+
    }

}

Note: a definition name can be the same as an element name, but it’s better to give it a different name. We’ve added the suffix -defn to our definitions for this purpose.

Empty Elements

To create an empty element, specify empty as the element content rather than text. Specifying text allows you to put no text between opening and closing tags. Specifying empty forbids text or child elements between opening and closing tags.

Attributes

To specify an element’s attributes, you use attribute specifications instead of element specifications. Here’s how we’d specify some of the attributes for HTML’s <img/> element. This element requires a src and alt attribute, and has optional width and height.

element img
{
    empty,
    attribute alt {text},
    attribute src {text},
    attribute width {text}?,
    attribute height {text}?
}

Note: the following text is copied directly from the Relax NG tutorial; it’s beautifully written and nearly impossible to improve upon.

Copyright © The Organization for the Advancement of Structured Information Standards [OASIS] 2001. All Rights Reserved.

This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to OASIS, except as needed for the purpose of developing OASIS specifications, in which case the procedures for copyrights defined in the OASIS Intellectual Property Rights document must be followed, or as required to translate it into languages other than English.

The , and | connectors can be applied to attribute patterns in the same way they are applied to element patterns. For example, if we wanted to allow either a name attribute or both a givenName and a familyName attribute, we can specify this in the same way that we would if we were using elements:

element addressBook {
  element card {
    (attribute name { text }
     | (attribute givenName { text },
        attribute familyName { text })),
    attribute email { text }
  }*
}

The , and | connectors can combine element and attribute patterns without restriction. For example, the following pattern would allow a choice of elements and attributes independently for both the name and the email part of a card:

element addressBook {
  element card {
    (element name { text }
     | attribute name { text }),
    (element email { text }
     | attribute email { text })
  }*
}

As usual, the relative order of elements is significant, but the relative order of attributes is not. Thus the above would match any of:

<card name="John Smith" email="js@example.com"/>

<card email="js@example.com" name="John Smith"/>
<card email="js@example.com"><name>John Smith</name></card>
<card name="John Smith"><email>js@example.com</email></card>

<card><name>John Smith</name><email>js@example.com</email></card>

However, it would not match

<card><email>js@example.com</email><name>John Smith</name></card>

because the pattern for card requires any email child element to follow any name child element.


While the preceding examples show the power and flexibility of Relax NG, I don’t recommend them as an example of good design. If you give people too many options for data-oriented markup, they will wonder why that last example doesn’t work, given that “every other combination works great.” We’ll see a way to get around even this problem a bit later.

Choices for Attribute Values

We can use the | in Relax NG compact to specify that an attribute can have one of a specific set of values. For example, if we want an align attribute to have the possible values left, right, or center, we’d specify:

attribute align
{
    "left" | "right" | "center"
}

Mini-Exercise

Let’s take the description for the wrestling club database from the second lecture, and translate it into Relax NG.

See the solution

Mixed Content

All of our examples so far have been data-oriented; most of the meaning and structure is carried by the elements; the text simply fills in the blanks.

Let us now turn our attention to describing narrative-oriented markup. These are markup languages like HTML, where text is king, and elements are sprinkled throughout to add structure. Consider the following folksy version of a weather report:

<report>
Here is your weather for <month>April</month> <day>3</day>,<year>2002</year>.
Morning <cloud time="am">fog</cloud>, <cloud time="pm">clearing</cloud> in the afternoon.
The high will be from <min-high>75</min-high> to <max-high>79</max-high> degrees,
with an overnight low between <min-low>46</min-low> and <max-low>50</max-low> degrees.
A total of  <precip type="rain" units="in">1.5</precip> inches
of rain fell, much to the delight of local farmers.
</report>

This is called mixed content, since it has text mixed with elements, and the elements may appear in any order. Relax NG lets you specify mixed content with the & character. When you join patterns with &, they may appear in any order. any order. Here’s the pattern for a weather report. You’ll notice that our patterns become longer as we go along, since we are able to make them more detailed and specific.

grammar {

start =
    element report
    {
        text &
        element day {text} &
        element month {text} &
        element year {text} &
        element cloud
        {
            text,
            attribute time { "am" | "pm" }
        } * &
        element min-low { text } &
        element min-high { text } &
        element max-low { text } &
        element max-high { text } &
        element precip
        {
            text,
            attribute type { text },
            attribute units { text }
        } *
    }
    
}

Note that using interleaving does not automatically allow an infinite number of any of the child elements. In the specification above, you can have exactly one <month>, <day>, and <year>. We had to use the * quantifier to allow multiple <cloud> elements within the weather report.

Finally, you may declare
mixed {some pattern}
as a shortcut for
text & some pattern

Data Types

In the weather report, we specified the minimum and maximum temperatures as text, but that’s really a bit too broad a categorization. Content like 73 or -12.5 is fine; it would be nice to say that content like low 90's or twenty-two is invalid. If you have a markup language that keeps track of people’s personal information, you want to ensure that the person’s age is an integer and that their bank balance is a floating point number.

Relax NG is able to validate the data types of element content and attribute values. It borrows its data typing language from XML Schema. You tell Relax to use these datatypes by using a prefix of xsd: when you are specifying the type of data that an element or attribute should have.

The following fragment specifies that a bank account’s acct-id must be an ID (i.e., unique and must begin with a letter or underscore), the <age> must contain an integer, and the <balance> a decimal number.

element account 
{
    attribute acct-id { xsd:ID },
    element owner { text },
    element age { xsd:integer },
    element balance { xsd:decimal }
}

Here’s a list of the most popular data types.

integer
Positive or negative number without decimal point. Subtypes are positiveInteger, negativeInteger, nonPositiveInteger, and nonNegativeInteger.
decimal
Decimal number, uses period as decimal point.
float and double
Allows exponential E notation. double allows larger range of exponents than float. Examples: 3.5e12, 0.4e-2
ID
An ID must begin with a letter or underscore, and is followed by a series of letters, digits, dots, hyphens, or underscores. An ID must be unique within a document. Note that this datatype is capitalized.
IDREF
The value must be an ID that exists in the current document.
NMTOKEN
A name token; follows the same rules as an ID, except that it doesn’t have to be unique.
string
Any string. This is effectively the same as <text/>, but it is the specification you must use if you wish to have parameters.

Dates and Times

Dates and times are specified as per the ISO 8601 specification.

Data TypeExample
date 2002-05-27
gYear 2002
gMonth --05--
gDay ---21
gYearMonth 2002-05
gMonthDay --05-27
time 13:20:48
13:20:37-05:00

Further Refinement of Values

Things like positiveInteger include a lot of territory (and work only with integers). What if you decide that you need a price to be a positive decimal number? Or a quantity must be an integer greater than or equal to 10 and less than or equal to 100? You can attach parameters to data types to further restrict the valid values between inclusive or exclusive minimum and maximum values.

element price
{
    xsd:decimal { minExclusive = "0" }
},
 
element qty
{
    xsd:integer { minInclusive = "10" maxInclusive = "100" }
}

You may restrict a text element or attribute’s length with the length, minLength and maxLength parameters. Here’s a fragment that restricts a postal-code attribute to be exactly seven characters long, and a city to be at least four but no more than seventeen characters long:

attribute postal-code
{
    xsd:string { length="7" }
},

attribute city
{
    xsd:string
    {
        minLength="4" maxLength="17"
    }
}

Finally, the most important and powerful way to restrict a string’s values: regular expressions. The keyword for these parameters, pattern, has been inherited from XML Schema, and is not to be confused with the patterns of elements and attributes that Relax NG sets up. Here’s an element for verifying a Canadian Postal code (letter, digit, letter, space, digit, letter, digit) and a US phone number in the form 408-555-1212

element canada-post
{
    xsd:string { pattern="[A-Z]\d[A-Z]\s+\d[A-Z]\d" }
},
element us-phone
{
    xsd:string { pattern="\d{3}-\d{3}-\d{4}" }
}

This, by the way, now lets us update the wrestling club database grammar so that we can check that the age groups consist of the letters K, C, J, and O, in that order, and at most one of each:

element age-groups
{
    empty,
    attribute type
    {
        xsd:string { pattern="K?C?J?O?" }
    }
}