CIT041X Index > Advanced Relax NG

Advanced Relax NG

Lists

Some XML markup languges let you have whitespace-separated lists of items as attribute values or element content. For example, you might want to have any number of stylesheet classes in HTML, or a custom complex number element that has two floating point values, as follows:

<div class="floatleft bordered emphatic">
<complex-number>3.2 4.5</complex-number>

We'd define these elements as follows:

<element name="div">
    <text/>
    <attribute name="class">
        <list>
            <oneOrMore>
                <data type="NMTOKEN"/>
            </oneOrMore>
        </list>
    </attribute>
</element>
<element name="complex-number">
    <list>
        <data type="double"/>
        <data type="double"/>
    </list>
</element>

Referencing External Files

If you have a module that you want to use among many different markups, use the <externalRef> element. Let's say you have elements and attributes for mathematical formulas, and you want to include them all as one package. Put them into a file called formula.rng and say this:

<element name="math">
    <externalRef href="formula.rng"/>
</element>

Namespaces

Please read pages 91-96 of XML: The Complete Reference.

DTDs are not inherently namespace-aware. You can make a direct declaration of an element with the prefix included, as in the following example, but that ties you down to a specific prefix.

<!ELEMENT eq:variable (#PCDATA)>

A more flexible approach uses parameter entities. This is adapted from http://www.w3.org/2001/XMLSchema.dtd

<!-- prefix can be overriden in the internal subset of a
     schema document to establish a different namespace prefix -->
<!ENTITY % prefix 'eq:'> 

<!-- if %prefix is defined (e.g. as foo:) then you must also define %suffix
     as the suffix for the appropriate namespace declaration (e.g. :foo) -->
<!ENTITY % suffix ':eq'>

<!ENTITY % nds 'xmlns%suffix;'>

<!-- Define all the element names, with optional prefix -->
<!ENTITY % formula "%prefix;formula">
<!ENTITY % variable "%prefix;variable">

<!ELEMENT %formula; (#PCDATA | %variable;)*>
<!ATTLIST %formula;
    %nds;   CDATA   #FIXED 'http://www.mathstuff.org'>
<!ELEMENT %variable;>

While this technique does solve the problem for DTDs, it's not truly aware of namespaces; it just works by tacking on the appropriate prefixes. To be truly namespace aware, you must actually connect the namespace with a URI. This is what Relax NG does with the ns attribute. Here's the preceding example, written in Relax NG:

<element name="formula" ns="http://www.mathstuff.org">
    <interleave>
        <text />
        <zeroOrMore>
            <element name="variable"> <text/> </element>
        </zeroOrMore>
    </interleave>
</element>

The URI declared in the outer ns attribute is “inherited” by all the children of that element; that's why we didn't have to specify an ns attribute on the <element name="variable">

Once this is set up, the RNG will validate an XML file using any prefix, so long as its xmlns specification points to the proper URI. This will validate:

<math:formula xmlns:math="http://www.mathstuff.org">
    <math:variable>P</math:variable> =
    <math:variable>m</math:variable>
</math:formula>

But this won't, since the URI doesn't match:

<eq:formula xmlns:eq="http://www.mathworld.org">
    <eq:variable>P</eq:variable> =
    <eq:variable>m</eq:variable>
</eq:formula>

Applying Namespaces

We can now apply this knowledge to the wrestling club database. Often, a club will have its website URL in the <info> element, and may wish to use <b> and <i> elements. While we could add these as part of the definition of our club database markup language, they are really HTML elements, and it is appropriate to use namespaces to mark them as such:

<club-database xmlns:html="http://www.w3.org/1999/xhtml">
<association id="SCVWA">
<club id="H25">
    <charter>2000</charter>
    <name>Gilroy Hawks</name>
    <location>Gilroy</location>
    <!-- [snip] -->
    <info>
        USA Wrestling card
        <html:b>required - <html:i>No Exceptions</html:i></html:b>.
        See <html:a href="http://www.someclub.com/">our website</html:a>
        for further details. 
    </info>
</club>
<!-- remainder of document -->

Here's the RNG needed to make this happen:

<element name="info">
    <ref name="HTML"/>
</element>

<define name="HTML">
<interleave>
    <text/>
    <zeroOrMore ns="http://www.w3.org/1999/xhtml">
    <choice>
        <element name="b">
            <ref name="HTML"/>
        </element>
        <element name="i">
            <ref name="HTML"/>
        </element>
        <element name="a">
            <attribute name="href"/>
            <ref name="HTML"/>
        </element>
    </choice>
    </zeroOrMore>
</interleave>
</define>

Context Sensitivity

So far, so good. However, the current definition lets us have an HTML <a> element nested within another <a> element. This is a meaningless construct, and should be invalid. Relax NG lets you provide several definitions for an element, and the context tells you which one is correct. We will change the definition of our HTML subset to say that an <a> element contains HTML without links; that new definition will redefine the <b> and <i> elements.

<define name="HTML">
<interleave>
    <text/>
    <zeroOrMore ns="http://www.w3.org/1999/xhtml">
    <choice>
        <element name="b">
            <ref name="HTML"/>
        </element>
        <element name="i">
            <ref name="HTML"/>
        </element>
        <element name="a">
            <attribute name="href"/>
            <ref name="HTML_without_link"/>
        </element>
    </choice>
    </zeroOrMore>
</interleave>
</define>

<define name="HTML_without_link">
<interleave>
    <text/>
    <zeroOrMore ns="http://www.w3.org/1999/xhtml">
    <choice>
        <element name="b">
            <ref name="HTML_without_link"/>
        </element>
        <element name="i">
            <ref name="HTML_without_link"/>
        </element>
    </choice>
    </zeroOrMore>
</interleave>
</define>