Markup comes from the bad old days before word processors.
If you needed a brochure, you'd type it on a typewriter, and then
literally mark it up with a red pen to tell the typesetter what you
wanted it to look like. The typesetter would follow your instructions
and return a finished document to you:
There are two kinds of wrenches: wrenches with fixed size, and adjustable wrenches.
In this instance, we're using markup not only to show how text should be presented (italic versus normal text), but also to tell how the document is structured: some of the words form a heading, the other words are just ordinary text.
The idea of using markup to impose structure on otherwise anonymous data is such a good one that people came up with a standardized way to create markups for general use. This method was called the Standard Generalized Markup Language, or SGML. SGML really isn't a language in and of itself, it is more of a “rulebook” that tells you how to develop these markup languages. Any markup that follows the SGML rulebook is called an application of SGML.
The most widely known application of SGML is a language used to mark up text for delivery and presentation on the World Wide Web. That language is HTML, the HyperText Markup Language. In HTML, we can mark up the example above to send to a web browser instead of a typesetter:
<h3>How to Buy a Wrench</h3> <p> There are two kinds of wrenches: wrenches with fixed size, and <i>adjustable</i> wrenches. </p>
There are many other applications of SGML, but they're mostly found in
large corporations and government agencies. That's because the SGML
rulebook is very complex, which makes it hard to learn.
For example, SGML allows optional opening and closing tags.
Quick: is
</li> required or not? How about
<body>?
Additionally, it's difficult (and expensive!) to develop tools
that can manage data that's marked up according to those rules.
While HTML is a good thing, it doesn't solve all our problems. Consider the following two tables. While the data is structured into rows and cells, there's nothing to tell you (other than your intuition) that the first table gives maximum and minimum temperatures, while the second table gives current and maximum capacities for water reservoirs.
<table border="1"> <tr> <td>Chicago</td><td>13</td><td>6</td> </tr> <tr> <td>Dallas</td><td>60</td><td>20</td> </tr> </table>
<table border="1"> <tr> <td>Calero</td><td>5538</td><td>10050</td> </tr> <tr> <td>Uvas</td><td>6095</td><td>9935</td> </tr> </table>
To solve the complexity issue, XML was designed as a subset of SGML. It eliminates the features that make SGML difficult to learn and parse while retaining 90% of the power of SGML. Tools that analyze and display XML are easier to write, and are widespread and inexpensive. Since XML is a subset of SGML, it lets you devise any set of tags you wish, thus solving the problem of differentiating what would be otherwise be anonymous numbers:
<temperatures>
<city name="Chicago">
<max>13</max><min>6</min>
</city>
<city name="Dallas">
<max>60</max><min>20</min>
</city>
</temperatures>
<water-banks> <reservoir name="Calero"> <current>5538</current><capacity>10050</capacity> </reservoir> <reservoir name="Dallas"> <current>6095</current><capacity>9935</capacity> </reservoir> </water-banks>
Consider the following example:
<p>Here is some <b>important</b> and <i>useful</i> information.</p>
The <p>
element is the parent of five children:
Here is some <b> element and <i> element information.
Each of these children is the sibling of the other
children. Note that the <b> and
<i> elements also have children.
HTML didn't care whether you wrote your element names or attribute names in uppercase or lowercase. XHTML is case-sensitive; all element and attribute names must be lowercase.
<OL Type="A"> <li>item one</li> <li>item two</LI> </oL>
<ol type="A"> <li>item one</li> <li>item two</li> </ol>
Notice that the attribute value can be uppercase. Some people use uppercase element names because they stand out better from the surrounding text; but it turns out that all lowercase is easier to read. Hey, it's an imperfect world.
If you have nested elements (one element inside another), you must end the inner element before the outer one. Older browsers do their best to display improperly nested HTML; XML tools will reject any XHTML document that has a nesting error.
<b>Outer and <i>inner</b> elements</i>
<b>Outer and <i>inner</i> elements</b>
<a href=page2.html> <a href="page2.html" name="b" href="abc.html"> <a href="page2.html"name="b">
<a href="page2.html"> <a href="page2.html" name='b'>
Finally, all attributes must have both a name and a value. For those attributes in HTML that didn't require values, you must duplicate the attribute name as the value. Here are some examples:
<dl compact> <option selected> <td nowrap>
<dl compact="compact"> <option selected="selected"> <td nowrap="nowrap">
This is a big one.
<p> Paragraph one <p> Paragraph two
<p> Paragraph one </p> <p> Paragraph two </p>
What, then, are we to do with elements like <br>
and <img>, which don't have any content, and thus
don't need any closing tags in HTML? We can do one of
two things: we can put in a closing tag, or we can use a
“shorthand form” by placing a / before the
> of the element, as in the following examples.
<br></br> <br /> <img src="wsp.png" alt="WaSP logo"></img> <img src="wsp.png" alt="WaSP logo" />
You'll note that we've put a blank before the slash; this keeps older
browsers from freaking out when they encounter one of these shorthand
elements. You should use the shorthand form only for
empty elements (elements that don't have content). If you want a paragraph
with no text in it, use the opening-and-closing-tag form. This reminds
people who read your source that the <p> element
is still a container element—one that can have content, but doesn't happen to at this moment. This, too,
will prevent older browsers from freaking out.
<p />
<p></p>
You cannot put two hyphens in a row inside an HTML comment, but you may use equal signs, underscores, or dashes with spaces between them.
<!-- Comments ------- like this -->
<!-- Comments - - - - like this --> <!-- Comments ======= like this --> <!-- Comments _______ like this -->
You can't put a < or & directly into
the text of your XHTML. You must instead use < and
&. And yes, the semicolon at the end of
each of these entities is
required! You don't have to encode a greater than sign as
>, as it never causes any ambiguity.
However, we recommend that you do so; this will keep your markup
looking symmetrical.
<p> He & I graphed the inequality x + 3 < y </p>
<p> He & I graphed the inequality x + 3 < y </p>
By the way, all XML processors also accept
" as a synonym for double quotes. This
lets you do things like:
<img src="hello.jpg" alt="Mrs. O'Hara says "Hi" to us!" />