Validating Web Pages

You may read this straight through, or skip to one of these main sections:

The Problem

Up until now, you have been experimenting with HTML by putting tags into your documents, saving them, and viewing them in a browser. Once the page looks OK, you move on to the next page.

When you’re just starting out with HTML, this is a great approach. However, if you intend to do web design on a professional level, you can run into problems.

The Browser Forgives a Lot of Errors

Most browsers have been written with the assumption that the people who write Web pages will make a lot of mistakes. If you write something that isn’t according to the rules, the browser will do its best to make sense out of it and display it for you.

For example, you really shouldn’t have a list item (<li>) outside of an unordered list (<ul>), but if you put it on your page, the browser will show you a bulleted item anyway. When XHTML isn’t written right, the browser has to make a decision of what it should look like.

But It Looks OK! So Who Cares?

Let’s say you have a flat tire. Let’s say that when you put on the spare tire, you attach it with only only two of the four nuts that hold the wheel in place. After you put the hub cap back on, it will look OK, and it’s good enough to get you to the tire store to buy a new tire.

Of course, you don’t do that, because if you keep driving with the wheel held on that way, you can expect trouble in the long run. Certainly, no professional car mechanic would attach a tire that way.

Similarly, if you write “tag soup” it may look OK in one browser, but not work well in other browsers. When the next set of brand-new browsers get released, they may make different decisions about how to handle your bad tags, so your page might look different.

That’s why you want to write valid HTML—so that you make the decisions, not the browser.

What’s This “valid” Stuff Anyway?

The word “valid” is just a fancy way of saying that you are following all the rules set up by the people who designed HTML. These are rules like: “if you want a list item, it had better be inside a list” or “every opening tag has to have a closing tag.” The folks at the World Wide Web consortium have set up a web page that will validate your pages—tell you if you re following the rules correctly.

Setting Up Your Documents

In order for the validator to do its job correctly, you have to tell it three things:

  1. Which version of HTML you are using. (HTML has changed a lot over the years.)
  2. What is the main language of your document?
  3. Which set of characters you are using on your page. (Is it the English alphabet, or Russian, or Chinese, or what?)

You can tell the validator which version of HTML you are using by putting this line in as the very first lines of your file. It goes even before the opening <html> tag.

<!DOCTYPE html>

You tell the validator what the document’s main language and XML “namespace” is by adding attributes to the opening <html> tag.

<html xmlns="http://www.w3.org/1999/xhtml"
    xml:lang="en" lang="en">

You tell the validator what “character set” (English, Russian, Vietnamese, etc.) your document uses by putting the following line right after the opening <head> tag:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Using the Validator Web Page

Once you have your document ready, go to http://validator.nu/ or http://validator.w3.org. The part of the page we are interested in lets you upload files from your computer. Set up your screen so it matches the following screenshot. Choose File Upload. Click the picture to see a larger view.

Screenshot of validator with buttons highlighted

First, click the Browse... button to select the file you want to validate. This will bring up the standard file chooser dialog. Once you locate the file, click the Validate button, and the validator will tell you if your file is valid or not.

If your page is valid, you’ll see something like this. If you have warnings, do not ignore them—read the warnings and fix the problems!

Display from a valid file

If you have a mistake in your file, like this one:

<p>
This is a paragraph where we 
<i>turned on italics, but
forgot to turn it off!
</p>

You will get error messages from the validator. See the result of validating this bad file.

Common Errors

Close Your Quote Marks and Angle Brackets

Make sure your every opening quote has a closing quote, and every opening < has a closing >.

Incorrect
<img src="joe.jpg alt="Picture of Joe" />
<p Have a good one!</p>
Correct
<img src="joe.jpg" alt="Picture of Joe" />
<p>Have a good one!</p>

Spelling Counts

If you misspell a tag name or an attribute name, the validator will say it’s wrong. (The browser will just ignore that element or attribute.) Beware especially of using scr instead of src in your image elements! Remember, the src attribute tells the source file for the graphic!

Incorrect
<image src="joe.jpg" alt="Picture of Joe" />
<image scr="joe.jpg" alt="Picture of Joe" />
Correct
<img src="joe.jpg" alt="Picture of Joe" />

The rest of these errors are ones that the browser will overlook; it will just try to do the best it can. The validator, on the other hand, is a big meanie, and won’t let you get away with them!

Use Lowercase

XHTML is case-sensitive; all element and attribute names must be lowercase. HTML5 doesn’t care; you can mix upper and lowercase as much as you like. However, to be consistent, stick with the XHTML syntax, and use lowercase only. The letter E in Example can stay uppercase because it is the attribute value, not the attribute name.

Incorrect
<OL Class="Example">
<li>item one</li>
<li>item two</LI>
</oL>
Correct
<ol class="Example">
<li>item one</li>
<li>item two</li>
</ol>

Nest Elements Properly

If you have nested elements (one element inside another), you must end the inner element before the outer one.Browsers do their best to display improperly nested HTML; the validator will reject any HTML document that has a nesting error.

Incorrect
<b>Outer and <i>inner</b> elements</i> nested incorrectly.
Correct
<b>Outer and <i>inner</i> elements</b> nested correctly.

Rules for Attributes

  1. In XHTML syntax, attribute values must be enclosed in quote marks. You can use either double or single quotes to enclose the value of an attribute, but they must be there. In HTML syntax, you sometimes can leave them out. But rather than have you memorize rules about when you can or can’t use quotes, just use them all the time. It’s never wrong.
  2. Attribute names must be unique.
  3. Attributes must be separated by whitespace (spaces, tabs, or new lines)
Incorrect
<a href=page2.html>
<a href="page3.html" id="b" href="abc.html">
<a href="page4.html"id="c">
Correct
<a href="page2.html">
<a href="page3.html" id='b'>
<a href="page4.html" id="c">

All Opening Tags Must Have Closing Tags

In XHTML, any element that contains text between the opening and closing tag (like paragraphs, bold, italic, list items, etc.) has to have both tags. In HTML, many (but not all) opening tags have optional closing tags. Again, rather than have you memorize which ones are optional, always use closing tags. Then you don’t have to worry.

Incorrect
<p>
Paragraph one
<p>
Paragraph two
Correct
<p>
Paragraph one
</p>
<p>
Paragraph two
</p>

What, then, are we to do with elements like <br> and <img>, which don’t contain text? They still need closing tags, so we can do one of two things: we can put in a closing tag, or we can use a “shorthand form” by placing a / before the > of the element, as in the following examples.

<br></br>
<br />
<img src="joe.jpg" alt="Picture of Joe"></img>
<img src="joe.png" alt="Picture of Joe" />

You’ll note that we’ve put a blank before the slash; this keeps really old browsers from freaking out when they encounter one of these shorthand elements. In HTML syntax, you can leave out the closing slash, but to be consistent, we will use the slash as if it were XHTML syntax.

You Must Encode < and & Symbols

The less than sign is special—it tells the browser that you are about to start a tag. The ampersand symbol (&) is also a special symbol for HTML. You can’t put a < or & directly into the text of your document when you are using XHTML syntax. You must instead use &lt; and &amp;. And yes, the semicolon at the end is required! You don’t have to write a greater than sign as &gt;, as it never causes any ambiguity. However, we recommend that you do so; this will keep your markup looking symmetrical.

The HTML syntax will sometimes let you put in an ampersand all by itself. Again, rather than having you memorize the conditions when you can or can’t do this, always use &amp; it is guaranteed to work.

Incorrect
<p>
He & I graphed the
inequality x + 3 < y
</p>
Correct
<p>
He &amp; I graphed the
inequality x + 3 &lt; y
</p>