Attributes and Character Entities

Attributes

We have seen that tags are like verbs or commands; <em> means “start emphasis” </p> means “end a paragraph”, and <br /> means “move to a new line.”

If you remember your high school English, you know that verbs are modified by adverbs, which tell a verb where, when, how, or to what extent it should perform its action. If HTML tags are like verbs, we should have something like adverbs to allow us to modify the way they work. These “adverbs” are called attributes.

Let’s start by adding an attribute to an element you already know: the <abbr> element.

You are learning <abbr title="Hypertext Markup Language">HTML</abbr>.

Here is what it looks like. Hover the mouse over the HTML and see what happens.

You are learning HTML.

In this course, you will always write the attribute name in lowercase to be compatible with the XHTML syntax, though it can be in either upper- or lowercase in HTML syntax. You will also always enclose the attribute value in either double quotes or single quote marks. In HTML syntax, the quote marks are sometimes optional. This was always a source of great confusion to beginning authors, who would constantly have to worry about whether or not they had one of the special cases that didn’t need them.

You may put whitespace around the equal sign if you think it improves readability. You must put whitespace between the tag name and the attribute name.

Multiple Attributes

It is possible to have more than one attribute on a tag, just as you can have more than one adverb modifying a verb. However, at this point, there are no really great examples of this. We will return to this subject at a later point.

For the moment, remember that your attribute names must be unique (you can’t put two title attributes on a <abbr>), and you must have whitespace between attributes. Important: Attributes go on the opening tag only. You never put attributes on a closing tag; it’s unnecessary, and it’s not valid HTML.

Character Entities

You’ve learned quite a bit of HTML so far, and you may be tempted to write something like this:

<p>Wow! Now I know about the <dfn> and <q> elements!</p>

Of course, we all know what you intend, but that’s certainly not what you’ll get. Try it and find out. The problem is that the <dfn> and <q> are real tags. You need some way to tell the browser, “No, no, no. That less than sign is not the beginning of a tag; I really want a plain old less than sign.”

Whenever you need a real less than sign, you must use &lt;. That’s an ampersand symbol (&) followed by the letters lt, which stand for less than, followed by a semicolon. In fact, this is what we’ve been doing all along to show you these examples of HTML.

An ampersand followed by an abbreviation or a numeric code and a semicolon is called a character entity. These are listed in Appendix D of the book and also at this link.

The other character entity which is absolutely required is the &amp;, which produces an ampersand. Thus, if you want to say “x & y are < 15”, you must do it this way:

x &amp; y are &lt; 15

Although browsers are very forgiving, and will almost always let you get away with using a normal & or < rather than the character entity, the key word here is almost. Even if you do get away with it, it’s still not valid HTML.

You never need a character entity for a greater than sign, since there’s no question whether a > belongs to a tag or not. However, for the sake of symmetry, HTML provides the &gt; for a greater than symbol.

You could write the first example above in either of these two ways. They both work (try it and find out) but, for someone who’s used to HTML, the second one looks better.

<p>Wow! Now I know about the &lt;dfn> and &lt;q> elements!</p>
<p>Wow! Now I know about the &lt;dfn&gt; and &lt;q&gt; elements!</p>

As the book says, you can also use numeric codes for character entities. A less than sign can also be written as &#60; and an ampersand as &#38;. The # is required in valid HTML.

“Foreign” Characters

These character entities also let you put “foreign” characters into your documents. The word foreign is in quotemarks because it’s only foreign to people who speak English. Let’s say we want to display this:

<p>España y México</p>

If you type it exactly that way, and you’re using a PC, it will look great on a PC. If you display the same page on a Macintosh, it won’t look right, because the character codes for ñ and é aren’t the same on the two systems. If you use the character entities (either abbreviations or numeric), however, the page will display correctly on any system:

<p>Espa&ntilde;a y M&eacute;xico</p>
<p>Espa&#241;a y M&#233;xico</p>

Typographically Correct Quotes

Did you see those quote marks around the word “Foreign” in the preceding section? Those look much better than using straight up-and-down quotes like this: "Foreign"—especially when they are in a heading. You should always use these curly quotes and a curly apostrophe to give your documents a more professional look.

To getType thisor this
&ldquo; &#8220;
&rdquo; &#8221;
&rsquo; &#8217;

In these entity names, &ldquo; stands for left double quote, &rdquo; stands for right double quote, and &rsquo; stands for right single quote. Is there a &lsquo;? Try it and find out!

In older HTML documents, you will see curly quotes produced using character entities &#147; and &#148;. Please don’t do this. Those character codes are a Windows-only standard. The world of HTML and other technologies from the World Wide Web consortium is based on the Unicode standard, which you may read about at http://www.unicode.org. Instead, always use the Unicode numbers &#8220; for opening curly quotes and &#8221; for closing curly quotes.

As a matter of record, all the character entities with numbers &#128; through &#159; are Windows-standard, not Unicode-standard.

The non-breaking space

Sometimes there are phrases or sections of text that you don’t want word-wrapped. For example, you might want a company name like “Rodriguez and Sons” to always stay together on one line. Or, you might want a mathematical equation like “x = y + z” to not be broken across a line. In this paragraph, they will break if you resize the browser window. If you’re viewing this in the browser, try it..

To tell the browser that you want a space, but you don’t want it to word wrap, you use the character entity &nbsp; which stands for non-breaking space, or its numeric equivalent, &#160;.

This paragraph has the phrase “Rodriguez and Sons” and the mathematical equation “x = y + z” written with non-breaking spaces. If you resize the browser window, you will see that the phrase and the equation always stay together; they are never broken onto separate lines. They are written this way:

Rodriguez&nbsp;and&nbsp;Sons
x&nbsp;=&nbsp;y&nbsp;+&nbsp;z

Yes, this is incredibly ugly, and it will give you fits when you are editing your documents. Your readers, however, will see text that works the way it ought to, and they will thank you for it.