Fun with Character and Entity References and Why You Should Use Them

Character references? Entity references? Am I referencing some obscure geek joke? No, I’m actually talking about a often overlooked aspect of the push towards a more semantic Web.

What are Character and Entity References?

First, let me explain what character and entity references are actually referencing.

Written languages the world over employ various symbols to denote different effects of verbal communication. In addition to these punctuation marks, various symbols are used to describe relationships between parts of a written phrase. One very common example of this is the dash to denote a range of numbers. For example, the dash between the years in the sentence “George Orwell (1903–1950) was a British author,” denotes a range of years in which George Orwell lived.

Many times, this is written with a simple dash. A dash looks like this, - and its purpose is to break apart long words at the end of a line in print or to connect two words to form a single one. The dash used in the above lifespan example is not a regular dash. It is called an n-dash, named for its length, about equal to the width of the lowercase letter N.

The other common dash seen in prose is the dash used to break the flow of a sentence. This is called an m-dash, which looks like this , and is named for its width (approximately as wide as the lowercase letter M). I’ve often seen this written online as two dashes like this: --. You probably have, too.

Ordinarily, word processors like Microsoft Word take care of the formatting. If you type -- and then hit the space-bar in word, the program automatically converts your dashes into an m-dash. But if you were to copy and paste the Word-formatted m-dash into an HTML document, browsers would have difficulty reading the character and would be unable to display it appropriately. This is where character references come into play.

Okay, so how do I use them?

Character references are specially-encoded characters referenced by number. For example, the character reference for writing an m-dash is —, that is, an ampersand, the octothorpe (frequently misinterpreted as the number sign) symbol, the numeric characters 8, 2, 1, and 2, followed by a semicolon to mark the end of the reference. All character and entity references begin with an ampersand (&) and end with a semicolon (;).

Entity references are almost identical to character references, except they refer to specific characters by name rather than by number. For example, the entity reference to write an m-dash is —. Entity references were created to make character references easier to remember.

Word of Warning! Many HTML entity references are not compatible with some other language formats (most notably SVG). Always use the numeric character references in favor of entity references for maximum document portability. That said, I often use entity references simply because they’re easier to remember. If I want to port my document to some other format, though, I’ll need to replace the entity references with proper entity values. This XML.com article is great if you’re looking for more information on typography in general or how it pertains to SVG specifically.

Why should I go through all this trouble?

  1. Professionalism. If you have a business Web site, then you should do it for no other reason than because it’s only professional to do so.
  2. Semantic accuracy. There is a big difference between three periods in a row and an ellipses. The former is simply bad English. (An ellipses, by the way, is …, or ….)
  3. For looks. Which line do you think looks better:
    1. "They used dumb quotes all over their site!"
    2. “I was impressed; they used smart quotes on their site!”

    Smart quotes, by the way, can be written with “ for opening quotes, and ” for closing quotes. This name is an abbreviation of “left/right double quote”. The character references are “ and ”, respectively.

Great! I love my newfound semantic correctness. Now can I have some fun?

You bet! A friend of mine blogged the other day and had just the need for such entity references. In this case, she wanted to make a check mark appear. Here’s a table showing some fun symbols you can use to spice up the text on your page! Just be aware that some older browsers won’t render these properly, because they don’t understand them, and some symbols won’t work with certain encodings. For best results, encode your page as UTF-8.

Symbol Meaning HTML Entity Reference Numeric Character Reference
Here is a very comprehensive list of HTML entity references.
Check mark? Check! None. ✓
Mail envelope. None. ✉
Victory! w00t! None. ✌
Musical double bar note. None. ♬
Black pawn. Now you can use your friends for evil. None. ♟
Cancer zodiac symbol. My sign. None. ♋
Smiley. Just like your word processor! None. ☺
Yin and yang. Gives you balance and inner peace. None. ☯
Star of David. Jewish religious symbol. None. ✡
Biohazard sign. Put this on your splash page to keep others out. Heh…. None. ☣
Skull and bones. Yar. None. ☠
Telephone. None. ☎
Solid star. Rate your friends. None. &#9733
Paragraph mark. Use it to annoy bad Microsoft Word authors who don’t turn on invisible characters. >:) ¶ ¶
· Middle dot. An interesting factoid is that this is used as a separator between foreign first and last names in Japanese. · ·
Þ Capitol THORN. Makes for a great smiley. :-Þ Þ Þ
Dagger. Used for footnotes and the like. Also useful for stabbing. † †
ƒ Latin lower case F with hook. Sometimes used on Apple Macintosh computers as a suffix for folder names. ƒ ƒ
Cursive capital letter P, power set, and Weierstraff P. ℘ ℘
Blackletter capital letter I, and mathematical imaginary part symbol. ℑ ℑ
Blackletter capital letter R and mathematical real part symbol. ℜ ℜ
Hebrew print letter Alef, and mathematical first transfinite cardinal. ℵ ℵ
mathematical vector product, or circled times. ⊗ ⊗
Lozenge, a geometric diamond. Also a great cough candy. ◊ ◊
Numero symbol. The real number sign. None. №

Hope you have fun with these. ‘Til next time, ☮-out!