 | Level: Introductory Brett McLaughlin (brett@newInstance.com), Author and Editor, O'Reilly Media Inc.
20 Sep 2005 With everyone from your eight-year-old neighbor to your eighty-year-old grandmother building Web sites, the Internet has become a slow-moving, bogged-down beast. But with just a few tricks using XHTML, you can build classy, beautiful sites that still load in the blink of an eye.
For the past two years, the industry has inundated you with messages that you've reached a new plateau in high-speed connectivity. Around 25 percent of the world's computer users now have at least a cable modem or DSL connection; of course, this implies that speed is no longer an issue. And, as such, throwing huge images or Flash movies on to a Web site is now fine! I mean, if everyone has all this bandwidth, why not use it?
Personally, I've always been able to use just about all the bandwidth I can get. With my mail application constantly polling my IMAP server while I download the latest version of Firefox, update my forum on IBM developerWorks, and surf, I still find myself waiting on pages to load. And, I still get annoyed by a site that seems slow, despite the amazing connectivity we're all supposed to have. Does any of this sound familiar? If it does, then it's time to be proactive: Assume people are just as annoyed when your site is slow, and then strive to fix it. Hopefully, if enough people start to write solid Web code, this waiting around will soon be a thing of the past (at least, as long as you don't land on that Flash newbie's site, complete with a 250 MB opening movie sequence!).
This article offers you a great start to building fast and slick Web sites. I picked a few things that the average site designer or developer usually doesn't use, but will find effective, as well as features that bring more benefits than you realize. Adjusting to these new techniques might take time, but your end result will almost certainly be a happier customer (which often turns into revenue, in one form or another).
Use XHTML for everything!
HTML is dead; XHTML thrives! This statement sounds extreme, but I wish I could shout it from the mountaintops. I expect some blank stares right about now. But don't worry if you haven't heard of XHTML, or just aren't sure why you should care; you're in the vast majority of Web designers. XHTML is the first step toward a blinding-fast Web site and a headache-free Internet.
 |
Acronym soup
Unless you've been under a rock for the past several years, you've certainly heard of XML, the Extensible Markup Language. While XHTML is not the same as XML, it is, to a large degree, an XML-ized version of HTML, being HyperText Markup Language. XML defines a set of rules for a document to follow in its data representation. XHTML (like XML Schema or XSLT) is an XML vocabulary that defines a specific set of tags and structures suitable for a particular application; in this case, to define Web pages. So XML and XHTML are closely related (and XHTML is dependent upon XML).
|
|
HTML to the X-treme!
HTML has been around for almost as long as Web pages. Designed to allow for easy content display on a Web browser, HTML has been the mainstay of Web design for well over a decade. However, HTML had (and still has) a wealth of serious problems:
- It never displays the same on different browsers (such as Mozilla, Firefox, Microsoft Internet Explorer, Safari, or Opera)
- It promotes sloppy coding because it accepts poorly formed markup. For instance, you can insert a
<br> tag, and never worry about closing it. The same is true for many other tags, including <hr> and <p>.
- It defines both a structure for a Web page (using paragraphs, headings, lists, and the like) and the styling (colors, borders, and fonts) of that structure. When a new style was introduced, the entire HTML specification had to be changed to introduce that style, even though it never affected HTML's basic structure.
I can list more, but these are the most egregious offenders. Each version of HTML (2.0, 3.2, 4.0, and most recently 4.01) failed to completely or definitively address these problems. The nature of HTML makes it impossible to completely eliminate these problems. In other words, rather than try to fix something that was fundamentally broken, the need for a new successor became clearer.
Enter XHTML. In its first version, XHTML 1.0, the World Wide Web Consortium (see W3C in Resources) attempted to produce a basic, well-formed analog to HTML. Although XHTML has some small problems, the W3C largely succeeded. Because XHTML is XML, it must be well-formed: you must start and end tags, and create nestings logically and in an orderly fashion (for more on XML semantics, check out Resources). All stylistic concerns were removed from XHTML, and left to CSS (Cascading Style Sheets), which I will discuss in detail shortly.
 |
I shall call him...mini-XHTML
XHTML Basic, an offshoot of XHTML, is essentially a stripped down version of XHTML. It's suitable for devices -- such as phones, PDAs, and pagers -- that cannot display as rich a content model as full XHTML would provide. This is a real plus for XHTML developers, as you're already using a language that you can easily translate to mobile devices.
|
|
Now in a 1.1 release, XHTML continues to slim down what most Web developers consider the bloated HTML specification. And by leaving style to CSS, the XHTML specification really becomes a manageable collection of tags. Of course, none of this means anything to you -- the Web designer and developer -- if you see no advantage to making the change.
XHTML and rendering modes
The truth of the matter is that a switch to XHTML will improve the speed by which your pages load. To understand why, though, you first need to understand the rendering modes of a browser. Before the latest crop of browsers (now in versions 6 or 7), browsers handled all the special cases in HTML, like tags that open and never close by design (including <br> and <img> tags, as well as tags that designers never close, often out of laziness (like <p>). But correctly formatted pages didn't render any faster; the same engine that handled all the special cases continued to check that perfectly formatted document, and ignored all the effort you put into closing your <p> tags.
With the latest round of browsers, though, this is no longer the case. Two different modes now exist:
- Quirks mode handles the older HTML, and works with and checks for unclosed tags,
<br>, and all the rest. As you might expect, this slows down document parsing, and therefore, delays its browser display.
- Standards mode handles constrained HTML and XHTML, and expects the document to be well structured and organized. As a result, parsing is faster, and the page displays more quickly (and almost identically across browsers).
 |
More XML, huh?
In XML, a DOCTYPE declaration allows a document to reference a set of constraints. Those constraints allow a parser (which, in this case, is part of a Web browser) to know which elements, attributes, tags, and structures the document allows. Usually, these constraints are in a file called a document type definition, or DTD, which is a format specifically for constraining structured markup (such as XML and HTML). A valid document follows those constraints or rules; an invalid document breaks the rules and generally causes an error in parsing.
|
|
As you might expect, the goal of any good Web designer now is to create pages that render in standards mode. Quicker results and display uniformity across browsers are a huge payoff for the relatively small cost of moving to XHTML. However, you can actually use standards mode on the earlier, looser versions of HTML, including 2.0. (For a complete list of HTML DTDs, see Resources.) So what gives?
First, the only trigger for a browser to parse in standards mode is a DOCTYPE as the first line of an HTML or XHTML document, as Listing 1 shows.
Listing 1. This HTML fragment's DOCTYPE declares it is HTML 4.0
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<html> <meta name="robots" content="noindex, nofollow" />
<head> <title>Head First Labs</title> <link rel="stylesheet"
type="text/css" href="hf.css">
</head>
<body>
<div id="content">
<div id="header">
<div id="logo">
<IMG src="Images/hfguy.png">
</div>
<div id="hfl">
<IMG src="Images/HeadFirstInstitute.png" width="584" height="53">
</div>
</div> <!-- header -->
<div id="menu">
<a href="index.html"
><IMG class="menuselected" src="Images/postit/home.png"
width="80" height="65"
></a>
<a href="books.html"
><IMG src="Images/postit/books-unselected.png"
width="80" height="65"
></a>
<a href="training.html"
><IMG src="Images/postit/training-unselected.png"
width="80" height="65"
></a>
<a href="forum.html"
><IMG src="Images/postit/forums-unselected.png"
width="80" height="65"
></a>
<a href="about.html"
><IMG src="Images/postit/about-unselected.png"
width="80" height="65"
></a>
<a href="writeforus.html"
><IMG src="Images/postit/write-unselected.png"
width="80" height="65"
></a>
</div> <!-- menu -->
|
But, this document is HTML 4.0-compliant, not XHTML-compliant. While the DOCTYPE declaration informs a browser to use standards mode, it will still parse the HTML according to the rules in that DTD -- in this case, HTML 4.0. So the DTD, rather than the browser itself, defines the strictness of the rules. If you use a forgiving version of HTML, such as 2.0 or 3.0, then the rules are flexible, and the browser still has to handle special cases; the result is better than parsing in quirks mode -- where all bets are off and almost anything goes -- but still not as good as using XHTML.
So it's not enough to force a browser to render in standards mode; writing HTML 2.0 code and supplying a DOCTYPE will handle that, and you'll still have to contend with a fairly slow page-rendering process. Instead, you need to supply a DTD, have your browser use standards mode, and you need to choose a DTD that is restrictive enough to speed up parsing. Here's where XHTML comes in; it defines a strict set of rules, allowing for fast parsing and cross-platform rendering of your XHTML. The result? A better user experience for everyone, on any type of computer, with any type of browser.
Convert old HTML to XHTML
Now that you're aware of XHTML's usefulness, you need to convert your existing HTML into XHTML, so you can utilize the language. That sounds like a lot of trouble; and, to be honest, it's not a totally simple process. However, the W3C -- the group that defines all HTML- and XHTML-related specifications -- has made this transition less difficult.
As you learn about XHTML, DTDs, and CSS, it's hard to go from HTML to well-formed, valid XHTML. First, you need to move to HTML 4.01, which is essentially the baseline for real Web design. Add the following DOCTYPE declaration to the first line of your HTML documents:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
This immediately renders all your documents in standards mode. This particular DTD, known as the Transitional version of HTML 4.01, will get you started in the standards-based world of HTML. It allows several attributes and constructs that the W3C is currently deprecating (meaning, they no longer will be available), which gives you a chance to use most of your HTML with minimal modification.
Once you take this first step, try to remove all those tags that are marked to be deprecated and phased out. You can do this in one of two ways:
- Learn the specification backwards and forwards, and make these changes manually.
- Change your document to use the HTML 4.01 Strict DOCTYPE, and then validate your document, fixing each error reported.
Clearly, the second option is preferred. Why bog yourself down in the intricacies of HTML and XHTML if you can use a convenient tool (covered in the Validate your HTML) to help you? The key is changing from the Transitional DTD (shown above) to this new DTD, called HTML 4.01 Strict:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
The Strict DTD actually removes the attributes and constructs that are being phased out (where HTML 4.01 Transitional still allows them). It's also a huge step toward XHTML. In fact, the XHTML 1.0 specification defines a Transitional DTD and a Strict DTD; both match up closely to their HTML cousins:
- HTML 4.01 Transitional + small XML-related changes = XHTML 1.0 Transitional
- HTML 4.01 Strict + small XML-related changes = XHTML 1.0 Strict
For practical purposes, aim for -- at a minimum -- HTML 4.01 Strict. If you can validate your model and display your sites with that, you're essentially running a slightly stylized version of XHTML. You'll gain most of XHTML's performance benefits as well. For those of you comfortable in an XML world, push right on to XHTML 1.0 Strict and XHTML 1.1.
Validate your XHTML
All this DTD and versioning isn't worth much unless you can ensure your documents are valid, and follow the rules in the DTD you specify. The easiest way to handle this is to use another tool from the W3C: the W3C Validator (see Resources). Figure 1 shows what this tool looks like.
Figure 1. The W3C Validator is an invaluable tool
Just give this tool a URL (such as http://www.headfirstlabs.com) or load a local file. Select Check to create a report, similar to the one in Figure 2, that says what's right and wrong in your HTML.
Figure 2. Sometimes the Validator's results aren't too pretty
Note that the report does more than check for errors. It will help you locate a problem (with both line and column numbers), show you the error, and even advise on how to fix the problem. Also notice that the heading gives you a link to the specification and DTD you refer to in your file. This is great as you quickly see what the W3C says about that specification, and clear up any confusion about which elements and attributes are legal.
As you begin, the task of correcting all the little problems with your HTML can be a tedious, time-consuming process. For example, HTML 4.0 Transitional requires that all your <img> tags specify alt attributes. When you validate a page with 50 or 100 images -- and none of them use the alt attribute -- this can be a huge headache. However, validation also reinforces the specification. You'll soon find that you make these additions -- which once seemed quirky -- naturally. Hundreds of errors are reduced to 10 or 15, then to just a few; soon you'll be writing error-free HTML without the Validator (although you should still check every page that you write with it!). A word to the wise: Correct your errors in the order that they appear. Often, one error near the beginning of your file will cause more to go wrong toward the end. Fix them early and you'll prevent problems later in your HTML.
Once you fix all the problems in your document, the Validator will give you a virtual thumbs up. At that point, from a technical point of view, your page is ready to put online. From your user's point of view, the page serves quickly and looks almost the same on every browser, even across platforms. For most Web sites, that's worth more than a little bit of tedium and error-correction.
And the bad news is...
This all sounds great and will make a noticeable improvement in how your sites load. However, there's a rather significant caveat. Keep in mind that a Web browser's primary purpose is to display Web sites, not to validate HTML (or XHTML), and certainly not to report errors (in any sort of robust way, at least). As a result, if your HTML has errors in it -- even if you're using the HTML 4.01 Strict DTD -- a browser will simply drop into quirks mode and display it, with nobody the wiser. This means that you can't author HTML and just throw it up on a Web server, testing its validity by loading it in a browser. You'll need to explicitly test each page in the W3C Validator (yes, I can hear the groans from here). However, without this testing, you actually place yourself in the worst possible situation: you took the pains to (almost) code and design in a strict version of HTML, but you don't get the performance improvements because your documents are invalid. If you take the time to use XHTML 1.1 Strict (or the HTML equivalent), then you'll gain the benefit. Validation is a quick process, especially once you test pages in batch.
Before you get too upset over browser behavior, remember that even when the page shows -- in quirks mode -- your user base is happy, as your site isn't down. A site that actually loads when requested is, of course, even more important than a site that loads quickly.
The benefits of CSS
CSS (Cascading Style Sheets) is, in addition to being a W3C specification (see Resources), an integral part of HTML 4.01 Strict, as well as XHTML 1.0 and 1.1. In fact, without CSS, these later versions of markup are impossible to use effectively.
Before I continue, let me tell you up front: This article doesn't teach CSS. CSS is a language in itself with articles and entire books written on the subject (I list several good ones in Resources). Here I focus on the advantages of CSS, and why it's useful for more than just converting your old HTML to HTML 4.0 Strict, as well as XHTML 1.0 Strict and XHTML 1.1. Once you understand CSS's value, you should be able to learn CSS easily.
Organize by class rather than by element
With CSS in its simplest form, you can define a set of properties -- from font selection and size to background colors and text positioning -- and then attach that set of properties to elements in a document (like HTML and XHTML). That might not seem like a big deal at first; it even sounds a lot like what you can do in HTML without CSS. However, consider this simple HTML fragment in Listing 2.
Listing 2. HTML requires formatting on the element
<html>
<head> <title>Old HTML </title></head>
<body bgcolor="#FFFFFF" color="#000000">
<p align="center"><font face="Arial" size="14">This is my page.</font></p>
<!-- Other HTML -->
</body>
</html>
|
In this example, notice two things:
- The text is formatted on the text, using the
<font> tag.
- The page is formatted on the element, using the
<body> tag.
Pretty obvious, right? However, these are not good! First, consider the case of formatting the text. It's not a huge deal to align the paragraph or set the font face and size -- until you have multiple paragraphs (see Listing 3). Now, you have font tags spread throughout the page, these tags clutter up your HTML, increase the time it takes to load and render the page, and generally make a mess of things.
Listing 3. The more elements, the more formatting
<html>
<head> <title>Old HTML </title></head>
<body bgcolor="#FFFFFF" color="#000000">
<p align="center"><font face="Arial" size="14">This is my page.</font></p>
<p align="center"><font face="Arial" size="14">This is some more text.</font></p>
<p align="center"><font face="Arial" size="14">Not very exciting, is it?</font></p>
<!-- Other HTML -->
</body>
</html>
|
 |
No more Mr. Nice Guy
Realize that, when you compare Listing 3 to Listing 4, you really compare apples to oranges. Using the <center> tag, instead of p align="center", produces slightly different results (depending on the browser). On Firefox, for example, using <center> moved all my text slightly down on the page. So even in a simple case like this, you can't truly reproduce the effect of individual element tagging with group element tagging.
|
|
Clearly, you might cut this down if you move the various font statements into one <font> tag, and use the <center> tag to remove all the align="center" statements, as Listing 4 shows.
Listing 4. An apparent solution to formatting woes
<html>
<head> <title>Old HTML <title></head>
<body bgcolor="#FFFFFF" color="#000000">
<center><font face="Arial" size="14">
<p>This is my page.</p>
<p>This is some more text.</p>
<p>Not very exciting, is it?</p>
</font></center>
<!-- Other HTML -->
</body>
</html>
|
Problems multiply, though, when you interject other formatting into the midst of this page, as you can see in Listing 5.
Listing 5. Just a slight mistake can mess up an entire page
<html>
<head> <title>Old HTML</title></head>
<body bgcolor="#FFFFFF" color="#000000">
<center><font face="Arial" size="14">
<p>This is my page.</p>
<p>This is some more text.</p>
<p align="right"><font face="Times" size="10">Just to be different...</p>
<p>Hope you didn't forget to close any font tags!</p>
<p>Not very exciting, is it?</p>
</font></center>
<!-- Other HTML -->
</body>
</html>
|
Now you have the overall font and sizing, but a paragraph in the middle uses a different font and alignment. Browser inconsistencies will start to show up, and nesting quickly becomes a headache. All of this reflects the simple problem of per-element formatting. When you apply formatting specifically to an element, you have to either duplicate formatting code (consider that you might also underline, bold, italicize, or otherwise change the text beyond the alignment, font face, and size shown in these examples), or settle for unreliable grouping (as in Listings 4 and 5). Neither option is appealing, and that's where CSS really shines.
 |
Look elsewhere, students of CSS
You can find plenty of tutorials on CSS in Resources. If this CSS looks unfamiliar or confusing to you, that's OK. Concentrate on how it simplifies and organizes your Web pages, and then learn about it using other resources. For now, you should simply see how valuable it is to even basic Web pages.
|
|
With CSS, you define rules, and then apply those rules to elements. While this might seem like just a twist on what you've already seen, realize that the browser applies the rule and doesn't insert the formatting from the rule into each element. So you've saved processing and rendering time immediately, as the browser isn't reading bytes and bytes of font instructions or alignment attributes. With the help of a simple style element, you can clean up these listings dramatically (see Listing 6).
Listing 6. Clean up formatting with CSS
<html>
<head> <title>Old HTML </title>
<!-- <STYLE type="text/css">
body { color: black; background: white; }
p { font-family: Arial; font-size: 200%; text-align: center; }
p.right {font-family: Times; font-size: 75%; text-align: right; }
--> </STYLE>
</head>
<body>
<p>This is my page.</p>
<p>This is some more text.</p>
<p class="right">Just to be different...</p>
<p>Hope you didn't forget to close any font tags!</p>
<p>Not very exciting, is it?</p>
<!-- Other HTML -->
</body>
</html>
|
With just a few lines of CSS, you can eradicate all the font, alignment, and nesting issues. In fact, you even relegate font sizes to the proverbial dust heap, and use percentages. This is great; no more absolute font sizes appearing at (strangely) different pixel heights on different browsers. Now you can ensure that your text is consistent and your headlines are twice as big (or three times as big, or 1.28 times as big) as the rest of your text.
A consistent style
Careful readers might think that I left something out; in the previous section, The benefits of CSS, I mentioned two problems with inline formatting in HTML. I dealt with the first, which is per-element formatting, but not with the second, formatting on the body of a page itself. What's the big deal with that? It seems that you gain no advantage if you move formatting from directly on a body element to a CSS style element. You still only type the information in once, so you have no redundancy, as you did with alignment and font faces.
The answer is not as readily apparent, and requires a little more understanding of what CSS can do. First, you have to realize that you can write CSS in separate files, and refer to those external files in your HTML, as Listing 7 shows.
Listing 7. Move CSS out of the page
<html>
<head> <title>Old HTML </title>
<link type="text/css" rel="stylesheet"
href="/styles/style.css" />
</head>
<body>
<p>This is my page.</p>
<p>This is some more text.</p>
<p class="right">Just to be different...</p>
<p>Hope you didn't forget to close any font tags!</p>
<p>Not very exciting, is it?</p>
<!-- Other HTML -->
</body>
</html>
|
Now your CSS rules are available to this page as well as all pages on your site. And it's with this change that the advantage of moving body formatting into CSS become clear: you can create a consistent style across all your Web pages. Now, instead of setting the body formatting for each page, just use the body tag normally -- without formatting -- and be sure your page refers to the CSS style sheet. Your rule for <body> is applied automatically, and you create a site-wide look and feel. Add some marginal padding, create a border, and make any other changes to your CSS rules; all of your pages will update immediately, with no effort on your part. And, in fact, this is the real power of CSS: the ability to easily make changes that affect not just one page or several pages, but every page on your Web site, with minimal effort.
Wrap up
So what's the bottom line? Is the move to XHTML and CSS really worthwhile? I believe the answer to this question is a resounding, "Yes!" Not only do you get a more structured environment in which to mark up your Web pages, but your users get quicker load times. And, with a move to CSS, you gain a whole set of advantages beyond just simple load time reduction. You can use a more programmatic model in your design, that enables you to easily establish a uniform and consistent experience for your users. No more do you need to remember what font size you used on the author bio page or tinker with the hex color of multiple HTML tables; and certainly no more <font> tags ti clutter up your code. Your pages will indicate layout, and your CSS will define page design.
Further, you will have caught up with the Web community, rather than trailing along behind it, writing HTML 3.2 (or, heaven forbid, HTML 2.0!). This means that your pages can use new Web technologies like Ajax (something that essentially depends on the <div> and <span> tags to do anything spectacular), without requiring another site overhaul. So get with the program -- let's see some quick, slick, professional Web sites!
Resources
- For an introduction to Cascading Style Sheets, try these developerWorks tutorials:
- The specifications in this article -- HTML, XHTML, and CSS -- are all products of work done at the World Wide Web Consortium, usually called the W3C.
- The latest HTML specification (HTML 4.01) provides the "not quite next" generation vocabulary for creating Web pages.
- For a much snappier Web experience, write all your pages in XHTML, which is well-formed HTML with some additional restrictions and features.
- For those into the actual browser interpretation of XHTML, the W3C defines a set of guidelines for interpreting XHTML.
- The W3C also publishes a list of DTDs for use in your HTML and XHTML documents.
- The Mozilla team (for Mozilla and, to a large degree, Firefox) publishes information on quirks mode that will help you understand the difference between standards mode and quirks mode on Mozilla browsers.
- The compatMode property is closely related to how Microsoft Internet Explorer deals with standards and quirks mode.
- Read about the Opera browser's DOCTYPE switches to see how a DTD affects parsing and rendering.
- Validate your HTML and XHTML using the W3C Validator online tool.
- Pair your XHTML with Cascading Style Sheets and you can really create beautiful sites.
- The W3C provides a fairly exhaustive list of CSS editing and authoring tools.
- Cascading Style Sheets: The Definitive Guide is an invaluable reference on CSS, written by the CSS guru himself, Eric Meyer.
- The author likes to keep the CSS Pocket Reference around, as he constantly forget the names of certain features and options. No shame in looking these things up!
- Find a large selection of books on XML and other related topics at the Safari book store.
- Go through the Web Architecture zone's library to find helpful articles and tutorials on all the subjects you need help with.
About the author  | 
|  | Brett McLaughlin has worked in the computer industry since the Logo days. (Remember the little triangle?) In recent years, he's become one of the most well-known authors and programmers in the Java and XML communities. He's worked for Nextel Communications, implementing complex enterprise systems; at Lutris Technologies, actually writing application servers; and most recently at O'Reilly Media, Inc., where he continues to write and edit books that matter. His most recent book, Java 1.5 Tiger: A Developer's Notebook, is the first book available on the newest version of Java technology, and his classic Java and XML remains one of the definitive works on using XML technologies in the Java language. |
Rate this page
|  |