 | Level: Introductory Molly Holzschlag, Author, Instructor, and Web designer
09 Feb 2005 Still writing your documents in HTML? If you are, you're not complying with current standards. On January 26, 2000, XHTML 1.0 became a recommendation by the World Wide Web Consortium (W3C). HTML, according to the W3C, is no longer the Web markup standard. Instead, XHTML 1.0 has replaced our old favorite, marking up the dawn of a new and exciting time in communications technology.
So what exactly is XHTML 1.0 and what does it mean to the Web developer?
I'll start with the W3C's description: XHTML
1.0 is a reformulation of HTML as an XML application. This means that if you're authoring a document in XHTML 1.0,
you are applying the rules and concepts inherent to XML to your Web markup.
The dangling question naturally is: Can XHTML 1.0 be used to mark up my
Web documents today? The answer is a resounding "yes!" All you
need to do is learn how to structure documents properly, choose
the correct document type definition (DTD) for your needs, and learn
a few new ways of managing your code development. Just how does XHTML 1.0 manage to be so ready to go? Well, as you write
your documents, you'll see that it uses familiar HTML as its vocabulary.
With some minor shifts in approach, but major shifts in thinking, XHTML
1.0 enables Web authors to code to the standards and begin shifting
their perspectives in terms of future growth and change. Why do we need another markup language?
HTML works pretty well. Granted, we've been challenged to come up with cross-browser, cross-platform solutions that really work.
And in the process of bringing the Web's evolution from its nascent form
in the early 90s to the vibrant, active Web we know today has meant straining, breaking or even making up new HTML rules as we went along. Developers who have studied HTML 4.0 principles know that a definitive goal of improving
HTML practices had been set forth by the time the HTML 4.0 standard came into being. Some of the primary concerns of HTML 4.0 involved:
- Cleaning up documents by separating basic formatting from
style
- Deprecating those elements that are arbitrary or problematic
- Requiring that document types be declared (and hoping that in that declaration,
authors would conform to the rules set out in HTML 4.0's three DTDs)
These principles all exist in XHTML 1.0, but they have been combined
with concepts from XML that help advance our markup beyond just strengthening
its basic syntax. The goals of XHTML 1.0 are many, but include the following:
-
Offer the foundations for extensibility in Web markup
-
Provide the same or better interoperability as HTML via past, current,
and future browsers
-
Prepare authors for evolving opportunities via upcoming XHTML versions,
other XML applications, and emerging technologies such as wireless and
alternative device development
Perhaps the most compelling argument for adopting XHTML 1.0 is that developers -- especially those
who are self-taught in HTML or rely on visual design tools to achieve their
goals -- can easily move into other XML applications by studying the standard. They can then begin
to see the power of XML and extensibility. XHTML 1.0 makes the territory
of XML and its applications less daunting because the path is familiar:
HTML vocabulary with some new structural and syntactical
methods. By using familiar language with some new concepts, it is easier to transition
into less familiar territories. For example, knowledge of XHTML 1.0 can simplify the transition to upcoming XHTML versions and related XML technologies for wireless and other applications, such as WML (Wireless
Markup Language), SMIL (synchronized multimedia language), and SVG (Scalable
Vector Graphics).
You got to have roots
Looking at the roots of XHTML is helpful in understanding the rationale for XHTML and the rules that guide it. Both XML and HTML have common roots in SGML, the Standardized General Markup Language. It is important
to know that SGML is not a language per se. It is what is known as a metalanguage -- a language that contains rules from which other languages are developed. XML, like its parent SGML, is also a metalanguage. As such, its rules
are used to create XML applications. XHTML, then, is an XML application
that uses another SGML language, HTML, as its vocabulary. If the relationship seems complex, that's because in a way, it is. SGML
begat HTML first, then XML. When the concerns and limitations of HTML were examined, it became apparent
that XML's rules could help HTML mature into a markup language that would help transition developers out of those limitations.
First, the requirements
In order for an XHTML 1.0 document to be true to its metalanguage (XML),
there are several requirements and rules that you must consider. They
are as follows:
-
It is recommended but not required that an XHTML 1.0 document be declared
as an XML document using an XML declaration.
-
It is required that an XHTML 1.0 document contain a DOCTYPE that denotes
that it is an XHTML 1.0 document, and that also denotes the DTD being used by that document.
-
An XHTML 1.0 document has a root element of
<html>. The opening
tag of the HTML element should contain the XML namespace xmlns
and the appropriate value for the namesepace.
-
The syntax and structure of the document must follow the syntactical rules
of XHTML.
The first step in achieving these goals is to structure XHTML 1.0 documents
properly. You'll begin by adding the proper declarations and document information.
Document declarations, types, and namespaces
An XHTML 1.0 document may contain several structural elements in order
to be considered correct: an XML declaration, a DOCTYPE declaration, and
the inclusion of a namespace. The XML declaration allows authors to declare
their documents as XML, and include the encoding that is being
employed by the document: <?xml version="1.0" encoding="UTF-8"?> |
Using this declaration is recommended but not required, as mentioned
earlier. Part of the reason it is not required is that some browsers
including IE 4.5 for Mac, and Netscape 4.0 for Windows will display XHTML
pages inappropriately if it is used. So, most XHTML 1.0 authors interested
in the best interoperability leave it out. However, since the encoding
information is important in many instances -- particularly when working with
international documents -- if you don't use the XML declaration, you are encouraged
to add the encoding in a meta tag (shown later in Listing
2). Beneath the XML document declaration -- or directly at the top of the document,
should you choose not to use it -- you must place the DOCTYPE declaration.
DOCTYPE allows an author to declare the type of document in use. In this
example, the document type is XHTML 1.0 and the specific XHTML 1.0 DTD
to which the document is to conform is strict. There are only three DTDs available in XHTML 1.0. They carry over from
HTML 4.0, and are as follows:
-
Strict: Strict follows the most stringent rules of XHTML. Only current
elements, attributes, and character entities are allowed in documents written
in this type. Elements such as
font or center, that were deprecated in
HTML 4.0, are not allowed. Obsolete elements are also not allowed. The
Strict declaration appears as follows:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> |
-
Transitional: A transitional XHTML 1.0 document is more lenient,
allowing the author to use deprecated as well as current methods. You can
use
font or center, or any other deprecated markup in a transitional document -- so long as the document itself is properly marked as such. No obsolete elements
should be used. If you want to write a transitional document in XHTML 1.0,
you'll include the following declaration:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> |
-
Frameset: The frameset DTD is reserved only for frameset documents.
A frameset document conforming to this DTD can use either strict or transitional
markup. To create a frameset document in XHTML 1.0, include this DOCTYPE
at the top of your document:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd"> |
Once you've decided whether to use an XML declaration, and you've
added a DOCTYPE declaration defining the markup rules to which you're going
to conform, you'll need to add an HTML root to the document and place
the XHTML namespace accordingly:
<html xmlns="http://www.w3.org/1999/xhtml"> |
At this point, you'll want to add necessary structural elements such
as head, title, and body. Listing 1 shows an XHTML 1.0 transitional document shell with the XML declaration included. In Listing 2, you'll see a transitional document shell without the XML declaration, but with a meta tag declaring the character set in use.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Transitional Document with XML Declaration</title>
</head>
<body>
</body>
</html>
|
In Listing 2, you will see a transitional
document shell without the XML declaration, but with a meta tag declaring
the character set in use. Listing 2: A Transitional DTD conforming XHTML 1.0 document without
the XML declaration
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Transitional Document without XML Declaration</title>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
</head>
<body>
</body>
</html>
|
Syntax concerns
Once an XHTML document contains the necessary declarations and structural information, you can examine the syntax changes resulting from XML's influences on Web markup. These syntax changes include case
awareness, well-formed tag elements, empty and non-empty elements, and
the use of quotation marks. Case
As you know, HTML is not case sensitive. This means that HTML elements
and attributes names can be in upper, lower, or mixed case. So, you can
have: <body background="my.gif"> |
or <BODY BACKGROUND="my.gif"> |
or even <BoDy background="my.gif"> |
All of these examples mean the same thing. On the other hand, XML is case sensitive. Thus, XHTML is case-specific. In XHTML 1.0, all elements and attribute
names must be written in lower case: <body background="my.gif"> |
Other than element and attribute names, nothing else conforms to XHTML 1.0. Note that attribute values,
such as "my.gif", can be in mixed case. This is especially true in instances
where the files are on servers with case-sensitive file systems, or you're
using mixed-case code in applications such as those written in Microsoft's
Active Server Pages (ASP), ASP+, or ColdFusion. Well-Formedness
While many HTML browsers are quite forgiving, many HTML tools don't
conform to standards. As such, some authors have learned bad habits such
as improper nesting of tags. The following example may work in many browsers: <b><i>Welcome to MySite.Com</b></i> |
It will display as both bold and italic in a forgiving browser. But,
if you take a pencil and draw an arc from the opening bold tag to its closing
companion, and then from the opening italic tag to its closing companion,
you'll see that the lines of the arcs intersect. This demonstrates improper
nesting of tags, and is considered poorly formed. In XHTML 1.0, such poorly formed markup is unacceptable. The concept
of well-formedness must be adhered to in that every element must nest appropriately.
The XHTML 1.0 equivalent of the prior sample is: <b><i>Welcome to MySite.Com</i></b> |
Draw the arcs now, and you'll see that they do not intersect. These
tags are placed in the proper sequence, and are considered to be well-formed.
Non-Empty and empty elements
A non-empty element is one that contains an element and some content: <p>This is the content within a non-empty element.</p> |
Whereas an empty element is one that has no content, just the element
and its attributes, such as <hr>, <br>, and <img>. XML rules indicate that empty and non-empty elements must be properly
closed. In HTML, you've seen that non-empty elements often
have optional closing tags. I could write the paragraph above as follows: <p>This is the content within a non-empty element. |
In HTML, this would be considered correct. XHTML 1.0 demands that
non-empty elements are properly closed. Another example of this would be
the <li> (list item) element. In HTML, you could have: <li>The first item in my list.
<li>The second item in my list. |
or <li>The first item in my list. </li>
<li>The second item in my list. </li> |
In XHTML 1.0, only the latter method is allowed. Empty elements are terminated in XML with a slash. So <br> becomes
<br/>. Due to problems some browsers accustomed to interpreting HTML
have with this method, a workaround has been introduced, adding a space
before the slash: <br />. Here's an XHTML example of the image element, which is an empty element: <img src="my.gif" height="55" width="25" border="0" alt="picture
of me" /> |
Other empty elements of note are meta and link. Quotes
Quotation marks in HTML are arbitrary in that you can use or not use
them around attribute values without running into too much trouble. There's
no rule that says that leaving values unquoted is illegal. The following
is perfectly acceptable in HTML: <table border=0 width="90%" cellpadding=10 cellspacing="10"> |
Despite the fact that some attribute values are quoted, and others are
not, browsers will render this markup just fine. However, if you want to
conform to XHTML 1.0, you'll have to quote all of your attribute values: <table border="0" width="90%" cellpadding="10" cellspacing="10"> |
As you can see, none of these changes are monumental. A bit pesky, yes,
but if you begin to employ this approach, you'll find your markup is a
whole lot more consistent. That consistency is part of what makes XHTML
1.0 so attractive -- it provides a strong foundation upon which to build future
constructs. Future of XHTML
If XHTML is so easy to use, then why is it taking so long to be adopted?
This is a question that many standards-oriented people are asking. Part
of the problem may be poor press -- not too many people know about XHTML 1.0. And even if they've heard about it, they may not realize how easily it
can be put to use today. Add to this the fact that current software tools for HTML development
such as Adobe GoLive, Macromedia Dreamweaver, Microsoft FrontPage and others
do not have support for XHTML, and you have run into a serious concern
for many Web authors who prefer these tools, or must use them in a work
environment. But despite these difficulties, XHTML 1.0 is marching on. In fact, the next version, XHTML 1.1, has already been fairly well fleshed out and contains some new and different concepts for the Web
markup author. Modularization -- the act of breaking the language down into
discrete modules -- is a primary part of XHTML 1.1. Also, more XML-like advantages
are coming into play. For example, the ability to write your own DTD for an XHTML document or use a schema will truly bring extensibility to the game. XHTML 1.0 is the current Web markup standard. Those who are not using
it should at the very least give it a good try. The growth that is occurring
in other areas of XML-related technologies -- particularly in the wireless
realm -- is strong and convincing proof that the more flexible you can become as a markup
author, the more prospects you will have. XHTML 1.0 is the perfect
way to start expanding your horizons. It is familiar enough to make
sense, and powerful enough to help you create stable, interoperable Web
sites that work today and are prepared for the exciting opportunities of
tomorrow.
Resources -
The World Wide Web Consortium XHTML
Recommendation. This is the standards document at the W3C that explains
XHTML 1.0 in detail
-
XHTML 1.1. Under discussion,
this next version of XHTML involves modularizing aspects of XHTML 1.0.
-
Mozquito.Com is the Web site for
Mozquito Technologies, who make software products specifically for XHTML.
Good tutorials and plenty of resource links can be found at their site,
too.
About the author  | |  | An author, instructor, and designer, Molly E. Holzschlag brings her irrepressible enthusiasm to books, classrooms, training centers, and Web sites. Honored as one of the Top 25 Most Influential Women on the Web, Molly has spent an almost unprecedented decade working in the online world. She has written and contributed to more than ten books about the Internet and, in particular, the Web. She holds a B.A. in communications and writing, and a M.A. in Media Studies from the New School for Social Research. You can find out more about Molly's activities on her Web site. |
Rate this page
|  |