 | Level: Introductory David Mertz, Ph.D (mertz@gnosis.cx), Old Schooler, Gnosis Software, Inc.
11 Sep 2007
Regardless of what sort of Content Management System or Web
application framework you might use to develop your Web site, there
are some basics you should cover. A sophisticated user interface and
rich content is great to have, but before you get to that, you
should provide the basic files that users anticipate finding and
that tell both humans and machines what your site does.
Introduction
There are a few standard files that every Web site should really have,
but that most neglect. Most of these are matters of convention, not of
technical requirement, but you are doing your site wrong not to
provide them. Let users who make a wild guess about what they want to
find usually succeed when they guess URLs. This tip discusses each of these
standard files briefly.
Exactly how a given resource is provided depends on the Web server and
Web application layers you use. In a "traditional," mostly static
server like Apache, these resources are likely to be literal files on
a server. But in a different configuration, they might actually be
entries in a database, lines in a configuration file, classes in a
server process, and so on. This tip focuses on what
a user ultimately sees, not what you need to do to make it happen.
404.html
When users use your Web site, they will inevitably seek resources that
do not exist. Probably this happens more often because of typos in
URLs than for any other reason, but link rot, back-end
misconfiguration, URL mangling at various points, and other causes
contribute. When resources are unavailable, it is nice to provide
some sort of fallback page that assists users in navigating to
something more useful. A generic "not found" is enough for
users to know a resource is unavailable, but it does not do anything
to help them figure out "what next."
A warning when you create a custom 404.html (or whatever mechanism
your Web server uses to deliver a custom "not found" message): Far too
many Web sites are misconfigured to deliver "soft 404" messages. In other words, they
deliver a page with a regular "200 OK" header that merely says "not
available" somewhere in the text, perhaps (but not always) mentioning
"404 Error" somewhere in there. You should not do this! Instead, give
your users—and their Web browsers and other tools—a break and use
accurate status headers!
about.html
So why did you create your Web site, anyway? Yes, you have a front page
that may answer that question. More likely, though, it does not, and
rather serves to let users login, "sells" your site, shows something
splashy, and so on. Probably there is a way for users to navigate from the
home page to the "about" page—go ahead and make that information
available right from http://mysite.example.com/about.html. Someone
will look there for it.
A good about.html page provides a quick overview of what your site does,
maybe why you created it, why users might care, and probably has a few
links to navigate back to the core functions of your site. This page
need not, and usually should not, be extremely fancy. Just let it be
factual and concise so that users can proceed to take advantage of all
the neat things your site offers.
contact.html
So who are you? As with the about.html, users can probably get to this information after sufficiently many clicks away from your
existing home page. Do not make users work too hard for this
information: Put it at http://mysite.example.com/contact.html.
While you are at it, use contacts.html for the same page, too. Throw
in the .htm extensions while you are at it. Names are cheap. Of
course, you can also leave the information at the end of those clicks
in your whiz-bang navigation screens; a little redundancy in finding
resources is not bad.
copyright.html
To whom does this stuff belong?
Probably the content belongs to
you—who are you again? An individual? A corporation? A set of
collaborators? A government organization? If your content is in the
public domain or under a free content license of some sort, it is
probably even more important to let users know that. Nowadays,
everything is born privately copyrighted: If your material follows
different rules, let users know. Not enough Web sites bother with
this resource, but why not add it to yours? Someone will look for it.
Obviously, different pages or resources might have different copyright
information. Let this general page provide some information for users
on how to determine those individual differences, if that is relevant.
If there are trademark issues, mention those as well.
index.html (and index.htm)
Not every Web server uses an actual index.html file to describe its
home page. Depending on your setup, you might have URL rewriting,
dynamic generation by pathname, and so on. Users don't care! Just
make http://mysite.example.com/index.html point to your home page,
even if you have to use a simple HTML redirect to make that happen.
Oh, and while you are at it, you might as well make the old
Windows-crippled .htm extension work too. And if you are feeling
particularly generous, even let index.cgi get to the same place
also.
index.rss
A lot of Web content is available through RSS. Doing that will not make
sense for every Web site, but it will for many of them. It is
perfectly reasonable to make RSS content be dependent on user-specific
configuration options, or logging in, or paying for particular
information. One size does not necessarily fit all with RSS.
Nonetheless, if there is something you can generically provide as
RSS, go ahead and do so. Maybe all you give out under index.rss is
"teaser" content, along with a recurring "story" about how to take
advantage of the full RSS feed(s). Or even just a story about why RSS
is not relevant to your Web site.
privacy.html
If you intend to collect any information from users (even only
usernames or traffic logs), please let them know what you intend to do
with that information. The legal issues around the rights and
obligations of Web site creators and or users are complex—and I am
not a lawyer, let alone your lawyer. Still, users will feel better
knowing you have thought about their privacy. And maybe this is a
good time to talk with your lawyer about exactly what you plan to do
with user data.
robots.txt
If you do not want all the resources on your Web site to be indexed by
automatic tools, say so in a robots.txt file. Heck, if you do
want everything to be indexed, say that too. A Robots Exclusion
Standard directive is not compulsory on users: If you really do not
want something to be visible, either do not put it on your Web site at
all, or make sure it lives behind adequate permission protection. But
all the major and legitimate Web crawling engines obey the requests in
robots.txt. Make your intentions clear.
security.html
The use of a security.html resource is not uniform. However, if
your site can raise security concerns—for example, if you collect any
sort of sensitive information from users—documenting your security
procedures (at least in broad outline) is a good idea. Give some contact
information on this page in case users have questions, or perhaps have
useful improvements to suggest. Finding this
information should follow the overall organization of your site's
navigation options. But while you are at it, put a copy of that
resource at this URL too.
sitemap
Exactly how you will show a map of your overall Web site is not well
standardized. Providing something along these lines is always
useful, but exactly what level of detail is available depends on how
dynamic your site is (and in what ways). Moreover, what you want to
show users can depend on the purpose of the site. For example, it
might not be appropriate for all users to know that Resource X exists
at all if they do not have permission to use it. Use your judgment,
but think of providing something.
For many sites, a sitemap is simply a way of being friendly to robots
such as search engines. Google has published a convention that
piggybacks on the robots.txt convention. In brief, you can create
an XML file that documents all the resources that your site provides.
This acts as an "inclusion list" to complement the "exclusion list" of
robots.txt.
E-mail addresses
Not everything happens on the Web. In fact, just in case the
navigation tools on your Web site do not quite live up to your hopes
(or maybe your users have a brain glitch in discerning your elegant
design), it is nice to let users reach you by e-mail too.
By all means, prominently publicize contact information at
contact.html and elsewhere on your Web site. But as a fallback, make
sure mail sent to a few general e-mail addresses gets to the right
person. These include at least, postmaster@mysite.example.com,
webmaster@mysite.example.com, and security@mysite.example.com. For
the real old-timers, you might want to let root@mysite.example.com
go somewhere meaningful too (but probably not to "root" for security
reasons). While you are at it, throw in e-mail forwarding for a dozen
more words that seem obvious to the purpose of your site. E-mail
addresses are almost as cheap as symbolic links in your Web server
directory.
Resources
About the author
Rate this page
|  |