Skip to main content

skip to main content

developerWorks  >  Web development  >

An introduction to RDF

Exploring the standard for Web-based metadata

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Uche Ogbuji (uche@ogbuji.net), Co-founder, Fourthought, Inc.

01 Dec 2000

This article introduces Resource Description Framework (RDF), developed by the W3C for Web-based metadata, using XML as an interchange syntax. RDF's essential aim is to make work easier for autonomous agents, which would refine the Web by improving search engines and service directories. Author Uche Ogbuji gives an overview of RDF aspects from schemas to usage scenarios. The article assumes that you are already familiar with XML.

By now it's a well-known and oversimplified bedtime story: In 1989 Tim Berners-Lee invented the Web, and casinos, pornographers, and, incidentally, businesspeople the world over found a medium of unprecedented power. Many limitations of the Web are widely accepted:

  • The predominance of HTML documents, which mix content with presentation
  • The difficulty of maintaining Web sites to reflect inevitable real-world changes
  • The difficulty of seamlessly presenting dynamic content
  • The seeming futility of finding precisely what one wants using a Web-crawler search engine.

The W3C, the consortium founded in 1994 by Berners-Lee and other industrial shapers of the Web, has been working hard to change these four limitations. The first two are supposed to give way to a future of an XML-driven Web, which would improve the maintainability and flexibility of Web data. The W3C takes aim at the latter two with the Resource Description Framework (RDF), claiming that RDF will make the management and navigation of Web data easier to automate by providing structured Web metadata as counterpart to Web data. (See the sidebar for a note about the word metadata and other such elusive concepts.)

Thus far XML has garnered much of the world's attention, but as many XML specialists point out (and as many observers of XML's remarkable media coverage have probably thought), XML is not very interesting. XML is nothing more than a way to standardize data formats. In a way, it is just the next level of data above the character level, which has been standardized on such similarly unglamorous technologies as ASCII and Unicode.

This is not to underplay XML's importance. A data-format standard makes all of the more glamorous technologies possible, and RDF is the leading example of the benefit that comes once the data format has been standardized. Many proclaim that RDF is really the XML's killer app, and with good reason. Despite all this, RDF remains somewhat obscure. This is mainly because at its core RDF is very abstract, very dry, and very academic. With this article I hope to illustrate why RDF is very important to anyone interested in XML.

While trying to implement initiatives for managing the Web, particularly the Platform for Internet Content Selection (PICS), a content rating system, the W3C kept running into the difficulty of how to uniformly express assertions about Web pages which could be used by automated content filters and selectors.

The power of simplicity

RDF is very simple. It is no more than a way to express and process a series of simple assertions. For example: This article is authored by Uche Ogbuji.

This is called a statement in RDF and has three structural parts: a subject ("this article"), a predicate ("is authored by"), and an object ("Uche Ogbuji"). This is a familiar breakdown of such assertions, whether in the field of formal logic or grammar (well, OK, as long as you don't make too fine a point of that intransitive verb). Indeed, RDF is nothing more than an application of long study in such fields aimed at describing resources, which consist of any item accessible through the Web.

In RDF, resources are represented by Uniform Resource Identifiers (URIs), of which URLs are a subset. The subject of RDF statements must actually be a resource, so the above English statement could be turned into an RDF statement illustrated in Figure 1.


Figure 1: An RDF statement
Figure 1: An RDF statement

Figure 1 shows the common graph representation of RDF statements, introduced in the RDF Model and Syntax 1.0 Recommendation (RDF M&S). Note that the object is a string: "Uche Ogbuji". This is called a literal in RDF, but an object could also be a resource. Take a look at Figure 2.


Figure 2: A small RDF model
Figure 2: A small RDF model

Figure 2 shows several RDF statements combined into a single diagram. All of RDF is pretty much an expansion of this basis. RDF defines a directed graph of statements that describe Web-based resources. As you can see, I have replaced the literal "Uche Ogbuji" in the original statement with a URI representing this person, which in turn is the subject of several more statements. Such a collection of RDF statements is called a model in RDF.

This might seem rather simple to be such an important technology, but it is RDF's very simplicity that makes it so powerful. Computer science already has plenty to say about the effectiveness of graphs for representing information. RDF allows many simple statements to be aggregated so that machine agents can apply the well-tested graph traversal techniques to glean data. These statements are called triples because there are three predominant parts (subject, object, and predicate). Databases of such triples have been shown to be scalable to many millions of triples, mostly because of the simplicity of this information. Such scalability is the only hope if a technology is to make an attempt at taming the vast Web.



Back to top


What does it look like in XML?

The abstract representation we have discussed above is the basis of RDF, but it is quite impractical for exchanging RDF descriptions and placing such descriptions in HTML and XML content. To this end RDF M&S also provides a serialization format in XML for RDF. According to this format the model in Figure 2 might be rendered as in Listing 1.


Listing 1: XML serialization of the RDF model in figure 1
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns="http://schemas.uche.ogbuji.net/rdfexample/">
  <rdf:Description about="http://uche.ogbuji.net/thisarticle">
    <authored-by>
      <rdf:Description ID="uche.ogbuji.net">
        <name>Uche Ogbuji</name>
        <nationality>Nigerian</nationality>
      </rdf:Description>
    </authored-by>
  </rdf:Description>
</rdf:RDF>

Listing 1 shows just one of many forms, some more verbose and some more abbreviated, that are provided for XML expression. This flexibility of RDF syntax -- often an obstacle to learning and implementing RDF -- makes it much easier to apply RDF processing to existing XML. One constant in all RDF serializations is the use of the element rdf:RDF to wrap the RDF statements.

Note the use of XML namespaces in Listing 1. RDF relies heavily on XML namespaces for disambiguating names. There are several element and attribute names that must be in the namespace defined by RDF M&S (using the prefix rdf in this example and many others you'll see). All RDF predicates must use a namespace to clarify their meaning, as the examples show.

Inside the RDF wrapper element, a description element indicates the subject of the enclosed statements. This example uses the about attribute, which points to an external resource as the subject. There is one statement with this resource as subject, marked by the element <authored-by>, which forms the predicate. Note that this element has the namespace http://schemas.uche.ogbuji.net/rdfexample/. According to RDF M&S, this is translated to an abstract model in which the actual predicate is formed by joining the namespace URI and local name of the predicate element. So really the full predicate of this statement is http://schemas.uche.ogbuji.net/rdfexample/authored-by. In addition, the namespaces are supposed to provide schemas that have type and constraint information for RDF.

The remaining part of the statement, you'll remember, is the object. But the object of the first statement is not very clear from the listing. RDF handles the case in which the object of a statement is a resource but doesn't really have an external URI. In the example, the resource representing the person named Uche Ogbuji is such a case, and is actually represented by the embedded Description element with an ID attribute. The URI of this resource becomes the joining of the URI of the RDF file as a whole, and the value of the ID attribute. Note that RDF takes this arcane concept (one of many) even further by allowing fully anonymous resources without even an ID.

The resource with ID "uche.ogbuji.net" itself is the subject of two statements, with predicates represented by the child elements name and nationality. Note that these predicates are also in the http://schemas.uche.ogbuji.net/rdfexample/ namespace. The object of these statements are literals: "Uche Ogbuji" and "Nigerian", respectively.

That wraps up the introduction of RDF. See Resources for links to more detailed introductions to RDF, and more advanced topics such as statement containers, reification, and schemas.



Back to top


Looks basic enough. So what?

As I said, RDF's power comes from its simplicity. The W3C suggests that webmasters begin the process of annotating existing Web data with RDF by embedding simple descriptions (such as in Listing 1) into the headers of their documents. Actually, rather than using the sample namespace for the schema I used in the listing, webmasters are encouraged to make use of the Dublin Core, a standard specification for library-like metadata (see Resources). Use of standard cataloguing metadata would assist search engine Web crawlers and other machine agents the way HTML meta tags help search engines index Web pages. The advantage of RDF is that it is readily extensible with schemas that are also machine readable, bringing about an unprecedented level of automation.

This automation of resource discovery, description, and schematics is the basis of what Berners-Lee and the W3C have been touting for some time as the next-generation Web, also known as the semantic Web. This term is rather controversial (see the sidebar "RDF wordplay" below), but it indicates the application of well-established artificial intelligence technologies, known as semantic networks, to the task of automating data processing on the Web. This evolution would allow Web crawlers to gather more than just plain keywords. Through RDF schemas, this evolution would allow Web crawlers to get some sense of the meaning of the various parts of distributed RDF statements. What meaning would actually mean is a matter of continuing debate and discussion. But at minimum RDF schemas provide a mechanism for navigating established contracts for descriptions of Web resources.

Of course, there is no semantic Web yet, and there is no telling whether such a vision will ever survive the lethargy of webmasters, the test of scalability, or problems with shifting resources. The latter concern is that URLs are based on the domain-name system, which is constantly changing. RDF resources are actually URIs, which are a superset of URLs, but the other URI formats are very obscure compared to URL. And they are not tested in the pervasive use that URLs endure.



Back to top


Practical RDF for Y2K

RDF wordplay

Metadata is one of the several slippery words that are unavoidable when discussing RDF. Because RDF and many of its applications are quite abstract, several words simply defy clear and consistent meaning: It's clear enough that metadata means data about data. The problem is exactly where the data ends and the metadata begins. Some fields have developed precise conventions for this distinction. For instance, in relational databases, metadata includes the name of each table and type of each column in the table. The data is what is stored in those columns. RDF addresses XML at the content model, and so the data/metadata distinction comes down to user preference. For instance, is the author of a book data or metadata? It could be data because by Uche Ogbuji appears as content on the front cover. It could be metadata because many people wouldn't actually consider the author's name to be part of the text.

Other slippery RDF words exist, some marked by the so-called S curse. Wry observers of data-processing science have noticed that many words that start with the letter "S" seem to be cursed with blurry distinctions and abstract wooliness and thereby defy clear definition. Three notorious S words are syntax, semantics, and schema. The W3C proclaims RDF as a key building block for the "semantic Web," but since the "S" word is used, the XML community is having great difficulty agreeing on what exactly the semantic Web is. Also, RDF is designed to lean heavily on schemas, but it is another "S" word. Again it's not clear what these are. There are hints in the RDF schema spec, but this is loose, open-ended, and incomplete by the editors' own admission. This spec could provide data-type restrictions for RDF objects, human or machine readable "meaning" (another slippery word) for RDF predicates, or just ways to link vocabularies together.

As RDF implementation matures and the W3C updates its RDF schema spec, there will likely be some more clarity in these areas. But for now, discussion of RDF can seem frustratingly academic because of all the words and concepts that remain fuzzy.

So if the ambitious goals of RDF are some time ahead of us and somewhat uncertain, why is RDF important?

I've already mentioned (and it is well discussed in the literature) how hard the Web has become to manage on a macro scale. This problem is the same even in limited domains. The well-known client/server revolution in application design brought about a paradigm where a forms-and-display code plugs into a server data store. This approach and the development techniques associated with it are really only meant to handle a fixed and highly-controlled database environment. The extension to three-tier and n-tier systems hasn't changed this much. The problem is that as applications migrate to the Web, the rigidity gets in the way of maintainability.

I use the term Web applications to describe any location on the Web with dynamic or interactive content. This ranges from portals to e-commerce sites. Increasingly, to be competitive, Web applications must assemble data from diverse sources and services; furthermore, requirements for such applications tend to be far more fluid in "Internet time." This is the sort of environment in which the extensibility of both XML and RDF really pays dividends. XML allows great flexibility for adaptation of data formats, and RDF provides great flexibility for adaptation of data-processing rules.

We have discussed some of the problems with the idea that RDF can turn the entire Web into a semantic network, but many of these problems are more easily dealt with in the controlled environment of a single application. A central RDF database can be put in place covering triples that describe resources, which are combined to form the views of the Web application. In fact, some of the core application objects, particularly the ones most subject to change, can be directly referenced by the RDF model. This becomes a database index, but one that can be more easily extended.

Basically, RDF can provide Web-based applications an "escape hatch" from the strictures of traditional database design and application evolution. Some folks have been complaining for years that traditional database management tools are too highly structured, and therefore add hefty maintenance costs when the real world inevitably changes around the application. This faction (including the author) has long advocated a "semi-structured" approach to data management because it can drastically reduce maintenance costs. An RDF database working with a traditional database is one technology that goes a long way to addressing such concerns.



Back to top


Conclusion

As a consultant I have made significant use of RDF to augment traditional databases in controlled but evolving systems. I've seen it reduce maintenance costs for portals, Web-based searching, and message indexing applications. As a heavy user of the Web, I can easily envision much of the advantage that XML, RDF, and the proclaimed semantic Web would provide.

RDF is by no means a perfect technology. Its serialization is rather rough around certain edges, and the only available RDF schema specification is almost completely toothless. RDF does have two powerful features: It is well designed to work with XML, which is designed for the Web and is quickly becoming the pervasive standard for data-exchange. It is also simple enough that even the troublesome edge cases are manageable.

If you already have a body of XML data, it is not very difficult to build a pilot program that creates indexes and rules for handling your XML data using RDF gleaned therefrom. Many RDF tools have already emerged, so you will rarely have to do much invention, and this approach would allow you explore some of the advantages of RDF in closed systems. Meanwhile, it's even easier to annotate your Web content with RDF descriptions alongside your HTML meta tags, which would give you early entry into the promised semantic Web.



Resources



About the author

Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a consulting firm specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, the open-source platform for XML middleware. Mr. Ogbuji is a computer engineer and writer born in Nigeria, living and working in Boulder, Colorado, USA. You can reach him at uche@ogbuji.net.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top