Skip to main content

skip to main content

developerWorks  >  Information Management | XML  >

Add IBM OmniFind Yahoo! Edition to your Web site

A quick look at four options

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss


Rate this page

Help us improve this content


Level: Introductory

Todd Leyba (tleyba@us.ibm.com), Search Architect, IBM

13 Dec 2006

Learn how you can quickly and easily integrate a freely downloadable search engine into your Web site. This article describes four methods to do this, using IBM® OmniFind™ Yahoo! Edition search functionality. The methods range from directly linking to the OmniFind search results page, to using XSLT to transform the XML returned by the OmniFind search API into the HTML of your design.

Introduction

IBM and Yahoo! have partnered together to offer a free, downloadable search engine that is easy to set up and use. IBM's OmniFind Yahoo! Edition (referred to as just OmniFind in this article) can crawl and index up to half a million Web pages or file system documents and make them available for search through a simple-to-use Web interface. You may have already downloaded OmniFind and discovered how quick and easy it is to setup an index and start searching.

But perhaps you are beyond that point, and are now investigating how to integrate OmniFind's search functionality into your Web site. In this article I will explore four ways to accomplish this. Namely:

  1. Link directly to the OmniFind search page
  2. Use your own search box and button
  3. Present search results as an HTML snippet
  4. Use XSLT to transform OmniFind XML results into HTML

The methods presented here will build upon each other adding increased flexibility at the expense of increased level of effort.

Scenario

The scenario I will follow is to add OmniFind search functionality to a blog site. If you have ever read a blog, you may have noticed that not all of the blogger's postings are presented on the first page. Most blog hosting facilities list the most recent postings and provide an archive link to the older postings, typically organized by month. So if you want to view a posting not listed on the first page, you would need to click through the previous months to find the article.

In this integration scenario, I will use OmniFind to crawl and index my personal blog site and then allow users to search my OmniFind blog index to find previous postings. I use Google's Blogger to host my real blog (Todd Leyba's Perspecitives on Search and Discovery). Blogger already provides such a search capability, but Blogger's search feature is immutable. I will demonstrate in this article how to replace Blogger's search facility with OmniFind and then show how to customize the overall search experience.



Back to top


Option 1: Link to OmniFind directly

The easiest integration option is to provide a link to the OmniFind search page in a conspicuous place somewhere in the blog. All changes to my blog site are made through a Blogger template. The template defines the overall structure of my blog and is written in standard HTML. As such it is a simple matter for me to insert a link to the OmniFind search page - in this case right above the archive section. The link to use would be similar to the following but with the host and port changed to the OmniFind installation site.

http://omniFindhost:8080/search/

The overall look and feel of the OmniFind search interface (shown below) can then be customized by selecting from several layout options offered in the OmniFind administrator's console. With no programming involved you can change the banners and images to those of your company, change the text of various labels and buttons, and even choose which features are to appear or not (such as summaries, footers, and so on).


Figure 1. OmniFind Search Results Page
OmniFind Search Results Page

But the direct link approach described above is cumbersome to use. It forces the user to click twice to issue a search -- once to get to the OmniFind search page and second to issue the search. Ideally, you would like to have the search box always present on your site so that users can type their query when needed and then click once to see the results.



Back to top


Option 2: Add your own search box

The first step is to add your own search box and button to the Web page. I will use standard HTML to add these components and a small amount of JavaScript to handle the onClick action when the button is pressed.

The following three lines add the search box and button right before the list of "Archives" in the right hand panel of my blog.

<h2 class="sidebar-title">Search entire blog</h2>
    <input type="text" name="Query" value="" size="25">   
    <input type="button" value="Search"  onclick="runSearch()">

Now we need to provide a JavaScript function to handle the onClick action when the button is pressed. Right before the body tag in my Blogger template, I insert the following JavaScript:

<SCRIPT LANGUAGE="JavaScript">

<!--Begin

  function runSearch()
  {
    var dest    = "http://OmniFindHost:8080/search?";
    var params  = "index=Default&start=0&results=10&query=";
    var request = dest + params + escape(document.forms[0].Query.value);
            
    window.open(request,                   // complete search url
	            "OmniFind Search Results", //  Title of the window
                toolbar=1,                 // toolbar provides back/fwd 
                resizable=1,               // allow them to resize window
                scrollbars=1,              // and to scroll as well
                height=500,                // and I like smaller windows
                width=400,                 // of this size and position
                left=80,top=80);
  }
// End -->

</SCRIPT>



Back to top


Invoke search

The primary job of the JavaScript function is to collect the keywords entered in the search box and include them in an OmniFind search request. In this first example I will invoke the OmniFind search page directly with no modifications. The results page will appear as shown in Figure 1 above and is the same search results page presented as if you had entered the search directly in OmniFind. The only difference is that we have used our own search box to accept the search expression.

The URL is similar to the direct link described above but is further qualified with a few additional parameters. In the JavaScript, I broke up the URL into its constituent parts for readability. There are four parameters used. The index I created and to be searched is named "Default". The number of results to return is 10, starting with result zero. If you wanted to return the second page of results for the same query, the "start" parameter would be set to 10.

The "request" variable contains the concatenation of the URL parts and appends the query terms provided by the user to the end of the "query" parameter. Note that I used the escape function to convert blanks and other special characters to their escaped representation. The "request" variable containing the fully built OmniFind URL is then passed as the first parameter to the window.open() function call. The window.open call will submit the request and cause a new window to be opened with the results of the search. I added a few parameters to the window.open call to control the size, location, and options of the window. Below is an example search from my blog.


Figure 2. Search box added to blog, and OmniFind search results page
Search box added to my blog OmniFind Search Results Page



Back to top


Use the OmniFind REST API

Up to this point you have seen how to successfully use your own search box to submit a search rather than the search box that appears on the OmniFind search page. Now I'll show you how to change the appearance of the search results more than what is offered in the OmniFind layout editor. You can accomplish this with the help of the OmniFind search API. OmniFind's API is REST based which means that you use a standard HTTP GET request with parameters to submit the search. The search results are returned as XML which we will then transform into our custom HTML using XSLT. Below is an example OmniFind search request:

http://OmnifindHost:8080/api/search?index=Default&results=10&start=0&query=conferences

You may have noticed that the above URL is nearly identical to the URL issued in the previous example with one exception. The sub domain "/api/search" is used instead of "/search". This instructs OmniFind to return the results as XML instead of the fully formatted HTML page shown in Figure 1. The XML that is returned conforms to an ATOM 1.0 feed. Consequently, you can test your API search requests using any conventional RSS feed reader that supports ATOM 1.0 (I personally use the open source FeedReader program). The feed reader program will automatically issue the search and format the results for you. You can also test your API searches with a standard browser which will display the returned XML natively as shown below.


Figure 3. Search results displayed as XML
Search results displayed as XML



Back to top


Option 3: Results returned as HTML snippets

Before we further discuss the XML that is normally returned, it is important to note that the output of the search API can also return the results as a snippet of standard HTML. I refer to the output as a snippet because it is not a complete HTML page (no <HTML><BODY> tags). HTML output is indicated with the "output=snippet" parameter on the search request with its effect is show below:

http://OmnifindHost:8080/api/search?index=Default&results=10&start=0&query=conferences&output=snippet


Figure 4. HTML snippets results page
Search box added to my blog HTML snippets results page

Notice that the format of the results are somewhat similar to those in the OmniFind search results page with the exception of the missing search box and page controls. This approach has value in certain applications but is somewhat inflexible. If you want to change the HTML formatting you would need to parse the HTML yourself, not an easy task.



Back to top


Option 4: Use XSLT to format the search results

Since the results are normally returned as XML, you have the ability to use an XSLT stylesheet to transform the XML into HTML formatting the results as desired. The XSLT stylesheet would be prepared by you and contain the appropriate XSL and XPath directives to process the XML ATOM feed elements. In this case I would like the motif of the results page to match that of my blog using the same color schemes and fonts. Below is the XSLT stylesheet I used which I stored in a file named "myStyleSheet.xsl". Each line is numbered for easy reference.

01 
02 
03 
04 
05 
06 
07 
08 
09 
10 
11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 
33

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
   xmlns:atom="http://www.w3.org/2005/Atom"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:dc="http://purl.org/dc/elements/1.1/">
<xsl:output method="html"/>
<xsl:template match="/">
   <style>
       <xsl:comment>
       .content-area {background-color: #dcedcb;}
       .description {font-size: .9em; margin: 0 0 10px 0;}
       // Remaining styles omitted for readability
       </xsl:comment>
   </style>
   <xsl:apply-templates select="/atom:feed"/>
</xsl:template>
<xsl:template match="/atom:feed">
   <div class="content-area">
     <div class="title"><xsl:value-of select="atom:title"/></div>
     <ol class="list"><xsl:apply-templates select="atom:entry"/></ol>
   </div>
</xsl:template>
<xsl:template match="atom:entry">
   <li>
      <a href="{atom:link/@href}">
         <xsl:value-of select="atom:title" disable-output-escaping="yes"/>
      </a>
      <div class="list-item-description">
       <xsl:value-of select="atom:summary" disable-output-escaping="yes"/>
      </div>
   </li>
</xsl:template>
</xsl:stylesheet>

XSLT can be used to transform XML into a variety of formats. Line 6 indicates that the output of this transformation is to be HTML. I use XSL templates to match the various elements in the XML stream. Line 7 is the main template and matches on all elements in the XML file. It is within this template that you specify your styles inserted between the xsl:comment directives (Lines 9 and 13). For readability I omitted the majority of my style directives. Within this main template I reference a subordinate template for each atom:feed element XSLT encounters. In this case there is only one atom:feed element specified in the OmniFind XML results.

The template for an atom:feed element begins on line 17. It creates an outer HTML "div" tag whose style class is "content-area". Note that a style for each specified "div" class attribute must be defined above in the main template. Again I purposely omitted most of the style definitions for readability. The atom:feed template creates an inner "div" tag for the title of the results (line 19). The title is actually pulled from the XML using the xsl:value-of statement with a select on the element named "atom:title". If you want to provide a different hard coded title, just replace line 19 with your own HTML statement (e.g., <h2>My Title</h2>). Line 20 inserts an HTML ordered list and applies another sub template for each "atom:entry" element found in the XML.

The last template definition (starting on line 23) provides the HTML transformations to be applied to each search result. In this template I create an HTML line item tag for the ordered list, a link to the document, and a brief description of the document. For the link URL I use the XPath directive (atom:link/@href) to extract that value of the href attribute in the atom:link element (line 25). For the anchor text itself I use the xsl:value-of directive to extract the contents of the atom:title element within the entry element (line 26). The same technique is used for the result description as well.

OmniFind conveniently highlights any search terms contained in the title and summary of each result. It does this by bracketing the encountered search terms with HTML <SPAN> tags to indicate the style to be used for the highlighting. These HTML tags are embedded within the original XML and are normally escaped by the XSLT processor. This causes the HTML tags to be shown as is when displayed in the browser (an effect we do not want). The XSLT processor does not know that these are valid HTML tags and dutifully escapes any special characters it encounters during the processing of the xsl:value-of directive. Under these circumstances we can instruct the XSLT processor not to escape any special characters with the disable-output-escaping="yes" attribute on lines 26 and 29.

It is important to note that the disable-output-escaping attribute is not honored by all browsers. Microsoft’s Internet Explorer does disable the output escaping with the desired effect but not according to W3 XSLT specification. Mozilla on the other hand ignores the attribute so as to be in compliance with the XSLT specification. For Mozilla browsers you can achieve the same effect with different XSLT commands (not shown in the example).



Back to top


Using the stylesheet parameter

Share this...

digg Digg this story
del.icio.us Post to del.icio.us
Slashdot Slashdot it!

The "stylesheet" parameter is used on the search request to indicate which XSLT stylesheet is to be applied as shown below.

http://OmnifindHost:8080/api/search?query=cameras&index=Default&results=10&start=0&stylesheet=http://myserver.com/myStyleSheet.xsl

The use of the "stylesheet" parameter causes OmniFind to insert an xml-stylesheet entry as the second line of the XML search results as shown below.

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href=" http://myserver.com/myStyleSheet.xsl"?>
<feed>...</feed>

This will cause the browser to retrieve the stylesheet from the location specified in the HREF and apply it using XSLT. The results are shown below:


Figure 5. Search box added to my blog, and XSLT transformed search results page
Search box added to my blog XSLT Transformed Search Results Page


Back to top


Summary

In this article I presented four methods for integrating OmniFind search functionality into your Web site. The first and simplest approach was to insert links to the OmniFind search page directly. Next was to replace the direct links with my own search box and button but to keep the OmniFind search results page unchanged. I then switched over to using the OmniFind search API so as to better control the formatting of the returned search results. I first showed how the API can return the search results as a snippet of HTML and then ultimately XML. I finally demonstrated how an XSLT Stylesheet can be applied to the XML to create completely customized search results. Download OmniFind Yahoo! Edition and try these techniques to enhance your own Web applications.



Resources

Learn

Get products and technologies

Discuss


About the author

Todd Leyba photo

Todd Leyba is currently serving as an evangelist for Discovery and Search Analytics in IBM's Information Management Division. He is a key spokesperson, responsible for engaging with customers, partners, and developers to articulate IBM's Discovery and Search strategy. In this role, he is also responsible for incorporating customer and developer feedback as well as market trends into IBM's future product direction. Mr. Leyba's expertise lies in the architecture of full text search and retrieval systems and their application in business. He has previously worked on a variety of search related projects including: IBM's WebSphere Enterprise Search product (OmniFind), designed to provide superior performance, scale, and result quality with a broad range of data source support




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top