Skip to main content

skip to main content

developerWorks  >  Information Management | WebSphere  >

Distribute search results as RSS feeds using WebSphere Information Integrator OmniFind Edition

developerWorks
Document options

Document options requiring JavaScript are not displayed

Discuss

Sample code


Learn and share!

Exchange know-how with your peers -- try our new Pass It Along beta app


Rate this page

Help us improve this content


Level: Intermediate

Srinivas Varma Chitiveli (schitive@us.ibm.com), Advisory Software Engineer, IBM

19 Apr 2006

WebSphere® Information Integrator OmniFind™ Edition is a full-text enterprise search product offered by IBM®, designed to provide superior performance, scale, and result quality with a broad range of data source support. Really Simple Syndication (RSS) feeds are catching on as one of the most widely used techniques by enterprises to distribute information to their employees. Explore how you can easily combine the capabilities of RSS with WebSphere Information Integrator OmniFind Edition reach to distribute search results as RSS feeds.

Introduction

RSS feeds are based on a standardized XML format that represents syndication of information. Using the RSS format, you can create a data feed that supplies headlines, links, and article summaries from your Web sites. Other sites can incorporate your information into their pages automatically. You can also use RSS feeds from other sites to provide your site with current news headlines or articles with terms of your interest. These techniques let you draw more visitors to your site and also provides them with up-to-date information.

Every day, employees produce an abundance of documents in their everyday work: customer e-mails, business reports, product manuals - in all shapes and sizes, and stored on various content back ends. To make sure that employees can use this information for making business decisions, companies often provide their employees integrated access to this information using text search. This is where WebSphere Information Integrator OmniFind Edition (henceforth in this article known as OmniFind) comes into play. IBM's search offering is designed to support enterprise scale search requirements over a wide variety of content sources such as Domino® databases, relational databases, portal content, file systems, Web content, and enterprise content systems. OmniFind provides extensive capabilities for searching diverse collections of business information from a single point of access, delivering highly relevant search results within sub-second response time while scaling to millions of documents and thousands of users.

RSS feeds are widely embraced by Web sites that maintain constantly changing information (like news, blogs, stocks) to publish the latest information. Users configure RSS readers or create links on their Web sites or create RSS-based portlets to access Web URLs or physical files that represent the feeds. Instead of flooding users with information, the feeds are small chunks of very significant data. Users access the feeds when they reload or refresh the RSS readers or Web sites or portlet pages.

Configuring a search engine that indexes new groups or archives or blogs to generate results as RSS feeds can provide relevant information to a large community of feed readers. While browsing feeds from blogs or news Web sites, users can continue to use their internet clients (RSS readers, Web sites, or portal pages) to search for specific topics in the enterprise.

When a URL with search term is configured in RSS-based clients, users can access the search results on the data that gets periodically indexed. The clients have to be refreshed or reloaded to get to the latest results.

This article presents a J2EE™-based application that is based on the OmniFind to search and return results as RSS feeds. The returned feeds are 2.0 compliant. The rest of the topics describe how to install, configure, and use this application.

Following is a high-level diagram of the search components and clients that access the results:


Figure 1. Components for searching
Components for Searching

Sample results and Web clients

Before delving into the details, here are possible use cases for search results and RSS feeds.

Web browsers

You can create a link on your Web site to access the RSS results on a specific topic. The topic could be mapped to some key terms on the page.This effort requires minor customization of your JavaServer™ page(s). The link on the Web site can be identified by a standard image and should represent a fully populated URL to the search server. Once the link is clicked, the search is displayed on a new page. See Figures 2 and 3, below.


Figure 2. Sample Web page with link
Sample Web page with the link

Figure 3. Sample results page
Sample results page

RSS readers

To aggregate feeds from a number of sites, there are a number of RSS readers available commercially, as well as for free. These readers provide the capability of specifying URLs that return RSS feeds. The URLs are accessed when the readers are started or reloaded. Refer to Figures 4 and 5, below. I have used a RSS reader to configure a URL to search on a topic and display the results.


Figure 4. Configure feed reader
Configure feed reader

Figure 5. Results rendered in the reader
Results rendered in the reader


Back to top


Prerequisites

You need the following in order to use the application discussed in this article:

  • WebSphere Information Integrator OmniFind Edition (Version 8.3) should be installed on a dedicated server.
  • Configure WebSphere Information Integrator OmniFind Edition to index the enterprise sources related to Web sites or news groups or Domino databases.
  • Users can use RSS readers or create a link on their Web sites to access the feed or configure RSS feed portlets.


Back to top


Install the J2EE application

You need the following in order to use the application discussed in this article:

  • Download the attached ESRSSResults.ear file. This is a standard J2EE enterprise application archive.
  • Use the WebSphere administration console to deploy the downloaded application onto your WebSphere Application Server, preferably on the "server1" instance of WebSphere Application Server.


Back to top


Configuring the sample application

ESRSSResults.ear is a simple servlet-based J2EE application that invokes the implemented Search and Indexing API (SIAPI) interface to search Enterprise content indexed by OmniFind. The results are returned to the clients as RSS feeds. The .ear file is merely a sample to demonstrate search results as RSS feeds.

To learn more about SIAPI, see the Resources section, where I have identified useful links to other developerWorks articles.

Before publishing this application for use, the WebSphere adminstrator needs to modify default settings in a configuration file called confi.properties. This file is deployed in the <WebSphere install directory>/ESRSSResults.ear/ESRSSResults.war/WEB-INF directory.

Update the following entries to access the searchable content hosted by OmniFind:

  • hostname=<Search server node for WebSphere Information Integrator OmniFind Edition>
  • port=<HTTP server port number>

If global security is enabled on the OmniFind search node, specify the following credentials so that the search requests are not prompted for user credentials:

  • username=<Valid user on the WebSphere Information Integrator OmniFind Edition search node>
  • password=<Associated password>

These are just a few basic configuration parameters to get started. The configurations file hosts several fields that can be specified to customize the content populated in the RSS feed. The customizable content can be title or description of your channel. Add more channel extensions honored by RSS 2.0 specifications that suits your organization or identify URLs that specify information when the search site is under maintenance. Please refer to config.properties, where the field names are self explanatory.

RSS 2.0 specifies at least 18 possible elements, of which only three are mandatory (<title/>, <link/> and <description/>). Based on your enterprise needs, the config.properties file provides a key for you to specify more channel elements (<webmaster/>,<textinput/> or <skipdays/> are few possible elements).



Back to top


Sample searches

Once you have successfully configured the application, users can update their personal RSS feed readers with URL(s) that execute searches with specific query terms and return results compatible to their readers. In this section, I have identified a vast set of URL samples that can help your users to create a URL of their preference.

In the samples that follow, I am assuming <yourserver> is the name of the server that will host the ESRSSResults.ear application.

Search for single term, say "IBM"

  • http://<yourserver>/ESRSSResults/rss.do?queryString=IBM

This field is mandatory and identifies the query term. Multiple query terms should be safely encoded for processing.

Search for multiple terms, say "IBM Domino"

Since "space" is a special character, the following sample safely encodes the space in the URL request:

  • http://<yourserver>/ESRSSResults/rss.do?queryString=IBM%20Domino

Note: "%20" represents the encoded value for "space."

Refer to the Resources section, where I have identified a document that helps to understand special encoding for the URLs. Also in the resources, refer to the application programmers' guide that describes acceptable formats of the query strings.

Search for multiple terms, say "+IBM -Domino +WebSphere"

The following sample safely encodes the special characters (plus, minus):

  • http://<yourserver>/ESRSSResults/rss.do?queryString=%2BIBM%20%-Domino%20%2BWebSphere

Note: "%2B" is the encoded value for the "+" and "-" does not need any encoding.

Search with security tokens

OmniFind can be configured to associate documents with security tokens or access control lists. The users will have to specify their security tokens or assigned access controls (example group names) to retrieve documents that are viewable to them. The following sample searches documents with security tokens or access controls:

  • http://<yourserver>/ESRSSResults/rss.do?queryString=business%20growth%20rate&acl=staff&acl=userid

The acl parameter is optional. If acl is not specified, then documents marked for public access will be returned. If multiple values are specified, the specified acl will be OR'ed for maximum results.

Since RSS feeds is a standard format to distribute data that is open for the general audience, make sure the content indexed by OmniFind is not confidential. Also, letting users specify arbitrary groups using this option can lead to security breaches. It is advisable to implement a filter that queries an enterprise user repository (LDAP) to determine the groups for a logged group. The discovered groups could then be appended as acl arguments to the search URL.

Search a specific application

All the above examples return results from the default application that is specified in config.properties. Searchable collections are isolated by application names. If the administrators for OmniFind configure application called "Finance" with collections that contain documents with your company's revenue statistics and an application called "QuickHelp" that may contain documents with customer problems, you can specify the appropriate application name for relevant information. Contact your administrator to get a list of searchable application names.

In the following sample, search for an application titled "Finance" for "business growth +programs":

  • http://<yourserver>/ESRSSResults/rss.do?queryString=business%20growth%20%2Bprograms&appID=Finance

The appID parameter is optional. If appID is not specified, the default value specified in config.properties will be used.

Again, the system administrators should be cautious of indexing confidential data and should segregate them with an application ID.

Search specific collections

The applications mentioned in the above section are made up of a group of collections. The sample queries described in the previous sections are based on application names. If the URL contains a application name but no collect ID, the query is submitted to all the collections and results are federated. To make the search results more granular, you can also specify the collection of interest. In such a case, the results are not federated. The collections are represented by unique IDs. Contact your administrator to get a list of collection IDs associated to an application.

In the following sample, search collection IDs named "col_2134" and "col_9876" associated with an application titled "Finance" for "business growth +programs":

  • http://<yourserver>/ESRSSResults/rss.do?queryString=business%20growth%20%2Bprograms&appID=Finance&colid=col_2134&colid=col_9876

The colid is an optional parameter. If colid is not specified, the search query will be invoked across all the collections mapped for the specified application. colid could represent multiple collection IDs.

Sorting the results

By default, the search results are sorted by relevance. You can also configure the URL to sort the results by date.

In the following sample, sort the results by date:

  • http://<yourserver>/ESRSSResults/rss.do?queryString=business&sortKey=date

This is an optional parameter. Relevance and date are the possible values.

By default, search results are displayed in descending order. The results can also be presented in ascending order.

In the following sample, display the results in ascending order:

  • http://<yourserver>/ESRSSResults/rss.do?queryString=business&sortOrder=ascending&sortKey=date

This is an optional parameter. Ascending or descending are the possible values. The search results are always descending when the sortKey is set to relevance.

When the changes take effect

If the administrator modifies the config.properties file, the changes will be effective when the successive search request is invoked with "refresh=true".

In the following sample, invoke the J2EE application to refresh its configuration with customized changes.

  • http://<yourserver>/ESRSSResults/rss.do?refresh=true

Note: This is an administration task.



Back to top


Summary

The application described in this article demonstrates the test search capabilities of WebSphere Information Integrator OmniFind Edition to address the need of distributing enterprise information to user communities who are hooked onto the RSS feeds for the latest information from intranet/internet sites. A user who might have configured a RSS feed reader to access the latest information from a plethora of Web sites can also discover a feed (a URL to this application) to access public content in the enterprise that changes periodically.




Back to top


Download

DescriptionNameSizeDownload method
Enterprise application archiveESRSSResults.ear4917KBHTTP
Information about download methods


Resources



About the author

author photo

Srinivas Varma Chitiveli is a software engineer in the IBM software group. He has been involved with IBM products that deal with technologies related to issuing digital certificates for secure e-business transactions, content management, and searching information across distributed data sources.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top