Skip to main content

skip to main content

developerWorks  >  Information Management | SOA and Web services  >

Use IBM OmniFind Yahoo! Edition REST APIs

developerWorks
Document options

Document options requiring JavaScript are not displayed

Sample code


Learn and share!

Exchange know-how with your peers -- try our new Pass It Along beta app


Rate this page

Help us improve this content


Level: Introductory

Arthur Choi (achoi@us.ibm.com), Technical Sales Consultant, IBM

19 Apr 2007

Learn how you can easily access a freely downloadable search engine, IBM® OmniFind™ Yahoo! Edition (referred to simply as OmniFind in this article), from your custom applications. OmniFind provides a Representational State Transfer (REST) Web service to expose its search and document push or delete APIs to other applications. Using these APIs, you can write custom search applications that provide your own search pages with a personalized look and feel. You can also write custom crawler applications that push and delete documents from other content repositories in addition to Web and file systems repositories that are currently supported by OmniFind.

Introduction

Like several other search engines, the OmniFind provides REST APIs to expose its functionalities. There are many reasons why REST is broadly used as a service delivery mechanism for many Web-based applications and services.

REST clients do not require any extra software except for the socket interfaces and XML parsers that come with the majority of computing platforms. In contrast, SOAP Remote Procedure Call (RPC) clients need to install special run time libraries for handling an additional message layer and transports. Therefore, the REST clients have less software dependency. You may find that there are many additional benefits of REST. For details on the REST architectural style and benefits, refer to the the REST Wikipedia Web site.

The following discussion, focuses on how to access each of the services provided by the OmniFind REST APIs with short sample code segments. The sample code segments presented in this article were implemented to complement the "IBM OmniFind Yahoo! Edition Programming Guide and API reference." Together with the programming guide and sample codes, you can immediately begin to write custom applications. Therefore, if you haven't already done it, download IBM OmniFind Yahoo! Edition and use OmniFind REST APIs to write your own custom applications.



Back to top


Summary of OmniFind REST APIs

The services exposed by the OmniFind APIs include searching the index, adding documents to the index, and deleting documents from the index. The following is a summary of OmniFind REST APIs. This summary includes the service end point URL, a type of underlying HTTP methods, the required parameters, and a brief description of the service.


End point URLHTTP methodRequired parametersDescription
http://host:port/api/searchGETindex=Default, querySearch and return a search result as an atom feed or an HTML snippet
http://host:port/api/documentPOSTindex=Default, action=addDocument, docType, docIdAdd a document to the default index
http://host:port/api/documentDELETEindex=Default, action=deleteDocument, docIdDelete a document from the default index

The end-point URL for the search API is http://host:port/api/search. The search API uses the HTTP GET method to send queries and to receive a search result. Various search parameters are passed as an HTTP GET URL. The end-point URL for the document API is http://host:port/api/document. The add document API uses the HTTP POST method. Various parameters for the add document action are passed as HTTP request headers and the actual document content is passed as the HTTP POST body. The delete document API uses the HTTP DELETE method. Various parameters for the delete document action are passed as HTTP request headers.

Note that although there is the required "index" parameter for all OmniFind APIs, the OmniFind currently supports only one predefined index, Default. Therefore, the value of the "index" parameter should always be "Default". For the document push API, you need to push the document content as the HTTP POST body, and set an appropriate content type (such as MIME type) using the "docType" parameter.

The docId must uniquely identify a particular document in an index. The docIds are used for document push and delete operations to identify a particular document in the default index. The docId is also used for retrieving the original document after a search. In order to retrieve the original content correctly, the docId must be a valid URL. If the URLs are in the standard formats, which can be directly handled by browsers, you do not have to do anything to acquire the original content. Otherwise, you may need to create a custom document retrieval J2EE Web application and pass the docId to the application so that the original document can be retrieved. Although retrieving the original content is important, it is not discussed further in this article since it is not a part of OmniFind REST APIs.



Back to top


Typical usages of the OmniFind REST APIs

Custom search Web applications are written to provide your search pages with a unique look-and-feel. As shown in Figure 1, the typical custom search Web application accepts query strings from clients and searches the OmniFind index by using the OmniFind search REST API. How to present the search input forms and how to display the search result is completely up to customer application. The custom Web application may likely incorporate AJAX and DHTML to further improve user experience and UI responsiveness. Custom Web-based search applications are typically deployed into a J2EE application server and these applications use OmniFind REST APIs to utilize the OmniFind search services. These search applications interact with a J2EE application server embedded in the OmniFind server node to utilize the OmniFind search services.


Figure 1. Custom search application
Custom search application

Currently, the OmniFind supported data sources include Web sites and file systems. To index data from other repositories, you need to write custom crawlers that retrieve documents from those repositories and push them into the OmniFind index. For example, as shown in Figure 2, in order to index e-mails to the OmniFind index, you need to write a custom email crawler that extracts emails from email servers and pushes those extracted emails to the OmniFind index by using the OmniFind document push API.


Figure 2. Custom crawler
Custom crawler



Back to top


Use Apache Jakarta Commons HttpClient & ROME libraries

Since the OmniFind REST APIs are based on standard HTTP methods, the only requirement for the programming environment is a capability of issuing the HTTP GET/POST/DELETE request and consuming the corresponding HTTP response. Basically, all platforms and programming languages support HTTP protocol. However, this article only concentrates on the Java™ platform.

The Java platform provides the java.net.* package that allows us to write a variety of networking applications. You can use the raw socket level interfaces or higher URLConnection and HttpURLConnection classes from the java.net package. However, to avoid reinventing the wheel, and to simplify the sample code further, the Apache Jakarta Commons HttpClient library is used in this example for handling HTTP request or response. The following excerpt from the Apache HttpClient Web site states the goal of the Apache Jarkarta common HTTP client library clearly.

"Although the java.net package provides basic functionality for accessing resources via HTTP, it doesn't provide the full flexibility or functionality needed by many applications. The Jakarta Commons HttpClient component seeks to fill this void by providing an efficient, up-to-date, and feature-rich package implementing the client side of the most recent HTTP standards and recommendations."

Another open source that is used is ROME (RSS and Atom Utilities for Java). As is discussed later, the OmniFind search API supports two search response formats; an HTML snippet and the default Atom 1.0 feed format. The ROME library simplifies the consumption of the Atom 1.0 feed among several other syndicated feed formats. This is done by mapping the raw XML stream to Java objects, which can be easily manipulated within Java programs.

Armed with these open source libraries, the following discussion explores the OmniFind REST APIs with short sample programs.



Back to top


Use the OmniFind Search API

The OmniFind Search API is accessed by the HTTP GET method. There are eight search parameters supported: an index, a query text, a query language, an output format, a start offset, a result language filter, a number of results, and a fully qualified URL to the XSL style sheet that formats the output results. Among these parameters, the only required parameters are the index and a query text. The other unspecified parameters take on default values. For details on supported search request parameters, refer to "IBM OmniFind Yahoo! Edition Programming Guide and API reference ."

The following program is a simple OmniFind search application that sends a search request and retrieves a search response by utilizing the Apache HttpClient library:


Listing 1. Sample search program
                
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.methods.GetMethod;

/**
 * This is a simple search example against the OmniFind default index.
 * 
 * Steps as following; (1)construct HttpClient so that we can issue HTTP GET
 * request later (2) construct search URL with the OmniFind search REST endpoint
 * and any necessary parameters such as index and query parameters (3) construct
 * GetMethod with the search URL (4) execute the GetMethod and check HTTP status
 * (5) retrieve the search result
 */
public class SimpleSearch {
  
  // omnifind the REST end point URL for OmniFind search
  final static String targetSearchAPIEndpointURL = "http://localhost/api/search";
  
  public static void main(String[] args) throws UnsupportedEncodingException {
    // query string command line parameter
    String queryString = args[0];
    
    // construct HttpClient so that we issue HTTP GET request
    HttpClient client = new HttpClient();
    
    // construct search URL with the OmniFind search REST endpoint
    // and any necessary parameters such as index and query
    StringBuffer url = new StringBuffer();
    url.append(targetSearchAPIEndpointURL);
    url.append("?index=Default&");
    url.append("query=" + java.net.URLEncoder.encode(queryString, "UTF-8"));
    
    // construct GetMethod with the search URL
    HttpMethod method = new GetMethod(url.toString());
    try {
      // execute the GetMethod and check http response
      String responseBody = null;
      int status = client.executeMethod(method);
      if (status != HttpStatus.SC_OK) {
        System.err.println("SimpleSearch failed: " + method.getStatusLine());
        System.exit(status);
      }
      // retrieve the search result. By default it is in Atom 1.0 format
      responseBody = method.getResponseBodyAsString();
      // write out the response body
      System.out.println(responseBody);
    } catch (HttpException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    } finally {
      // clean up the connection resources
      method.releaseConnection();
    }
  }
}

This program constructs an HttpClient object and an OmniFind search URL. The search URL just contains the required "index" and "query" search parameters. Based on the constructed URL, a GetMethod object is instantiated and the method is executed. After the execution, the GetMethod class provides various ways to extract an HTTP response. In this case, the simple getResponseBodyAsString method is used to obtain the whole response string. However, the GetMethod class provides other methods that return an input stream instead of a string. For large and long responses, using an input stream from the response is recommended. For more details, refer to the Apache HTTP Client v3.1.Beta Javadocs.

The output from the program returns as a XML string which is in the Atom 1.0 syndicated feed format. This is the case as we did not specify the "output" format parameter. The OmniFind supports two search result formats: Atom 1.0 and HTML snippet. The search result format is controlled by the "output" search parameter. Unless specified, the default is in Atom 1.0 format as shown below.


Listing 2. Sample search results in ATOM format
                <?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
	xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
	<title>Search results for query 'Axis' on index Default</title>
	<link
		href="http://localhost:8800/api/search?
		index=Default&amp;results=1&amp;query=Axis"
		rel="self" type="application/atom+xml" />
	<author>
		<name>IBM OmniFind Yahoo! Edition API Web Service</name>
	</author>
	<id>
		http://localhost:8800/api/search?index=Default&amp;
		results=1&amp;query=Axis
	</id>
	<category term="Default" label="Default" />
	<updated>2007-01-07T03:09:48Z</updated>
	<opensearch:totalResults>94</opensearch:totalResults>
	<opensearch:Query role="request" searchTerms="Axis" />
	<opensearch:startIndex>1</opensearch:startIndex>
	<opensearch:itemsPerPage>1</opensearch:itemsPerPage>
	<entry>
             <link
                href="http://www.javaworld.com/askTheExpert.jsp?
                pagename=http:%2F%2Fwww.javaworld.com%2Fjavaworld%2Fjw-02-2006%2F
                jw-0220-axis.html&amp;pagename=%2Fjavaworld%2Fjw-02-2006%2F
                jw-0220-axis.html&amp;pageurl=%2Fjavaworld%2Fjw-02-2006%2F
                jw-0220-axis.html&amp;pubsite=j"
                rel="alternate" type="text/html" 
                hreflang="en" />
             <link
                href="http://192.168.0.100:8800/search/?query=cache::http%3A%2F%2F
                www.javaworld.com%2FaskTheExpert.jsp%3Fpagename%3Dhttp%3A%252F%252F
                www.javaworld.com%252Fjavaworld%252Fjw-02-2006%252Fjw-0220-axis.html%26
                pagename%3D%252Fjavaworld%252Fjw-02-2006%252Fjw-0220-axis.html%26
                pageurl%3D%252Fjavaworld%252Fjw-02-2006%252F
                jw-0220-axis.html%26pubsite%3Dj&amp;output=binary"
                rel="via" type="text/html" 
                hreflang="en" />
             <opensearch:relevance>1.55</opensearch:relevance>
             <title type="html">Feedback Form</title>
             <updated>2007-01-06T14:56:04Z</updated>
             <id>
                http://www.javaworld.com/askTheExpert.jsp?pagename=http:%2F%2F
                www.javaworld.com%2Fjavaworld%2Fjw-02-2006%2F
                jw-0220-axis.html&amp;pagename=%2Fjavaworld%2Fjw-02-2006%2F
                jw-0220-axis.html&amp;pageurl=%2Fjavaworld%2Fjw-02-2006%2F
                jw-0220-axis.html&amp;pubsite=j
             </id>
             <summary type="html">
                &lt;SPAN class=&quot;ellipsis&quot;&gt;...
                &lt;/SPAN&gt;Feedback. Tell us your thoughts on this article
                or the issues raised in it. We&apos;ll cc: the author
                &lt;SPAN class=&quot;ellipsis&quot;&gt;
                ... &lt;/SPAN&gt;
             </summary>
       </entry>
</feed>

To programmatically consume the search response in Atom 1.0, you can use any XML parsers. Unfortunately, doing so can be a very tedious process. Fortunately, there is an open source class library, ROME, which can facilitate the consumption of Atom 1.0 feed among many other syndicated feed formats. Basically, ROME is used to parse any RSS or Atom feed into a canonical bean interface, SyndFeed. In this way, it allows developers to deal with Java objects instead of an XML raw string.

The following code snippet shows how to consume the search response stream by using ROME. In this case, you are extracting the first "/feed/entry/link" element from the search response stream and displaying it to the console. To do this, you create a SyndFeedInput instance and feed in the search response stream. The SyndFeedInput returns the SyndFeed object instance so that you can access necessary information. For details on OmniFind Atom 1.0 elements, refer to the "IBM OmniFind Yahoo! Edition Programming Guide and API reference."


Listing 3. Sample search with ROME parsing
                
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.Iterator;
import java.util.List;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.methods.GetMethod;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.FeedException;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;

/**
 * This is a simple search example against an OmniFind default index. For this
 * program, we are parsing the search results with ROME library to extract feed
 * entries.
 * 
 * Steps as following; (1)construct HttpClient so that we can issue HTTP GET
 * request later (2) construct search URL with the OmniFind search REST endpoint
 * and any necessary parameters such as index and query parameters (3) construct
 * GetMethod with the search URL (4) execute the GetMethod and check HTTP status
 * (5) retrieve the search result (6) convert the search result in ATOM to a
 * SyndFeed using ROME library (7) extract feed entries
 */
public class SimpleSearchWithParsing {
  
  // omnifind the REST end point URL for OmniFind search
  final static String targetSearchAPIEndpointURL = "http://localhost/api/search";
  
  public static void main(String[] args) throws UnsupportedEncodingException {
    // query string command line parameter
    String queryString = args[0];
    
    // construct HttpClient so that we issue HTTP GET request
    HttpClient client = new HttpClient();
    
    // construct search URL with the OmniFind search REST endpoint and any
    // necessary parameters such as index and query parameters
    StringBuffer url = new StringBuffer();
    url.append(targetSearchAPIEndpointURL);
    url.append("?index=Default&");
    url.append("results=1&");
    url.append("query=" + java.net.URLEncoder.encode(queryString, "UTF-8"));
    
    // construct GetMethod with the search REST end point URL
    HttpMethod method = new GetMethod(url.toString());
    try {
      // execute the GetMethod and check http status
      int status = client.executeMethod(method);
      if (status != HttpStatus.SC_OK) {
        System.err.println("SimpleSearchWithParsing failed: "
            + method.getStatusLine());
        System.exit(status);
      }
      
      // retrieve the search result
      // convert the search result in ATOM to a SyndFeed using ROME
      // library
      SyndFeedInput input = new SyndFeedInput();
      SyndFeed feed = input.build(new XmlReader(method
          .getResponseBodyAsStream()));
      
      // extract feed entries
      List list = feed.getEntries();
      for (Iterator iter = list.iterator(); iter.hasNext();) {
        SyndEntry entry = (SyndEntry) iter.next();
        System.out.println(entry.getLink());
      }
    } catch (HttpException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    } catch (FeedException e) {
      e.printStackTrace();
    } finally {
      // clean up the connection resources
      method.releaseConnection();
    }
  }
}

As shown from the aforementioned two programs, issuing search requests and consuming responses can be done very easily because the APIs are based on standard HTTP methods. Many available open source Java libraries further simplify the access of the OmniFind search function.

As briefly mentioned before, OmniFind supports two search response formats. In addition to the default Atom 1.0 feed format, OmniFind can return a search response in HTML snippets. By simply adding the "output" search parameter and setting its value to "htmlsnippet", the search response is returned as an HTML snippet. Listing 4 is an example HTML snippet output. The HTML snippet contains cascading styles and HTML elements that represent the search response list.


Listing 4. Sample search response in HTML snippets
                <link rel="stylesheet" type="text/css" href="styles/nuvo.css">
   <style>
   body { 
  font-family: Verdana; 
  font-size: 12px; 
  background-repeat: no-repeat; 
  background-color: white; 
} 
.background { 
  background-repeat: no-repeat; 
} 
…
.resultFooterHtmlCachedLink { 
  font-family: Arial; 
  color: #8284cc; 
} 
.resultFooterOriginalCachedLink { 
  font-family: Arial; 
  color: #8284cc; 
} 
 
</style>
<div id=yschres>
    <div id=yschcont style="margin-left:0px;">
         <div id=yschpri style="margin-left:5px;">
            <div id=yschrel>
            </div>
            <div id=yschweb>
               <ol start=1>    
	             <li>            
		       <div>
		             <a class="yschttl" style="
		             font-family: Arial; color: #0000de; " 
		             href="http://www.javaworld.com/
		             askTheExpert.jsp?pagename=http:%2F%2F
		             www.javaworld.com%2F
		             javaworld%2Fjw-02-2006%2F
		             jw-0220-axis.html&amp;pagename=%2F
		             javaworld%2Fjw-02-2006%2F
		             jw-0220-axis.html&amp;pageurl=%2F
		             javaworld%2F
		             jw-02-2006%2Fjw-0220-axis.html&amp;pubsite=j">
		             Feedback Form</a>
		             </div>
		       <div class=yschabstr style="font-family: Arial; 
		            color: black; ">               
		             <SPAN class="ellipsis">... </SPAN>
		             Feedback.  Tell us your thoughts on this article or 
		             the issues raised in it. We'll cc: the 
		             author <SPAN class="
		             ellipsis">... </SPAN></div>
		       <em class=yschurl  style="font-family: 
		       Arial; color: #088000; " >
		          http://www.javaworld.com/askTheExpert.jsp?pagename=http:%2F%2F
		          www.javaworld.com%2Fjavaworld%2Fjw-02-2006%2F
		          jw-0220-<SPAN class="
		          highlight"><SPAN class="
		          hlTerm0">axis</SPAN></SPAN>.html&amp;
		          pagename=%2Fjavaworld%2Fjw-02-2006%2Fjw-0220-<SPAN class="
		          highlight"><SPAN class="
		          hlTerm0">axis</SPAN></SPAN>.html&amp;
		          pageurl=%2Fjavaworld%2Fjw-02-2006%2Fjw-0220-<SPAN 
		          class="highlight"><SPAN class="
		          hlTerm0">axis</SPAN></SPAN>.html&amp;
		          pubsite=j </em> 
	                      - 
		 </ol>   
             </div>   
         </div>
      </div>
   </div>

The HTML snippet output format is more suitable for J2EE Web applications that actually render the search result for the end users. Using DHTML, the returned HTML snippet can be dynamically inserted into the search result page. For programmatic accesses, the default Atom feed format is more appropriate.

Related to the Atom feed format, there is another available search parameter, called a "stylesheet". If specified, this parameter should be a fully qualified URL to the XSL stylesheet that formats the output results. This parameter is only effective when the "output" search parameter's value is "atomxml".

Note that the XSL stylesheet specified as a parameter is not processed on the OmniFind search engine. The actual transformation is done by client applications that can be an XSL-compliant Web browser or custom XSLT applications. For details on XSLT transformation of OmniFind search results, refer to: "Add IBM OmniFind Yahoo! Edition to your Web site" (developerWorks, Dec 2006).



Back to top


Use the document push API to add a document

To add documents, use the OmniFind document API, which is also based on a standard HTTP method like the search API. However, unlike the search API, the document push API can be accessed by the HTTP POST method. Unlike search, accessing the document API requires an API password from the OmniFind admin console (see Figure 3). You can get the API password by clicking the "Manage System -> Manage Authentication" menus from the admin console.


Figure 3. API password
API password

The following program is a simple OmniFind document push client that adds a PDF document to the default OmniFind index. The Apache HttpClient library is used to issue a HTTP POST method, PostMethod. Since the document API requires an API password, the HTTP basic authentication with the API password is used. The user ID of the basic authentication is not used and ignored. Therefore, it can be any value.

The document push API supports seven parameters such as an action parameter, an index, a document type(docType), a document content fallback language, the document content known language, a document id (docId), and the last modified date. Among these, the action, index, docType, and docId parameters are required. If unspecified, other parameters retain default values specified in the "IBM OmniFind Yahoo! Edition Programming Guide and API reference."


Listing 5. Sample document push program
                
import java.io.File;
import java.io.IOException;
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.UsernamePasswordCredentials;
import org.apache.commons.httpclient.auth.AuthScope;
import org.apache.commons.httpclient.methods.FileRequestEntity;
import org.apache.commons.httpclient.methods.PostMethod;
import org.apache.commons.httpclient.methods.RequestEntity;

/**
 * This is a document push example.
 * 
 * Steps as following; (1)construct HttpClient so that we can issue HTTP POST
 * request later (2) set the credential using OmniFind API password (2)
 * construct a PostMethod and set necessary request headers (4) set the request
 * entity with the file content to be indexed (5) execute the PostMethod and
 * check HTTP status (6) check an error
 */
public class PushPDFDocument {
  
  // omnifind the REST end point URL for document push
  final static String targetDocumentAPIEndpointURL = "http://localhost/api/document";
  
  public static void main(String[] args) {
    
    // construct HttpClient so that we issue HTTP POST request
    HttpClient client = new HttpClient();
    
    // set the credential using OmniFind API password
    client.getState().setCredentials(new AuthScope("localhost", 80),
        new UsernamePasswordCredentials("notused", "+UrSlFg="));
    
    // construct a PostMethod and set necessary request headers
    PostMethod method = new PostMethod(targetDocumentAPIEndpointURL);
    method.setDoAuthentication(true);
    method.setRequestHeader("action", "addDocument");
    method.setRequestHeader("index", "Default");
    method.setRequestHeader("docId", "pdfrepository://oye_api_guide_v84.pdf");
    method.setRequestHeader("docType", "application/pdf");
    
    try {
      // set the request entity with the file content to be indexed
      File input = new File("oye_api_guide_v84.pdf");
      RequestEntity entity = new FileRequestEntity(input, "application/pdf");
      method.setRequestEntity(entity);
      
      // execute the post method and check HTTP status
      int status = client.executeMethod(method);
      if (status != HttpStatus.SC_OK) {
        System.err.println("PushPDFDocument failed: " + method.getStatusLine());
        System.exit(status);
      }
      
      // Note on the response from a document API call. The HTTP response
      // code is always 200, with the exception of 401 for failed
      // authentication. If an error occurs during the document insert
      // process, the HTTP response code is still 200 and the HTTP
      // response header contains a parameter of "hasError". 
      // Therefore, to do the proper handling of the document push result,
      // you must check for the existence of the "hasError" response
      // header or the HTTP 401 response code.
      Header hasErrorHeader = method.getResponseHeader("hasError");
      if (hasErrorHeader == null) {
        System.out.println("document pushed fine.");
      } else {
        String errorResponse = method.getResponseBodyAsString();
        System.out.println("error response: " + errorResponse);
      }
    } catch (HttpException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    } finally {
      // clean up the connection resources
      method.releaseConnection();
    }
  }
}

To add a PDF document, the program creates an HttpClient object and sets the HTTP Basic Authentication userID/password credential by calling setCredentials() method. The required parameters for the add document action are passed as HTTP request headers. Finally, a PostMethod object is instantiated and executed. Before executing the PostMethod, the actual PDF file is set as a HTTP POST body of the PostMethod.

Note on the response from a document push API call, the HTTP response code is always 200, with the exception of 401 for failed authentication. If an error occurs during the document insert process, the HTTP response code is still 200 and the HTTP response header contains a parameter of "hasError". Therefore, to do the proper handling of the document push result, you must check for the existence of the "hasError" response header or the HTTP 401 response code.

Although the document ID was set arbitrarily in the previous program, it was not a good practice since the original document could not be fetched after the search. If the docID value is not a valid URL, the document will not be a clickable result from the search result page. For instance, after the above program was executed, I searched for the document I just added. The search page displayed the "pdfrepository://oye_api_guide_v84.pdf" link. Since this docId was not a valid URL, if clicked, an error page shows up. However, the original document may still be viewed by clicking the Cached link. For your programs, you need to set the docId with a valid URL in order to retrieve the original document reliably.


Figure 4. Unclickable link
Unclickable link

Note on adding documents through OmniFind document push APIs: The programming guide says "Documents that are added into the index by the addDocument API cannot be tracked in the Document Status window in the administration console."



Back to top


Use the document API to delete a document

To delete documents, use the OmniFind document API, which is also based on a standard HTTP method like the search API. However, unlike the search API, the document delete API can be accessed by the HTTP DELETE method. The document delete action requires the API password from the OmniFind admin console, as shown in Figure 3.

The following is a simple program that deletes a PDF document from the Default OmniFind index. Like in search programs, we are using Apache HttpClient library to issue an HTTP method. At this time, the Apache HttpClient library's DeleteMethod is being used instead of GetMethod. Because the document API requires an API password, the HTTP basic authentication is being used with the API password. The user ID of the basic authentication is not used and ignored. Therefore, it can be any value.

The document delete API requires three parameters: an action parameter, an index, and a document ID (docId). The value of the action parameter must be "deleteDocument". The value of the index parameter must be "Default". The docId should uniquely identify a particular document to be deleted from the Default index. If the given docId is not found from the index, it is treated as a warning and the delete API does not return an error.


Listing 6. Sample document delete program
                
import java.io.IOException;
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.UsernamePasswordCredentials;
import org.apache.commons.httpclient.auth.AuthScope;
import org.apache.commons.httpclient.methods.DeleteMethod;

/**
 * This is a document delete example.
 * 
 * Steps as following; (1)construct HttpClient so that we can issue HTTP DELETE
 * request later (2) set the credential using OmniFind API password (2)
 * construct a DeleteMethod and set necessary request headers (4) execute the
 * DeleteMethod and check HTTP status (5) check an error
 */

public class DeletePDFDocument {
  
  // omnifind the REST end point URL for document push
  final static String targetDocumentAPIEndpointURL = "http://localhost/api/document";
  
  public static void main(String[] args) throws HttpException, IOException {
    
    // construct HttpClient so that we issue HTTP POST request
    HttpClient client = new HttpClient();
    
    // set the credential using OmniFind API password
    client.getState().setCredentials(new AuthScope("localhost", 80),
        new UsernamePasswordCredentials("notused", "+UrSlFg="));
    
    // construct a DeleteMethod and set necessary request headers
    DeleteMethod method = new DeleteMethod(targetDocumentAPIEndpointURL);
    method.setDoAuthentication(true);
    method.setRequestHeader("action", "deleteDocument");
    method.setRequestHeader("index", "Default");
    // the docId of the document to be deleted
    method.setRequestHeader("docId", "pdfrepository://oye_api_guide_v84.pdf");
    
    try {
      // execute the delete method
      int status = client.executeMethod(method);
      if (status != HttpStatus.SC_OK) {
        System.err.println("DeletePDFDocument failed: "
            + method.getStatusLine());
        System.exit(status);
      }
      
      // Note on the response from a document API call. The HTTP response
      // code is always 200, with the exception of 401 for failed
      // authentication. If an error occurs during the document insert
      // process, the HTTP response code is still 200 and the HTTP
      // response header contains a parameter of "hasError". 
      // Therefore, to do the proper handling of the document push result, 
      // you must check for the existence of the "hasError" response 
      // header or the HTTP 401 response code.
      Header hasErrorHeader = method.getResponseHeader("hasError");
      if (hasErrorHeader == null) {
        System.out.println("document deleted.");
      } else {
        String errorResponse = method.getResponseBodyAsString();
        System.out.println("error response: " + errorResponse);
      }
    } finally {
      // clean up the connection resources
      method.releaseConnection();
    }
  }
}

Note: Deleting a document does not guarantee that the document is no longer searchable. The programming guide says, "The time that is required for the document to be no longer searchable depends on the search server load at the time when the delete request was issued."



Back to top


Summary

In this article, several simple programs that utilize OmniFind search and the document functionality were used. Because the OmniFind APIs are based on standard HTTP methods, and there are many available open source libraries and tools, accessing OmniFind functions do not require much efforts. The Apache HttpClient library was used for handling HTTP GET/POST/DELETE methods. The search results in Atom feed format were able to be consumed with a couple of lines of codes because of the availability of the open source ROME library. Download OmniFind Yahoo! Edition and try these techniques to enhance your own custom applications.




Back to top


Download

DescriptionNameSizeDownload method
Sample programs for this articleofsamples.zip980KBHTTP
Information about download methods


Resources

Learn

Get products and technologies
  • Download IBM Apache HTTP Client v3.1.Beta.

  • Download ROME, an open source library that helps you to work in Java with most syndication formats: RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0, to find out details on the OmniFind REST APIs.

  • Download IBM OmniFind Yahoo! edition.

  • Build your next development project with IBM trial software, available for download directly from developerWorks.


Discuss


About the author

Arthur Choi is currently serving as a technical sales consultant for the Content Management & Discovery Center of Excellence in IBM's Information Management Division. He is responsible for providing services to IBM's customers by deploying IBM's enterprise search solutions into their enterprise computing environments. He has previously worked on a variety of enterprise search related projects including Lotus Extended Search and WebSphere Enterprise Search products.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top