 | Level: Introductory Arthur Choi (achoi@us.ibm.com), Technical Sales Consultant, IBM
19 Apr 2007 Learn how you can easily access a freely downloadable search engine, IBM® OmniFind™ Yahoo! Edition (referred to simply as OmniFind in this article), from your custom applications. OmniFind provides a Representational State Transfer (REST) Web service to expose its search and document push or delete APIs to other applications. Using these APIs, you can write custom search applications that provide your own search pages with a personalized look and feel. You can also write custom crawler applications that push and delete documents from other content repositories in addition to Web and file systems repositories that are currently supported by OmniFind.
Introduction
Like several other search engines, the OmniFind provides REST APIs to expose its functionalities. There are many reasons why REST is broadly used as a service delivery
mechanism for many Web-based applications and services.
REST clients do not require any extra software except for
the socket interfaces and XML parsers that come with the majority
of computing platforms. In contrast, SOAP Remote Procedure Call (RPC) clients need
to install special run time libraries for handling an additional message
layer and transports. Therefore, the REST clients have less
software dependency. You may find that there are many additional benefits of REST.
For details on the REST architectural style and benefits, refer to the the REST Wikipedia Web site.
The following discussion, focuses on how to access each
of the services provided by the OmniFind REST APIs with short
sample code segments. The sample code segments presented in this article were implemented
to complement the "IBM OmniFind Yahoo! Edition Programming Guide and API reference." Together
with the programming guide and sample codes, you can immediately
begin to write custom applications. Therefore, if you haven't already done it,
download IBM OmniFind Yahoo! Edition
and use OmniFind REST APIs to write your own custom applications.
Summary of OmniFind REST APIs
The services exposed by the OmniFind APIs include searching the index,
adding documents to the index, and deleting documents from the index.
The following is a summary of OmniFind REST APIs. This summary includes
the service end point URL, a type of underlying HTTP methods, the required parameters,
and a brief description of the service.
| End point URL | HTTP method | Required parameters | Description |
|---|
| http://host:port/api/search | GET | index=Default, query | Search and return a search result as an atom feed or an HTML snippet |
|---|
| http://host:port/api/document | POST | index=Default, action=addDocument, docType, docId | Add a document to the default index |
|---|
| http://host:port/api/document | DELETE | index=Default, action=deleteDocument, docId | Delete a document from the default index |
|---|
The end-point URL for the search API is http://host:port/api/search.
The search API uses the HTTP GET method to send queries and to receive a
search result. Various search parameters are passed as an HTTP GET URL.
The end-point URL for the document API is http://host:port/api/document.
The add document API uses the HTTP POST method. Various parameters for
the add document action are passed as HTTP request headers and the actual
document content is passed as the HTTP POST body. The delete document
API uses the HTTP DELETE method. Various parameters for the delete
document action are passed as HTTP request headers.
Note that although there is the required "index" parameter
for all OmniFind APIs, the OmniFind currently supports only one
predefined index, Default. Therefore, the value of the
"index" parameter should
always be "Default".
For the document push API, you need to push the document content as
the HTTP POST body, and set an appropriate content type (such as MIME type) using
the "docType" parameter.
The docId must uniquely identify
a particular document in an index. The docIds are used for document push and delete
operations to identify a particular document in the default index. The docId is
also used for retrieving the original document after a search. In order to retrieve
the original content correctly, the docId must be a valid URL. If the URLs
are in the standard formats, which can be directly handled by browsers, you
do not have to do anything to acquire the original content.
Otherwise, you may need to
create a custom document retrieval J2EE Web application and
pass the docId to the application so that the original document
can be retrieved. Although retrieving the original content is
important, it is not discussed further in this article since it
is not a part of OmniFind REST APIs.
Typical usages of the OmniFind REST APIs
Custom search Web applications are written to provide your search pages
with a unique look-and-feel. As shown in Figure 1, the typical
custom search Web application accepts query strings from clients and searches
the OmniFind index by using the OmniFind search REST API. How to present the
search input forms and how to display the search result is completely up to customer
application. The custom Web application may likely incorporate AJAX and
DHTML to further improve user experience and UI responsiveness.
Custom Web-based search applications are typically
deployed into a J2EE application server and these applications
use OmniFind REST APIs to utilize the OmniFind search services. These search
applications interact with a J2EE application server embedded in the OmniFind
server node to utilize the OmniFind search services.
Figure 1. Custom search application
Currently, the OmniFind supported data sources include Web sites and file
systems. To index data from other repositories, you need to write custom
crawlers that retrieve documents from those repositories and push them
into the OmniFind index. For example, as shown in Figure 2, in order to
index e-mails to the OmniFind index, you need to write a custom email crawler that
extracts emails from email servers
and pushes those extracted emails to the OmniFind index by using the OmniFind
document push API.
Figure 2. Custom crawler
Use Apache Jakarta Commons HttpClient & ROME libraries
Since the OmniFind REST APIs are based on standard HTTP methods,
the only requirement for the programming environment is a capability
of issuing the HTTP GET/POST/DELETE request and consuming the corresponding
HTTP response. Basically, all platforms and programming languages support
HTTP protocol. However, this article only concentrates on the
Java™ platform.
The Java platform provides the java.net.* package that
allows us
to write a variety of networking applications. You can use the raw
socket level interfaces or higher URLConnection and HttpURLConnection
classes from the java.net package. However, to avoid
reinventing the wheel, and to simplify the sample code further, the
Apache Jakarta Commons HttpClient library is used in this example for handling HTTP
request or response. The following excerpt from
the Apache HttpClient Web site states
the goal of the Apache Jarkarta common HTTP client library clearly.
"Although the java.net package provides basic functionality for
accessing resources via HTTP, it doesn't
provide the full flexibility or functionality needed by many applications.
The Jakarta Commons HttpClient component seeks to fill this void by providing
an efficient, up-to-date, and feature-rich package implementing the client
side of the most recent HTTP standards and recommendations."
Another open source that is used is ROME
(RSS and Atom Utilities for Java).
As is discussed later, the OmniFind search API supports two search
response formats; an HTML snippet and the default Atom 1.0 feed format. The
ROME library simplifies the consumption of the Atom 1.0 feed among several other
syndicated feed formats. This is done by mapping the raw XML stream to
Java objects, which can be easily manipulated within Java programs.
Armed with these open source libraries, the following discussion explores the OmniFind REST APIs with short sample programs.
Use the OmniFind Search API
The OmniFind Search API is accessed by the HTTP GET method. There are eight
search parameters supported: an index, a query text, a query language, an
output format, a start offset, a result language filter, a number of results, and a fully
qualified URL to the XSL style sheet that formats the output results. Among these parameters, the
only required parameters are the index and a query text. The other
unspecified parameters take on default values. For details on supported search request parameters, refer to "IBM OmniFind Yahoo! Edition Programming Guide and API reference ."
The following program is a simple OmniFind search application that sends a search
request and retrieves a search response by utilizing the Apache HttpClient library:
Listing 1. Sample search program
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.methods.GetMethod;
/**
* This is a simple search example against the OmniFind default index.
*
* Steps as following; (1)construct HttpClient so that we can issue HTTP GET
* request later (2) construct search URL with the OmniFind search REST endpoint
* and any necessary parameters such as index and query parameters (3) construct
* GetMethod with the search URL (4) execute the GetMethod and check HTTP status
* (5) retrieve the search result
*/
public class SimpleSearch {
// omnifind the REST end point URL for OmniFind search
final static String targetSearchAPIEndpointURL = "http://localhost/api/search";
public static void main(String[] args) throws UnsupportedEncodingException {
// query string command line parameter
String queryString = args[0];
// construct HttpClient so that we issue HTTP GET request
HttpClient client = new HttpClient();
// construct search URL with the OmniFind search REST endpoint
// and any necessary parameters such as index and query
StringBuffer url = new StringBuffer();
url.append(targetSearchAPIEndpointURL);
url.append("?index=Default&");
url.append("query=" + java.net.URLEncoder.encode(queryString, "UTF-8"));
// construct GetMethod with the search URL
HttpMethod method = new GetMethod(url.toString());
try {
// execute the GetMethod and check http response
String responseBody = null;
int status = client.executeMethod(method);
if (status != HttpStatus.SC_OK) {
System.err.println("SimpleSearch failed: " + method.getStatusLine());
System.exit(status);
}
// retrieve the search result. By default it is in Atom 1.0 format
responseBody = method.getResponseBodyAsString();
// write out the response body
System.out.println(responseBody);
} catch (HttpException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
// clean up the connection resources
method.releaseConnection();
}
}
} |
This program constructs an HttpClient object and an OmniFind search URL. The search
URL just contains the required "index" and "query" search parameters.
Based on the constructed URL, a GetMethod object is instantiated and the method is executed. After the
execution, the GetMethod class provides various ways to extract an HTTP response. In this case, the simple getResponseBodyAsString method is used to obtain the whole response string. However,
the GetMethod class provides other methods that return an input stream instead of a string. For
large and long responses, using an input stream from the response is recommended. For more details, refer to
the Apache HTTP Client v3.1.Beta Javadocs.
The output from the program returns as a XML string which is in the Atom 1.0
syndicated feed format. This is the case as we did not specify the "output" format
parameter. The OmniFind supports two search result formats: Atom 1.0 and HTML snippet.
The search result format is controlled by the "output" search parameter. Unless
specified, the default is in Atom 1.0 format as shown below.
Listing 2. Sample search results in ATOM format
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">
<title>Search results for query 'Axis' on index Default</title>
<link
href="http://localhost:8800/api/search?
index=Default&results=1&query=Axis"
rel="self" type="application/atom+xml" />
<author>
<name>IBM OmniFind Yahoo! Edition API Web Service</name>
</author>
<id>
http://localhost:8800/api/search?index=Default&
results=1&query=Axis
</id>
<category term="Default" label="Default" />
<updated>2007-01-07T03:09:48Z</updated>
<opensearch:totalResults>94</opensearch:totalResults>
<opensearch:Query role="request" searchTerms="Axis" />
<opensearch:startIndex>1</opensearch:startIndex>
<opensearch:itemsPerPage>1</opensearch:itemsPerPage>
<entry>
<link
href="http://www.javaworld.com/askTheExpert.jsp?
pagename=http:%2F%2Fwww.javaworld.com%2Fjavaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pagename=%2Fjavaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pageurl=%2Fjavaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pubsite=j"
rel="alternate" type="text/html"
hreflang="en" />
<link
href="http://192.168.0.100:8800/search/?query=cache::http%3A%2F%2F
www.javaworld.com%2FaskTheExpert.jsp%3Fpagename%3Dhttp%3A%252F%252F
www.javaworld.com%252Fjavaworld%252Fjw-02-2006%252Fjw-0220-axis.html%26
pagename%3D%252Fjavaworld%252Fjw-02-2006%252Fjw-0220-axis.html%26
pageurl%3D%252Fjavaworld%252Fjw-02-2006%252F
jw-0220-axis.html%26pubsite%3Dj&output=binary"
rel="via" type="text/html"
hreflang="en" />
<opensearch:relevance>1.55</opensearch:relevance>
<title type="html">Feedback Form</title>
<updated>2007-01-06T14:56:04Z</updated>
<id>
http://www.javaworld.com/askTheExpert.jsp?pagename=http:%2F%2F
www.javaworld.com%2Fjavaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pagename=%2Fjavaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pageurl=%2Fjavaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pubsite=j
</id>
<summary type="html">
<SPAN class="ellipsis">...
</SPAN>Feedback. Tell us your thoughts on this article
or the issues raised in it. We'll cc: the author
<SPAN class="ellipsis">
... </SPAN>
</summary>
</entry>
</feed>
|
To programmatically consume the search response in Atom 1.0, you can use any XML parsers.
Unfortunately, doing so can be a very tedious process. Fortunately, there is an open source
class library, ROME, which can facilitate the consumption of Atom 1.0 feed among many
other syndicated feed formats. Basically, ROME is used to parse any RSS or Atom feed into a
canonical bean interface, SyndFeed. In this way, it allows developers to deal with Java
objects instead of an XML raw string.
The following code snippet shows how to consume the search response stream by using ROME.
In this case, you are extracting the first "/feed/entry/link" element from the
search response stream and displaying it to the console. To do this, you create a SyndFeedInput
instance and feed in the search response stream. The SyndFeedInput returns
the SyndFeed object instance so that you can access necessary information. For details on OmniFind Atom 1.0 elements,
refer to
the "IBM OmniFind Yahoo! Edition Programming Guide and API reference."
Listing 3. Sample search with ROME parsing
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.util.Iterator;
import java.util.List;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.methods.GetMethod;
import com.sun.syndication.feed.synd.SyndEntry;
import com.sun.syndication.feed.synd.SyndFeed;
import com.sun.syndication.io.FeedException;
import com.sun.syndication.io.SyndFeedInput;
import com.sun.syndication.io.XmlReader;
/**
* This is a simple search example against an OmniFind default index. For this
* program, we are parsing the search results with ROME library to extract feed
* entries.
*
* Steps as following; (1)construct HttpClient so that we can issue HTTP GET
* request later (2) construct search URL with the OmniFind search REST endpoint
* and any necessary parameters such as index and query parameters (3) construct
* GetMethod with the search URL (4) execute the GetMethod and check HTTP status
* (5) retrieve the search result (6) convert the search result in ATOM to a
* SyndFeed using ROME library (7) extract feed entries
*/
public class SimpleSearchWithParsing {
// omnifind the REST end point URL for OmniFind search
final static String targetSearchAPIEndpointURL = "http://localhost/api/search";
public static void main(String[] args) throws UnsupportedEncodingException {
// query string command line parameter
String queryString = args[0];
// construct HttpClient so that we issue HTTP GET request
HttpClient client = new HttpClient();
// construct search URL with the OmniFind search REST endpoint and any
// necessary parameters such as index and query parameters
StringBuffer url = new StringBuffer();
url.append(targetSearchAPIEndpointURL);
url.append("?index=Default&");
url.append("results=1&");
url.append("query=" + java.net.URLEncoder.encode(queryString, "UTF-8"));
// construct GetMethod with the search REST end point URL
HttpMethod method = new GetMethod(url.toString());
try {
// execute the GetMethod and check http status
int status = client.executeMethod(method);
if (status != HttpStatus.SC_OK) {
System.err.println("SimpleSearchWithParsing failed: "
+ method.getStatusLine());
System.exit(status);
}
// retrieve the search result
// convert the search result in ATOM to a SyndFeed using ROME
// library
SyndFeedInput input = new SyndFeedInput();
SyndFeed feed = input.build(new XmlReader(method
.getResponseBodyAsStream()));
// extract feed entries
List list = feed.getEntries();
for (Iterator iter = list.iterator(); iter.hasNext();) {
SyndEntry entry = (SyndEntry) iter.next();
System.out.println(entry.getLink());
}
} catch (HttpException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (FeedException e) {
e.printStackTrace();
} finally {
// clean up the connection resources
method.releaseConnection();
}
}
} |
As shown from the aforementioned two programs, issuing search requests and consuming responses can be
done very easily because the APIs are based on standard HTTP methods. Many available open source
Java libraries further simplify the access of the OmniFind search function.
As briefly mentioned before, OmniFind supports two search response formats. In addition to
the default Atom 1.0 feed format, OmniFind can return a search response in HTML snippets. By
simply adding the "output" search parameter and setting its value to
"htmlsnippet", the search response is returned as an HTML snippet. Listing 4
is an example HTML snippet output. The HTML snippet contains cascading styles
and HTML elements that represent the search response list.
Listing 4. Sample search response in HTML snippets
<link rel="stylesheet" type="text/css" href="styles/nuvo.css">
<style>
body {
font-family: Verdana;
font-size: 12px;
background-repeat: no-repeat;
background-color: white;
}
.background {
background-repeat: no-repeat;
}
…
.resultFooterHtmlCachedLink {
font-family: Arial;
color: #8284cc;
}
.resultFooterOriginalCachedLink {
font-family: Arial;
color: #8284cc;
}
</style>
<div id=yschres>
<div id=yschcont style="margin-left:0px;">
<div id=yschpri style="margin-left:5px;">
<div id=yschrel>
</div>
<div id=yschweb>
<ol start=1>
<li>
<div>
<a class="yschttl" style="
font-family: Arial; color: #0000de; "
href="http://www.javaworld.com/
askTheExpert.jsp?pagename=http:%2F%2F
www.javaworld.com%2F
javaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pagename=%2F
javaworld%2Fjw-02-2006%2F
jw-0220-axis.html&pageurl=%2F
javaworld%2F
jw-02-2006%2Fjw-0220-axis.html&pubsite=j">
Feedback Form</a>
</div>
<div class=yschabstr style="font-family: Arial;
color: black; ">
<SPAN class="ellipsis">... </SPAN>
Feedback. Tell us your thoughts on this article or
the issues raised in it. We'll cc: the
author <SPAN class="
ellipsis">... </SPAN></div>
<em class=yschurl style="font-family:
Arial; color: #088000; " >
http://www.javaworld.com/askTheExpert.jsp?pagename=http:%2F%2F
www.javaworld.com%2Fjavaworld%2Fjw-02-2006%2F
jw-0220-<SPAN class="
highlight"><SPAN class="
hlTerm0">axis</SPAN></SPAN>.html&
pagename=%2Fjavaworld%2Fjw-02-2006%2Fjw-0220-<SPAN class="
highlight"><SPAN class="
hlTerm0">axis</SPAN></SPAN>.html&
pageurl=%2Fjavaworld%2Fjw-02-2006%2Fjw-0220-<SPAN
class="highlight"><SPAN class="
hlTerm0">axis</SPAN></SPAN>.html&
pubsite=j </em>
-
</ol>
</div>
</div>
</div>
</div> |
The HTML snippet output format is more suitable for J2EE Web applications that actually
render the search result for the end users. Using DHTML, the returned HTML snippet can be
dynamically inserted into the search result page. For programmatic accesses, the default
Atom feed format is more appropriate.
Related to the Atom feed format, there is another available search parameter, called a
"stylesheet". If specified, this parameter should be a fully qualified URL to
the XSL stylesheet that formats the output results. This parameter is only effective when
the "output" search parameter's value is "atomxml".
Note that the XSL stylesheet specified as a parameter is not processed on the
OmniFind search engine. The actual transformation is done by client applications that
can be an XSL-compliant Web browser or custom XSLT applications. For details on XSLT
transformation of OmniFind search results, refer to:
"Add IBM OmniFind
Yahoo! Edition to your Web site" (developerWorks, Dec 2006).
Use the document push API to add a document
To add documents, use the OmniFind document API, which is also based on a standard
HTTP method like the search API. However, unlike the search API, the document push API
can be accessed by the HTTP POST method. Unlike search, accessing the document API requires
an API password from the OmniFind admin console (see Figure 3). You can get the
API password by clicking the "Manage System -> Manage Authentication" menus from the admin console.
Figure 3. API password
The following program is a simple OmniFind document push client that adds a PDF document
to the default OmniFind index. The Apache HttpClient library is used to issue a HTTP POST method,
PostMethod. Since the document API requires an API password, the HTTP basic authentication with the API password is used. The user ID of the basic authentication is not used and
ignored. Therefore, it can be any value.
The document push API supports seven parameters such as an action
parameter, an index, a document type(docType), a document content
fallback language, the document content known language, a document id (docId), and the
last modified date. Among these, the action, index, docType, and docId parameters are required. If unspecified, other parameters retain default values specified in
the "IBM OmniFind Yahoo! Edition Programming Guide and API reference."
Listing 5. Sample document push program
import java.io.File;
import java.io.IOException;
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.UsernamePasswordCredentials;
import org.apache.commons.httpclient.auth.AuthScope;
import org.apache.commons.httpclient.methods.FileRequestEntity;
import org.apache.commons.httpclient.methods.PostMethod;
import org.apache.commons.httpclient.methods.RequestEntity;
/**
* This is a document push example.
*
* Steps as following; (1)construct HttpClient so that we can issue HTTP POST
* request later (2) set the credential using OmniFind API password (2)
* construct a PostMethod and set necessary request headers (4) set the request
* entity with the file content to be indexed (5) execute the PostMethod and
* check HTTP status (6) check an error
*/
public class PushPDFDocument {
// omnifind the REST end point URL for document push
final static String targetDocumentAPIEndpointURL = "http://localhost/api/document";
public static void main(String[] args) {
// construct HttpClient so that we issue HTTP POST request
HttpClient client = new HttpClient();
// set the credential using OmniFind API password
client.getState().setCredentials(new AuthScope("localhost", 80),
new UsernamePasswordCredentials("notused", "+UrSlFg="));
// construct a PostMethod and set necessary request headers
PostMethod method = new PostMethod(targetDocumentAPIEndpointURL);
method.setDoAuthentication(true);
method.setRequestHeader("action", "addDocument");
method.setRequestHeader("index", "Default");
method.setRequestHeader("docId", "pdfrepository://oye_api_guide_v84.pdf");
method.setRequestHeader("docType", "application/pdf");
try {
// set the request entity with the file content to be indexed
File input = new File("oye_api_guide_v84.pdf");
RequestEntity entity = new FileRequestEntity(input, "application/pdf");
method.setRequestEntity(entity);
// execute the post method and check HTTP status
int status = client.executeMethod(method);
if (status != HttpStatus.SC_OK) {
System.err.println("PushPDFDocument failed: " + method.getStatusLine());
System.exit(status);
}
// Note on the response from a document API call. The HTTP response
// code is always 200, with the exception of 401 for failed
// authentication. If an error occurs during the document insert
// process, the HTTP response code is still 200 and the HTTP
// response header contains a parameter of "hasError".
// Therefore, to do the proper handling of the document push result,
// you must check for the existence of the "hasError" response
// header or the HTTP 401 response code.
Header hasErrorHeader = method.getResponseHeader("hasError");
if (hasErrorHeader == null) {
System.out.println("document pushed fine.");
} else {
String errorResponse = method.getResponseBodyAsString();
System.out.println("error response: " + errorResponse);
}
} catch (HttpException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
// clean up the connection resources
method.releaseConnection();
}
}
} |
To add a PDF document, the program creates an HttpClient object and sets
the HTTP Basic Authentication userID/password credential by calling setCredentials() method.
The required parameters for the add document action are passed as HTTP request headers.
Finally, a PostMethod object is instantiated and executed. Before executing the PostMethod,
the actual PDF file is set as a HTTP POST body of the PostMethod.
Note on the response from a document push API call, the HTTP response
code is always 200, with the exception of 401 for failed authentication.
If an error occurs during the document insert process, the HTTP response
code is still 200 and the HTTP response header contains a parameter of
"hasError". Therefore, to do the proper handling of the document
push result, you must check for the existence of the "hasError"
response header or the HTTP 401 response code.
Although the document ID was set arbitrarily in the previous
program, it was not a good practice since the original document could not be fetched after the search. If the docID value is not a valid URL,
the document will not be a clickable result from the search
result page. For instance, after the above program was executed, I
searched for the document I just added. The search page displayed the
"pdfrepository://oye_api_guide_v84.pdf" link.
Since this docId was not a valid URL,
if clicked, an error page shows up. However, the original document
may still be viewed by clicking
the Cached link. For your programs, you need to set the docId
with a valid URL in order to retrieve the original document reliably.
Figure 4. Unclickable link
Note on adding documents through OmniFind document push APIs: The
programming guide says "Documents that are added into the
index by the addDocument API cannot be tracked in the Document Status
window in the administration console."
Use the document API to delete a document
To delete documents, use the OmniFind document API, which is also based on a
standard HTTP method like the search API. However, unlike the search API, the document
delete API can be accessed by the HTTP DELETE method. The document delete action requires the API password from the OmniFind admin console, as shown in Figure 3.
The following is a simple program that deletes a
PDF document from the Default OmniFind index. Like in search programs, we are using
Apache HttpClient library to issue an HTTP method. At this time, the Apache HttpClient library's DeleteMethod is being used instead of GetMethod. Because the document API requires an
API password, the HTTP basic authentication is being used with the API password. The user ID
of the basic authentication is not used and ignored. Therefore, it can be any value.
The document delete API requires three parameters: an action parameter, an index, and a document ID (docId). The value of the action parameter must be "deleteDocument".
The value of the index parameter must be "Default". The docId should
uniquely identify a particular document to be deleted from the Default index. If the given docId is not
found from the index, it is treated as a warning and the delete API does not
return an error.
Listing 6. Sample document delete program
import java.io.IOException;
import org.apache.commons.httpclient.Header;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.HttpStatus;
import org.apache.commons.httpclient.UsernamePasswordCredentials;
import org.apache.commons.httpclient.auth.AuthScope;
import org.apache.commons.httpclient.methods.DeleteMethod;
/**
* This is a document delete example.
*
* Steps as following; (1)construct HttpClient so that we can issue HTTP DELETE
* request later (2) set the credential using OmniFind API password (2)
* construct a DeleteMethod and set necessary request headers (4) execute the
* DeleteMethod and check HTTP status (5) check an error
*/
public class DeletePDFDocument {
// omnifind the REST end point URL for document push
final static String targetDocumentAPIEndpointURL = "http://localhost/api/document";
public static void main(String[] args) throws HttpException, IOException {
// construct HttpClient so that we issue HTTP POST request
HttpClient client = new HttpClient();
// set the credential using OmniFind API password
client.getState().setCredentials(new AuthScope("localhost", 80),
new UsernamePasswordCredentials("notused", "+UrSlFg="));
// construct a DeleteMethod and set necessary request headers
DeleteMethod method = new DeleteMethod(targetDocumentAPIEndpointURL);
method.setDoAuthentication(true);
method.setRequestHeader("action", "deleteDocument");
method.setRequestHeader("index", "Default");
// the docId of the document to be deleted
method.setRequestHeader("docId", "pdfrepository://oye_api_guide_v84.pdf");
try {
// execute the delete method
int status = client.executeMethod(method);
if (status != HttpStatus.SC_OK) {
System.err.println("DeletePDFDocument failed: "
+ method.getStatusLine());
System.exit(status);
}
// Note on the response from a document API call. The HTTP response
// code is always 200, with the exception of 401 for failed
// authentication. If an error occurs during the document insert
// process, the HTTP response code is still 200 and the HTTP
// response header contains a parameter of "hasError".
// Therefore, to do the proper handling of the document push result,
// you must check for the existence of the "hasError" response
// header or the HTTP 401 response code.
Header hasErrorHeader = method.getResponseHeader("hasError");
if (hasErrorHeader == null) {
System.out.println("document deleted.");
} else {
String errorResponse = method.getResponseBodyAsString();
System.out.println("error response: " + errorResponse);
}
} finally {
// clean up the connection resources
method.releaseConnection();
}
}
} |
Note: Deleting a document does not guarantee that the document is no longer searchable. The programming guide says, "The time that is required for the document to be no
longer searchable depends on the search server load at the time when the delete request
was issued."
Summary
In this article, several simple programs that utilize OmniFind search and
the document functionality were used. Because the OmniFind APIs are based on standard HTTP
methods, and there are many available open source libraries and tools, accessing
OmniFind functions do not require much efforts. The Apache HttpClient library
was used for handling HTTP GET/POST/DELETE methods. The search results in Atom feed format were able to be consumed with a couple of lines of codes because of the availability of
the open source ROME library. Download OmniFind Yahoo! Edition and try these techniques to enhance your own custom applications.
Download | Description | Name | Size | Download method |
|---|
| Sample programs for this article | ofsamples.zip | 980KB | HTTP |
|---|
Resources Learn
Get products and technologies
-
Download IBM Apache HTTP Client v3.1.Beta.
-
Download ROME, an open source library that helps you to work in Java with most syndication formats: RSS 0.90, RSS 0.91 Netscape, RSS 0.91 Userland, RSS 0.92, RSS 0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom 0.3, and Atom 1.0, to find out details on the OmniFind REST APIs.
-
Download IBM OmniFind Yahoo! edition.
-
Build your next development project with
IBM
trial software, available for download directly from developerWorks.
Discuss
About the author  | |  | Arthur Choi is currently serving as a technical sales consultant for the
Content Management & Discovery Center of Excellence in IBM's
Information Management Division. He is
responsible for providing services to IBM's customers by
deploying IBM's enterprise search solutions
into their enterprise computing environments. He has previously
worked on a variety of enterprise search related projects including
Lotus Extended Search and WebSphere Enterprise Search products. |
Rate this page
|  |