 | Level: Intermediate M. Tim Jones (mtj@mtjones.com), Consultant Engineer, Emulex
14 Nov 2006 Web spiders are software agents that traverse the Internet
gathering, filtering, and potentially aggregating information for a user. Using
common scripting languages and their collection of Web modules, you can easily
develop Web spiders. This article shows you how to build spiders and scrapers for
Linux® to crawl a Web site and gather information, stock data, in this case
A spider is a program that crawls the Internet in a specific way for a
specific purpose. The purpose could be to gather information or to understand the
structure and validity of a Web site. Spiders are the basis for modern search
engines, such as Google and AltaVista. These spiders automatically retrieve data
from the Web and pass it on to other applications that index the contents of the
Web site for the best set of search terms.
 |
Web spiders as agents
Web spiders and scrapers are simply another form of software robot or
agent (as coined by Alan Kay in the early 1980s). Alan's idea of an agent
was as a proxy for the user in the computer's world. The agent could be given a
goal and work towards that goal in its domain. If it got stuck, it could request
advice from the user and continue on to fulfill its goal.
Today, agents are classified with attributes such as autonomy, adaptiveness,
communication, and collaboration with other agents. Other attributes, such as
agent mobility and even personality, are goals of agent research today. The Web
spiders in this article are classified as Task-Specific Agents in the
agent taxonomy.
|
|
Similar to a spider, but with more interesting legal questions, is the Web
scraper. A scraper is a type of spider that targets specific content from
the Web, such as the cost of products or services. One use of the scraper is for
competitive pricing, to identify the price of a given product to tailor your price
or advertise it accordingly. A scraper can also aggregate data from a number of
Web sources and provide that information to a user.
Biological motivation
When you think of a spider in nature, you think of it in its interactions with an
environment, not in isolation. The spider sees and feels its way around, moving
from one place to another in a meaningful way. Web spiders operate in a similar
way. A Web spider is a program written in a high-level language. It interacts with
its environment through the use of networking protocols, such as the Hypertext
Transfer Protocol (HTTP) for the Web. If your spider wants to communicate with
you, it can use the Simple Mail Transfer Protocol (SMTP) to send an e-mail
message.
Spiders aren't limited to HTTP or SMTP, though. Some spiders use Web services,
such as SOAP or the Extensible Markup Language Remote Procedure Call (XML-RPC)
protocol. Other spiders scour newsgroups with the Network News Transfer Protocol
(NNTP) or look for interesting news items in Really Simple Syndication (RSS)
feeds. While most spiders in nature can see only light-dark intensity and movement
changes, Web spiders can see and feel using many types of protocols.
Applications of spiders and scrapers
 |
Spider's eyes and legs
The Web spider's primary means of looking and moving around the Internet is
HTTP. HTTP is a message-oriented protocol where a client connects to a server
and issues requests. The server provides a response. Each request and response
is made up of a header and a body, with the header providing status information
and a description of the contents of the body.
HTTP provides three basic types of requests. The first is
HEAD, which requests information about an asset at
the server. The second is GET, which requests an
asset, such as a file or an image. Finally, the POST
request allows the client to interact with the server through a Web page
(commonly through a Web form).
|
|
Web spiders and scrapers are useful applications, and, therefore, you can find a
variety of different types in use for both good and evil. Let's look at some of
the applications that use these technologies.
Search engine Web crawlers
Web spiders make searching the Internet easy and efficient. A search engine uses
many Web spiders to crawl the Web pages on the Internet, return their content, and
index it. After this is done, the search engine can quickly search the local index
to identify the most applicable results for your search. Google also uses the
PageRank algorithm, where a Web page's rank in the search results is
based on how many other pages link to it. This serves as a vote, where pages with
the highest votes get the highest rank in the results.
Searching the Internet like this can be expensive, both in terms of the
bandwidth required to communicate Web content to the indexer and also the
computational expense of indexing the results. Lots of storage is required for
this, but apparently it isn't a problem when you consider Google offers 1,000
megabytes of storage for Gmail users.
Web spiders minimize the drain on the Internet using a set of policies. To give
you some idea of the scope of the challenge, Google indexes more than eight
billion Web pages. The behavior policies define which pages the crawler will bring
down to the indexer, how often to go back to a Web site to check it again, and
something called a politeness policy. Web servers can exclude crawlers
using a file called robot.txt that tells the crawler what can and can't be
crawled.
Corporate Web crawlers
Like the standard search engine spider, the corporate Web spider indexes content
that is not available to the general public. For example, companies commonly have
internal Web sites that are used by employees. This type of spider is constrained
to the local environment. Because its search is restricted, there is usually more
computing power available, and specialized and more complete indexes are possible.
Google has taken this one step further by providing a desktop search engine to
index the content of your personal computer.
Specialized crawlers
There are also a number of non-traditional uses for crawlers, such as archiving
content or generating statistics. An archiving crawler simply crawls a Web site
pulling content locally to be stored on a long-term storage medium. This can be
used for backup or, in more grand cases, to take a snapshot of the content of the
Internet. Statistics can be useful in understanding the content of the Internet or
the lack thereof. Crawlers can be used to identify how many Web servers are
running, how many Web servers of a given type are running, the number of Web pages
that are available, and even the number of broken links (those that return the
HTTP 404 error, page not found).
Other useful specialized crawlers include Web site checkers. These crawlers look
for missing content, validate all links, and ensure that your Hypertext Markup
Language (HTML) is valid.
E-mail harvesting crawlers
Now to the dark side. It's unfortunate, but a few bad apples can ruin the
Internet for the rest of us. E-mail harvesting crawlers search Web sites for
e-mail addresses that are then used to generate the mass of spam that we all deal
with each day. Postini reports that, as of August 2005, 70% of all e-mail messages
processed for Postini users is unwanted spam.
E-mail harvesting can be one of the easiest crawling activities, as you'll see
in the final crawler example in this article.
Now that we've looked at some of the basics of Web spiders and scrapers, the
next four examples show how easily you can build spiders and scrapers for Linux
with modern scripting languages, such as Ruby and Python.
Example 1: Simple scraper
This example shows you how to figure out what kind of Web server is being run for
a given Web site. This can be interesting and, if done on a large enough sample,
can provide some intriguing statistics on the penetration of Web servers in
government, academia, and industry.
Listing 1 shows a Ruby script that scrapes a Web site to identify the HTTP
server. The Net::HTTP class implements an HTTP client
and the GET, HEAD, and
POST HTTP methods. Whenever you make a request to an
HTTP server, part of the HTTP message response indicates the server from which the
content is served. Rather than download a page from the site, I simply use the
HEAD method to get information about the root page
('/'). As long as the HTTP server responds with success (indicated by a "200"
response code), I iterate through each line of the response searching for the
server key, and, if found, I print the value. The value
for this key is a string representing the HTTP server.
Listing 1. Ruby script for simple metadata scraping (srvinfo.rb)
#!/usr/local/bin/ruby
require 'net/http'
# Get the first argument from the command-line (the URL)
url = ARGV[0]
begin
# Create a new HTTP connection
httpCon = Net::HTTP.new( url, 80 )
# Perform a HEAD request
resp, data = httpCon.head( "/", nil )
# If it succeeded (200 is success)
if resp.code == "200" then
# Iterate through the response hash
resp.each {|key,val|
# If the key is the server, print the value
if key == "server" then
print " The server at "+url+" is "+val+"\n"
end
}
end
end
|
In addition to showing how to use the srvinfo script, Listing 2 shows some
results from a number of government, academic, and business Web sites. There is
quite a bit of diversity, from Apache (68% penetration) to Sun and
Microsoft® Internet Information Services (IIS). You can also see a case
where the server is not reported. It's fun to note that the Federated States of
Micronesia is running an old version of Apache (time to update), and Apache.org is
on the bleeding edge.
Listing 2. Example usage of the server scraper
[mtj@camus]$ ./srvrinfo.rb www.whitehouse.gov
The server at www.whitehouse.gov is Apache
[mtj@camus]$ ./srvrinfo.rb www.cisco.com
The server at www.cisco.com is Apache/2.0 (Unix)
[mtj@camus]$ ./srvrinfo.rb www.gov.ru
The server at www.gov.ru is Apache/1.3.29 (Unix)
[mtj@camus]$ ./srvrinfo.rb www.gov.cn
[mtj@camus]$ ./srvrinfo.rb www.kantei.go.jp
The server at www.kantei.go.jp is Apache
[mtj@camus]$ ./srvrinfo.rb www.pmo.gov.to
The server at www.pmo.gov.to is Apache/2.0.46 (Red Hat Linux)
[mtj@camus]$ ./srvrinfo.rb www.mozambique.mz
The server at www.mozambique.mz is Apache/1.3.27
(Unix) PHP/3.0.18 PHP/4.2.3
[mtj@camus]$ ./srvrinfo.rb www.cisco.com
The server at www.cisco.com is Apache/1.0 (Unix)
[mtj@camus]$ ./srvrinfo.rb www.mit.edu
The server at www.mit.edu is MIT Web Server Apache/1.3.26 Mark/1.5
(Unix) mod_ssl/2.8.9 OpenSSL/0.9.7c
[mtj@camus]$ ./srvrinfo.rb www.stanford.edu
The server at www.stanford.edu is Apache/2.0.54 (Debian GNU/Linux)
mod_fastcgi/2.4.2 mod_ssl/2.0.54 OpenSSL/0.9.7e WebAuth/3.2.8
[mtj@camus]$ ./srvrinfo.rb www.fsmgov.org
The server at www.fsmgov.org is Apache/1.3.27 (Unix) PHP/4.3.1
[mtj@camus]$ ./srvrinfo.rb www.csuchico.edu
The server at www.csuchico.edu is Sun-ONE-Web-Server/6.1
[mtj@camus]$ ./srvrinfo.rb www.sun.com
The server at www.sun.com is Sun Java System Web Server 6.1
[mtj@camus]$ ./srvrinfo.rb www.microsoft.com
The server at www.microsoft.com is Microsoft-IIS/6.0
[mtj@camus]$ ./srvrinfo.rb www.apache.org
The server at www.apache.org is Apache/2.2.3 (Unix)
mod_ssl/2.2.3 OpenSSL/0.9.7g
|
That's useful data, and it's interesting to see what governments and academic
institutions use for their Web servers. The next example shows something a little
more useful, a stock quote scraper.
Example 2: Stock quote scraper
In this example, I build a simple Web scraper (also called a screen
scraper) to collect stock quote information. I do this in a brute-force way by
exploiting a pattern in the response Web page, like so:
Listing 3. A simple Web scraper for stock quotes
#!/usr/local/bin/ruby
require 'net/http'
host = "www.smartmoney.com"
link = "/eqsnaps/index.cfm?story=snapshot&symbol="+ARGV[0]
begin
# Create a new HTTP connection
httpCon = Net::HTTP.new( host, 80 )
# Perform a HEAD request
resp = httpCon.get( link, nil )
stroffset = resp.body =~ /class="price">/
subset = resp.body.slice(stroffset+14, 10)
limit = subset.index('<')
print ARGV[0] + " current stock price " + subset[0..limit-1] +
" (from stockmoney.com)\n"
end
|
In this Ruby script, I open an HTTP client connect to a server (in this case,
www.smartmoney.com) and build a link that specifically requests a stock quote as
passed in by the user (via
&symbol=<symbol>). I request this
link using the HTTP GET method (to retrieve the full
response page) and then search for
class="price">, which is immediately followed by
the stock's current price. This is cut out of the Web page and then displayed for
the user.
To use the stock quote scraper, I simply invoke the script with the stock symbol
of interest, as shown in Listing 4.
Listing 4. Example usage of the stock quote scraper
[mtj@camus]$ ./stockprice.rb ibm
ibm current stock price 79.28 (from stockmoney.com)
[mtj@camus]$ ./stockprice.rb intl
intl current stock price 21.69 (from stockmoney.com)
[mtj@camus]$ ./stockprice.rb nt
nt current stock price 2.07 (from stockmoney.com)
[mtj@camus]$
|
Example 3: Communicating stock quote scraper
The Web scraper for stock quotes shown in Example 2 was engaging, but it would be
really useful to have this scraper routinely monitor the stock price and let you
know if your favorite stock has risen above a certain value or dropped below
another. Your wait is over. In Listing 5, I update the simple Web scraper to
routinely monitor the stock and send an e-mail message when the stock has moved
outside of a defined price range.
Listing 5. Stock scraper that can send an e-mail alert
#!/usr/local/bin/ruby
require 'net/http'
require 'net/smtp'
#
# Given a web-site and link, return the stock price
#
def getStockQuote(host, link)
# Create a new HTTP connection
httpCon = Net::HTTP.new( host, 80 )
# Perform a HEAD request
resp = httpCon.get( link, nil )
stroffset = resp.body =~ /class="price">/
subset = resp.body.slice(stroffset+14, 10)
limit = subset.index('<')
return subset[0..limit-1].to_f
end
#
# Send a message (msg) to a user.
# Note: assumes the SMTP server is on the same host.
#
def sendStockAlert( user, msg )
lmsg = [ "Subject: Stock Alert\n", "\n", msg ]
Net::SMTP.start('localhost') do |smtp|
smtp.sendmail( lmsg, "rubystockmonitor@localhost.localdomain", [user] )
end
end
#
# Our main program, checks the stock within the price band every two
# minutes, emails and exits if the stock price strays from the band.
#
# Usage: ./monitor_sp.rb <symbol> <high> <low> <email_address>
#
begin
host = "www.smartmoney.com"
link = "/eqsnaps/index.cfm?story=snapshot&symbol="+ARGV[0]
user = ARGV[3]
high = ARGV[1].to_f
low = ARGV[2].to_f
while 1
price = getStockQuote(host, link)
print "current price ", price, "\n"
if (price > high) || (price < low) then
if (price > high) then
msg = "Stock "+ARGV[0]+" has exceeded the price of "+high.to_s+
"\n"+host+link+"\n"
end
if (price < low) then
msg = "Stock "+ARGV[0]+" has fallen below the price of "+low.to_s+
"\n"+host+link+"\n"
end
sendStockAlert( user, msg )
exit
end
sleep 120
end
end
|
This Ruby script is a bit longer, but it builds on the existing stock scraping
script from Listing 3. A new function,
getStockQuote, encapsulates the stock scraping
function. Another function, sendStockAlert, sends a
message to an e-mail address (both are user-defined). The main program is nothing
more than a loop to get the current stock price, check to see if it's in band,
and, if not, send an e-mail alert to the user. I also delay between checking the
stock price because I'm polite and don't want to overload the server.
Listing 6 is a sample invocation of the stock monitor with a popular technology
stock. Every two minutes, the stock is checked and printed out. When the stock
exceeds the high limit, an e-mail alert is sent and the script exits.
Listing 6. Stock monitor script demonstration
[mtj@camus]$ ./monitor_sp.rb ibm 83.00 75.00 mtj@mtjones.com
current price 82.06
current price 82.32
current price 82.75
current price 83.36
|
The resulting e-mail is shown in Figure 1, complete with a link to the source of
the scraped data.
Figure 1. E-mail alert
sent by the Ruby script in Listing 5
Now I'll leave scrapers and dig into the construction of a Web spider.
Example 4: Web site crawler
In this final example, I explore a Web spider that crawls a Web site. For
safety, I avoid straying outside of the site, but instead simply dig down into a
single Web page.
To crawl a Web site and follow the links that are provided within it, you must
parse HTML pages. If you can successfully parse a Web page, you can identify links
to other resources. Some specify local resources (files), but others represent
non-local resources (such as links to other Web pages).
To crawl the Web, you start with a given Web page, identify all of the links
that are on that page, queue them to a to-visit queue, and then repeat this
process using the first item from the to-visit queue. This results in
breadth-first traversal (compared to digging down into the first link found, which
would result in depth-first behavior).
If you avoid non-local links and dig down only into local Web pages, you provide
a Web crawler for a single Web site, as shown in Listing 7. In this case, I switch
from Ruby to Python to take advantage of Python's useful
HTMLParser class.
Listing 7. Simple Python Web site crawler (minispider.py)
#!/usr/local/bin/python
import httplib
import sys
import re
from HTMLParser import HTMLParser
class miniHTMLParser( HTMLParser ):
viewedQueue = []
instQueue = []
def get_next_link( self ):
if self.instQueue == []:
return ''
else:
return self.instQueue.pop(0)
def gethtmlfile( self, site, page ):
try:
httpconn = httplib.HTTPConnection(site)
httpconn.request("GET", page)
resp = httpconn.getresponse()
resppage = resp.read()
except:
resppage = ""
return resppage
def handle_starttag( self, tag, attrs ):
if tag == 'a':
newstr = str(attrs[0][1])
if re.search('http', newstr) == None:
if re.search('mailto', newstr) == None:
if re.search('htm', newstr) != None:
if (newstr in self.viewedQueue) == False:
print " adding", newstr
self.instQueue.append( newstr )
self.viewedQueue.append( newstr )
else:
print " ignoring", newstr
else:
print " ignoring", newstr
else:
print " ignoring", newstr
def main():
if sys.argv[1] == '':
print "usage is ./minispider.py site link"
sys.exit(2)
mySpider = miniHTMLParser()
link = sys.argv[2]
while link != '':
print "\nChecking link ", link
# Get the file from the site and link
retfile = mySpider.gethtmlfile( sys.argv[1], link )
# Feed the file into the HTML parser
mySpider.feed(retfile)
# Search the retfile here
# Get the next link in level traversal order
link = mySpider.get_next_link()
mySpider.close()
print "\ndone\n"
if __name__ == "__main__":
main()
|
The basic design of this crawler is to load the first link to check onto a
queue. This queue serves as the next-to-interrogate queue. As a link is checked,
any new links that are found are loaded onto the same queue. This provides a
breadth-first search. I also maintain an already-viewed queue and avoid digging
into any link that I've seen in the past. That's pretty much it, with much of the
real work being done by the HTML parser.
First, I derive a new class, called miniHTMLParser,
from Python's HTMLParser class. The class does a few
things. First, it's my HTML parser, with a callback method
(handle_starttag) whenever a start HTML tag is
encountered. I also use the class to access links encountered in the crawl
(get_next_link) and to retrieve the file represented by
the link (in this case, an HTML file).
Two instance variables are contained within the class,
viewedQueue, which contains the links that have been
investigated thus far, and instQueue, which represents
the links that are yet to be interrogated.
As you can see, the class methods are simple. The
get_next_link method checks to see if the
instQueue is empty and returns ''. Otherwise, the next
item is returned via the pop method. The
gethtmlfile method uses
HTTPConnectionK to connect to a site and return the
contents of the defined page. Finally, handle_starttag
is called for every start tag in a Web page (that is fed into the HTML parser via
the feed method). In this function, I check to see if
the link is a non-local link (if it contains http), if it is an e-mail address
(via mailto), and also if the link contains 'htm', indicating (with high
probability) that it's a Web page. I also check to make sure that I haven't
visited it before, and, if not, the link is loaded into my interrogate and viewed
queues.
The main method is simple. I create a new
miniHTMLParser instance and start with the user-defined
site (argv[1]) and link
(argv[2]). I grab the contents of the link, feed it
into the HTML parser, and grab the next link to visit, if one exists. The loop
then continues while there are links remaining to visit.
To invoke the Web spider, you provide a Web site address and a link:
./minispider.py www.fsf.org /
In this case, I'm requesting the root file from the Free Software Foundation.
This command results in Listing 8. You can see the new links that are added to the
interrogation queue and those that are ignored, such as non-local links. At the
bottom of the listing, you can see the interrogation of the links found in the
root.
Listing 8. Output from the minispider script
[mtj@camus]$ ./minispider.py www.fsf.org /
Checking link /
ignoring hiddenStructure
ignoring http://www.fsf.org
ignoring http://www.fsf.org
ignoring http://www.fsf.org/news
ignoring http://www.fsf.org/events
ignoring http://www.fsf.org/campaigns
ignoring http://www.fsf.org/resources
ignoring http://www.fsf.org/donate
ignoring http://www.fsf.org/associate
ignoring http://www.fsf.org/licensing
ignoring http://www.fsf.org/blogs
ignoring http://www.fsf.org/about
ignoring https://www.fsf.org/login_form
ignoring http://www.fsf.org/join_form
ignoring http://www.fsf.org/news/fs-award-2005.html
ignoring http://www.fsf.org/news/fsfsysadmin.html
ignoring http://www.fsf.org/news/digital-communities.html
ignoring http://www.fsf.org/news/patents-defeated.html
ignoring /news/RSS
ignoring http://www.fsf.org/news
ignoring http://www.fsf.org/blogs/rms/entry-20050802.html
ignoring http://www.fsf.org/blogs/rms/entry-20050712.html
ignoring http://www.fsf.org/blogs/rms/entry-20050601.html
ignoring http://www.fsf.org/blogs/rms/entry-20050526.html
ignoring http://www.fsf.org/blogs/rms/entry-20050513.html
ignoring http://www.fsf.org/index_html/SimpleBlogFullSearch
ignoring documentContent
ignoring http://www.fsf.org/index_html/sendto_form
ignoring javascript:this.print();
adding licensing/essays/free-sw.html
ignoring /licensing/essays
ignoring http://www.gnu.org/philosophy
ignoring http://www.freesoftwaremagazine.com
ignoring donate
ignoring join_form
adding associate/index_html
ignoring http://order.fsf.org
adding donate/patron/index_html
adding campaigns/priority.html
ignoring http://r300.sf.net/
ignoring http://developer.classpath.org/mediation/OpenOffice2GCJ4
ignoring http://gcc.gnu.org/java/index.html
ignoring http://www.gnu.org/software/classpath/
ignoring http://gplflash.sourceforge.net/
ignoring campaigns
adding campaigns/broadcast-flag.html
ignoring http://www.gnu.org
ignoring /fsf/licensing
ignoring http://directory.fsf.org
ignoring http://savannah.gnu.org
ignoring mailto:webmaster@fsf.org
ignoring http://www.fsf.org/Members/root
ignoring http://www.plonesolutions.com
ignoring http://www.enfoldtechnology.com
ignoring http://blacktar.com
ignoring http://plone.org
ignoring http://www.section508.gov
ignoring http://www.w3.org/WAI/WCAG1AA-Conformance
ignoring http://validator.w3.org/check/referer
ignoring http://jigsaw.w3.org/css-validator/check/referer
ignoring http://plone.org/browsersupport
Checking link licensing/essays/free-sw.html
ignoring mailto:webmaster
Checking link associate/index_html
ignoring mailto:webmaster
Checking link donate/patron/index_html
ignoring mailto:webmaster
Checking link campaigns/priority.html
ignoring mailto:webmaster
Checking link campaigns/broadcast-flag.html
ignoring mailto:webmaster
done
[mtj@camus]$
|
This example demonstrates the crawling phase of a Web spider. After a file is
read by the client, the page could also be scanned for content, as in the case of
an indexer.
Linux spidering tools
You've now seen how to implement a couple of scrapers and a spider. Linux tools
that can also provide this functionality for you.
The wget command, which stands for Web get, is a
useful command for recursively working through a Web site and grabbing content of
interest. You can specify a Web site, content that you're interested in, and some
other administrative options. The command then sucks down the files to your local
host. For example, the following command will connect to your defined URL and
recursively walk down no more than three levels and grab any file with an
extension of mp3, mpg, mpeg, or avi.
wget -A mp3,mpg,mpeg,avi -r -l 3 http://<some URL>
The curl command operates in a similar way. Its
advantage is that it's actively developed. Other similar commands that you can use
are snarf, fget, and
fetch.
Legal issues
There have been lawsuits for data mining on the Internet using Web spiders, and
they've not gone well. Farechase, Inc. was recently sued by American Airlines for
screen scraping (done in real-time). The lawsuit first claimed that gathering the
data violated American Airlines' users' agreement (found under Terms and
Conditions). When that wasn't successful, American Airlines claimed a form of
trespass, which was successful. Other lawsuits claim that the bandwidth taken by
the spiders and scrapers detracts from legitimate users. All are valid claims and
make politeness policies all the more important. See the Resources section for more information.
Going further
Crawling and scraping the Web can be fun and, for some, extremely profitable.
But, as previously discussed, there are legal issues. When spidering or scraping,
always obey the robots.txt file available on the server and incorporate it into
your politeness policy. Newer protocols, such as SOAP, make spidering much easier
and less intrusive to normal Web operations. Future endeavors, such as the
semantic Web, will make spidering even simpler, so the solutions and methods of
spidering will continue to grow.
Resources Learn
Get products and technologies
- Searchtools.com's Source Code for Web
Robot Spiders provides source code for free open source robots in several
languages for a number of tasks.
-
Order the SEK for Linux, a two-DVD set containing the latest IBM trial
software for Linux from DB2®, Lotus®, Rational®, Tivoli®,
and WebSphere®.
- With IBM trial software, available for download directly from developerWorks,
build your next development project on Linux.
Discuss
About the author  | |  | M. Tim Jones is an embedded software engineer and the author of GNU/Linux Application Programming, AI Application Programming (now in its second edition), and BSD Sockets Programming from a Multilanguage Perspective.
His engineering background ranges from the development of kernels for geosynchronous spacecraft to embedded systems architecture and networking protocols development. Tim is a Consultant Engineer for Emulex Corp. in Longmont, Colorado. |
Rate this page
|  |