
- Home
- Magazine
- Conference & Seminars
- News
- Archives
- Forums
- Store
- Directory
- Editorial
- Advertising
- User/Login
- Contact



Volume Number: 15 (1999)
Issue Number: 12
Column Tag: Web Site Design
by Avi Rappoport
When you want to add search to your site, you may be have some technical difficulties. Perhaps your site is hosted on a large server somewhere, or you have an uncooperative web administrator, or the challenges of adding a CGI are too daunting. Never fear! You can outsource your search to a remote site search service and let someone else worry about the gory details.
The indexer and search engine run on the remote server: they will use a web indexing robot, or spider, to follow links on your site and read the pages, then store every word in the index file on that server. When it comes time to search, the form on your local Web page send a message to the remote search engine. Although it's going through the Web, process doesn't change - it just has to move a little farther. The remote search engine takes the search terms, matches the words in the index, sorts them according to relevance, and creates an HTML page with the results. When a searcher clicks on the result link, they will see the page from your site, just as though the search came from there. It's easy and painless for practically everyone.
This review covers the range of remote search services, their features and their drawbacks. It will teach you to prepare your site, try indexing it, test the search, customize the results, keep the search up to date, and choose the right program for your long-term needs.
The Tradeoffs
The following services are covered in this review, and also have pages and examples on this site.
Atomz <http://www.atomz.com/>
FreeFind <http://www.freefind.com/>
intraSearch (WhatUSeek) <http://www.whatUseek.com/intraSearch/>
MondoSearch (remote version) <http://www.mondosearch.com/>
PicoSearch <http://www.picosearch.com/>
PinPoint <http://pinpoint.netcreations.com/>
SearchButton <http://www.searchbutton.com/>
SiteMiner <http://www.siteminer.com/>
Webinator (remote version) <http://www.thunderstone.com/texis/indexsite>
Before you install any search engine with a indexing spider, you must make sure it can find the pages on your site. The good news is that cleaning up your links will improve your accessibility to the large public search engines (such as AltaVista, Google, HotBot and Infoseek), and make it easier for you to run an automated site mapper.
Robot Spider Compatibility
The indexing spiders follow links from a starting page, so use a home page if you have good text links, or a site map page.
Whole sites: Robots.txt
The first thing is to check the "robots.txt" file. This is a standard file for web servers that sits at the root of your site, and excludes robots that are not welcome on the site, or in certain specific directories (though this is voluntary). If you run your own server, you control this file: otherwise your host server administrator controls it.
You want to make sure that this file exists, and that it allows at least your indexing spider to access your directories. You may need to negotiate with your web hosting provider on this point, as this file must be stored in the root folder of the web host.
For more information on this topic, see Search Indexing Robots and Robots.txt: <http://www.searchtools.com/info/robots/robots-txt.html> and the WebMasters Guide to the Robots Exclusion Protocol at < http://info.webcrawler.com/mak/projects/robots/exclusion-admin.html>
Individual Pages: META ROBOTS tag
The other way that page designers can control robots and spiders is by using the META ROBOTS tags. These are particularly useful if you have a hosted site and don't want to bother your server administrator.
For example, if you have a directory listing or site map page, you can tell the spiders to follow the links but not index the text on the page by placing the following information into the HTML header: <meta name="robots" content="noindex,follow">. If you have pages with useful data but inappropriate links, such as a web calendar page with duplicate links to other calendar pages, use <meta name="robots" content="index,nofollow">.
For more information, see Search Indexing Robots and the Robots Meta Tags <http://www.searchtools.com/info/robots/robots-meta.html> and the Webmaster guide above.
Good Links and Bad Links
Indexing spiders tend to be pretty dumb. They know about the simple HREF links, but just get lost on anything more complex. Spiders and robots may not follow links in:
Check Your Links
To give yourself a spider-eye view, try a text browser such as Lynx, or a graphical browser with images and JavaScript turned off, and no Plug-Ins: this will give you a good view of what the spiders see.
Don't rely on your content-management system to check local links: it knows too much about the structure of your site and the special formats you use!
To make sure all your local links work, run a link-checking robot such as Big Brother for Mac & Unix <http://pauillac.inria.fr/%7Efpottier/bb.html.en>, or use a service such as NetMechanic <http://www.netmechanic.com/>. If these services can follow the links, there's a good chance that your search indexing robot can do the same.
Solution: Supplement Complex Links
If you find you have problems, there are two ways around bad links: both require work, but they will make the indexing spiders happy.
Five for the Price of One
The good news is that all this work will pay off in five ways:
Many of the search services require minimal commitment on your part. All you have to do is go to the service Web site, register with a user ID or email address and password, then give them the home page URL. The search service will send their indexing spider to follow links on your site very quickly, so try to do this during a quiet time.
Once you have signed up, you'll see all the setup and configuration options in the browser interface. Some are more elaborate than others: Atomz has a bunch of tabs and subpanes within the tabs, FreeFind has a nice Wizard interface. Webinator has a fairly elaborate mail-back access control: you must have an email address on the server to index that server.
If your server is slow, you are charged by the byte, or you have long files, choose a service that will do smart updating, and only get the contents of pages if they have changed.
If you have access to your web site log or monitor window, you can watch the spider as it follows links throughout your site. Otherwise, or in addition, choose a service that provides reports on the indexing process.
Remote search services provide almost-instant gratification: you can test them as soon as they're done with the indexing. Most of them have a test search form on their site: if not, copy their form to your local page and try it out.
Searching
There are two basic kinds of search queries: those which match pages on your site that contain every search term and those that match any search term, though they may not show you every matching page . A few will let searchers choose the best approach.

Figure 1. PicoSearch result showing all pages which have any of the search terms.

Figure 2. SiteMiner result for the same search, showing all three pages with all search terms.
If your site contains text from other languages, you need to watch out for letter matching issues. Some search engines can only match the 26 English characters, while others can match diacritical characters (such as î and á) and special characters (ø and ß). PicoSearch and MondoSearch also offer multilingual interfaces. Non-Roman scripts such as Arabic, Russian and Japanese are even harder, although PicoSearch offers results in Chinese.
Relevance Ranking
When you do a search, and the engine locates a set of pages that match your search, it has to sort them as best it can. This is particularly difficult with one and two word searches-it's hard to tell which is the most relevant page (the best match).
Like hairstyles and music, success in relevance ranking is a matter of taste. You should do a number of searches to see what you think of any search engine you choose. Try searches with just one word, others with two, and still others with four or five. This should give you a feeling for the kinds of relevance ranking that a search engine will do.
Search forms are the user interfaces to the search engine, so you can have several different forms, for your various needs.

Figure 3. MondoSearch Simple Search Form.
Each of the site search services provides an HTML or JavaScript search form for you to copy and paste to a page on your site. All you have to do is put the form into a page (you don't even have to post the page on your site at first). When you, or a searcher, types text into the field and clicks on the Search button, the browser recognizes the ACTION attribute of the FORM tag connects to the search server, and sends the form items, including the hidden site ID, so the server can tell which site you mean to search.
When the remote search server gets the form command, it looks in the index, matches the search words, and organizes the results. The URL of the results page is that of the search service, not of your server, because that's where the results page is coming from, but the URLs for the found pages themselves include your server name.
Note: SiteMiner only has a JavaScript search box: site visitors without JavaScript must follow a link to their site for searching. This limits your audience and makes it hard for people with old browsers, PDAs and other new client hardware, and those with impaired vision using speaking browsers
Everyone is familiar with webwide public search engines and their lists of results. A local search results page is very similar, although for the best user experience, the search results page should look and feel like the rest of your site. If you are using a remote search service as a permanent part of your site, be sure that you choose a service that lets you customize the page design enough for your comfort level.
Simple Customization features

Figure 4. FreeFind Results Page Options Wizard.
Page Design
Some services let you lay out your results page, including the page sections above, to the left, to the right, and below the results list. This allows you to include your normal navigation and site structure links, showing searchers more about the scope of your site. This usually includes fields for you to paste in your HTML code, and you will probably have to try this a couple of times to get the right relation ship with the results list, so this is only accessible to those who have some HTML tag experience.

Figure 5. Webinator field to insert page header in HTML.
Advertising
Several of the free site search services will display banner advertising on the search results, although none of the paid versions will do so. For many sites, it's a fair trade for searching services, but for others, such as libraries and public schools, advertising is inappropriate, so they should choose a version without advertising.

Figure 6. PinPoint default result page, showing banner advertising.
As with the results page, the list of pages which match the search is familiar. Some search engines let you customize the elements of the items on this list, which lets you match the layout to the data you have. For example, some sites have useful URLs which give some context to the page, while others are just confusing.
Other features may include

Figure 7. IntraSearch result showing items with URL, size and update date.
If you have carefully written META DESCRIPTION tag contents for each of your pages, so they'll rank well and look great in webwide search engines, you will probably want your site search to display them as well. Be sure to choose a remote search service that will show these.
Otherwise, some services do a good job of extracting useful text, while others just grab the text from the top of the page.

Figure 8. SearchButton result showing selected text extracted from pages.
Some services extract lines containing the search terms and/or highlight the words which match the search terms.

Figure 9. Atomz result showing items with top text and matching text.
Although the remote search service is taking care of the server-side of things, you still have to keep track of the status, even if it's just to make sure it's still running, although these services have been fairly reliable so far. You should also perform test searches, some that you do every time, others that check new information on your site. And, as you change the layout and design of your site, make sure that the search form and results page reflect these changes.
Updating the Index
To keep your search index synchronized with the content on your site, you'll need to set up some kind of update schedule. If your site changes rarely, you can tell the service when to re-index. However, if your site changes more often, you will want to set up a scheduled update.
Watching the Searches
Analyzing your search log or report can teach you what your visitors are looking for - it's like having a free, automated market research survey. For example, if you have a movie site and everyone starts searching for the Blair Witch Project, you know it's hot, and can make sure you have good information so they don't go somewhere else.

Figure 10. SearchButton Report Options.
Read through the listings above, and try out the search engines in the SearchTools search page <http://www.searchtools.com/search/>. Think about which of the features we describe is vital, and which you can live without (it's like buying a car). Then try out two or three that have the most important elements, and see how well they fit with your site.
Atomz
FreeFind
intraSearch (WhatUSeek)
MondoSearch

Figure 11. MondoSearch Category Results.
PicoSearch
PinPoint
SearchButton
SiteMiner
Webinator
As you can see, there's no one search engine that has all the advantages. Which one you should choose depends on your site, and your particular needs. You won't know what you like until you take a couple of test drives!
Avi Rappoport is the Principal Consultant for Search Tools Consulting, specializing in Web Site, Intranet and Portal search engines. She reports and analyzes the industry for SearchTools.com (which runs on a PowerMac 6100). You can contact her at consult@searchtools.com.
Disclaimer: Search Tools Consulting has consulting relationships with MondoSearch and SearchButton, but we do not allow our customers to influence our reviews.




