With a half-dozen or so large general Web search engines, it is far from easy to remember which one provides which features and how those features are implemented. Add to that the fact that several search engines have both simple and advanced modes (often significantly different) and the searcher's memory becomes more heavily taxed.
This chart will lay out cues to help a searcher use most of the various features without having to refer to the sometimes weak online documentation provided by the search engines. It will cover the more significant features that are common to at least a couple of the search engines and identify the more outstanding search-related features that are unique to particular engines. To keep the chart to a manageable size, not all nuances will be mentioned. Enough detail should be covered so that a typical search can be structured using the features provided in the chart.
Since Web search engines tend to succumb to the bandwagon effect, most of them now have both a simple, or "home page," version and an advanced version. Because the capabilities and implementation of features for the two versions often differ considerably, each version is given its own column in this chart.
To make the chart easier to use, and to avoid long entries in the cells, a general explanation of the entries and some special notes will be covered later. These notes do not attempt to provide behind the scenes details, rather, the intent is to provide a practical quick guide.
The chart includes the larger search engines (AltaVista, HotBot, Northern Light, Excite, Infoseek, and Lycos), which the serious searcher is likely to use at least occasionally. WebCrawler is included partly out of respect for its age and its early contribution. It is also included because it is still widely used by casual searchers who may come to the more frequent searcher with questions about it.
Most of the entries in the top half of the chart indicate the operator, syntax, or prefix that the searcher is required to enter (e.g., AND, +term, " ", title:) in order to perform a search. An entry, such as title:term, indicates that the searcher should enter the prefix "title:" followed by the term (word or phrase) to be searched for, e.g., title:andromeda.
In many cases, on a search engine's search page, the options available to searchers are presented by means of a pull-down window (or radio buttons, etc.) rather than by typing their choice in a text box. This is indicated on the chart by the "(window)" designation or the appropriate variant. In some cases, parentheses are used on the chart where a clarification seemed advisable.
In the Boolean portion of the chart, the term "(default)" is used to designate which operator takes effect when the searcher does not designate otherwise.
Where a place on the chart merely needs to indicate whether a particular option is available (as in the case of parentheses), a "yes" is used if the feature is available. A blank implies a "no." This convention has been found to make it easier to glance at the chart and see the availability while at the same time presenting a less cluttered chart.
Search Engine Size The "size" stated by search engine producers conventionally refers to the number of unique Web pages (unique URLs), rather than "sites" (which may contain numerous "pages.") The numbers shown here have either been published or were obtained directly from the producer. The kinds of pages that are counted or not counted varies and the numbers alone do not necessarily reflect the whole "size" picture. In Lycos, for example, the 35 million does not include the personal home pages or its pictures and sounds databases. The Northern Light figure does not count the "Special Collection" documents. (For an excellent discussion of sizes of Web search engines, see: S. R. Lawrence and C. L. Giles, "Searching the World Wide Web." Science 280 (April 1998): pp. 98-100.)
In general, there are two levels of Boolean capability among these search engines. The "simplified" form uses a plus sign in front of a term to indicate that records should only be retrieved if that term is present. At this simplified level, a "NOT" is achieved by means of a minus sign in front of a term. Usually the use of this simplified form of Boolean does not override the relevance ranking algorithm as does the use of AND, OR, and NOT (or AND NOT).
To enable searchers to use the full Boolean capabilities familiar in traditional online services, the engine must provide the equivalents of AND, OR, and NOT, plus the capability of nesting (the use of parentheses). Either out of perversity or for reasons not clear, some engines don't use the plain NOT to exclude a term, but insist on using AND NOT. I am sure some programmer or theoretician somewhere will be glad to explain this, but it seems that in these cases, a programmer prevailed over the "user-friendly" advocate.
Some engines require that capital letters be used for Boolean operators, some do not. In all engines that use these Boolean connectors, the capitalized form will work. For simplicity, therefore, the capitalized form is shown on the chart. It seems easier to always stick to caps rather than to try to remember which engines requires caps. Also, if you use "copy and paste" to move between engines, use of caps makes for greater cross-engine compatibility.
In almost all cases, a phrase can be indicated by putting the phrase in quotes ("") in the query box. In some cases, a phrase can be designated by choosing the phrase option from a pull-down window.
Phrase searching is, of course, one form of proximity searching. The next most common proximity option is NEAR, which specifies "within 10 words" in AltaVista and "within 25 words" in Lycos PRO. The latter also allows NEAR/n, where "n" is a user-specified maximum distance, e.g., NEAR/5. (This is comparable to the (N) and (N/n) connectors on Dialog.) Lycos PRO also provides BEFORE and BEFORE/n, as well as other variations. For more detail on these, see the online documentation.
If a truncation or stemming feature is available, the appropriate symbol is shown.
Where these fields are searchable, the appropriate prefix is shown, or an indication is given that the searcher uses a pull-down window or text boxes. For prefixes, the searcher should enter the prefix shown followed by the term to be searched--for example, title:lupus.
This refers to the capability of identifying which pages in the search engine's database contain a link to a particular URL. This is somewhat analogous to "citation searching," and enables the searcher to identify sites that have some interest in the site referred to.
Entries here indicate whether one can search by the language in which the Web page is written.
This refers to the capability for searching by type of media--images, audio files, and video files. The implementation is quite different among the search engines that provide this. In AltaVista, you can search for a word in an image file's name (or use the special "Photo Finder" database). In HotBot, you perform a subject search, but specify that you want only records that also contain an image, sound, or video file. With Infoseek, the "alt:" prefix allows searching an image's "alternate text" tag. Lycos' home page version provides a separate "Pictures and Sounds" database to search, but in Lycos Pro Search, radio buttons are used to specify image or sound files.
The entries here refer to whether the search engine claims to be able to identify proper names--persons or otherwise. This actually boils down to either limiting retrieval to instances in a page where each word appears in its capitalized form and/or automatically allowing for the inverted form of a word pair.
Some search engines can identify upper and lower case letters. This is important in instances when "AIDS" needs to be distinguished from "aids." In general, when a query is entered using all lower case, the search engine will retrieve both lower and upper case. When upper case is entered by the searcher, the engines will return only those records with an exact case match. For example, "next" will retrieve "next" and "neXt," whereas "neXt" will only retrieve "neXt."
This refers to whether literally all words are indexed and searchable. Critical not only when one wants to search for "The Who," but when one needs to search for any phrase containing a very common word.
This is an indication of whether a Web directory is included as a part of the search engine's search page. (In some cases, the directory may be embedded in a "channel" option.)
Being told the number of items retrieved might seem to be something we could take for granted, but not so for Lycos.
Some engines tell not only the overall answer, but also the retrieval count for the individual terms searched (as with traditional online services.)
This line indicates which format options are available, whether the user can specify the number of records on each results page, and also if results can be "grouped or ungrouped" by Web site.
For some search engines, you are given the option, when you see a record you like, of having the engine find other records that are similar to that record.
What is listed here are additional features provided by the search engine that should be of interest to the serious searcher. The choice of what is included in this line is admittedly somewhat subjective. Features listed here are ones directly related to performing a search on the search engine's Web database. Additional features (or "add-ons"), such as company directories, free email, weather reports, etc., are ignored.
As we all know, Web search engines are changing constantly. The changes are often superficial and cosmetic, but in the last year, we have also seen a number of substantive and welcome additions to functionality. With luck, in a year or so we'll be able to fill in even more of the blocks on the chart and maybe even add a line or two.
Communications to the author should be addressed to Randolph Hock, Online Strategies, 9919 Corsica Street, Vienna, VA 22181; 703/242-6078; ran@onstrat.com;
By Randolph Hock