Zum Hauptinhalt springen

Web Search Engines: Features and Commands.

Hock, Randolph
In: Online, Jg. 23 (1999), Heft 3, S. 24-28
Online academicJournal

WEB SEARCH ENGINES  features and commands

With a half-dozen or so large general Web search engines, it is far from easy to remember which one provides which features and how those features are implemented. Add to that the fact that several search engines have both simple and advanced modes (often significantly different) and the searcher's memory becomes more heavily taxed.

This chart will lay out cues to help a searcher use most of the various features without having to refer to the sometimes weak online documentation provided by the search engines. It will cover the more significant features that are common to at least a couple of the search engines and identify the more outstanding search-related features that are unique to particular engines. To keep the chart to a manageable size, not all nuances will be mentioned. Enough detail should be covered so that a typical search can be structured using the features provided in the chart.

Since Web search engines tend to succumb to the bandwagon effect, most of them now have both a simple, or "home page," version and an advanced version. Because the capabilities and implementation of features for the two versions often differ considerably, each version is given its own column in this chart.

To make the chart easier to use, and to avoid long entries in the cells, a general explanation of the entries and some special notes will be covered later. These notes do not attempt to provide behind the scenes details, rather, the intent is to provide a practical quick guide.

The chart includes the larger search engines (AltaVista, HotBot, Northern Light, Excite, Infoseek, and Lycos), which the serious searcher is likely to use at least occasionally. WebCrawler is included partly out of respect for its age and its early contribution. It is also included because it is still widely used by casual searchers who may come to the more frequent searcher with questions about it.

CONVENTIONS USED

Most of the entries in the top half of the chart indicate the operator, syntax, or prefix that the searcher is required to enter (e.g., AND, +term, " ", title:) in order to perform a search. An entry, such as title:term, indicates that the searcher should enter the prefix "title:" followed by the term (word or phrase) to be searched for, e.g., title:andromeda.

In many cases, on a search engine's search page, the options available to searchers are presented by means of a pull-down window (or radio buttons, etc.) rather than by typing their choice in a text box. This is indicated on the chart by the "(window)" designation or the appropriate variant. In some cases, parentheses are used on the chart where a clarification seemed advisable.

In the Boolean portion of the chart, the term "(default)" is used to designate which operator takes effect when the searcher does not designate otherwise.

Where a place on the chart merely needs to indicate whether a particular option is available (as in the case of parentheses), a "yes" is used if the feature is available. A blank implies a "no." This convention has been found to make it easier to glance at the chart and see the availability while at the same time presenting a less cluttered chart.

FEATURES COVERED IN THE CHART

Search Engine Size The "size" stated by search engine producers conventionally refers to the number of unique Web pages (unique URLs), rather than "sites" (which may contain numerous "pages.") The numbers shown here have either been published or were obtained directly from the producer. The kinds of pages that are counted or not counted varies and the numbers alone do not necessarily reflect the whole "size" picture. In Lycos, for example, the 35 million does not include the personal home pages or its pictures and sounds databases. The Northern Light figure does not count the "Special Collection" documents. (For an excellent discussion of sizes of Web search engines, see: S. R. Lawrence and C. L. Giles, "Searching the World Wide Web." Science 280 (April 1998): pp. 98-100.)

Boolean Operators (AND, OR, NOT) and Parentheses

In general, there are two levels of Boolean capability among these search engines. The "simplified" form uses a plus sign in front of a term to indicate that records should only be retrieved if that term is present. At this simplified level, a "NOT" is achieved by means of a minus sign in front of a term. Usually the use of this simplified form of Boolean does not override the relevance ranking algorithm as does the use of AND, OR, and NOT (or AND NOT).

To enable searchers to use the full Boolean capabilities familiar in traditional online services, the engine must provide the equivalents of AND, OR, and NOT, plus the capability of nesting (the use of parentheses). Either out of perversity or for reasons not clear, some engines don't use the plain NOT to exclude a term, but insist on using AND NOT. I am sure some programmer or theoretician somewhere will be glad to explain this, but it seems that in these cases, a programmer prevailed over the "user-friendly" advocate.

Some engines require that capital letters be used for Boolean operators, some do not. In all engines that use these Boolean connectors, the capitalized form will work. For simplicity, therefore, the capitalized form is shown on the chart. It seems easier to always stick to caps rather than to try to remember which engines requires caps. Also, if you use "copy and paste" to move between engines, use of caps makes for greater cross-engine compatibility.

Phrase Searching

In almost all cases, a phrase can be indicated by putting the phrase in quotes ("") in the query box. In some cases, a phrase can be designated by choosing the phrase option from a pull-down window.

Proximity

Phrase searching is, of course, one form of proximity searching. The next most common proximity option is NEAR, which specifies "within 10 words" in AltaVista and "within 25 words" in Lycos PRO. The latter also allows NEAR/n, where "n" is a user-specified maximum distance, e.g., NEAR/5. (This is comparable to the (N) and (N/n) connectors on Dialog.) Lycos PRO also provides BEFORE and BEFORE/n, as well as other variations. For more detail on these, see the online documentation.

Truncation

If a truncation or stemming feature is available, the appropriate symbol is shown.

Title, Date, and URL fields

Where these fields are searchable, the appropriate prefix is shown, or an indication is given that the searcher uses a pull-down window or text boxes. For prefixes, the searcher should enter the prefix shown followed by the term to be searched--for example, title:lupus.

"Links to" a URL

This refers to the capability of identifying which pages in the search engine's database contain a link to a particular URL. This is somewhat analogous to "citation searching," and enables the searcher to identify sites that have some interest in the site referred to.

Language

Entries here indicate whether one can search by the language in which the Web page is written.

Media Searching

This refers to the capability for searching by type of media--images, audio files, and video files. The implementation is quite different among the search engines that provide this. In AltaVista, you can search for a word in an image file's name (or use the special "Photo Finder" database). In HotBot, you perform a subject search, but specify that you want only records that also contain an image, sound, or video file. With Infoseek, the "alt:" prefix allows searching an image's "alternate text" tag. Lycos' home page version provides a separate "Pictures and Sounds" database to search, but in Lycos Pro Search, radio buttons are used to specify image or sound files.

Name

The entries here refer to whether the search engine claims to be able to identify proper names--persons or otherwise. This actually boils down to either limiting retrieval to instances in a page where each word appears in its capitalized form and/or automatically allowing for the inverted form of a word pair.

Case Sensitivity

Some search engines can identify upper and lower case letters. This is important in instances when "AIDS" needs to be distinguished from "aids." In general, when a query is entered using all lower case, the search engine will retrieve both lower and upper case. When upper case is entered by the searcher, the engines will return only those records with an exact case match. For example, "next" will retrieve "next" and "neXt," whereas "neXt" will only retrieve "neXt."

Searches All Common Words

This refers to whether literally all words are indexed and searchable. Critical not only when one wants to search for "The Who," but when one needs to search for any phrase containing a very common word.

Directory Attached

This is an indication of whether a Web directory is included as a part of the search engine's search page. (In some cases, the directory may be embedded in a "channel" option.)

Gives Count for Answer

Being told the number of items retrieved might seem to be something we could take for granted, but not so for Lycos.

Gives Term Count

Some engines tell not only the overall answer, but also the retrieval count for the individual terms searched (as with traditional online services.)

Output Options

This line indicates which format options are available, whether the user can specify the number of records on each results page, and also if results can be "grouped or ungrouped" by Web site.

"More Like This"

For some search engines, you are given the option, when you see a record you like, of having the engine find other records that are similar to that record.

Special Features

What is listed here are additional features provided by the search engine that should be of interest to the serious searcher. The choice of what is included in this line is admittedly somewhat subjective. Features listed here are ones directly related to performing a search on the search engine's Web database. Additional features (or "add-ons"), such as company directories, free email, weather reports, etc., are ignored.

FINAL COMMENTS

As we all know, Web search engines are changing constantly. The changes are often superficial and cosmetic, but in the last year, we have also seen a number of substantive and welcome additions to functionality. With luck, in a year or so we'll be able to fill in even more of the blocks on the chart and maybe even add a line or two.

Communications to the author should be addressed to Randolph Hock, Online Strategies, 9919 Corsica Street, Vienna, VA 22181; 703/242-6078; ran@onstrat.com; http://www.onstrat.com.

FEATURES AND COMMANDS COMPARISON CHART Legend for Chart: B - AltaVista Home www.altavista.com C - AltaVista Advanced D - Excite Home www.excite.com E - Excite Power Search F - HotBot Home www.hotbot.com G - HotBot "More Search Options" H - Infoseek Home www.inforseek.com I - Infoseek Advanced J - Lycos Home www.lycos.com K - Lycos Pro Search L - Northern Light Home www.northernlight.com M - Northern Light Power Search N - WebCrawler www.webcrawler.com A B C D E F G H I J K L M N Size (pages) 150 million 150 million 50 million 50 million 110 million 110 million 50 million 50 million 35 million 35 million 110 million 110 million 2 million OR (default) (default) OR (default) (window) OR (window) (window) OR OR (default) (window) (window) OR OR OR default OR AND +term AND +term (window) AND (default) +term (window) (window) +term +term AND AND +term (window) (default) (window) +term AND +term (default) (default) +term +term AND AND +term AND NOT -term AND NOT -term (window) AND NOT -term (window) NOT -term NOT -term (window) -term NOT -term -term -term NOT NOT -term NOT Parentheses yes yes yes yes yes yes yes yes Phrase " " " " " " (window) (window) (window) " " " " " " (window) " " (window) " " " " " " " " Proximity NEAR (within 10 words) (window) NEAR, NEAR/n, ADJ, BEFORE, FAR, OADJ, ONEAR, etc. Truncation (*)(asterisk) (*)(asterisk) (*)(asterisk) (*)(asterisk) automatic automatic automatic automatic plural/sing, plural/sing. (*)(for multiple (*)(for multiple chars.) chars.) % (for single char.) % (single char.) Title Field title:term title:term (window) (window) title:term title:term title:term (window) (radio button, box) title:term (window) title:term Date Field (date range boxes) (window) (window) URL Field url:term url:term domain:term domain:term host:term host:term domain:term (window) domain:term url:term (window) site:term (radio button, box) url:term (window) url:term "Links To" link:term link:term a URL (window) (window) link:term (window) Language (window) (window) (window) (window) (window) (window) Media Searching image:term (also image:term (also "Photo Finder") "Photo Finder") yes (checkboxes -- for (checkboxes -- for image, audio, image, audio, video) video) alt:term alt:term (Use "Pictures & (radio buttons for Sounds" pictures and database) sounds) Name (window) (window) automatic (window) Case Sensitive yes yes yes yes yes yes Searches All Common Words yes yes yes yes yes yes Web Directory Attached yes yes yes yes yes yes yes Gives Count for Answer yes yes yes yes yes yes yes yes yes yes yes Gives Term Count yes Output Options Detailed Detailed Count Only Titles 10, 20, 30, 40, 50 Titles & results summaries Titles Grouped/ Titles & summaries ungrouped by Grouped/ Web site ungrouped by Web site Full, brief, URLs Full, brief, URLs only only 10, 25, 50, 100 10, 25, 50, 100 results results Hide summaries. Hide summaries. Show summaries Show summaries Grouped/ 10, 20, 25, 50 results ungrouped by Grouped/ Web site ungrouped by Sort by date Web site Sort by date Grouped by 10, 20, 30, or 40 Web site results Grouped by Web site Detailed Detailed Sort by date Titles Titles and summaries "More Like This" yes yes yes yes yes yes yes Special Features "Refine" "Refine" (co-occurring (co-occurring terms) terms) Translations Translations Adult content Adult content filter filter Concept Concept searching searching Suggested terms Suggested terms Search by page depth Can NARROW a Can NARROW search a search Adult content filter Can NARROW Can NARROW a search a search User-controlled relevance factors Special Collection Special Collection Custom Search Custom Search Folders Folders Can NARROW a Can NARROW a search by means search by means of folders, of folders. Publications Publication search search Industry search, Industry search, etc. etc. Related sites

By Randolph Hock

Titel:
Web Search Engines: Features and Commands.
Autor/in / Beteiligte Person: Hock, Randolph
Link:
Zeitschrift: Online, Jg. 23 (1999), Heft 3, S. 24-28
Veröffentlichung: 1999
Medientyp: academicJournal
ISSN: 0146-5422 (print)
Schlagwort:
  • Descriptors: Comparative Analysis Expert Systems Information Retrieval Information Sources Internet Online Searching Online Systems Search Strategies World Wide Web
Sonstiges:
  • Nachgewiesen in: ERIC
  • Sprachen: English
  • Language: English
  • Peer Reviewed: N
  • Page Count: 5
  • Document Type: Journal Articles ; Reports - Descriptive
  • Entry Date: 2000

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -