August 21, 2003
Fishing for Information? Try Better Bait
HE notion of a user's manual for search engines might seem counterintuitive. Give people an empty search box and a button to click on and somehow they know exactly what to do.
But as the Web gets larger and more complicated, encompassing PDF documents, movies and audio files, product databases and ever-changing pages, it can help to know a few tricks that are not so obvious.
A new book, "Google Hacks: 100 Industrial-Strength Tips and Tools," by Tara Calishain and Rael Dornfest (O'Reilly & Associates), is the latest resource in a growing industry to help people become better online searchers. It catalogs ways to uncover nuggets of information. Although a large part of the book is intended for programmers who adapt Google's search services for their own Web sites, there is much in it for everyday users.
Other rich sources include Gary Price's resourceshelf.freepint.com (a favorite Web site among reference librarians), Greg R. Notess's searchengineshowdown.com (chock full of reviews and comparisons) and Danny Sullivan's searchenginewatch .com (the place to be mentioned if you are affiliated with the search-engine market). Ms. Calishain maintains a site with updated trade tricks at www.researchbuzz.com.
In a perfect world, people would have time to keep up with these masters of information retrieval. But those who are simply looking for a way to avoid the tedium of trying one keyword after another can benefit from a few of their basic tips.
If you are looking for a phrase or other words that always appear together, you probably know you should enclose the words within quotation marks. Search for "Death by Chocolate" at www.google.com and every entry on the first page of results includes that phrase, allowing you to steer clear of unappetizing pages with the words "death" and "chocolate" somewhere on them.
But many people do not know that the same search without quotation marks will turn up the same set of results. That is because Google considers a page on which the search words occur as a phrase to be of higher relevance. AltaVista (www.altavista.com), AlltheWeb (www.alltheweb.com), Teoma (www.teoma.com) and other search engines follow the same rule.
That does not mean that quotation marks are useless. Run a search for "the ultimate chocolate cake" and you will get a different result than you would by simply plugging in the words without quotation marks. In this case, the difference is the word "the," which is so common that Google will typically ignore it unless it is part of a phrase that you have delimited.
All major search engines allow you to limit searches by ruling out pages that might contain specific words. To do so, put a minus sign directly in front of the word you do not want to see. A search for "chocolate fudge recipe -marshmallows" will enable you to dodge the Rocky Road.
If you want to widen your search instead, you have the OR command at your disposal. Be sure to type it in capital letters. A search for "fudgy (icing OR frosting)" on Google doubles the caloric options. You can also use the tilde shortcut that Google unveiled earlier this month. If you put the tilde symbol in front of your keyword (try "fudge ~ icing"), Google will search for icing and its common synonyms.
As popular as Google is, it does not measure up when it comes to two other strategies beloved by expert information retrievers. One is truncation - the ability to chop off a word and put an asterisk in place of whatever was chopped, thereby searching for all variations of that word with one search query.
To many experts, AltaVista wins at this game. Plug "fudg* brownie recipe" into the search box and you will find fudge brownies, fudgy brownies and fudge-nut brownie cake.
The other trick is called proximity searching, in which you can search for two words in close proximity, instead of simply on the same page or within the same phrase. AltaVista has this licked, using the NEAR command. Type in "substitution NEAR chocolate" or better yet, "substitut* NEAR chocolate" and you get advice on substituting bars of unsweetened chocolate with semi-sweet, or how to use chocolate substitutes like cocoa powder.
Most search engines give you a break when you cannot remember every single word in a phrase or name that you are seeking. They allow you to use a wild card, an asterisk in place of the word that escapes you. Type " Nestle * cookies" as a phrase and Tollhouse appears (along with Rolo, Quik and Raisinet goodies, too).
Domain Limits and Links
Sometimes it seems like overkill to search the entire Web when all you really want is an academic or noncommercial take on a topic. Say you only want results from the .edu domain. Try using the syntax tool called "site:" and restrict the results to those in the .edu domain. (This works with Google, AlltheWeb.com and Teoma, among other engines.)
A search like "Chocolate addiction treatment site:edu" pulls up only those pages posted at university sites. (Of course, there are still plenty of .com options in the advertisements on the right-hand side of the results page.)
When using a syntax like "site:" be sure that there is no space between the colon and the next word. If you accidentally put a space there, the search engines will think that "site:" is a word you're looking for.
If you are digging deeply into a topic, it may help to know which sites are linking to the page that you are reading. Knowing those connections can bring you a step closer to understanding the community that has coalesced around your subject.
For example, by typing "link:www .chocoholic.com" into Google's search box, you'll find other sites like the Chocolate Corner and a list of "Choco-Links."
But Google is not the easiest means for conducting searches for links (technically known as Uniform Resource Locators, or URL's), and many search experts avoid it, preferring the AlltheWeb engine. At AlltheWeb, you do not need to remember to use "link:" syntax. Simply plug the URL into the search window and you will get a link to a list of the 391 pages that link to Chocoholics. But you will also be pointed to sites that contain the URL in their text, pages that are indexed under that URL, information on who owns the URL and an image of how the page used to look. That last option is a link that takes you straight to the WayBack Machine, a service of the Internet Archive, where you can view pages as they were rendered as far back as 1996.
News, Numbers or File Types
AltaVista and Google News (news.google.com) offer the ability to search for news articles by time and date, whether your range is the past hour, the past day, the past week or the past month. Google News has more sources (4,000-plus), but while AltaVista has fewer (3,000-plus), its database is deeper, with more than a year's worth of material available. Other options are AlltheWeb, which carries multilingual newspapers; DayPop (www.daypop.com), which logs blogs; and NewsNow (www.newsnow.com), which offers a live feed - the closest thing to watching the wires free.
News articles can be searched with the same tricks that apply to Web pages. (And if you think chocolate does not rise to the level of news, think again. How else would you locate the latest commentary on the chocolate-chip cookie industry from the financial site The Motley Fool?)
If you are simply looking for a phone number, Google offers a shortcut, but you need to understand its quirks. To find the number (or address) for a Bread & Chocolate bakery in Virginia, type "phonebook:bread & chocolate va" (note the lack of space between "phonebook" and "bread"). Remember to put a state abbreviation after the query; otherwise, Google will give up.
Now that Google and other engines have started indexing PDF files, PowerPoint presentations and Word documents (among other file types), it can be useful to narrow a search to those alone. This requires use of another syntax term, "filetype:" (again, no space after the colon).
A search for "stomach ache remedies filetype:pdf" retrieves some timely advice.