Slide 5 of 16

We all know that one of the biggest problems on the Internet is finding what you want. I think of the Internet searching as dumpster-diving for information. There just might be something good in there, but you have to dig through a lot of garbage to find out. It's also hard to know just what you might find and what you are missing.

I have a talk that I have given to computer science classes in which I point out some of the problems with the kind of keyword searching that we do on the Internet. I start out by asking the students if they find many foreign language materials when they search. Invariably they say they don't, and some are even convinced that nearly everything on the Internet is in English. So I show them a search using the term "fiber optics." Naturally, the bulk of the results are in English. Then I do the same search using the German term, "fiberoptik", and then the French term "fibre optique." Lo and behold, when we search on the German term we get items in German, and when we search on the French term we get items in French. So logically, when we are searching on the English term, we are mainly getting items in English. And by the looks on the faces of the students I know that this obvious connection between the language of the search term and the language of the results has escaped them. What else have they missed? So now I pull out by coup de grace and do a search on "fibre optics," retrieving thousands of hits for Web pages written by people who use British spelling rules. At this point it becomes clear to my audience that they are not even retrieving all of the relevant works in English.

There are many other deceptions lying in Internet retrievals. One of which we librarians are familiar is the spelling problem. We all know that many users can't spell, and even those who can make keyboarding mistakes. When you are doing keyword retrieval, this can make a huge difference. And although library catalogs aren't perfect, an additional problem on the Internet is that the providers of information can't spell either. Go onto one of the Internet search engines and do you search on your favorite misspelling. I tried "recieve" and got 39,394 hits.