The Importance of the Invisible Web for Librarians
.Preface
I had never heard of the term "Invisible Web." One day this semester I was browsing the Dewey 025's stacks at the Chicopee Public Library looking for a book referred by one of my LSC548 classmates. Its title and author are "The Extreme Searchers Guide to Web Search Engines" by Randolph Hock. This book was not on the shelf and it was supposed to be. The book I found was the "The Invisible Web: Uncovering Information Sources Search Engines Can’t See" by Chris Sherman and Gary Price. It looked interesting and I imagined important reading for a future librarian who would be using the Web for reference work.
I had gotten into the habit of only using Google.com to search the Web! I think that along with the search engine comparison assignment in LSC548 I began to learn the limitations of the search engine. This serendipitous find has opened up a brand new understanding for me on the valuable information available to people on the Internet and the Web.
I also worked on a Western Massachusetts Master Gardener Virtual Library and some of the examples I used are how I found the databases I used for this Web site on the Invisible Web.
Introduction
It is important for prospective librarian to realize the limitations of the search engine and be aware of the different formats of Web accessible information on the Invisible Web. Most people might automatically go to their favorite Search Engine, either Google.com, Search.com, HotBot.com or Lycos.com first. After reading the book, "The Invisible Web: Uncovering Information Sources Search Engines Can’t See" by Chris Sherman and Gary Price, one would realize how foolish this is. One might not realize the amount of information they would be missing.
Sherman defines The Invisible Web as this. "It consists of content that’s been excluded from general-purpose search engines and Web directories such as Lycos and LookSmart." (Sherman 55). Another definition is "The Invisible Web is text pages, files, or other often high-quality authoritative information available via the World Wide Web that general-purpose search engines cannot, due to technical limitations, or will not, due to deliberate choice, add to their indices of Web pages. Sometimes this is call "Deep Web" or "dark matter." (Sherman 57)."
One way to miss data is that the author of the HTML web page can put a parameter into the page informing search engines and web crawlers that they do not want their page added to the search engine’s or web crawler’s database. Here is the code.
<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
This is a courtesy that the search engines and web crawlers honor. The following is a fact of search engines that may not be so well understood. Search engines only infrequently look at all the Web pages. They search from their own databases, created by the crawler which looked through HTML code on different servers and put keywords and URLs in indices of a database. When one keys in the desired keyword in the Google.com search box, it is the Google.com database that is being searched, not the Web. This is most obvious in Google.com where one can click on "cached" and get the real, updated version right from the Web if they want the latest version of the Web page.
The Invisible Web is the part of the Web whose resources the search engines miss. These missing pages and databases are what mainly make up the Invisible Web. Search engines are known to search only 20 per cent of what is actually available on the Web. One example is address and phone number. One can look up an address in Google.com by keying in his name, and city and state a person lives in. One has a good chance that it will be found, but on some occaisions one has to click on more names to get to the name they are looking for. When one goes to the database www.anywho.com, phone numbers can be found quickly. Knowing which databases have the right information is what a librarian must do. Sometimes it works in Google but it is more efficient to use the database in www.anywho.com. Chris Sherman suggests creating a web version on Back Flip www.backflip.com or HotLinks www.hotlinks.com to house the databases URLs that are the librarian’s favorite databases from the Invisible Web. Having them online enable the librarian to use the Web on any machine to get to their web page of searchable databases.
Sherman gives an example of missed crawled Web data on a page. The home page of the Library of Congress's, www.loc.gov/, has many links on it. If the cursor is moved over the American Memory link, one will see it gives access to more than 80 collections. A crawler database will have picked up the single link for the Library of Congress home page but will miss the 80 links deep in this American Memory link. Now if a Web Page or (Virtual Library) happens to be linked to one of the American Memory's 80 links, that link will have been picked up by the Web crawler and put into its database. It is a hit or miss proposition. The present home page of the Library of Congress has American Memory as an indirect links and dynamic pages are created. Situations like this aid in the creation of the Invisible Web. It is not done on purpose. It is a limit of web crawler technology.
To illustrate another example of the Invisible Web, I will relate my personal experience. While searching www.google.com for information about the Invisible, web I keyed in both "invisible web" in double quotes and again without the double quotes and I found several Web pages on the definitions of the Invisible Web. They were excellent definitions and helped me further my understanding. But then I needed to do some more research for my paper. I went to INFOTRAC, a database on my hometown library Web site. When I keyed in "Invisible web," both in and without double quotes, the wealth of high quality information rose. I found chapters 4 and 5 of the serendipitously found book in full text and I was able to print it out and read it. There are several recent articles about the Invisible Web. One is in Choice Magazine. Several others were only abstracts. The Choice Magazine article turned out to be a book review which highly recommended the book for any academic library.
The Invisible Web are those databases like INFOTRAC that are not in HTML format without first going into the database and keying in a logon id and then keying the key words "Invisible Web." Google.com is limited to HTML pages and does not have the capacity to fill in the database keyword and search it to find this very valuable 50% to 80% of information that is Web accessible but not indexed or obtainable for the present search engine technology. I believe as search engines evolves this type of information will be available in the future.
The purpose of this paper is to inform the reader to not overlook this important part of Web searching and to give tips on how to search and find these reliable and information-packed databases.
Invisible Web Web Sites
Sherman writes a great section about keeping consistent with the Web and he gives some URLs that are pathfinders for the Invisible Web. Some of the Web sites are The Scout Report, Librarians' Index to the Internet, ResearchBuzz.com, FreePrint.com and Internet Resources Newsletter.
He gives a list of Invisible Web Web sites. The site he created for his book is www.invisible-web.net. The links on the home page are actually Chapters 9 – Chapters 27 from Sherman’s book. He can updated this site easier than updating the book. It keeps the site timely. You can open a browser with this link. Choose Science for the Category. This brings you to another Web page and has the Category and Sub-Category. Choose Botany. One Should double click on the Show Button. Scroll down the page and see the four categories of Botany databases. Sometimes there are more than one link associated with the topic. Plants Database is the third entry down.
Another Invisible Web site is www.invisibleweb.com. This is a commercial site trying to do what Sherman did. Choose Science, Life Science, and Botany. Some of these URL sites did not work and were dead links. The Plants Database, http://plants.usda.gov/plants/index.html, showed up on this web site as well as Sherman’s web site.
One last site that is most useful is the Librarians’ Internet Information, www.lii.org. There are several places to go. Under Science, Computers and Technology choose Plants. It gives you general resource lists and under all that is a "see also" a link to Gardening. Choose the "see also" Gardening. Choose Children’s Gardens. There are three available databases all the .edu suffixes on the URLs. It will highlight in yellow, plants, gardening and Children’s Gardens or whatever your search keyword is. There are three URLs to choose from. If you double click on the highlighted term of your choice it will give you all the sites for that topic. I chose gardening and I got twenty-five gardening sites to choose from.
Future of Libraries
Libraries have a challenge before themselves at this time in history. The public perception is "Why do we need a library when everything anyone needs is on the Internet?" Teachers are finding out as students depend more and more on it for their research and learning. According to Laura Sessions Stepp, a Washington Post staff writer, in her article about students relying on the Internet for research:
"On the good side, Net thinkers are said to generate work quickly and make connections easily."
"They are more in control of facts than we were 40 years ago," says Bernard Cooperman, a history professor at the University of Maryland."
" But they also value information-gathering over deliberation, breadth over depth, and other people’s arguments over their own."
" This has educators worried. After receiving teacher approval of their articles, Cooperman's students summarized and evaluated the articles' arguments and then used the Web to find further sources. Cooperman told them to evaluate the usefulness of the Web sources compared with the scholarly material."
" Their Web work turned up contradictions, errors and extraneous material." (Stepp 7).
It is going to be up to librarians to point out the right direction for these scholars to find the useful and factual information. This is going to be quite a challenge to the future of libraries since they are not getting the financial or psychological support from politicians and the public. Libraries are going to have to work hard to overcome the Internet perception people seem to attribute to libraries.
What does a library student make of all this? I think the Invisible Web is a nebulous entity that can’t clearly be defined. It ranges from 20% to 80% of Web resources and it is not organized very well. I think it is pretty cool to find this information but why hasn’t there been more mention of this before? I ask people about it and some people do know about it. A lot of it is already discovered on Virtual Library Home pages. The UMass library and other academic libraries have a databases section on their sites. These are available on the Web but will not be searched by the web crawler and therefore will not show up in a Google.com search. I think people know about it. The databases seem hard to get to and a high school or college student might not know how to get to this page on the Web or how to choose which of the online databases would be best for their present research. This would change for each assignment.
I think librarians must in the future get this message out time and time again. I think it will be frustrating because students want information in three seconds, they don’t want to take the time to find and research different aspects of the topic. But the librarian is going to have to be persistent and somehow get the message out that the Internet search engine is a fine tool, but it has a time a place to use it, and it has its limitations. Librarians will have to direct patrons to these valuable databases and make them easier to find and to use. Having INFOTRAC so widely available is definitely a step in the right direction. I was just going to look up the APA version of the bibliography format and I had to stop myself from using Google. I will use www.lii.org instead.
Conclusion
In conclusion, it is up to the individual librarian to define which part of the Invisible Web they want to uncover and find, and document their own way around it. I think I will build a library toolkit in the Dewey Decimal system order.
I think that library science personnel should take the bull by the horns and tame it. Sherman and Gary Price have made great strides in doing this and so had www.lii.org and www.invisiblewb.com. I think it would be nice to get a group of librarians together to sort through all the possible databases. (It stills seems like a hunt and peck way to find these databases.) Like Stielow in the book "Creating a Virtual Library" said, why re-invent the wheel? It would be nice to have a reference since these databases are like having fine reference materials. It would be nice to create a Reference Virtual Library. It could be like his model library in the book
One could create the foyer – the Invisible Web Reference Virtual Library. Under that would be the 10 major categories of the Dewey Decimal System. Each of these 10 links would link to sub-page that will link you to the databases. For example, 635 plants would be under the sixth box in the foyer and would hold the URLs for the plants database. A page would be created for eac sub-level and this would bring flexibility to the design of the web page.
It would be similar to a reference section of a library but it would take it a step further and it would put it in the Dewey Decimal system of the databases of the Invisible Web. I may do it for my next Virtual Library project or independent study. I find it fascinating. This would be fun! This would save me from book marking everything in www.backflip.com or www.hotlinks.com.
BIBLIOGRAPHY
Buczynski, J.A. (2002). (Review of the book ). Using the World Wide Web and Creating Home Pages: A How-to-do-it
Manual). Choice: Current Reviews for Academic Libraries, 39(5), 846.
Clyde, Anne (April 2002). The Invisible Web. (InfoTech). Teacher Librarian, 29(4), 47(3).
Conhaim, Wallys W. (2002). (Review of the book ). The Invisible Web: Uncovering Information Sources Search Engines
Can't See. Link-Up. 19(1), 11(1).
Hock, Randolph (2001). The Extreme Searchers Guide to Web Search Engines. Medford,NJ: CyberAge Books.
Metz, Ray E., & Junion-Metz, Gail. (1996). Using the world wide web and creating home pages: A how-to-do-it
manual. New York: Neal-Schuman Publishers.
Sherman, Chris & Price, Gary (2001). The Invisible Web: Uncovering Information Sources Search Engines Can't See.
Medford, New Jersey: Information Today, Inc.
Sherman, Chris & Price, Gary (2001). The Invisible Web. (Internet/Web/Online Service Information). Searcher, 9(1), 62.
Stielow, Frederick (1999). Creating a Virtual Library. New York: Neil-Schuman Publishers, Inc
Williams, R., & Tollett, J. (2000). The Non-Designer's Web Book. Berkeley, CA: Peachpit Press.
Williams, R. (1994). The Non-Designer's Design Book. Berkeley, CA: Peachpit Press.
Williams, R. (1992). The PC is Not a Typewriter: A Style Manual for Creating Professional-level
Type on Your Personal Computer. Berkeley, CA. Peachpit Press