What is the invisible web?
The “visible web” is what you can find using general web search engines. It’s also what you see in almost all subject directories. The “invisible web” is what you cannot find using these types of tools. It is also known as the “deep web”.
These types of pages used to be invisible but can now be found in most search engine results:
– Pages in non-HTML formats (pdf, Word, Excel, PowerPoint), now converted into HTML.
– Script-based pages, whose URLs contain a ? or other script coding.
– Pages generated dynamically by other types of database software (e.g., Active Server Pages, Cold Fusion). These can be indexed if there is a stable URL somewhere that search engine crawlers can find.
There are still some hurdles search engine crawlers cannot leap. When you search in a library catalog, article database, statistical database, etc., the results are generated “on the fly” in answer to your search. Because the crawler programs cannot type or think, they cannot enter passwords on a login screen or keywords in a search box. Thus, these databases must be searched separately. Google Scholar is part of the public or visible web. It contains citations to journal articles and other publications, with links to publishers or other sources where one can try to access the full text of the items. This is convenient, but results in Google Scholar are only a small fraction of all the scholarly publications that exist online. Much more – including most of the full text – is available through article databases that are part of the invisible web. The UC Berkeley Library subscribes to over 200 of these, accessible to our students, faculty, staff, and on-campus visitors through our Find Articles page. Search engine companies exclude some types of pages by policy, to avoid cluttering their databases with unwanted content. Think of the billions of possible web pages generated by searches for books in library catalogs, public-record databases, etc. Each of these is created in response to a specific need. Search engines do not want all these pages in their web databases, since they generally are not of broad interest. A web page creator who does not want his/her page showing up in search engines can insert special “meta tags” that will not display on the screen, but will cause most search engines’ crawlers to avoid the page.
IPL2 is a tool which can aid you in searching the “invisible web”.
IPL2 (IPL stands for Internet Public Library) was formed in January 2010 by merging the collections of IPL and LII (Librarian’s Internet Index) websites. The site is hosted by Drexel University’s College of Information Science & Technology, and a consortium of colleges and universities with programs in information science are involved in developing and maintaining the IPL2.
IPL2 is a public service organization and a learning/teaching environment. To date, thousands of students and volunteer library and information science professionals have been involved in answering reference questions for our Ask an IPL2 Librarian service and in designing, building, creating and maintaining the IPL2’s collections. It is through the efforts of these students and volunteers that the IPL2 continues to thrive to this day.
IPL2 has the following sections: