Searching the “Archives” of the Internet

InternetArchiveLogo“The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, the print disabled, and the general public.”

This site can be very helpful in your research process as it offer two areas that are the focus in today’s blog post – Web and Texts.  While there are others—Video, Audio, and Projects – I’ve chosen the two that I think are the most beneficial for genealogists.

WEB

It is truly amazing that only a short 13 years ago, the idea to store webpages in a digital archive came to be.  On Internet Archive, you can browse through over 240 billion web pages archived from 1996 to a just a few months ago.

So, when that favorite website of your disappears before you’ve printed the items of interest or you’re revisiting online research from a few years ago and find the website “gone”, you can visit the Wayback Machine .  By entering the URL to “bring back” the site or page, your search request will bring you to the last cached version that Internet Archive archived.  The archived webpage results will be at date as close as possible to your choice. Unfortunately, at this time, keyword searching is not currently supported.

TEXTS

Are you interested in what a particular repository might have digitally available?  In Texts – Ebooks and Text Archive – section, you can search over 4.7 million items of free digitized items, including academic books, historical texts, etc.  The full listing of the generous sponsoring libraries can be found at the bottom of the Welcome page.

Here’s a highlight of one of my favorite research libraries – Rutgers University – that I can “surf” for hours every time I visit.  Its digital offerings give you 246 items that are scanned in their entirety for your perusal and research.  That’s a lot of pages, folks!

Excerpt from the online description:

“As Rutgers has evolved over the past two centuries from a small colonial era college into a major research university, the university library system has grown to become one of the top academic libraries in the country.”

Items You Can  Find

A quick review of the page gave us the Most Downloaded Item Last Week, which included:Rutgers University Libraries_archive.org

  1. The early Germans of New Jersey : their history, churches, and genealogies.
  2. List of Regular Lodges F. & A.M. (Free & Accepted Masons), March 1920
  3. History of Sussex and Warren counties, New Jersey
  4. History of the First Presbyterian Church, Morristown, N.J.
  5. N.Y. & N.J. Tel. Co. Telephone directory (Volume 1888)

Wait, the 1888 telephone book made the Top 5?  Truly amazing!

However, you can do a search of the 246 entries to narrow down items to a specific research topic or location.  Remember, this is not all-inclusive of what the library holds, just what’s been shared with Internet Archive.  Also, you can broaden the search by entering the your search terms on the Internet Archive main page and clicking GO.

While I’ve only picked a few items to write about today, there are many more treasures waiting for you to find them at Internet Archive.  Also, be sure to visit the FAQ webpage for answer to questions about material not covered in today’s post. Have fun searching!

About these ads

2 thoughts on “Searching the “Archives” of the Internet

  1. It seems nothing has been added to the Wayback Machine part of Internet Archive since 2008. I also put in some of the subjects listed above and got no results

    • Mike,
      You can read more about how the websites are “crawled” to be included in the Wayback Machine in the FAQ:

      https://archive.org/about/faqs.php#The_Wayback_Machine

      It is important to note that there is a timelag of 6-14 months for websites/pages to show up in the WM. Also, they do respect “robot.txt” instructions plus Java scripts play a part of difficulty in archiving the webpage.

      As for replicating the search results in my post, be sure to change from the Wayback Machine or WEB to the TEXTS section of Archive.org. WaybackMachine is for websites/webpages only. If you’re seeking the digitized texts/books/pamphlets/etc, then you need to select the correct section to conduct the search.

      TEXTS:

      https://archive.org/details/texts

      Wayback Machine:

      https://archive.org/web/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s