Canadian Social Research Links

How the Internet Archive
(www.archive.org)
Can Help You Beat
404 Fury


Last updated November 22, 2013

[ Go to Canadian Social Research Links Home Page ]
SEARCH
FREE WEEKLY NEWSLETTER


To search the complete
Canadian Social Research Links website ,
use the text box below:


To search ONLY the page you are now reading,
use Ctrl + F to open a search window.


SUBSCRIBE TO THE
CANADIAN SOCIAL RESEARCH NEWSLETTER

Sign up to receive this free weekly newsletter by e-mail or read it online
(including archives back to January 2005).
Each issue includes all links added to this site during the previous week.
(2800+ subscribers in January 2017)

What's 404 Fury, you ask?

That's when you click on a content link - say, to a report title or to a specific directorate in a government department) - that you've saved or found on Canadian Social Research Links or a similar website. On the next page, you see "404 Error - Page not found" page. "ARGH", you might say, "I sure wish I'd saved that report before it got vapourized." I say that all the time, along with a few choice colourful phrases. Governments keep changing their sites, almost as if they're deliberately trying to confuse/frustrate people who are looking for information.

How to beat 404 Fury?
Wayback machine to the Rescue!

The Wayback Machine - Archive.org (Internet Archive)
"Browse through billions and billions of web pages archived from 1996 to a few months ago. To start surfing the Wayback Machine, type in the web address of a site or page where you would like to start, and press enter. Then select from the archived dates available. "

The Wayback Machine is a tool that lets you revisit/recreate bygone versions of sites and extensive archival content of websites and web pages. Paste a URL in the box on the home page, and the Wayback machine will retrieve as many copies of that page as it has archived. If you paste http://www.canadiansocialresearch.net/ into the Wayback Machine, for example, you'll see (as at November 5, 2013) links to 325 separate versions of this entire website (not just the home page) going right back to December 2000, two months after I purchased my own domain name ("canadiansocialresearch.net").
Play with it - you don't have to register or anything, and you can't break it.
And don't miss the special collections of historical links!

Wayback Bookmarklet - Click this link, then scroll halfway down the page to "Take The Wayback Machine With You". Drag the Wayback link to your browser's toolbar (also called a Links bar). Now, when you click a link to a website OR to website content and the link is dead, just click your Wayback toolbar link and you'll be transported to a calendar with links to any historical versions of the website or the file at the Wayback Machine. Select the latest version before the link went dead.
[You'll just have to play with this tool to learn how useful it can be to help you retrieve lost files and sites!]

What does the Wayback Machine look like? (photo)
The Wayback Machine contains (as of April 2009) 150 billion archived pages on a 20' by 8' by 8' box that sits in Santa Clara, courtesy of Sun Microcomputer.
(Click the link to see the actual physical Wayback Machine - it looks like one of those transatlantic shipping containers.)
It serves about 500 queries per second from the approximately 4.5 Petabytes (4.5 million gigabytes) of archived web data.

Where does the name Wayback Machine originate?
From the Rocky and Bullwinkle Show (a Saturday morning cartoon show from the 1960s) - it's the name of Mister Peabody's time-travel machine.

Here's a practical example of how Archive.org works.

In 2005, the Ontario Ministry of Community and Social Services created a page to celebrate its 75th anniversary. The page, which included some very interesting historical articles on welfare, was summarily deleted a year or so later, because, well, because the 75th anniversary had come and gone, and who cares about how welfare operated in Ontario in 1915 or 1920. Not the Ontario government webmaster, apparently.

Ministry of Community and Social Services:
Supporting Ontario's communities since 1930

The year 2005 was the 75th anniversary of the Ontario Ministry of Community and Social Services. To mark the occasion, the Ministry posted to its website a collection of six historical factoids and vignettes about welfare as it existed in the first quarter of the 20th century and even before. When I checked the link in the summer of 2007, not only had the page disappeared from the MCSS website, but the above URL now (still in 2009) takes the cyber-visitor to "Thriving Communities", the ministry's framework for a contemporary approach to supporting Ontarians. That's all well and good, but six historical accounts of welfare in Ontario were simply discarded like yesterday's trash, without so much as a "does-anybody-even-care-about-history-out-there" warning.

Solution:
I went to Archive.org and copied the URL of the Ministry into the Wayback Machine (the text box near the top of the page). Then, on the Archive.org results page, I selected the link to the October 2004 site snapshot. Then, on the archived MCSS home page that appeared, I simply clicked on the 75th anniversary button and found the "missing" page and all its secondary links, all live.

Here's the URL of the archived copy of this page from Archive.org:
http://web.archive.org/web/20050518172022/www.mcss.gov.on.ca/CFCS/en/Celebrating75Years/default.htm
TIP : scroll down to "Stories from our Past" for links [you have to click on the word "more" in each case] to the following six short historical bits about welfare and social services in Ontario in the last century:
* Origins of the welfare department (1930)
* Breaking 650 lbs. of rocks to qualify for welfare in 1915
* houses of refuge
* the Mothers' Allowance Act (1920)
* the first foray into the field of day care in the mid-40s
* the Soldier's Aid Commission (est. 1915).

TIP: you can use this same technique to retrieve many (but sadly, not all) "404" pages that have disappeared from the Web.
Sites that are database driven, generate dynamic web pages or have robots.txt exclusions can't be archived.
(Long Live HTML!!)

Put the Wayback Machine right in your browser:
The Wayback Machine Bookmarklet

Drag this link up to your browser's Links or Bookmarks bar:
Wayback

When you land on a 404 web page and you want to find an earlier version of that page,
just click the toolbar link ---you'll be transported to any existing archived versions of the page in the Wayback Machine.

More info about The Internet Archive from Wikipedia

The Internet Archive (IA) - from Wikipedia:
"The Internet Archive (IA) [also called the "Wayback Machine"] is a nonprofit organization dedicated to building and maintaining a free and openly accessible online digital library, including an archive of the Web. With offices and data centers located in California, the archive includes snapshots of the World Wide Web - archived copies of pages, taken at various points in time, along with software, movies, books, and audio recordings. To ensure the stability and endurance of the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina in Egypt. The IA makes its collections available at no cost to researchers, historians, scholars, and the general public. It is a member of the American Library Association and is officially recognized by the State of California as a library."

The Government of Canada Web Archive

Government of Canada Web Archive:
http://www.collectionscanada.gc.ca/webarchives/index-e.html
Since the Fall of 2007, Library and Archives Canada has been harvesting the web domain of the Federal Government of Canada (starting in December 2005).Client access to the content of the Government of Canada Web Archive is provided through searching by keyword, by department name, and by URL. At the time of its launch in Fall 2007, approximately 100 million digital objects (over 4 terabytes) of archived Federal Government website data was made accessible via the LAC website. The GC WA currently contains over 170 million digital objects and more than 7 terabytes of data.
Source:
Library and Archives Canada

Comments:

1. This site is definitely worth closer examination if you're looking for a federal government report or other resource that has disappeared from the Internet since early 2006. As the blurb above states, you can search through superseded versions of federal websites by keyword, department name or URL. I highly recommend that you consider using both the Government of Canada Web Archive and the Internet Archive as complementary tools; the former contains only three years' worth of digital objects (reports, tables, etc.), whereas the Internet Archive's "Wayback Machine" contains digital objects going right back to 1996.

2. The Canadian government archive a spring chicken and a lightweight compared to the Internet Archive.
To put everything into perspective, the government archive only goes back to the end of 2005, and it includes *only* sites that belong to the Government of Canada. As per the above blurb, it currently (in 2009) contains "over 170 million digital objects and more than 7 terabytes of data". According to Wikipedia (see the article above), "As of April 2009, the Wayback Machine contained about 4.5 Petabytes (4.5 million gigabytes) of archived web data,
and it was growing at a rate of 100 terabytes per month."
[Snarky factoid:
The Canadian Government site boasts of "more than 7 terabytes of data", which is about the average size of the home collection of real audiophiles and video collectors.]


CANADIAN SOCIAL RESEARCH LINKS HOME PAGE
 PAGE D'ACCUEIL - SITES DE RECHERCHE SOCIALE AU CANADA

Google
Search the Web Search Canadian Social Research Links Only
TIP:
How to Search for a Word or Expression on a Single Web Page 

Open any web page in your browser, then hold down the Control ("Ctrl") key on your keyboard and type the letter F to open a "Find" window. Type or paste in a key word or expression and hit Enter - your browser will go directly to the first occurrence of that word (or those exact words, as the case may be). To continue searching using the same keyword(s) throughout the rest of the page, keep clicking on the FIND NEXT button.
Try it. It's a great time-saver!
 

 

Site created and maintained by:
Gilles Séguin (This link takes you to my personal page)
E-MAIL: gilseg@rogers.com