FIRST AID FOR BROKEN LINKS

How to track down original source material for broken reference links on web sites

Please feel free to link to this page from your own site
to help your readers help themselves in such matters

A Presentation of jrm&aFLUX


by J.R. Mooneyham
_______This page last updated on or about 9-13-03_______
(Free JavaScripts provided by The JavaScript Source)

Please help us keep this site online



(Translate this page)

| ESPAÑOL | FRANÇAIS | DEUTSCH |
| ITALIANO | NORSK | PORTUGUÊS |

(above translations provided by FreeTranslation.com)

(Translate this site)


| Search this site |
| Site map | Site author | Site store |
>>> | Latest site updates | <<<

| Access Google's cache of this site |


Alternative (mirror site) links
| Translate this site |
| Site search | Site map | Site author |
| Access Google's cache of this site |


| Access mirror of this page |


Many reference links to original articles and scientific papers provided throughout this web site and others will undoubtedly break eventually, resulting in error messages for readers when clicked. There's little I or other web authors can do about that, other than avoid posting live links we know to be temporary, such as many Yahoo! news links were in early 2000. 9-13-03 UPDATE: I've begun using even URLs I know to be temporary now, due to the extra overhead required to exclude them. Basically, not including the temporary URLs causes me to often also lose track of some very promising information-- as I've come to depend upon my compiled Favorites lists from years of surfing as my main index to my research collection. Hopefully major resources like Yahoo will eventually realize links need some sort of web permanence for the good of humanity, and correct this flaw on their end (even if only by fee-based archives). END UPDATE So far as I know today no robust and automated system for correcting such broken links in realtime exists circa 2003, or if it does it's far too expensive for ordinary webmasters to use. The manual methods described here will often work, but can be time consuming to perform even just for a few links: doing such work for many links on a regular basis could quickly overwhelm even an entire corporate or government department of dozens or hundreds of folks-- so again, such comprehensive link corrections for individual webmasters is almost never practical.

Combine the above with the fact that LOTS of web links get purposely or accidentally broken for many reasons, and that's how you end up with the likely hundreds of thousands or more broken links on the web today.

We webmasters can go through our pages from time to time and disable or completely remove broken links on our sites. In a small percentage of cases we may even be able to relocate a certain item and thereby repair the broken link. But that's about all we can do. I personally tend to never completely remove a broken link, but simply disable it (as clicking on an errant link can not only be annoying for the user, but also risky: some domains get hijacked and no longer take visitors to the places they meant to go. And worse, some such clicks might even cause various troubles with your PC upon visiting). By merely disabling such links but leaving intact any original publication info posted with it such as article title, publisher, author, and datestamp, I protect readers from several possible problems while still leaving them an accountability trail to follow, if and when they wish to verify my own sources or check them out for entirely different research purposes altogether.

Unfortunately, even the half-measures described immediately above can be a lot of work in themselves-- at least for those of us without lots of heavy duty corporate or government resources or expensive software, consultants, and/or programmers to help us out. So even the half-measures don't get performed very often, or across-the-board, by most webmasters. Me included.

But there's no need to despair; there's still ways to track down many of the original items in their entirety even when the original link has broken.

Using a relatively uncommon word or phrase taken from the original citation's reference information and running a Google search with it may bring up alternative locations for the item on the web, either on a live page or within the Google cache.

Uncommon is the key concept here. If the item you're looking for was authored by someone with a most unusual name, like "Gedhferdang", you'd want to try it in a search first. The more unusual the term, the better. For Google's database includes literally millions of items matching more common strings of characters.

Now that Google also has a news site, if you're really lucky your Google search may not point you to the original item itself, but instead to a fresh update of that item, perhaps even by the same author or publication! This updated information may serve your purposes even better than the source you were originally looking for, as new scientific findings either verify or disprove or expand upon relevant elements of the original reference article.

Keep in mind that the more time you're willing to spend looking, the better your chances of success. I've seen cases myself where I had to check out perhaps hundreds of promising links in search engine results before I found what I was looking for. When this happens to you, be sure to save the source code or HTML of such a found page to your local hard disk, as well as bookmark or add its URL to your Favorites menu-- since you don't want to have to perform such an exhaustive search for the same item again later!

You could also try the search via other search engines. You'll find a list of such engines on this page.

So what happens if you cannot find the item in the freely available, public portion of the web?

In some cases, such as books, or newspaper and magazine articles, the original may be found in hard copy or other forms (like microfilm) at public libraries, using the related clues presented on this site such as publisher, dates, titles, and authorship. And, of course, sometimes a referenced book may still be in print and available for purchase at retailers, if it cannot be found at the library. If no longer in print, copies might still be found via book sellers specializing in out-of-print materials, as well as on auction sites like eBay.

There's also the fact that some web sites routinely move content elsewhere onsite after a certain period of time, which means although the original link is broken, the content may still be there somewhere if that specific site can be searched. In many cases such searches and content retrieval may be free for readers, entailing little more than a bit of extra effort. In other cases however, the reader may be charged to access such archives. In all cases, the first step to such searches is often to extract the primary domain name from a broken link so that the main or Home page of the site may be reached, rather than a 'Page not found' message which may or may not offer such a search function.

How may you extract the primary domain name from a broken link? There's a couple of ways. First, in many browsers, when no page downloads are currently underway, the reader may move his mouse pointer over the broken link (no click necessary), and the hidden link address, or URL, will appear in the lower border of their web browser window. The leftmost, very first part of this address will usually be the portion needed to get to the Home page or main site of the organization in question. The portion you need will often end with a suffix such as ".com", ".org", or ".net". Here's an example: Let's say Web embraces language translation is the broken link. If you move your mouse pointer over the link you should see the URL "http://www.zdnet.com/zdnn/stories/zdnn_smgraph_display/0,3441,2121254,00.html" appear in the bottom border of your web browser window. Believe it or not many URLs are far messier than this one! Anyway, of this long string of gobbledygook, you only want the "http://www.zdnet.com/" part. Write this down to minimize spelling errors, then carefully type it into your Browser's web site address box near the top of its window (do NOT type in any quote marks), and press your Return key. This should take you to the main page of the site, where a search function of some sort should exist.

Secondly, if you'd rather copy and paste the address, you can do that too-- but you'll have to wade through far more gobbledygook to do it. In most browsers there's the menu option somewhere to view the "source code" or "HTML" of a particular web page you may be viewing. Select this option and you'll get a page filled with stuff most people won't understand. You'll have to locate the broken URL you want from this mess, extract the domain name as described above, and copy/paste it into the same browser web site address box I wrote about above. Your browser may or may not allow you to do a search or find within this HTML page for a piece of the URL, which could help a lot.

Lastly, if all else fails, you might email someone a request for help in the matter. If possible, first try any email contact which might have been listed in the original web reference, such as the author themselves, or a public relations person affiliated with the work. Scientific reports such as from the EurekAlert! service often include email contacts like this. If such specific contacts aren't available, try emailing a general editor of the publication which originally posted the item. Using the same example previously used above, you'd look for an editor's email address at http://www.zdnet.com/, asking for help locating the 'Web embraces language translation'article.

Well, that's it! I hope it helps you find that missing link in your research! -- J.R.

7-3-03 UPDATE: One fact of modern day life that can really give you fits in tracking down an original citation is that today's mainstream internet news reports can be fluid, often subject to on-going changes to respond to additional information received, editorial changes to the 'balance' in the story, and more; and such changes are not always noted for the readers of the medium. So anything and everything about a given online news story-- including the title and byline-- may change between two separate viewings of the item. This practice is called "writethroughs". Keep this in mind in your investigations, for it could render searches for exact quotes, phrases, titles, author/reporter names, or keywords used in a referring article fruitless. If you think you may be caught up in such a 'fluid' reference search, tailor your search to avoid exact terms and names from the original, instead looking for words and phrases which might describe or represent the 'gist' or general meaning or theme of the original article. The substitution of various synonyms for original item words in your search may also help. If the original article content was 'localized' with a particular region or city named, use that too to your advantage in a search. END UPDATE.

-- Slashdot | Online News Stories that Change Behind Your Back by Roblimo; May 09, 2002


The above article(s) come from and make references to a collection copyright © 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003 by J.R. Mooneyham (except where otherwise noted in the text). Text here explicitly authored by J.R. Mooneyham may be freely copied and distributed for non-commercial purposes in paper and electronic form without charge if this copyright paragraph and link to jmooneyham.com or jrmooneyham.com are included.

So who is J.R. Mooneyham, and just what are his qualifications for speculating about the future of government, business, technology, and society?

You can find out by clicking here...(and also send FEEDBACK)



Back to the Table of Contents of the Signposts Timeline

Back to J.R.'s WebFLUX Page (the magazine)

Back to J.R.'s WebWork Page (A hefty catalog of links to almost everything)

Site Map for the WebFLUX and WebWork pages