How do we fix the blogrot problem?

I’m concerned about linkrot in blogs. Blog entries tend to mention interesting stuff by hyperlinking to news articles, websites, and other blogs. Since it’s so easy to create a link to something (rather than excepting a relevant paragraph), you’re pretty much guaranteed to get a 404 when you try to visit that link at a later date. This might not be such an obvious problem on the surface because blogs are an ephemeral, fresh and up-to-the-minute medium, and linkrot usually takes a few weeks or months to set in.

But blogs are also supposed to serve as a diary or journal, so you should be able to go back 6 months from now and revisit all of the cool stuff you used to think about. That’s when linkrot is going to burn you the worst, because you’ll want to re-read an article or another person’s blog, and most likely it won’t be there anymore.

I think it would be cool if MovableType or some other popular blogging software could provide a PermaLink feature for external content. I’m thinking of something like the Google cache, which would mirror the content locally and add a header that would say something like:

This is Michael J. Radwin’s blog’s cache of http://www.newsfactor.com/perl/story/19912.html. This cache is the snapshot that I took of the page as I wrote my blog.

The page may have changed since that time. Click here for the current page.

It would work even better if there was some clever integration with your browser that says to visit this HREF first, but if you get a 404, try this alternative HREF (which happens to point to a snapshot of the page in your blog archives). I’m sure XHTML has something like this when you go beyond xlink:type=”simple” but I doubt browsers do anything intelligent with it.

Heck, even blogs themselves are prone to linkrot. I recently decided to switch my MoveableType settings to use Date-based archives instead of Individual Item archives because I rarely write more than one blog per day. Clicking on that convenient “Rebuild Site” button caused everything to get rebuilt. But what if someone had already linked one of my old Individual archives that’s no longer there? Apparently that PermaLink feature is not so “perma”.

I’m encouraged that people are working on solving the linkrot problem in a generalized way but not everyone is going to care to do it right.

2 thoughts on “How do we fix the blogrot problem?”

  1. True. I think I manually removed archive/*.html and the rebuilt. So I guess MoveableType isn’t to blame.

    But other blog systems are less resilient to such changes. I was reading

    http://www.microcontentnews.com/articles/googleblogs.htm

    and I tried to click on the “Critical IP sucks” link:

    http://a.wholelottanothing.org/archived.blah/2/01/2002/#795

    But I got a 404. After visiting the site’s top page and navigating around, it turns out that the correct URL is this one:

    http://a.wholelottanothing.org/archived.blah/02/01/02#795

    Kinda frustrating. It’s these sort of changes that sneak up on you. I’m sure Matthew Haughey didn’t even think about the fact that external links would break when he changed his blog archive format.

Comments are closed.