Bad Links
"I won't complain. I just won't come back." - Brown & Williamson advertisement

This site has been through several incarnations. For the earliest, Alan Francis guest-hosted it off his old site (twelve71). After getting a domain name, I hosted it for a time on the ThoughtWorks gigaton server (a sorta catch-all server with free space for employees). That was great (read: free), but the fact that it got used for all kinds of things made it less reliable than I wanted, so I went hunting for a permanent home. Again, Alan was there to help; this time in the form of recommending Bytemark. I've been there ever since. Great service, great price. Just the right balance of control and reliability that I was looking for.

With all that moving around and rebuilding, some links have rotted away, and there are some links out there that don't go where they should. I wanted to fix that. So I tackled the first one this week: Links to url's that point to Alan's old site. IOW, Once again, I looked to Alan for help.

Here's an example of the problem:

Keith Ray links to an early posting here. That link points to: http://www.twelve71.com/caputo/archive/000236.html. From the context of Keith's posting, its clear that link is meant for my take on exceptions (a rebuttal of an article from Joel Spolsky). The real page is actually named archives/000009.html. What to do?

There are two challenges: Getting traffic going to the folder on Alan's machine to come to mine, and redirecting archive links into their new location. Alan dealt with the former, and I the latter. Here's how we did it:

Alan was kind enough to add a permanent redirect from the folder to my site. He accomplished this easily by simply adding the following to his httpd.conf (Apache of course):

Redirect permanent /caputo http://www.williamcaputo.com #requests for /caputo folder redirected to site

If the directory/file structure on my site was the same, this would just work, but because I've changed some things since then, I had more work to do. The biggest difference is that my permanent links used to be under a directory called '/archive/' but now are located in '/archives/' -- so I added that folder. Next, the articles under the old folder had different names than they do now -- the version of MovableType that I use, annoyingly uses the index number of the database entry for the file name. This has changed in newer versions. Since I didn't want to copy the files to both, and the numbers have changed, I decided to handle this at the OS level via symlinks. However, symlinks (by default are not followed by Apache, so I had a change to make to httpd.conf too:

 <Directory /path/to/wwwroot/archive/>
        Options FollowSymLinks
</Directory>

This allows symlinks only in the archive folder. Now I can create a symlink in archive/000236.html that redirects to archives/000009.html. The only downside is that they actually show up as two different pages in my traffic stats, but I consider that an advantage: it will be easier for me to track how much traffic actually goes through the older "sites" (and possibly notify other websites to update links).

With that one out of the way, I still need to detect and handle dead links (i.e. 404 errors). I need to investigate log readers for that purpose (time is at a premium, suggestions welcome). I also need to deal with incorrect links (i.e. links that point to real pages on my site, but not the one's they are supposed to). Some of these are easier than others (more on those as I fix them).

This might seem like a lot of work. Granted, I could simply say: "That's the problem of those who linked to me -- they should keep links up to date" but that doesn't really help those trying to find their way here -- nor does it help me get more readers. In the end, that's just bad customer service... not something I believe in.

As a bonus, I got to learn a bit more about Apache. I just hope things don't change more -- keeping this page's working would then be quite a challenge.