Sunday, November 28, 2010

Redirect All (Broken) Links from any Domain via HTAccess

Here’s the scene: you have been noticing a large number of 404 requests coming from a particular domain. You check it out and realize that the domain in question has a number of misdirected links to your site. The links may resemble legitimate URLs, but because of typographical errors, markup errors, or outdated references, they are broken, leading to nowhere on your site and producing a nice 404 error for every request. Ugh. Or, another painful scenario would be a single broken link on a highly popular site. For example, you may have one of your best posts mentioned in the SitePoint forums, but the person leaving the link completely botched the job:
(Read it here: http://domain.tld/path/popular-post/)
Ugh. Thanks for the hundred-thousand 404 errors, moron.
Fortunately, fixing either of these scenarios is relatively easy using a little HTAccess magic. All you need is an Apache-powered server with the powerful mod_rewrite module installed, and of course the ability to edit either your server configuration file or the root HTAccess file for your domain. Once you have that, here’s the code you need to redirect those pathetic broken links to the target of your choice:
# REDIRECT BROKEN LINKS FROM SPECIFIC DOMAIN
<IfModule mod_rewrite.c>
 RewriteCond %{REQUEST_FILENAME} .*
 RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)?problem-domain\. [NC]
 RewriteRule (.*) http://redirected-domain.tld/target.html [R=301,L]
 # RewriteRule (.*) - [F,L]
</IfModule>
Place this code into your root HTAccess file and edit the “problem-domain” and “http://redirected-domain.tld/target.html” with the problem domain and the redirect resource, respectively. The redirect resource may be anything — a web page, your home page, a different domain, a script, whatever. To deliver a “403 Forbidden” error for all such requests (instead of redirecting to an alternate resource), comment out the penultimate line and uncomment the last line. No other editing is required. Upload to your server and verify the results by clicking on one (or more) of the broken links on the problem domain. For the record, this method functions as follows:
  1. check for the required Apache module
  2. apply the rewrite to all file requests
  3. apply the rewrite to all requests from the problem domain
  4. perform the rewrite by redirecting all request to specified resource
  5. alternate rewrite rule for delivering a forbidden error message
  6. close the module-check container
I use this method all the time. Here is a working example that I recently removed because the site was taken offline:
# REDIRECT BROKEN LINKS FROM spinfeed.com
<IfModule mod_rewrite.c>
 RewriteCond %{REQUEST_FILENAME} .*
 RewriteCond %{HTTP_REFERER} ^https?://([^.]+\.)?spinfeed\. [NC]
 RewriteRule (.*) http://perishablepress.com/note.txt [R=301,L]
 # RewriteRule (.*) - [F,L]
</IfModule>
Apparently, the now defunct site, spinfeed.com had installed one of my free WordPress themesbut didn’t bother editing the required code. My mistake was using URL examples from my own domain. Long story short, I was seeing frequent 404 errors resulting from the spinfeed.com site, and I wanted to resolve the issue. After several unsuccessful attempts at contacting the webmaster of the site, I eventually turned to HTAccess to deliver the message that dude needed to fix his links. So I uploaded the previous example and then created a file called “note.txt
with the following message:
Hello!
The Webmaster at spinfeed.com needs to update the current WordPress theme in order to resolve this issue.
Please contact me via http://perishablepress.com/press/contact/ for further assistance.
Thanks and regards,
Jeff
This seemed like a good idea, although I never did hear anything from the spinfeed webmaster. Oh well, the site no longer exists, so the problem was solved nonetheless. :)
And that’s a wrap for this post. Keep in mind that the method described in this article will redirect all URL requests from a specific domain, not just the 404 requests. To redirect only a few broken links instead of everything, use Apache’s excellent Redirect directive instead:
Redirect 301 /blog/old-post-01/ http://domain.tld/new-post-01/
Redirect 301 /blog/old-post-02/ http://domain.tld/new-post-02/
Redirect 301 /blog/old-post-03/ http://domain.tld/new-post-03/
That will suit you much better when dealing with only a handful of broken links. Otherwise, if the problems are severe, just nuke ‘em with teh heavy stuff! :)

No comments:

Post a Comment