learning, thinking, blogging.
.htaccess Hacking for Fun and Profit
The Apache server provides directory-level configuration via .htaccess files. This file can override Apache default configuration and change it for the local directory.
If you are not a lazy blogger, you may be intrested in some tips I recently discovered to optimize your .htaccess file in order to have better search engine position, avoid spam comments and protect your content.
Redirection
search engines see http://www.mapelli.info and http://mapelli.info as two different sites… this is bad for two reasons:
- search engines penalize sites with duplicated content, removing some (if not all) the duplicated pages
- some sites will link you as www.yoursite.com and other as yoursite.com, this is bad because your pagerank and your link popularity will be halved
to avoid this, you can simply redirects all the request from http://www.yoursite.com to http://yoursite.com or viceversa adding some directives to your webroot .htaccess file.
Use the following code:
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.domain.com [NC]
RewriteRule ^(.*)$ http://domain.com/$1 [R=301]Explanation:
RewriteEngine Onactivate the rewrite engine (that is, the ability to change the requested url to something else)
RewriteCond %{HTTP_HOST} !^www.domain.com$ [NC]this say that the rewrite action (specified in the RewriteRule line) should be applied if the file requested does not (that’s the !) start with www.domain.com (that’s the ^). The [NC] says to check in case insensitive mode
RewriteRule ^(.*)$ http://domain.com/$1 [L,R=301]this says that each request matching the RewriteCond should be rewritten as follow: put the string that starts with (that’s the ^) any string (that’s the .*) and then finish (that’s the $) in the first variable (called $1), then rewrite as http://domain.com/$1 and to redirect using 301-Moved permanently (that’s the R=301) and to stop applying rules from .htaccess (L for Last). This means a request directed to http://www.domain.com/foo will be redirected to http://domain.com/foo
Spam Blocking
wp-comments-post.php protection
In wordpress when a user posts a comment the file wp-comments-post.php is accessed.
The normal user post the comment from one of your blog’s page, sending an inside referral (i.e. the page that took the user to wp-comments-post).
A spammer access directly the wp-comments-post.php file, having no referral or an outside (not from your domain) referral. You can use this difference to block spam comments via .htaccess. If you don’t use wordpress you have to change the file name to the one that fits for you, but the tecnique can still be used.
Here’s the code
RewriteEngine On
RewriteCond %{REQUEST_METHOD} POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [OR]
RewriteCond %{HTTP_USER_AGENT} ^$
RewriteRule (.*) ^http://%{REMOTE_ADDR}/$ [R=301,L]Here’s the explanation:
RewriteEngine Onactivate the rewrite engine
RewriteCond %{REQUEST_METHOD} POSTif the request method is POST
RewriteCond %{REQUEST_URI} .wp-comments-post\.php*if the request uri (the page requested) is [single character or nothing]wp-comments-post.php[anything]
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [OR]if the referrer is not in your domain or (the [OR] flag does an or with the next rule)
RewriteCond %{HTTP_USER_AGENT} ^$if the user agent is empty
RewriteRule (.*) ^http://www.somsite.com/$ [R=301,L]redirect to somesite.com
Tor servers blocking
The tor network is a nobile thing… but it’s often used by spammers to run spambots.
I would not recommend this, but if you really need to, you can block the entire tor proxies network using the tor blacklist (just copy the content of the file into your .htaccess file)
Protection
Ip banning
say you want to block a spammer that use always the same ip…
deny from 192.168.0.1this is gonna deny access to 192.168.0.1 . Note that you can use also 192.168.0.* to ban an entire class of addresses, or 192.168.0.1/20 to ban a subnet using subnet mask.
Deny .htaccess access
this can be used to prevent .htaccess file access
<Files .htaccess>
order allow,deny
deny from all
</Files>
This way all requests to .htaccess file will return a 403 error code (Access denied).
Stop hotlinking
If you don’t want other sites to link directly to your images on your server, you can redirect the png/jpg request to a particular image (saying something like “this site is trying to steal my images”) with code like this:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain\.com/.*$ [NC]
RewriteRule .*\.(jpg|png)$ http://www.yourdomain.com/thief.jpg [R,NC,L]
Here’s the explanation
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain\.com/.*$ [NC]
this says that this rule should be applied if the referrer does not start with http://www.yourdomain.com or http://yourdomain.com (case insensitive)
RewriteRule .*\.(jpg|png)$ http://www.yourdomain.com/thief.jpg [R,NC,L]
this says that requests ending with .jpg or .png (not case sensitive) should be redirected to yourdomain.com/thief.jpg and that this will be the last rule to be applied (the L flag).
Resources
some useful resources
- apache mod_rewrite documentation
- apache .htaccess tutorial
- regular expression tutorial
- regular expression online tool
- wikipedia .htaccess page
| Print article | This entry was posted by francesco mapelli on 2006/12/07 at 4:56 pm, and is filed under Uncategorized. Follow any responses to this post through RSS 2.0. You can leave a response or trackback from your own site. |
about 6 years ago
Your “Spam Blocking” hint worked beautifully; I went down from a thousand spammy post attempts a day down to zero now that the junk is blocked at Apache level.
There is one slight omission on the first code block, however (the complete code excerpt before the explanation.)
This line:
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.*
Should have OR at the end:
RewriteCond %{HTTP_REFERER} !.*yourdomain.com.* [OR]
about 6 years ago
Thanks bfd, I fixed it.
I’m glad you find this hints useful :)
about 6 years ago
Actually you can block some visitors with firewalls that clear referrer field or change it(like it is possible in Outpost) from commenting on your site.
about 6 years ago
Yes, that’s an issue one should consider… but looking at stats they’re not so frequent. :)
One should deny access only if he really needs to, as most situation can be handled with spam filters.
about 6 years ago
Some fixes to your awesome article! from: http://www.askapache.com/2006/htaccess/htaccesselite-ultimate-htaccess-article.html
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www.domain.com [NC]
RewriteRule ^(.*)$ http://domain.com/$1 [R=301]
change to
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_HOST} !^www\.domain\.com$ [NC]
RewriteRule ^(.*)$ http://www.domain.com/$1 [R=301,L]
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain\.com/.*$ [NC]
RewriteRule .*\.(jpg|png)$ http://www.yourdomain.com/thief.jpg [R,NC,L]
Change to
RewriteEngine On
RewriteBase /
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain\.com/.*$ [NC]
RewriteRule \.(gif|jpg|swf|flv|png)$ http://www.yourdomain.com/legal.gif [R=302,L]
about 6 years ago
Hi! Very nice site! Thanks you very much! itZBTcnhie1
about 6 years ago
Really Usable tips.. ^_*
about 6 years ago
I am going to implement the spam blocking technique as soon as a find a chance.
Wonderful writeup. Thanks
about 5 years ago
Your redirection tip is a very good one that definately is worth putting in, if however you do not have access to the server or do not feel like fiddling with .htaccess files then you can solve the search engine issue for the biggest one (Google) in an alternative way as well.
All you need to do is claim the url as yours via your personal Google account, via “Webmaster tools” under “My services”. Once you have added your site and verified it then you can click on “Manage http://www.yoururl.com” and then in the sidebar click on “Preffered domain”. Now you van select if you want to have http://www.yoururl.com as your domain or just yoururl.com.
There are more interesting tools to use as well so it is definately worth to claim your site and make use of the tools.
about 5 years ago
That’s cool, added with using Akismet and Spam Karma, i dont think there will be much, if any spam left. But then again, you could always add a ‘whats the sum of x + y’, and hopefully your visitors can count or use a calculator. Cheers! Sameer from Canada
about 5 years ago
This is really great! I’m busy building my first website, and this was the sort of stuff I have been looking for everywhere!
Thanks Guys!
about 5 years ago
Thats cool… I will write soon about URL routing using mod_rewrite….
Thanks
about 5 years ago
Just want to add for Donovan van der Roest, “Preffered domain” can be found on Manage>Tools>Set Preffered Domain.
about 5 years ago
To make sure i understand the section above, in the “Deny .htaccess access” section… are you saying that copying and pasting the following into a blank htaccess file will lock the file and prevent anyone from changing it other than via ftp?
order allow,deny
deny from all
about 5 years ago
Good article. Very useful for my forum-communities
Thank you
about 4 years ago
Great tips! I think that I’ll place the anti-hotlinking code in asap!
about 4 years ago
Never seen a htacesses anti-hotlinking technique used before – good stuff, gonna try this out
about 4 years ago
They are really great tips. I knew all these, but never knew how to achieve these. That’s a good tutorial… Thanks :)
Stumbled :)
about 4 years ago
I use all of these rules on my sites. They’re quite simple when you’ve done them a few times.
about 4 years ago
Good post. Thanx for advise :)
about 4 years ago
Thanks for the good post. I have been trying to combine part of the case-insensitive matching with bits of the anti-hotlinking code in order to redirect requests for JPG and GIF to the same in lowercase. I recently batch processed a large number of images, and the output files all had a lower-case .jpg and .gif extensions, even though some of the original images were upper or mixed case extensions (.JPG, .Jpg, .GIF, .Gif). Now, when the old links point to the URL containing an upper or mixed case extension, I get a 404 not found error. It would be nice to use .htaccess as a quick fix to resolve this. Any ideas?
about 4 years ago
http://www.Sexy-Drive.totalh.com
about 4 years ago
change your lifestyle ..Adults only
about 4 years ago
The flu extends very quickly, do not forget about the relatives. Protect them and
about 3 years ago
Discount easy to combination of detrol la homeopathic alternative to longer taking zyrtec cached
about 2 years ago
Thanks for this. This will help me lot. I was finding this from many days. Now this will fix my problems.
about 2 years ago
You definitely want to 301 your website to one common URL using htacess, interesting the other codes also :D
Till then,
Jean
about 3 months ago
After looking at a number of the articles on your blog, I honestly
like your way of blogging. I saved as a favorite it to my bookmark site
list and will be checking back soon. Take a look at my web site too and tell me how you feel.
about 1 month ago
I blog often and I truly thank you for your content.
This great article has really peaked my interest.
I’m going to bookmark your blog and keep checking for new information about once per week. I subscribed to your RSS feed as well.
about 2 weeks ago
Very nice post. I simply stumbled upon your blog and wanted
to say that I’ve truly loved surfing around your weblog posts. After all I will be subscribing to your rss feed and I hope you write once more very soon!