Blocking bad guys with htaccess

While the combination of a CAPTCHA and the suspicious behavior identification of Bad Behavior proves to be a very effective spam-prevention solution, spam is not the only problem that a webmaster has to worry about. There are several other issues such as bandwidth theft, email harvesters, and hackers.

The .htaccess file lets you give instructions to the server so that it knows how to handle each request. It's a very powerful file, but editing it can be slightly scary because a single typing error can prevent your entire site from loading.

Always back up your .htaccess file

The .htaccess file is one of the most important files on a web site. A single typographical error in that file can bring down your entire web site, and the errors that appear when your site is down may not make it obvious that the .htaccess file is the one causing the problem. Changes to the .htaccess file may also cause your site to behave in unusual ways, without immediately obvious error messages. To save yourself lots of troubleshooting time, always back up your .htaccess file, and be very careful when making changes to it!

Time for action - .htaccess settings to stop bad guys

1. Download your .htaccess file and make a copy of it called htaccess.txt—keep the original as a backup.

2. Open up htaccess.txt.

3. You should see a line that says RewriteEngine On near the top of the file. Read through the file to find where the lines beginning with RewriteCond or RewriteRule end.

4. After those lines, add the following:

RewriteCond %{HTTP_REFERER}

RewriteCond %{HTTP_REFERER} !^http://(*.)?yourdomain.com/.*$ [NC] RewriteRule .*\.(jpe?g|jpg|gif|bmp|png)$ http://www. someimagehostingsite.domain/antihotlink.jpg

5. Also add the following lines:

RewriteCond %{REQUEST_METHOD} HEAD RewriteRule .* - [F]

6. If you would like to block email harvesters, add the following:

RewriteCond RewriteCond RewriteCond RewriteCond RewriteCond RewriteCond RewriteRule

%{HTTP_USER_AGENT}

%{http_user_agent} %{http_user_agent} %{http_user_agent} %{http_user_agent} %{http_user_agent}

CherryPickerSE [OR] CherryPickerElite [OR] EmailCollector [OR] EmailSiphon [OR] EmailWolf [OR] ExtractorPro

7. If you would like to add some rules to stop spam bots and referrer spam, take a look at the .htaccess file created by AaronLogan.com

(http://www.aaronlogan.com/downloads/htaccess. php)—the list of URLs and IP addresses is slightly old, but it is a good starting point.

8. Upload the file you have created and rename it back to .htaccess. Then check that everything on the site works as normal. If you experience problems, restore the old file while you double check the changes you made.

What just happened?

The .htaccess file tells the server how it should respond to requests for web pages.

In step three, we blocked image hotlinking. This stops people from being able to include an image hosted on our site in one of their web pages. The main reason for preventing people from doing this is that if an image hosted on our server gets posted to a popular site or, for that matter, several people take images and post them on less popular sites, that would cost us a lot of bandwidth.

Image hotlinking is a very bad etiquette. Some webmasters choose to redirect stolen images to a nasty or repulsive image with a message saying "Stop stealing bandwidth". Other options include simply serving up a 1x1 pixel gif, or telling the server to refuse the request. I recommend one of the latter approaches.

In step five we block HEAD requests, which are used by scanners that do not want to fetch the whole page. While there are a number of legitimate reasons for this, in most cases such applications are used by hackers wanting to scan for sites with certain vulnerabilities. They are also used by some denial of service tools. Blocking HEAD requests should not affect most legitimate visitors.

In the final two steps we blocked requests from suspicious user agents.

The User Agent is the name that a web browser, spam bot, or harvester sends to the server so that the server knows what it is talking to. Some programmers have made bots that can lie about their identity, but many bots do identify themselves correctly, so it is possible to ban them by name.

You may be wondering why you would want to use .htaccess files for this purpose when Bad Behavior already stops a lot of these attacks. Following are two main reasons:

♦ Firstly, an extra layer of protection is very useful. You may see some spam attacks or hacking attempts coming in from an address that Bad Behavior does not block. You can respond to this and update your .htaccess file to block the attacker as soon as you notice it, while you may have to wait a while for Bad Behavior to catch up.

♦ Secondly, the addresses you block in the .htaccess file won't even get to the stage of loading your site, which will mean that they take up less processor time, keeping the site nice and fast for legitimate visitors.

Have a go hero - build your own list

Now that you have an idea of what a .htaccess file could contain, why not try building your own list? If your web host offers AWStats, Webalizer, or any other stats tracking software, take a look at the logs. If you see something that doesn't look right, build a rule that will block it. Referrer spam, for example, is easy to spot—after all, why is an online pharmacy linking to a blog network for Vampire Slayers? I don't think they're selling treatments for neck wounds!

It's worth trying to keep your .htaccess file reasonably small. For example, rather than blocking lots of pharmacy sites individually, consider the following rule:

RewriteCond %{HTTP_REFERER} *(http://)?(www\.)?.*(-|.)pharmacy(-|.).*$ [NC,OR]

Which would block all referrers that contain the word "pharmacy".

Be careful with blocking IP addresses or IP ranges. If that IP address is dynamically assigned, you may end up blocking a legitimate visitor by accident. If you find that you are getting a lot of spam or hacking attempts from a certain IP range, block it for a short time, but leave a comment (lines that are comments start with a # ) noting why it is blocked along with the date; remove that block after a week or two and see if the attempts have stopped.

Once again, I would recommend you take a backup of your existing .htaccess file before making any changes to it.

Pop quiz - spam blocking

1. A spam blog is called a:

d) Splog

2. http:BL prevents spam by:

a) Blocking all http requests because only spammers use http.

b) Using your site as a honey pot to prevent spam.

c) Reading the contents of new comments and blocking ones that contain links.

d) Noting the IP address of the user and blocking it if it is on a blacklist of known spammers, hackers, and proxies.

a) A method of accessing a server.

b) A file containing configuration information for your Apache server, including how to respond to certain requests.

c) A file containing lots of email addresses—used as a honey pot to bait spammers into giving themselves away.

d) A Linux shell command that can be used to access a web site. Answers: (1) d, (2) d, (3) b

More ways to secure your server

Server management is beyond the scope of this book. However, if you are running your own server or even a VPS, then learning about Apache security is a good idea. Some useful modules include dosevasive, fail2ban, and mod_security. You can read more about mod_security at http://www.howtoforge.com/apache_mod_security.

Was this article helpful?

0 -1

Post a comment