The internet can be an awful place sometimes. Let’s say you created a website or blog with WordPress that recently became popular and are now getting a ton of traffic. If it hasn’t happened already, you’re going to get a lot of bad traffic mixed in. By bad traffic I mean bots and crawlers that probably don’t benefit your WordPress site because it isn’t real human traffic or well established search engine crawlers.
There are a few good ways to blacklist these bad and potentially malicious bots and crawlers from even accessing your WordPress site. We’re going to take a look at how to do this through the Apache .htaccess file.
You might be asking yourself, “well why should I even care”? The simple answer is that you should care because any traffic, good or bad, will put stress on your web server or database. I recently published an article of my WordPress statistics for my first year blogging. In it I explain how I had to upgrade my server due to excessive traffic. Had I chosen not to blacklist various bots and crawlers, I would have had to upgrade my server at least one or two more times, putting more stress on my wallet.
There are many ways to block traffic to your WordPress site. I’m going to demonstrate the .htaccess way because it is the least stressful option for your server beyond a hardware or operating system level firewall rule.
Going forward, I’m going to assume you have access to your WordPress .htaccess file. If you cannot edit this file, the rest of this tutorial will not help you.
Open your .htaccess file, whether it be through a text editor on your WordPress file system or through the WordPress dashboard, and include the following lines at the bottom:
# BEGIN User Agent Blacklist
SetEnvIfNoCase User-Agent (\<|\>|\'|\$x0|\%0A|\%0D|\%27|\%3C|\%3E|\%00|\+select|\+union|\<) keep_out
SetEnvIfNoCase User-Agent (binlar|casper|checkprivacy|cmsworldmap|comodo|curious|diavol|doco) keep_out
SetEnvIfNoCase User-Agent (dotbot|feedfinder|flicky|ia_archiver|jakarta|kmccrew|httrack|nutch) keep_out
SetEnvIfNoCase User-Agent (planetwork|purebot|pycurl|skygrid|sucker|turnit|vikspid|zmeu|zune) keep_out
<limit GET POST PUT>
Allow from all
Deny from env=keep_out
# END User Agent Blacklist
I cannot take credit for what you see above. I actually took it from the Perishable Press website which releases an updated blacklist every so often. Also note that the Perishable Press has many different blacklists and that I’ve chosen a smaller one because I don’t want to accidentally block positive traffic.
The code above requires the setenvif module to be enabled for your Apache server. If it is not enabled already, you can typically enable it with the following:
sudo a2enmod setenvif
Restart your Apache server and make sure you can still access your WordPress site.
The blacklist I chose to implement works based on the user agent of the system accessing your site. There are plenty of other ways to block access through the .htaccess file, for example, by IP addresses. It is up to you how deep you want to get into it.
I had other readers of my blog recommend plugins such as WordFence and Stop Spammers. I’ve not personally tried them so I can’t say the good or bad. Be careful when using plugins to block traffic though. Your server is only protected if you’re blocking access before WordPress starts serving content. Many plugins out there will block requests only after the request has first been processed by the WordPress plugin. The .htaccess file does the blocking before it hits WordPress.
As your WordPress site grows, so will the negative traffic to it. To reduce costs and server crashes, you can create a blacklist in your Apache .htaccess file to reject traffic from user agents that are known to be bad or malicious. Keeping up with the Perishable Press bot blacklist will keep your WordPress running smooth as time progresses.
If you have other ways to deter bad traffic, share your experiences in the comment section. My way by far won’t be the best way, but it has accomplished my needs so far.