Correct way to block bad bots in httpd.conf?

kokopelli

Member
Jan 5, 2005
6
0
151
I want to block some bad bots and hosts via httpd.conf and have tried adding the following to httpd.conf via WHM > Apache Configuration >Include Editor (first by adding it to "Pre Main Include", then to "Pre VirtualHost Include", then to "Post VirtualHost Include"), to no avail. [After each case, I restarted Apache]
Code:
# START BLOCK BAD BOTS

SetEnvIfNoCase User-Agent "^BaiDuSpider" UnwantedRobot
SetEnvIfNoCase User-Agent "^Exabot" UnwantedRobot
SetEnvIfNoCase User-Agent "^HTTrack" UnwantedRobot
SetEnvIfNoCase Host "^clearpath.in" UnwantedRobot

<Directory />
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>

# END BLOCK BAD BOTS
The test I ran with HTTrack didn't work, it still got through. What I am doing wrong? Any help will be appreciated.

BTW I also tried the following variations of the <Directory /> directive:

Code:
<Directory "/home/">
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>
<Directory "/var/www/">
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>
 

newtoallthis

Member
Dec 1, 2008
22
0
51
I want to block some bad bots and hosts via httpd.conf and have tried adding the following to httpd.conf via WHM > Apache Configuration >Include Editor (first by adding it to "Pre Main Include", then to "Pre VirtualHost Include", then to "Post VirtualHost Include"), to no avail. [After each case, I restarted Apache]
Code:
# START BLOCK BAD BOTS

SetEnvIfNoCase User-Agent "^BaiDuSpider" UnwantedRobot
SetEnvIfNoCase User-Agent "^Exabot" UnwantedRobot
SetEnvIfNoCase User-Agent "^HTTrack" UnwantedRobot
SetEnvIfNoCase Host "^clearpath.in" UnwantedRobot

<Directory />
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>

# END BLOCK BAD BOTS
The test I ran with HTTrack didn't work, it still got through. What I am doing wrong? Any help will be appreciated.

BTW I also tried the following variations of the <Directory /> directive:

Code:
<Directory "/home/">
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>
<Directory "/var/www/">
    Order Allow,Deny
    Allow from all
    Deny from env=UnwantedRobot
</Directory>
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^aipbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Anarchie [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:[email protected] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^iblog [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Linkwalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^nameprotect [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^searchestate [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Xenu [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^curl/ [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^HTMLParser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Jakarta\ Commons [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^libcurl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^LWP::Simple [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp-request [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ Data\ Access [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^MS\ Web\ Services\ Client\ Protocol [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PECL::HTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^POE-Component-Client-HTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^PycURL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Snoopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VB\ Project [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^WWW::Mechanize [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RPT-HTTPClient [NC]
RewriteRule .* Cannot find the damn server [R,L]
 

newtoallthis

Member
Dec 1, 2008
22
0
51
btw since you are running cpanel why are you not using an include in the directory specified as the include path in your httpd.conf? i always found this way easier than manually editing via whm.
 

kokopelli

Member
Jan 5, 2005
6
0
151
Thanks for the reply, but I was under the impression that using SetEnvIfNoCase instead of mod_rewrite would be more efficient?

btw since you are running cpanel why are you not using an include in the directory specified as the include path in your httpd.conf? i always found this way easier than manually editing via whm.
Not sure how to do that ... please enlighten me. :)

If I add it via WHM, which of these in WHM > Apache Configuration >Include Editor would be the correct place to include the directives: "Pre Main Include", "Pre VirtualHost Include", or "Post VirtualHost Include"?

Thanks for your help!
 

newtoallthis

Member
Dec 1, 2008
22
0
51
Thanks for the reply, but I was under the impression that using SetEnvIfNoCase instead of mod_rewrite would be more efficient?


Not sure how to do that ... please enlighten me. :)

If I add it via WHM, which of these in WHM > Apache Configuration >Include Editor would be the correct place to include the directives: "Pre Main Include", "Pre VirtualHost Include", or "Post VirtualHost Include"?

Thanks for your help!
you may be right i only share what works for me. I just tell you what i did after trying the different routes including yours and finding for me this was by far the easiest and allows you to see exactly where everything ends up.
Open /usr/local/apache/conf/httpd.conf
read the line under each virtual host where it says where that virtual host include should be. That would be:
/usr/local/apache/conf/userdata/std/2/(username)/(domain name)/*.conf
In that file just add all your code.
Then to make sure the files are included the first time only run /scripts/rebuildhttpdconf. Then for each update to the include file its
/scripts/ensure_vhost_includes --all-users
but test first with /scripts/verify_vhost_includes --show-test-output
 

jols

Well-Known Member
Mar 13, 2004
1,107
3
168
Thanks, but in the end I opted for the mod_security route, which is working like a charm.
Hi, would you care to share? I am having a tough time finding a mod_security rule to block Baiduspider. Currently nothing seems to work. This is what I have tried so far:

SecRule HTTP_User-Agent "Baiduspider"
SecRule HTTP_User-Agent "Baiduspider.*"
SecRule HTTP_User-Agent "^Baiduspider*"
SecRule REQUEST_HEADERS:User-Agent "$Baiduspider*"

Thanks for any help with this.
 

kokopelli

Member
Jan 5, 2005
6
0
151
I used the example from http://www.puntapirata.com/ModSec-Rules.php:

  1. Create this mod_security rule as one of you first rules:
    Code:
    # block bat bots - see http://puntapirata.com/On-House-ModSec-Rules.php
    SecRule REQUEST_HEADERS:User-Agent "@pmFromFile PuntaPirata-blackbots.txt" "id:980001,rev:1,severity:2,log,msg:'PuntaPirata Bot Rule: Black Bot detected.'"
  2. Create a txt file called "PuntaPirata-blackbots.txt", and enter bad bot user agents on new lines. See the Zipped example at the above URL.
  3. Upload PuntaPirata-blackbots.txt to the same location as Mod Security (in my case: root: /usr/local/apache/conf)
  4. Restart Apache.

Hope that helps.
 
Last edited: