The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Correct way to block bad bots in httpd.conf?

Discussion in 'Security' started by kokopelli, Sep 29, 2011.

  1. kokopelli

    kokopelli Member

    Joined:
    Jan 5, 2005
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    I want to block some bad bots and hosts via httpd.conf and have tried adding the following to httpd.conf via WHM > Apache Configuration >Include Editor (first by adding it to "Pre Main Include", then to "Pre VirtualHost Include", then to "Post VirtualHost Include"), to no avail. [After each case, I restarted Apache]
    Code:
    # START BLOCK BAD BOTS
    
    SetEnvIfNoCase User-Agent "^BaiDuSpider" UnwantedRobot
    SetEnvIfNoCase User-Agent "^Exabot" UnwantedRobot
    SetEnvIfNoCase User-Agent "^HTTrack" UnwantedRobot
    SetEnvIfNoCase Host "^clearpath.in" UnwantedRobot
    
    <Directory />
        Order Allow,Deny
        Allow from all
        Deny from env=UnwantedRobot
    </Directory>
    
    # END BLOCK BAD BOTS
    The test I ran with HTTrack didn't work, it still got through. What I am doing wrong? Any help will be appreciated.

    BTW I also tried the following variations of the <Directory /> directive:

    Code:
    <Directory "/home/">
        Order Allow,Deny
        Allow from all
        Deny from env=UnwantedRobot
    </Directory>
    <Directory "/var/www/">
        Order Allow,Deny
        Allow from all
        Deny from env=UnwantedRobot
    </Directory>
    
     
  2. newtoallthis

    newtoallthis Member

    Joined:
    Dec 1, 2008
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    1
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} ^aipbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Anarchie [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ASPSeek [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^attach [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^autoemailspider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Custo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^DISCo [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^eCatch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^FlashGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetRight [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GetWeb! [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^GrabNet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Grafula [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HMView [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HTTrack [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^iblog [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Indy\ Library [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^InterGET [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JetCar [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^larbin [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Linkwalker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^nameprotect [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Navroad [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NearSite [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetAnts [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^NetZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Octopus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pavuk [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^RealDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^ReGet [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^searchestate [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Surfbot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^TurnitinBot [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebAuto [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebCopier [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebFetch [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebGo\ IS [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebLeacher [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebReaper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebSauger [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Website\ Quester [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebStripper [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WebZIP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Wget [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Widow [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWWOFFLE [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Xenu [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Zeus [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^curl/ [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^HTMLParser [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Jakarta\ Commons [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Java [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^libcurl [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^LWP::Simple [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^lwp-request [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ Data\ Access [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Microsoft\ URL\ Control [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^MS\ Web\ Services\ Client\ Protocol [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PECL::HTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^POE-Component-Client-HTTP [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^PycURL [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^Snoopy [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^VB\ Project [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} ^WWW::Mechanize [NC,OR]
    RewriteCond %{HTTP_USER_AGENT} RPT-HTTPClient [NC]
    RewriteRule .* Cannot find the damn server [R,L]
     
  3. newtoallthis

    newtoallthis Member

    Joined:
    Dec 1, 2008
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    1
    btw since you are running cpanel why are you not using an include in the directory specified as the include path in your httpd.conf? i always found this way easier than manually editing via whm.
     
  4. kokopelli

    kokopelli Member

    Joined:
    Jan 5, 2005
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    Thanks for the reply, but I was under the impression that using SetEnvIfNoCase instead of mod_rewrite would be more efficient?

    Not sure how to do that ... please enlighten me. :)

    If I add it via WHM, which of these in WHM > Apache Configuration >Include Editor would be the correct place to include the directives: "Pre Main Include", "Pre VirtualHost Include", or "Post VirtualHost Include"?

    Thanks for your help!
     
  5. newtoallthis

    newtoallthis Member

    Joined:
    Dec 1, 2008
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    1
    you may be right i only share what works for me. I just tell you what i did after trying the different routes including yours and finding for me this was by far the easiest and allows you to see exactly where everything ends up.
    Open /usr/local/apache/conf/httpd.conf
    read the line under each virtual host where it says where that virtual host include should be. That would be:
    /usr/local/apache/conf/userdata/std/2/(username)/(domain name)/*.conf
    In that file just add all your code.
    Then to make sure the files are included the first time only run /scripts/rebuildhttpdconf. Then for each update to the include file its
    /scripts/ensure_vhost_includes --all-users
    but test first with /scripts/verify_vhost_includes --show-test-output
     
  6. kokopelli

    kokopelli Member

    Joined:
    Jan 5, 2005
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    Thanks, but in the end I opted for the mod_security route, which is working like a charm.
     
  7. jols

    jols Well-Known Member

    Joined:
    Mar 13, 2004
    Messages:
    1,111
    Likes Received:
    2
    Trophy Points:
    38
    Hi, would you care to share? I am having a tough time finding a mod_security rule to block Baiduspider. Currently nothing seems to work. This is what I have tried so far:

    SecRule HTTP_User-Agent "Baiduspider"
    SecRule HTTP_User-Agent "Baiduspider.*"
    SecRule HTTP_User-Agent "^Baiduspider*"
    SecRule REQUEST_HEADERS:User-Agent "$Baiduspider*"

    Thanks for any help with this.
     
  8. kokopelli

    kokopelli Member

    Joined:
    Jan 5, 2005
    Messages:
    6
    Likes Received:
    0
    Trophy Points:
    1
    I used the example from http://www.puntapirata.com/ModSec-Rules.php:

    1. Create this mod_security rule as one of you first rules:
      Code:
      # block bat bots - see http://puntapirata.com/On-House-ModSec-Rules.php
      SecRule REQUEST_HEADERS:User-Agent "@pmFromFile PuntaPirata-blackbots.txt" "id:980001,rev:1,severity:2,log,msg:'PuntaPirata Bot Rule: Black Bot detected.'"
      
    2. Create a txt file called "PuntaPirata-blackbots.txt", and enter bad bot user agents on new lines. See the Zipped example at the above URL.
    3. Upload PuntaPirata-blackbots.txt to the same location as Mod Security (in my case: root: /usr/local/apache/conf)
    4. Restart Apache.

    Hope that helps.
     
    #8 kokopelli, Nov 9, 2011
    Last edited: Nov 9, 2011
Loading...

Share This Page