The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

mod_security - how to allow bots like googlebot? It was blocked.

Discussion in 'Security' started by morrow95, Oct 21, 2012.

  1. morrow95

    morrow95 Well-Known Member

    Joined:
    Oct 8, 2006
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    Just got the following in an email :

    Time: Sun Oct 21 10:31:52 2012 -0400
    IP: 66.249.73.70 (US/United States/crawl-66-249-73-70.googlebot.com)
    Failures: 5 (mod_security)
    Interval: 300 seconds
    Blocked: Permanent Block

    [Sun Oct 21 10:28:28 2012] [error] [client 66.249.73.70] ModSecurity: Access denied with code 501 (phase 2). Match of "rx ^((?:(?:POS|GE)T|OPTIONS|HEAD))$" against "REQUEST_METHOD" required. [file "/usr/local/apache/conf/modsec2.user.conf"] [line "38"] [id "960032"] [msg "Method is not allowed by policy"] [severity "CRITICAL"] [tag "POLICY/METHOD_NOT_ALLOWED"] [hostname "server.example.com"] [uri "/"] [unique_id "UIQGjGB-guIAAAwRPyQAAAAB"]

    Granted this came to me because I have CSF installed on my server. I removed the block on the ip in CSF. Now, I have two questions :

    1 - Since I removed the block in CSF there is nothing I need to unblock in mod_security right? It is my understanding that while it blocks based on the rules it does not implement a 'permanent' block perse.

    2 - I don't want this to happen again and would like to allow all 'bots' without them being blocked in any form. I found the following on another site :

    # Allow GoogleBot by user-agent 10-21-2012
    SecRule HTTP_USER_AGENT "Google" nolog,allow
    SecRule HTTP_USER_AGENT "Googlebot" nolog,allow
    SecRule HTTP_USER_AGENT "GoogleBot" nolog,allow
    SecRule HTTP_USER_AGENT "googlebot" nolog,allow
    SecRule HTTP_USER_AGENT "Googlebot-Image" nolog,allow
    SecRule HTTP_USER_AGENT "AdsBot-Google" nolog,allow
    SecRule HTTP_USER_AGENT "Googlebot-Image/1.0? nolog,allow
    SecRule HTTP_USER_AGENT "Googlebot/2.1? nolog,allow
    SecRule HTTP_USER_AGENT "Googlebot/Test" nolog,allow
    SecRule HTTP_USER_AGENT "Mediapartners-Google/2.1? nolog,allow
    SecRule HTTP_USER_AGENT "Mediapartners-Google*" nolog,allow
    SecRule HTTP_USER_AGENT "msnbot" nolog,allow

    Should I add this at the top of my config through WHM? I also read that using the user-agent method is not good as this can be faked. So, with that said what is the best way to do this? What are the lines for other popular bots so they are not blocked as well?

    I also found information on something called gotroot, but apparently it was not meant for whm/cpanel? I would like something I can set and forget and gets updates auto similar to the default.
     
  2. morrow95

    morrow95 Well-Known Member

    Joined:
    Oct 8, 2006
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    I should add that this is the typical line reported in the error log :

    [Sun Oct 21 22:55:58 2012] [error] [client 66.249.73.70] File does not exist: /usr/local/apache/htdocs/501.shtml

    Why is this error not being handled properly by apache? This appears to be why mod_security is causing a problem... any thoughts?
     
  3. morrow95

    morrow95 Well-Known Member

    Joined:
    Oct 8, 2006
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    I did some testing and I believe this is related to https some how... if I enter any page of any of my sites in https I get a connection error in the browser - no error page is shown. If I then view my error log for apache I get something along the lines of this :

    ModSecurity: Access denied with code 501 (phase 2). Match of "rx ^((?:(?:POS|GE)T|OPTIONS|HEAD))$" against "REQUEST_METHOD" required. [file "/usr/local/apache/conf/modsec2.user.conf"] [line "38"] [id "960032"]

    If I do this a few times then my IP gets blocked by CSF. So, how do I fix the above problem. I believe this is why Googlebot is being blocked because it is trying to crawl https pages.

    Secondly, shouldn't a non-existent https page be showing a regular error page like a 501 or something rather than a connection error?
     
  4. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    14,468
    Likes Received:
    196
    Trophy Points:
    63
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    You should do some more homework on mod_security.

    If I set my browser user agent to be seen as Googlebot, and I come try and hack your site, you've allowed me in, by using the piece of code you posted above.

    Does that file exist? Assuming no. You should do some more homework on CSF. There are options for this. Example:

    Code:
    # This option will keep track of the number of "File does not exist" errors in
    # HTACCESS_LOG. If the number of hits is more than LF_APACHE_404 in LF_INTERVAL
    # seconds then the IP address will be blocked
    #
    # Care should be used with this option as it could generate many
    # false-positives,[B] especially Search Bots (use csf.rignore to ignore such bots)[/B]
    # so only use this option if you know you are under this type of attack
    #
    # A sensible setting for this would be quite high, perhaps 200
    #
    # To disable set to "0"
    LF_APACHE_404
    
    Searchbots are scanning your site, if they hit old links to pages that are no longer there they need to be fed an error page. If you have no error pages they will hit the link again, looking for the file again.

    You are able to have more control over your mod_sec blocking with a tool like this:
    ConfigServer ModSecurity Control
     
  5. Igal Incapsula

    Igal Incapsula Registered

    Joined:
    Oct 22, 2012
    Messages:
    1
    Likes Received:
    0
    Trophy Points:
    1
    cPanel Access Level:
    DataCenter Provider
    A recently conducted /http://www.incapsula.com/the-incapsula-blog/item/369-was-that-really-a-google-bot-crawling-my-site - "security research of Googlebot impersonation phenomena" showed that 16% of all "Googlebot" visits were fake and out of those 21% were malicious.

    (Googlebot impersonation is also commonly used by SEO crawling tools that try to assess competition and want to "see" the site, just as Googlebot does)

    To filter out fake Googlebot access attempts you should cross-verify IP ranges with user-agent data.

    To do this, you need to use /http://www.Botopedia.org IP validation tool to perform a reverse DNS lookup, to "weed out" all irrelevant IPs.

    Also, you should always check the IP before setting any restrictions.
    One common mistake is to ban all Chinese IPs, by default.
    This is false because Googlebot will sometimes use Chinese IPs and banning all access from China may lead to crawling errors.

    GL
     
    #5 Igal Incapsula, Oct 22, 2012
    Last edited: Oct 22, 2012
  6. PlotHost

    PlotHost Well-Known Member

    Joined:
    Apr 29, 2011
    Messages:
    253
    Likes Received:
    1
    Trophy Points:
    18
    Location:
    US
    cPanel Access Level:
    Root Administrator
    Twitter:
    You can add the Google IP in modsec2.whitelist.conf
     
  7. morrow95

    morrow95 Well-Known Member

    Joined:
    Oct 8, 2006
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    @InfoPro

    Thanks for the detailed response. Yes, I saw the danger of user-agent in the config and was looking for another answer. I do have a few others though.

    Regarding your comment about '/usr/local/apache/htdocs/501.shtml' existing or not... in this location I only have 400, 401, 403, 404, 500, and now 501. I was under the assumption that if an error occurred for anything not available it would use a 'default' of some type. Should I make files for all error codes in this location?

    On top of that, why is no error page shown to me when trying to view a non-existent https page? It simply says there is a connection error instead. Is this normal... I would think an actual error page would be thrown.

    Regarding CSF and LF_APACHE_404... that option has always been disabled and set to 0 so that is not the problem. The problem goes back to the above. If I try to visit a non-existent https page on any of my sites it says there was a connection problem, no error page, and modsec records the error then csf blocks because of it happening x times.
     
  8. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    14,468
    Likes Received:
    196
    Trophy Points:
    63
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    I think the answer here is, there is no site at https:// somedomainwithoutdedicatedipandcert.com so thats not a "valid" url that would generate an error. You might go over your CSF settings to make them work more like you want, give the error a few more times before blocking for example. If this is a problem that continues happening, I would think there's a reason for it. Why is someone going to that https:// domain anyway? A random spider for example, sure, but if users are visiting that URL often, whats sending them there?

    If you want, you can modify mod_sec rules per domain using this tool easy enough:
    ConfigServer ModSecurity Control

    In your errors above we see the rule is: 960032

    So you'd add that in the config using that tool, for that domain(s) affected by this issue.

    HTH somehow. :)
     
  9. morrow95

    morrow95 Well-Known Member

    Joined:
    Oct 8, 2006
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    Will look over everything and see what I can do. I am still curious as to why the server default 501 page was trying to be accessed... espcially from Googlebot and one of its ip's... unless that was faked somehow... either way, if it was faked, they succeeded in having their ip banned so my sites could not be crawled.

    I think the easiest solution right now is for csf not to block when modsec denies for visiting an invalid https.

    To answer your question, I did host https pages, but now have removed all files from the site. Either way, it still does not prevent this from happening on any site of mine... I'm not concerend with 'people' trying the https as this would probably never happen on a site, but I am concerned with spiders being blocked.
     
  10. morrow95

    morrow95 Well-Known Member

    Joined:
    Oct 8, 2006
    Messages:
    83
    Likes Received:
    0
    Trophy Points:
    6
    Okay, for the moment I have changed CSF so it no longer blocks IP's who have triggered mod_sec x times in y timeframe, however, I am still having an issue with https being used.

    As an example using firefox as a browser, I go to any of my websites using https: I get :

    Server Connection Failed. An error occurred during a connection to IANA — Example domains. SSL received a record that exceeded the maximum permissible length. (Error code: ssl_error_rx_record_too_long)

    This then triggers mod_security and gives :

    [Tue Oct 23 00:47:21 2012] [error] [client 99.30.160.94] ModSecurity: Access denied with code 501 (phase 2). Match of "rx ^((?:(?:POS|GE)T|OPTIONS|HEAD))$" against "REQUEST_METHOD" required. [file "/usr/local/apache/conf/modsec2.user.conf"] [line "38"] [id "960032"] [msg "Method is not allowed by policy"] [severity "CRITICAL"] [tag "POLICY/METHOD_NOT_ALLOWED"] [hostname "exampleserver.com"] [uri "/"] [unique_id "UIYhWWB-guIAAHl3FKcAAAAA"]

    While I do not have ssl on these sites this certainly cannot be normal... I read up on the ssl_error_rx_record_too_long here on the forum, but most of the posts were people who actually had ssl on the site. I also read that transferring of accounts to a new server using pkacct(?) can cause this. I did have my accounts tranferred to a new server and this was used.

    All in all, this can't be the normal response in this situation and I would think mod_sec wouldn't be triggered if it was 'correct'? Anyone have any ideas? I would like to turn my CSF filter back on for mod_sec, but since all it takes at the moment is going to any of my sites in https: to trigger it that isn't going to work unless something changes.
     
  11. GIANT_CRAB

    GIANT_CRAB Well-Known Member

    Joined:
    Mar 23, 2012
    Messages:
    89
    Likes Received:
    0
    Trophy Points:
    6
    cPanel Access Level:
    Root Administrator
    >If I set my browser user agent to be seen as Googlebot, and I come try and hack your site, you've allowed me in, by using the piece of code you posted above.

    I've been attacked by bots that does that too.

    I wouldn't trade security for SEO.
     
  12. d'argo

    d'argo Active Member

    Joined:
    Jul 4, 2012
    Messages:
    36
    Likes Received:
    0
    Trophy Points:
    6
    cPanel Access Level:
    Root Administrator
    gotroot rules have a secure crawler detector, they check the ranges and dont block google
     
Loading...

Share This Page