The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

HUGE Googlebot problem. Crashing MySQL!!!

Discussion in 'General Discussion' started by jols, Oct 21, 2006.

  1. jols

    jols Well-Known Member

    Joined:
    Mar 13, 2004
    Messages:
    1,111
    Likes Received:
    2
    Trophy Points:
    38
    Two different sites on two different servers that are MySQL based (one a blog, the other a large osCommerce site) had GoogleBot hammer them, opening too many MySQL processes and shuting down MySQL as a result for everyone on the server with a "too many processes" error.

    Short of putting all the Googlebot IPs in the firewall, what can be done about this?
     
  2. rikgarner

    rikgarner Well-Known Member

    Joined:
    Mar 31, 2006
    Messages:
    75
    Likes Received:
    1
    Trophy Points:
    8
    Location:
    /dev/null
    All well-behaved robots have to check for the robots.txt file at the top level of the domain:

    http://www.robotstxt.org/

    Maybe setting up an exclusion will prevent googlebot from destroying MySQL without getting them dropped out of the google results altogether.

    Rich
     
  3. Spiral

    Spiral BANNED

    Joined:
    Jun 24, 2005
    Messages:
    2,023
    Likes Received:
    7
    Trophy Points:
    0
    It should also be noted that there are a number of bad bots out there
    fraudulently identifying themselves as Google.

    I would definitely do an IP check on the IP numbers of the bots.

    If the traffic is just coming from the same IP or a couple of IPs,
    you could just block them with a simple firewall rule like such:


    iptables -A INPUT -s x.x.x.x -j DROP

    (where x.x.x.x would be replaced with the IP you want to block)
     
  4. hostmedic

    hostmedic Well-Known Member

    Joined:
    Apr 30, 2003
    Messages:
    559
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Washington Court House, Ohio, United States
    cPanel Access Level:
    DataCenter Provider
    old thread but

    I noticed this as well

    found the IP to be

    222.72.108.124

    Interesting as I found it to cross reference to

    http://www.doubleseek.com/cgi/ip.cgi

    This then shows:

    [Japanese]
    ProxyJudge V2.35

    REMOTE_HOST=c-68-44-93-223.hsd1.nj.comcast.net
    REMOTE_ADDR=68.44.93.223

    HTTP_ACCEPT=text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
    HTTP_ACCEPT_CHARSET=ISO-8859-1,utf-8;q=0.7,*;q=0.7
    HTTP_ACCEPT_ENCODING=gzip,deflate
    HTTP_ACCEPT_LANGUAGE=en-us,en;q=0.5
    HTTP_CONNECTION=keep-alive
    HTTP_HOST=www.doubleseek.com
    ? - HTTP_KEEP_ALIVE=300
    HTTP_USER_AGENT=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.8) Gecko/20061025 Firefox/1.5.0.8

    * REMOTE_HOST

    Result
    Comment
    Maybe no problem.


    * HTTP Env. Value

    Result
    Via a Proxy
    Comment
    Dubious valuable is detected.


    * AnonyLevel : 4
    If it is not slow, it is useful.

    prxjdg - created by PRX4EVER
    thanx to Team Cr[y]ackerz
     
  5. jols

    jols Well-Known Member

    Joined:
    Mar 13, 2004
    Messages:
    1,111
    Likes Received:
    2
    Trophy Points:
    38
    I've had to resort to putting this in the .htacces files of the effected accounts:

    SetEnvIfNoCase User-Agent "Googlebot" bad_bot
    <Limit GET POST>
    Order Allow,Deny
    Allow from all
    Deny from env=bad_bot
    </Limit>
     
Loading...

Share This Page