HUGE Googlebot problem. Crashing MySQL!!!

jols

Well-Known Member
Mar 13, 2004
1,110
3
168
Two different sites on two different servers that are MySQL based (one a blog, the other a large osCommerce site) had GoogleBot hammer them, opening too many MySQL processes and shuting down MySQL as a result for everyone on the server with a "too many processes" error.

Short of putting all the Googlebot IPs in the firewall, what can be done about this?
 

rikgarner

Well-Known Member
Mar 31, 2006
75
1
158
/dev/null
All well-behaved robots have to check for the robots.txt file at the top level of the domain:

http://www.robotstxt.org/

Maybe setting up an exclusion will prevent googlebot from destroying MySQL without getting them dropped out of the google results altogether.

Rich
 

Spiral

BANNED
Jun 24, 2005
2,020
8
193
It should also be noted that there are a number of bad bots out there
fraudulently identifying themselves as Google.

I would definitely do an IP check on the IP numbers of the bots.

If the traffic is just coming from the same IP or a couple of IPs,
you could just block them with a simple firewall rule like such:


iptables -A INPUT -s x.x.x.x -j DROP

(where x.x.x.x would be replaced with the IP you want to block)
 

hostmedic

Well-Known Member
Apr 30, 2003
544
0
166
Washington Court House, Ohio, United States
cPanel Access Level
DataCenter Provider
old thread but

I noticed this as well

found the IP to be

222.72.108.124

Interesting as I found it to cross reference to

http://www.doubleseek.com/cgi/ip.cgi

This then shows:

[Japanese]
ProxyJudge V2.35

REMOTE_HOST=c-68-44-93-223.hsd1.nj.comcast.net
REMOTE_ADDR=68.44.93.223

HTTP_ACCEPT=text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
HTTP_ACCEPT_CHARSET=ISO-8859-1,utf-8;q=0.7,*;q=0.7
HTTP_ACCEPT_ENCODING=gzip,deflate
HTTP_ACCEPT_LANGUAGE=en-us,en;q=0.5
HTTP_CONNECTION=keep-alive
HTTP_HOST=www.doubleseek.com
? - HTTP_KEEP_ALIVE=300
HTTP_USER_AGENT=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.8) Gecko/20061025 Firefox/1.5.0.8

* REMOTE_HOST

Result
Comment
Maybe no problem.


* HTTP Env. Value

Result
Via a Proxy
Comment
Dubious valuable is detected.


* AnonyLevel : 4
If it is not slow, it is useful.

prxjdg - created by PRX4EVER
thanx to Team Cr[y]ackerz
 

jols

Well-Known Member
Mar 13, 2004
1,110
3
168
I've had to resort to putting this in the .htacces files of the effected accounts:

SetEnvIfNoCase User-Agent "Googlebot" bad_bot
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>