Phylum

Active Member
Apr 20, 2010
33
0
56
I'm reviewing the bandwidth statistics for a site over the last few months and I'm a little bit confused about what I'm seeing. For instance, for the month of January:
  • awstats 'days of the month' chart reports nearly 11GB of bandwidth; two surges around 1.5GB and one at 2GB, the rest are mostly between 100-300MB
  • the bandwidth 'http' chart shows a spike around 1/20 with roughly 1.4GB transferred; nothing else comes close
  • the bandwidth 'by the day' chart shows 30GB for the entire month with 2 spikes over 2000MB and several spikes over 1000MB (note: they're not reflected in http chart above)
  • webalizer's 'monthly statistics' and 'daily statistics' charts both show 60669910 total KBytes; thats 57GB?

Unless I'm misinterpreting what I'm reading, its not making sense. It seems they're all incomplete requiring review of all in order for it to paint a proper picture.

I went back to awstats and jumped down to the 'Robots/Spiders visitors' section and, again unless I'm not reading this correctly, it looks like a few robots are crushing my bandwidth:

December 2012:
Code:
[B][U]robots[/U]					[U]Hits[/U]		[U]Bandwidth[/U]	[U]Last visit[/U] [/B]
Googlebot				8,743+121	24.32 GB	31 Dec 2012 - 23:55 
Unknown robot (identified by 'robot')	4,844+80	8.41 GB		31 Dec 2012 - 23:58 
Unknown robot (identified by 'bot*')	4,028+598	2.20 GB		31 Dec 2012 - 23:44 
Unknown robot (identified by '*bot')	3,555+601	7.58 GB		31 Dec 2012 - 23:38 
Unknown robot (identified by 'spider')	1,501+134	1.89 GB		31 Dec 2012 - 22:34

January 2013:
Code:
[B][U]robots[/U]					[U]Hits[/U]		[U]Bandwidth[/U]	[U]Last visit[/U] [/B]
Googlebot				8,171+126	18.28 GB 31	Jan 2013 - 23:51 
Unknown robot (identified by 'bot*')	5,289+638	6.79 GB 31	Jan 2013 - 23:58 
Unknown robot (identified by '*bot')	4,098+684	11.98 GB 31	Jan 2013 - 17:59 
Unknown robot (identified by 'robot')	2,559+67	5.83 GB 31	Jan 2013 - 14:27 
Unknown robot (identified by 'crawl')	2,605+19	3.77 GB 30	Jan 2013 - 18:08

February 2013:
Code:
[B][U]robots[/U]					[U]Hits[/U]		[U]Bandwidth[/U]	[U]Last visit[/U] [/B]
Unknown robot (identified by 'bot*')	3,580+277	7.72 GB		11 Feb 2013 - 06:52 
Googlebot				2,878+62	5.03 GB		11 Feb 2013 - 07:05 
Unknown robot (identified by 'robot')	2,026+29	5.78 GB		11 Feb 2013 - 01:43 
Unknown robot (identified by '*bot')	1,482+244	3.95 GB		11 Feb 2013 - 05:57 
Unknown robot (identified by 'spider')	1,133+52	1.25 GB		11 Feb 2013 - 06:57


I know that there are a number of options to prevent Googlebot from crawling the site but that's not necessarily ideal. Short of blocking Google, is there anything I can do?
As for those other unknown robots, any suggestions on how to block them?
 

ruzbehraja

Well-Known Member
May 19, 2011
392
11
68
cPanel Access Level
Root Administrator
I know that there are a number of options to prevent Googlebot from crawling the site but that's not necessarily ideal. Short of blocking Google, is there anything I can do?
As for those other unknown robots, any suggestions on how to block them?
Robots.txt
and
Mod_security rules should prevent unwanted bots from crawling.

See: http://frankmash.blogspot.in/2006/02/banning-abusing-bots-using-modrewrite.html

- - - Updated - - -

See this too:

http://forums.cpanel.net/f185/mod_security-how-allow-bots-like-googlebot-blocked-300411.html