Blocking web crawlers. ModSecurity or in vhost?

DennisMidjord

Well-Known Member
Sep 27, 2016
286
47
78
Denmark
cPanel Access Level
Root Administrator
Lately, a lot of our customers' websites has been crawled by a lot of bots. Yesterday, a single website was crawled by 4 different bots at the same time. All of the bots were bad bots.
We want to block these bots but I'm wondering which method is the best performance wise or if it really doesn't matter.

So, does anyone have any recommendations for blocking bad bots?
 

Handssler Lopez

Well-Known Member
Apr 30, 2019
84
24
8
Guatemala
cPanel Access Level
Root Administrator
the best recommendation would be robots.txt so you block the bad ones and allow the good ones.

* Even if you configure it, you decide whether or not to follow the instructions, blocking robots through apache is not very convenient, there may be problems with advertising campaigns, site validators, etc.
 
  • Like
Reactions: cPanelAnthony

DennisMidjord

Well-Known Member
Sep 27, 2016
286
47
78
Denmark
cPanel Access Level
Root Administrator
@Handssler Lopez I'm only talking about bad bots - not bots in general. Blocking access by configuring robots.txt is not a viable solution because a) we need to do it for every website on all of our servers, and b) it makes no difference if the bot doesn't respect robots.txt. A lot of them don't.
 

ffeingol

Well-Known Member
PartnerNOC
Nov 9, 2001
662
231
343
cPanel Access Level
DataCenter Provider
We just use mod_secuity.

Example of rules (that we picked up somewhere)

Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403"
SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403"
SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403"
SecRule REQUEST_HEADERS:User-Agent "@rx (?:SeznamBot)" "msg:'SeznamBot Spiderbot blocked',phase:1,log,id:7777774,t:none,block,status:403"
We grab the User-Agent for Apache logs and then just plug in. When you add another, you just need to increment the ID, so you don't have duplicates.
 

nootkan

Well-Known Member
Oct 25, 2006
152
9
168
There is a pretty good plugin that handles bad bots very well also. Just google "stopbadbots". The developer makes a plugin for wordpress and a stand alone plugin, I use both effectively.