SOLVED Blocking bad bots

nunoleite

Well-Known Member
Jun 4, 2007
65
3
158
Hi!

I have seen lots of bots accessing my websites on my VPS. For now i just block IPs temporarily using CSF, but i would like to have a better and global solution.

So, i'm thinking in 2 options...
first:
Apache Configuration -> Include Editor -> “Pre Main Include”
Code:
<Directory "/home">
   SetEnvIfNoCase User-Agent "MJ12bot" bad_bots
   SetEnvIfNoCase User-Agent "AhrefsBot" bad_bots
   SetEnvIfNoCase User-Agent "SemrushBot" bad_bots
   SetEnvIfNoCase User-Agent "Baiduspider" bad_bots
   ...
  <RequireAll>
     Require all granted
     Require not env bad_bots
  </RequireAll>
</Directory>
or
Code:
<Directory "/home">
  BrowserMatchNoCase "Baiduspider" bots
  BrowserMatchNoCase "HTTrack" bots
  BrowserMatchNoCase "Yandex" bots
  ...
  Order Allow,Deny
  Allow from ALL
  Deny from env=bots
</Directory>
second:
using ModSecurity rules
Code:
SecRule REQUEST_HEADERS:User-Agent "CareerBot" "deny,log,noauditlog,severity:2,msg:'Spiderbot blocked',status:403"
I don't know if this codes are 100% correct, as i found them on the internet and have not tested.

Can i have some advice about these two options, using apache or modsecurity and if these codes would work?

Thanks
Nuno Leite
 

fuzzylogic

Well-Known Member
Nov 8, 2014
149
89
78
cPanel Access Level
Root Administrator
Your Modsec rule would not work, it has no id (which is mandatory).
Below is a copy of OWASP CRS rule 913102 (a Paranoia Level 2 rule), edited so as to block all the bots you listed in your examples.
I am not recommending that all these bots should be blocked, just offering working syntax for the bots you decide you want to block.
The id has had the number 1 added at the end so as to never cause a duplicate id error.
Code:
SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:MJ12bot|AhrefsBot|SemrushBot|Baiduspider|HTTrack|Yandex|CareerBot)$" \
 "msg:'Found User-Agent associated with web crawler/bot',\
  severity:'CRITICAL',\
  id:9131021,\
  rev:'1',\
  phase:request,\
  block,\
  t:none,\
  ver:'OWASP_CRS/3.0.0',\
  maturity:'9',\
  accuracy:'9',\
  capture,\
  logdata:'Matched Data: %{TX.0} found within %{MATCHED_VAR_NAME}: %{MATCHED_VAR}',\
  tag:'application-multi',\
  tag:'language-multi',\
  tag:'platform-multi',\
  tag:'attack-reputation-crawler',\
  tag:'OWASP_CRS/AUTOMATION/CRAWLER',\
  tag:'WASCTC/WASC-21',\
  tag:'OWASP_TOP_10/A7',\
  tag:'PCI/6.5.10',\
  tag:'paranoia-level/2',\
  setvar:'tx.msg=%{rule.msg}',\
  setvar:tx.anomaly_score=+%{tx.critical_anomaly_score},\
  setvar:tx.%{rule.id}-OWASP_CRS/AUTOMATION/CRAWLER-%{matched_var_name}=%{matched_var},\
  setvar:ip.reput_block_flag=1,\
  expirevar:ip.reput_block_flag=%{tx.reput_block_duration},\
  setvar:'ip.reput_block_reason=%{rule.msg}'"
 
  • Like
Reactions: cPanelMichael

nunoleite

Well-Known Member
Jun 4, 2007
65
3
158
Hi!

The list of bots i have in the examples are not necessarily the ones o need to block has there are only 3 or 4 that i see more and have big impact on the server load.

So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right?

With this approach i can add and remove easily the bots i need to block in the whole server, right?

Thanks
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,910
2,211
363
So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right?

With this approach i can add and remove easily the bots i need to block in the whole server, right?
Hello,

You can use the rule as an example, but note some of the entries are designed for use with the OWASP rule set:

OWASP ModSecurity CRS - cPanel Knowledge Base - cPanel Documentation

Thank you.
 

fuzzylogic

Well-Known Member
Nov 8, 2014
149
89
78
cPanel Access Level
Root Administrator
cPanelMichael is correct the rule I posted relies on other OWASP CRS rules to do the blocking.
It was wrong of me to assume you would have this in your environment.

Here is another rule example based on the faulty rule example you posted.
It will work as a standalone rule or alongside any rule-set I know of.

Code:
SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:MJ12bot|AhrefsBot)$" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
I tested this rule and it returned a 403 status for request with either of the following headers...
User-Agent: MJ12bot
User-Agent: AhrefsBot

So with this code i can use just the ModSecurity Tools and add this custom rule and changing the bot list on the first line would block all the bots i need, right?
That is correct.
Add bots so that the regex has this form
^(?:bot1|bot2|bot3)$

With this approach i can add and remove easily the bots i need to block in the whole server, right?
ModSecurity rules added through...
Home » Security Center » ModSecurity™ Tools » Rules List » Add Rule
are applied to all http requests to the cPanel server.
 
Last edited:
  • Like
Reactions: cPanelMichael

nunoleite

Well-Known Member
Jun 4, 2007
65
3
158
Thanks fuzzylogic.
That's what i was looking for... a simple rule that could block these bad bots.

I don't have OWASP rules installed because some time ago i tried that and it created lots of problems with some CMS i have in the server, and i didn't investigate better what rules to enable or disable to be compatible.

I will try this new SecRule, thanks.

What about the other option using apache configuration? Is it valid? Or using modsecurity is better?

Thanks
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,910
2,211
363
What about the other option using apache configuration? Is it valid? Or using modsecurity is better?
Mod_Security rules are a better option in my opinion. It will make it easier for you to exclude rules for specific accounts if necessary.

Thank you.
 

nunoleite

Well-Known Member
Jun 4, 2007
65
3
158
Hi!
I have published this rule:
Code:
SecRule REQUEST_HEADERS:User-Agent "@rx ^(?:AhrefsBot|MJ12bot|Yandex)$" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
But i still see this visitors with this user agent:
Mozilla/5.0 (compatible; MJ12bot/v1.4.8; MJ12Bot | Home | from Majestic)

This isn't being blocked.
 
Last edited by a moderator:

fuzzylogic

Well-Known Member
Nov 8, 2014
149
89
78
cPanel Access Level
Root Administrator
If you want to match a fragment of the User-Agent you require a looser regular expression.
Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot|AhrefsBot)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
Also it is possible you want to cover variations of uppercase and lowercase.
This can be achieved by having modsecurity transform all User-Agent values to lowercase then enter the bot names in the regex as all lowercase.
Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?:ahrefsbot|mj12bot|yandex)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:lowercase,block,status:403"
 
  • Like
Reactions: cPanelMichael

nunoleite

Well-Known Member
Jun 4, 2007
65
3
158
Hi!
With this rule:
Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot|MJ12bot|Yandex)" "msg:'Spiderbot blocked',phase:1,log,id:777777,t:none,block,status:403"
I think MJ12bot are being blocked, but i still see: user-agent: Mozilla/5.0 (compatible; AhrefsBot/5.2; +ahrefs.com/robot/)

hum..... strange....
 

nunoleite

Well-Known Member
Jun 4, 2007
65
3
158
Hi!
I have made some changes...
I have added this 3 rules:
Code:
SecRule REQUEST_HEADERS:User-Agent "@rx (?:AhrefsBot)" "msg:'AhrefsBot Spiderbot blocked',phase:1,log,id:7777771,t:none,block,status:403"
SecRule REQUEST_HEADERS:User-Agent "@rx (?:MJ12bot)" "msg:'MJ12bot Spiderbot blocked',phase:1,log,id:7777772,t:none,block,status:403"
SecRule REQUEST_HEADERS:User-Agent "@rx (?:Yandex)" "msg:'Yandex Spiderbot blocked',phase:1,log,id:7777773,t:none,block,status:403"
With these 3 rules i know in the "hits list" what is going on with each bot and what is hitting the rule, because they are being logged separately.

If i understand right the bots still have access to sites, but they receive 0bytes and an 403 http error.
After some hits the CSF firewall blocks permanently the IP. Is this right? And is this the right behavior?
 

nunoleite

Well-Known Member
Jun 4, 2007
65
3
158
Hi!
Thanks.
Now i think this is working fine, as i can see lots of Hits in the rules and some being blocked.

But analyzing visitors it seems that AhrefsBot is still being served.

upload_2018-6-13_16-31-35.png

Is this possible?
 

WebJIVE

Well-Known Member
Sep 30, 2007
69
5
58
Good article on this subject and one we're going to look into.

linuxadmin.io/blocking-bad-useragents-modsecurity-fail2ban/

and another one that we're testing

geekytuts.net/cpanel/block-bad-bots-cpanel-globally-apache/
 
Last edited by a moderator:
  • Like
Reactions: cPanelMichael