Is mod_security blocking search engines???

DReade83

Well-Known Member
Oct 20, 2006
196
0
166
Cheshire, UK
I'm trying to use Lynx to view my client sites. This is a necessity as mentioned by the Google Webmaster Guidelines as it shows you what your site looks like from the spider's point of view.

However when trying to view pages using this text browser, I'm getting an Error 406 page, and an entry added to mod_security.

Does this mean search engine spiders are getting the same treatment?

** This is potentially causing marketing problems, which can lead to revenue issues, and I need an accurate answer **

For the record, I'm using CURRENT build with Apache 2.2, PHP 5.2.5 and MySQL 5.0.45. I'm also using the default mod_security rules that come as part of the EA3 build.
 
Last edited:

Bailey

Well-Known Member
Aug 12, 2001
120
1
318
Wisconsin
What do your logs show? According to your logs, are SEs getting 406s, and if so, what is the reason cited for the 406?

:D Bailey
 

DReade83

Well-Known Member
Oct 20, 2006
196
0
166
Cheshire, UK
Error as follows:

Access denied with code 406 (phase 2). Match of "rx ^apache.*perl" against "REQUEST_HEADERS:User-Agent" required. [id "990011"] [msg "Request Indicates an automated program explored the site"] [severity "NOTICE"] 406
2007-11-26 12:39:51 <IP> <URL> HTTP/1.0 <SITE>
 

yapluka

Well-Known Member
Dec 24, 2003
301
1
168
France
cPanel Access Level
Root Administrator
The key here is "against "REQUEST_HEADERS:User-Agent" required". Spiders do have the User-Agent configured so they are not blocked by mod_security - unless a rule is specifically blocking this user-agent, that is :)
 

Bailey

Well-Known Member
Aug 12, 2001
120
1
318
Wisconsin
*nods* The error cited means nothing without an IP address.

What IP generated that error? When you run a whois on the IP, who does it belong to? Is it a search engine, or a regular user?

If it's a search engine, then you have just answered your own question. :)

:D Bailey
 

Todd Mitchell

Well-Known Member
Staff member
Nov 13, 2006
301
1
243
Houston, TX
I've tested the default mod_security rules installed by WHM and I am able to browse to sites using lynx. This error could be occurring for a couple different reasons, including the site itself, the version and configuration of lynx that your using, etc. If you would like, please submit a ticket with this information along with the site in question and I'd be happy to dig a little deeper to find the exact cause for the error.
 

gmagana

Active Member
May 18, 2005
41
0
156
Is there any way to disable this rule? This is causing major problems for us because we host files that are being accessed by a (stand-alone, non-browser) program that apparently gets blocked by this rule... This seems to be the block of mod_security config that makes up the rule:
Code:
SecRule REQUEST_HEADERS:User-Agent "(?:\b(?:(?:indy librar|snoop)y|microsoft url control|lynx)\b|d(?:ownload demon|isco)|w(?:3mirror|get)|l(?:ibwww|wp)|p(?:avuk|erl)|cu(?:sto|rl)|big brother|autohttp|netants|eCatch)" \
        "chain,log,auditlog,msg:'Request Indicates an automated program explored the site',id:'990011',severity:'5'"
SecRule REQUEST_HEADERS:User-Agent "!^apache.*perl"
I have tried commenting out the lines, but no go, if I do, all the HTTP requests get blocked with 406...

For the time being I cleared the mod_security config file and now it's all working, but I'd rather keet the rest of the checks in place...

I cannot modify at all the program that is requesting these files to include the correct headers in its HTTP requests, nor can I require the maker to change their program to suit... so the only option is to remove this check.

Any suggestions?

EDIT: I should mention that in the access logs, the requests show up as:
Code:
201.211.96.234 - - [24/Jan/2008:07:43:47 -0800] "GET /server/etpub/etpub_client-20070801.pk3 HTTP/1.1" 406 496 "ET://72.5.249.131:27960" "ID_DOWNLOAD/2.0 libcurl/7.12.2"
so maybe there is a way to re-write the rule so that it allows requests with "ET://xxxx" as the referrer that would otherwise be blocked under this rule...
 
Last edited: