jeffschips

Well-Known Member
Jun 5, 2016
82
7
8
new york
cPanel Access Level
Root Administrator
I am receiving the following log entries for my cpanel server. I have csf and am wondering the best way to block this bot. They are filling up my logs.
Code:
msnbot-40-77-193-242.search.msn.com - - [23/May/2019:15:56:38 -0400] "GET /templates/xxxxxxxx/css/user.css?4ac4b28fxxxxxxxxx24a660c0da HTTP/1.1" 200 275 "http://example.com/" "Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 BingPreview/1.0b"
The IP address goes back to the American Registry of Internet Numbers. . . in Hong Kong. Can someone confirm if this is legitimate (which I doubt) and if not, best way to block only this bot using CSF or similar. The csf.deny is for blocklists, this is one particular domain and if I block the actual IP address only, then I would be blocking ARIN (which is maybe what the crawler wants me to do).

Thanks.
 
Last edited by a moderator:

fuzzylogic

Well-Known Member
Nov 8, 2014
136
78
28
cPanel Access Level
Root Administrator
A query for the ip of the host msnbot-40-77-193-242.search.msn.com on the website...
hostip.info/index.html
confirms the ip address is 40.77.193.242

A whois lookup for that ip says it is owned by Microsoft.
40.77.193.242 whois lookup information - who.is

The user agent header for the request is...
Mozilla/5.0 (iPhone; CPU iPhone OS 7_0 like Mac OS X) AppleWebKit/537.51.1 (KHTML, like Gecko) Version/7.0 Mobile/11A465 Safari/9537.53 BingPreview/1.0b

I would conclude that search.msn.com is fetching web pages to generate screenshot preview images for search results on Bing search engine.

Most web site owners would be displeased if they thought you were blocking requests from legitimate search engines.
 
Last edited by a moderator:

jeffschips

Well-Known Member
Jun 5, 2016
82
7
8
new york
cPanel Access Level
Root Administrator
in csf.rignore I have .search.msn.com which is supposed to allow this bot access.

What I'm finding is this bot is attempting to grab content that doesn't exist, producing many 400 responses

As well, when I did the ip address lookup it did not return microsoft but rather ARIN with an address in Hong Kong.

Another IP address lookup using a different service does, indeed, indicate it is Microsoft.