GoWilkes

Well-Known Member
Sep 26, 2006
703
34
178
cPanel Access Level
Root Administrator
I'm having a problem that I can't figure out, and I'm wondering if it's cPanel related? If not, maybe you guys will have an idea of how to narrow it down.

Yesterday from around 5:30am until 7am, I had a huge increase in Apache processes that was causing my server to freeze up. I normally don't have more than 50 or so processes during my peak time, but this period was hitting the Server Limit that I had set in Apache configuration of 100.

By time I saw it, though, it had ended.

Then at around 4pm, it started again. This time I was there to see it, but couldn't find any reason for it. I checked the number of connections using:

Code:
netstat -plan | grep :80 | awk '{print $5}' | cut -d : -f 1 | sort | uniq -c | sort -nr | head
but didn't see anything unexpected. I rebooted Apache, then MySQL, then the entire server, but none of them had any impact.

I was able to stop the server from freezing up by increasing Server Limit in Apache configuration to 256, but that's just a Band-aid. My number of Apache processes has stayed between 100 and 150 all night and all day, even when netstat showed that I only had 4 or 5 connections.

It's also notable that "Individual Interrupts" and "Disk Latency" in Munin went crazy at the same time.

I'm not sure what "Individual Interrupts" means, but an orange graph that's usually near 1e+02 dropped down below 1e-04.

And under "Disk Latency", /dev/xvdb/ has a green graph that's usually at around 1e-02 that dropped down to 1e-04. That made me suspect hardware failure, but I messaged Softlayer (who has the worst service now) and they said that with it being a virtual server then I wouldn't see hardware errors like that.

So I'm not sure if the change in Interrupts and Latency is relevant, or just a symptom of another problem.

I'm running CentOS 6.10 xen hvm, and WHM is v 76.0.20. I'm still running EasyApache 3, so WHM/cPanel hasn't updated to 78.

Any suggestions you guys can give would be greatly appreciated!! Thanks in advance!
 

GOT

Get Proactive!
PartnerNOC
Apr 8, 2003
1,780
331
363
Chesapeake, VA
cPanel Access Level
DataCenter Provider
Well, it sounds like you are getting some kind of DoS attack. This command:

/usr/bin/lynx -dump -width 500 http://127.0.0.1/whm-server-status | grep GET | awk '{print $12}' | sort | uniq -c | sort -rn | head

Will show you the number of connections each domain has active at that moment and this command:

netstat -ntu | awk '{print $5}' | cut -d: -f1 | sort | uniq -c | sort -n

Will show you the number of connections to your server per connecting IP.

The problem with the command you used above is that it only reports port 80 traffic whereas if they are attacking on SSL it would not show those connections.
 

GoWilkes

Well-Known Member
Sep 26, 2006
703
34
178
cPanel Access Level
Root Administrator
Thanks for the commands, those are very helpful! I didn't think about changing it from :80 after I moved everything to HTTPS.

But I'm still not seeing a high number of connections. From the first command, I have 28 connections right now, but Munin is showing about 100 Apache processes; roughly double the number that I had at this time on May 30.

And using the second command, the IP with the highest number of connections is a local IP, with 13 connections (pretty much what I would expect).

I even blocked all non-US IP addresses in CSF (firewall) using CC_ALL_FILTER (only allowing US), but it had no noticeable impact on the problem.
 

GOT

Get Proactive!
PartnerNOC
Apr 8, 2003
1,780
331
363
Chesapeake, VA
cPanel Access Level
DataCenter Provider
You might want to read the docs on that csf filter. If my memory serves me I dont think it works like you're expecting.

As for munin I would not necessarily use that for real time diagnostics.

ps axf|grep httpd|wc will give you a live count of Apache processes. From your numbers it doesn't sound like an attack but you should look at your general Apache settings. I believe the default max children/servers is set to 150 by default and if you are exceeding that then pages won't load

I would also look at Apache status in whm because sometimes your eyes can show you things that just getting numbers from commands doesn't reveal.
 

GoWilkes

Well-Known Member
Sep 26, 2006
703
34
178
cPanel Access Level
Root Administrator
This is what I was going by on CC_ALLOW_FILTER:

An alternative to CC_ALLOW is to only allow access from the following
countries but still filter based on the port and packets rules. All other
connections are dropped
And this:

crybit.com/block-whole-countries-csf/

I used the command you posted (ps axf|grep httpd|wc ) and this was the result:

46 366 3106

There wasn't a column header, though, so I'm not sure what I'm looking at here. It's 1:30am here right now, and the 46 matches what Munin shows for the current number of processes, but I'm not sure what the 366 or 3106 represent. Regardless, I would usually have 46 processes at peak time, not at 1:30. It should be more like 15-20 right now.


From your numbers it doesn't sound like an attack but you should look at your general Apache settings. I believe the default max children/servers is set to 150 by default and if you are exceeding that then pages won't load
You're right, and that turned out to be why my site was freezing up. Raising the number stopped it from freezing, but I have no clue why it increased in the first place :-(

I would also look at Apache status in whm because sometimes your eyes can show you things that just getting numbers from commands doesn't reveal.
Possibly because of the increase I made on Max Clients and Server Limit, but I do have about 100 of these:

::1 myservername.com OPTIONS * HTTP/1.0

I'm guessing that's normal, though... 100+/- free slots?
 
Last edited by a moderator:

MaFt

Registered
Jun 3, 2019
3
1
1
UK
cPanel Access Level
Reseller Owner
I'm following this as I've seen exactly the same. For years my sites averaged 4-6 Entry Processes and suddenly on Friday around 4-5pm UK time I was hitting "resource limit is reached" errors as these were limited to 20 on this server.

I'm a reseller though and have no control over the limits. I've managed to minimise this by shutting down 1 site completely and using Cloudflare's "I'm under attack" to reduce the number of visitors. Not ideal though as it's meant a 40% loss of income over the weekend compared to normal - but at least the sites are online.

The hosts are being painfully slow and keep saying they'll increase the limits. They still haven't. However, they've still not actually responded to my main query as to why the sites in question, with no changes at my end, are suddenly being reported as using a lot more processes than previously. Looking at the cPanel "concurrent usage" logs for 30 days you can see the sudden spike from Friday.

It seems very weird that the only similar thing I can find is this post - and the same issue also started on Friday too; Around the same time too (assuming the original poster is in the US).

I'm hopeful my hosts can find out what's going on and I'll certainly report back here if they find anything out.
 
Last edited:

GoWilkes

Well-Known Member
Sep 26, 2006
703
34
178
cPanel Access Level
Root Administrator
You're right, MaFt, I'm in eastern US. That's too much to be a coincidence, I think.

I ran ClamAV and rkhunter, and neither found anything, so I'm ruling out a virus on my end.

Right now (roughly 2pm EST) I have 101 busy Apache servers, but only 46 connections. The IP with the highest connection has 13 connections, which is reasonable, so I think that I can rule out a DDoS attack.

My RAM is high, too; I'm usually at around 3G at this time of day, but it's currently over 4G (I have 4G of RAM, so it's maxing out). My CPU load is fine, though: 0.87, and since I have 2 CPUs a load of 2 would be a normal-high.

MaFt, there's no excuse for your host to be dragging their feet on increasing the limits. It literally takes 30 seconds, and the restart of Apache might have a downtime of less than 1 second. It doesn't solve the problem, but it definitely help with the symptom (and should bring your revenue back on track).
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
Hello Everyone,

Can anyone affected by this issue verify if the Prefork MPM is enabled? You can execute the following command to check:

Code:
rpm -qa|grep mpm
If so, verify if any recent entries like the one below exist in /usr/local/apache/logs/error_log:

Code:
AH00144: couldn't grab the accept mutex
Thank you.
 

GoWilkes

Well-Known Member
Sep 26, 2006
703
34
178
cPanel Access Level
Root Administrator
I SSH'ed in to my server as root via Putty, ran rpm -qa|grep mpm, and basically nothing happened. It ran for about 2 seconds, then just gave me the prompt again.

In /usr/local/apache/conf/httpd.conf, though, the only reference to prefork is here:

Code:
Timeout 60
TraceEnable Off
ServerSignature Off
ServerTokens ProductOnly
FileETag None
StartServers 15

<IfModule prefork.c>
  MinSpareServers 10
  MaxSpareServers 20
</IfModule>

<IfModule itk.c>
  MinSpareServers 10
  MaxSpareServers 20
</IfModule>

ServerLimit 256
MaxClients 150
MaxRequestsPerChild 10000
KeepAlive On
KeepAliveTimeout 5
MaxKeepAliveRequests 100
I checked my error_log, anyway, but didn't find any reference to "mutex". The oldest entry was May 31, about 12 hours before this problem began the first time. I looked through, and don't see any errors other than attempts for pages that don't exist, and a handful of errors that I see all the time that I don't understand, but I doubt that they're related to this:

Code:
RewriteOptions: MaxRedirects option has been removed in favor of the global LimitInternalRecursion directive and will be ignored.
Hostname X provided via SNI and hostname example.com provided via HTTP are different
Thanks, Michael!
 

dalem

Well-Known Member
PartnerNOC
Oct 24, 2003
2,983
159
368
SLC
cPanel Access Level
DataCenter Provider
Are you running a lot of WordPress sites?
What you are describing sounds Just your run of the mill Layer 7 attack which happen 24/7 365 days a year non stop from bots, the typical wp-login & xmlrp attacks.

I have noticed that some of the bots have a new plan instead of rapid fire brute force they are connecting and reconnecting or one in out & switch to a new IP which will allow them to not get banned as easily. So we did not notice right away what was going on.
A good custom Mod security rule stops them in their tracks.


One of our servers has been acting up as you described a couple times a day and we realized on of our clients multiple Magento installs was getting hammered adedd a mod security rule all is well now (well all most as soon a all in the botnet ips get banned ).


Also realized for some reason our WordPress mod security rule was not working which did not help
 

MaFt

Registered
Jun 3, 2019
3
1
1
UK
cPanel Access Level
Reseller Owner
I have 2 wordpress installs on the hosting I mentioned in my reply.

Can you expand on what the "good mod security rule" would be?
 

dalem

Well-Known Member
PartnerNOC
Oct 24, 2003
2,983
159
368
SLC
cPanel Access Level
DataCenter Provider
one that blocks bots "connections with no referrer"

Like this one
wp-login.php and mod security

xmlrpc is the same rule just change the wp-login.php to xmlrpc & change the mod security ID


and set up your firewall to ban them the time to ban will be entirely up to you & is server specific
We have ours set to 1 time block permanent as the more WordPress sites on a server the more connections there will be.


Make sure it works as expected as different server set ups seam to behave differently
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463
Hello @GoWilkes,

Thank you for sharing the additional information. The issue reported on this thread does not appear related to the case quoted below, but feel free to test out the temporary workaround if the affected system uses the Prefork MPM to see if it has any impact on the reported issue:

Internal case EA-8508 was recently opened to address an issue where an update to the ea-apr RPM lead to instability on some systems using the Prefork MPM. The temporary workaround for affected systems is to execute the following command:

Code:
echo "Mutex sysvsem" >> /etc/apache2/conf.modules.d/000_mod_mpm_prefork.conf; /scripts/rebuildhttpdconf; /scripts/restartsrv_httpd --hard
Note the above command includes a restart of the Apache service. We're tentatively planning to publish a fix for this case in the next EasyApache 4 release (you can follow the EA4 Change Log here).
If the workaround doesn't help, could you open a support ticket so we can rule out any issues with cPanel & WHM? Post the ticket number here and I'll link this thread to it.

Thank you.
 

dalem

Well-Known Member
PartnerNOC
Oct 24, 2003
2,983
159
368
SLC
cPanel Access Level
DataCenter Provider
PS this was just a guess as what your issue is on our server it was definitely the issue
you can do a quick check and see how many foreign ip's are brute forcing


grep -ir wp-login.php /var/log/apache2/domlogs
grep -ir wp-admin /var/log/apache2/domlogs
 
  • Like
Reactions: cPanelMichael

GoWilkes

Well-Known Member
Sep 26, 2006
703
34
178
cPanel Access Level
Root Administrator
Michael, it turns that I don't have Prefork, after all. This was the result when I ran the commands you gave:

Code:
-bash: /etc/apache2/conf.modules.d/000_mod_mpm_prefork.conf: No such file or directory
Built /usr/local/apache/conf/httpd.conf OK
Waiting for âhttpdâhttpdâ

Service Status
        httpd (/usr/local/apache/bin/httpd -k start) is running as root with PID 4923 (pidfile+/proc check method).

Startup Log
        [Wed Jun 05 02:50:50 2019] [error] VirtualHost *:443 -- mixing * ports and non-* ports with a NameVirtualHost address is not supported, proceeding with undefined results

Log Messages
        [Wed Jun 05 02:50:51 2019] [notice] ModSecurity for Apache/2.9.0 (http://www.modsecurity.org/) configured.
        [Wed Jun 05 02:50:51 2019] [notice] suEXEC mechanism enabled (wrapper: /usr/local/apache/bin/suexec)

httpd restarted successfully.
I'm a tad concerned about the error message, considering that all of the accounts on the server were created with WHM and I haven't manually edited httpd.conf in years... probably not since I got this server, honestly. All of my sites seem to be running so I don't think it's a fatal error, but I definitely wasn't expecting it!


@dalem, that was a great thought, but unfortunately not my issue :-( My log files were at:

/usr/local/apache/domlogs/[USERNAME]/[DOMAIN.COM]

I already test for references to wp-admin and wp-login via PHP and block IPs, but not at the firewall so it was an idea! But I only had 5 references to wp-login, and 2 to wp-admin. So that wasn't the culprit, either.

@GOT, just FYI, it looks like CC_ALLOW_FILTER isn't blocking non-US IPs the way I'd hoped, so you could be right on that one. I was manually adding RIPE, APNIC, and LACNIC IP ranges but removed them in favor of CC_ALLOW_FILTER a few days ago. I didn't notice an increase in processes or anything, but I just now looked and saw that I have 7 RIPE connections.

But anyway... no change on my end, I still have almost double the number of processes, my RAM usage is off the charts, etc. I'm at a complete loss.
 

GoWilkes

Well-Known Member
Sep 26, 2006
703
34
178
cPanel Access Level
Root Administrator
I am... I'm procrastinating for 2 reasons:

1. I always wait until the last minute for software updates, to let everyone else figure out the bugs before I deal with them; and

2. Nothing in the documentation has commented on potential down time while waiting for it to update, so I'm waiting for a time when I have a few hours to possibly wait, and then another few hours to sort out bugs before the next business day.