Hello,
I'm having a problem with a "fake" CPU usage during cPanel backup. And the problem does not happen always. Only about 3-4 months, for no reason, but I need to reboot the server and lost that day backups.
Hard to explain, but... let's go.
Server is a VPS on RamNode, package VDS with only 1 dedicated 3.4 Ghz CPU and 4 Gb RAM. VPS is KVM (host).
Backup is set to run at 01:30. It runs everyday.
At 03:00 AM local time, that is 00:00 UTC, I start to receive alerts from my Nagios saying that the CPU is high. When I "top", CPU usage is 100% idle but the average counters are crazy, like: 1.02 (1 min), 1.03 (5 min), 1.02 (15 min). But, in fact, nothing is using CPU except the basic 2-3% for services and OS.
But... at 23:55 UTC, CPU is like 0.02 0.03 0.02 - In real world, the 1.00 (15 min) is not true. So, I believe that the counters just move to 1.0 1.0 1.0 immediately at 00:00 UTC.
Logs...
Ideas: something related to daily Cron? The server is 95% cPanel basic install in a minimum CentOS install and everything as recommended by docs... I just install other packages like mrtg, systat, iotop, nrpe (nagios) after done.
Note: in the first moment appears to be related to CentOS 7, KVM but... the problem only happens on cPanel servers AND during the backup AND 00:00 UTC (always). And this is not related to RamNode because the problem happened in another cPanel server once (but I don't remember if it was KVM, Xen or OpenVZ).
So... any ideas?
I'm having a problem with a "fake" CPU usage during cPanel backup. And the problem does not happen always. Only about 3-4 months, for no reason, but I need to reboot the server and lost that day backups.
Hard to explain, but... let's go.
Server is a VPS on RamNode, package VDS with only 1 dedicated 3.4 Ghz CPU and 4 Gb RAM. VPS is KVM (host).
Backup is set to run at 01:30. It runs everyday.
At 03:00 AM local time, that is 00:00 UTC, I start to receive alerts from my Nagios saying that the CPU is high. When I "top", CPU usage is 100% idle but the average counters are crazy, like: 1.02 (1 min), 1.03 (5 min), 1.02 (15 min). But, in fact, nothing is using CPU except the basic 2-3% for services and OS.
But... at 23:55 UTC, CPU is like 0.02 0.03 0.02 - In real world, the 1.00 (15 min) is not true. So, I believe that the counters just move to 1.0 1.0 1.0 immediately at 00:00 UTC.
Logs...
and stuck... then, a reboot, and the next line:[2018-10-05 01:35:47 -0300] Performing “Integration” component....
[2018-10-05 01:35:47 -0300] Completed “Integration” component.
[2018-10-05 01:35:47 -0300] Performing “AuthnLinks” component....
(...)
[2018-10-05 01:35:47 -0300] Completed “MailLimits” component.
[2018-10-05 01:35:47 -0300] Creating Archive ....Load watching resumed due to SIGUSR2
cpuwatch (Fri Oct 5 01:35:47 2018): System load is currently 0.88; waiting for it to go down below 0.88 to continue …
The "System load is currently 0.88" is normal, occurs everday, but proceeds after some seconds.[2018-10-05 03:10:02 -0300] info [backup] Final state is Backup::Failure (HUP)
Ideas: something related to daily Cron? The server is 95% cPanel basic install in a minimum CentOS install and everything as recommended by docs... I just install other packages like mrtg, systat, iotop, nrpe (nagios) after done.
Note: in the first moment appears to be related to CentOS 7, KVM but... the problem only happens on cPanel servers AND during the backup AND 00:00 UTC (always). And this is not related to RamNode because the problem happened in another cPanel server once (but I don't remember if it was KVM, Xen or OpenVZ).
So... any ideas?