Server crash during backup, high CPU usage

Denis Gomes Franco

Well-Known Member
Sep 3, 2018
45
7
8
Tupã, São Paulo, Brazil
cPanel Access Level
Root Administrator
Hello.

Something weird happened today at 3 am when I run my backups. Ever since installing cPanel I've enabled full daily backups transported to Amazon S3 (no local backups, and S3 is configured via official options, not mounted to the server) and cPanel excelled at it. But today it crashed mid backup and took the whole server down. Woke up with lots of notifications from Freshping and one customer asking why their site was down

Restarted the server and all's fine. So I checked the logs and the last line was:

[2019-03-03 03:52:23 -0300] Creating Archive .................................................................................... ..............................cpuwatch (Sun Mar 3 03:58:15 2019): System load is currently 3.75; waiting for it to go down below 3.50 to continue …

After that everything was down, all websites and the panel, until I restarted the server.

This was related to an account that has around 9gb of files but I have other accounts as large as this one that were backed up successfully on that run. Based on what was saved to S3, the server backed up about 1/5th of the accounts.

Server is a 4-core and has plenty of free space. Vultr logs shows that in that moment the CPU usage didn't go over 200%, however that graph (last 24h) doesn't have much resolution so I can't confirm it didn't go over 375%.

So my questions are:
1. Anyone has any idea about what happened? Plenty of full backups completed just fine the previous days.
2. Any way to set up an automatic server restart if it gets unresponsive like this or the backup never finishes?
3. I know cPanel can't restore backups stored on S3, but why does the restore option in WHM only shows some accounts, even though I can see them all on the S3 interface? Just curious about this one, though.
4. There is an option "Extra CPUs for server load" that is currently set at zero. What does this do? Also I learned that server load can't go over 4 as this is a 4-core server. Seeing that backup stopped at 3.75 load, it seems that it correctly identified the number of CPUs/cores.

Additionally, if there's a better way to configure these backups, I'd like to hear.
 

GOT

Get Proactive!
PartnerNOC
Apr 8, 2003
1,758
313
363
Chesapeake, VA
cPanel Access Level
DataCenter Provider
The most likely scenario is that something else caused load to go high and crash the server. The cPanel logs actually show that it detected the high load and paused the backup process.

It could have been anything causing server load to go high though usually its some form of a web attack like a WP xmlrpc attach or brute force logins.

Might want to look at your domlogs during that period and see if you can see any attacking IPs.