airaid

Member
Feb 18, 2014
23
1
3
cPanel Access Level
Root Administrator
CENTOS 6.7 x86_64 xenpv – WHM 54.0 (build 1) 4GB RAM 4 CPU

Recently I've started to encounter a problem where sites are going offline temporarily during the nightly backup to S3. This is now being accompanied by the chkservd process becoming non-responsive. The backups used to take approx 40 mins but now take 6.5 hours to complete and sites are offline for up to an hour.

I run 12 not very busy sites on this server with backup sizes ranging from 10MB to 4GB. Going back several months, there was one site with a large db (2.5GB) that would go offline for a minute or two during the backup. I put this down to the db size and because no other sites were affected I thought it was safe to ignore. Now however the problem has grown to impact the entire backup process.

My limited troubleshooting skills show that during the backup process memory might be being exhausted:

total: 3904496
used: 3886328
free: 18168

- Removed please attach images to your posts -

I could add more memory but don't want to waste the money if it won't solve the issue. How can I properly troubleshoot this?

I'm happy to provide more info but not sure what might be useful.

Many thanks
 

24x7server

Well-Known Member
Apr 17, 2013
1,911
96
78
India
cPanel Access Level
Root Administrator
Twitter
Hello,

I will suggest you please monitor your server process with top and ps aufx command during backup process so that you can find out the exact root cause your site down issues.
 

airaid

Member
Feb 18, 2014
23
1
3
cPanel Access Level
Root Administrator
Hello,

I will suggest you please monitor your server process with top and ps aufx command during backup process so that you can find out the exact root cause your site down issues.
Thank you. Forgive me for asking but is there a way I can set these up to run automatically and record to a log file for later review? The backup starts at 3am so running these manually isn't practical.
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,910
2,215
363
Hello :)

Please review the backup log files stored in:

/usr/local/cpanel/logs/cpbackup/

Also, since you are transporting the backup archives to a remote destination, the transport log may offer some information:

/usr/local/cpanel/logs/cpbackup_transporter.log

Do you notice any particular output to these logs files during the backup process that suggest a reason for the CPU/Memory increase?

Thank you.
 

airaid

Member
Feb 18, 2014
23
1
3
cPanel Access Level
Root Administrator
Thank you, very helpful. There's nothing to note in the transfer logs but the backup logs are littered with lines like the one below (including right at the top of the file which suggests the CPUs are already under heavy load):

cpuwatch (Tue Jan 19 02:04:34 2016): System load is currently 3.54; waiting for it to go down to 3.50 to continue …

These lines appear mostly but not exclusively during the archive creation.

Which begs the question; what's the best way to find out what's hogging the CPU? Is there a log I can check or something I can implement to track which processes could be causing this? I found /var/log/dcpumon which looks like it could help but this only seems to store the last 30 mins of data.

Thanks again.