Server crashes daily after kernel update

WhiteDog

Well-Known Member
Feb 19, 2008
142
6
68
Hello,

After performing a kernel upgrade a few days ago, my server has "crashed" every day at around the same time. This is always when creating the backups. I've seen the load go up from 10 to 20 to 40 and beyound. cPanel can then no longer be reached and I'm forced to do a hardware reboot.

I don't want to blame the kernel right away. I'm running CentOS 5.9 with the latest kernel they offer (2.6.18-348.3.1.el5). The server has otherwise always ran fine with 200+ days uptime.

Is there anything else I can try first, like:
- certain cron jobs
- recompile Apache / PHP / Exim / ...
- check certain log files to pinpoint where it goes wrong

I've never really had problems like these so I don't really know where to start looking... but I'm sure some of you have faced a similar problem in the past. Any help is greatly appreciated!

Things I've done in the mean time:
- Removed munin
- Recompiled Apache and PHP
 
Last edited:

ThinIce

Well-Known Member
Apr 27, 2006
352
9
168
Disillusioned in England
cPanel Access Level
Root Administrator
After performing a kernel upgrade a few days ago, my server has "crashed" every day at around the same time. This is always when creating the backups. I've seen the load go up from 10 to 20 to 40 and beyound. cPanel can then no longer be reached and I'm forced to do a hardware reboot.

I don't want to blame the kernel right away. I'm running CentOS 5.9 with the latest kernel they offer (2.6.18-348.3.1.el5). The server has otherwise always ran fine with 200+ days uptime.
Is the backup getting as far as the same account on every run or does the problem raise it's head at different points?

A quick test might be to boot back into the old kernel if it's still installed just to rule that out. cpbackup is generally pretty good at backing off if high load is detected (you could look for evidence of this in the cpbackup log and check the related cpu/nice values in tweak settings).

If load on the machine allows you - is the same problem created if you manually force run the backup at a different time of day?

What you report sounds like it could be an issue with user scripting / a user database, do you have any output from lfd, apache / process listings / sys-snap etc for when the problem is occurring?
 

WhiteDog

Well-Known Member
Feb 19, 2008
142
6
68
Is the backup getting as far as the same account on every run or does the problem raise it's head at different points?

A quick test might be to boot back into the old kernel if it's still installed just to rule that out. cpbackup is generally pretty good at backing off if high load is detected (you could look for evidence of this in the cpbackup log and check the related cpu/nice values in tweak settings).

If load on the machine allows you - is the same problem created if you manually force run the backup at a different time of day?

What you report sounds like it could be an issue with user scripting / a user database, do you have any output from lfd, apache / process listings / sys-snap etc for when the problem is occurring?
Hi ThinIce, thanks already for your feedback.

cpbackup runs for quite a long time on this server (1 AM > 10 AM). It normally does this as a steady pace and the server load at this time stays below 4.0. It's configured to use cpu priority 19 and ionice at 3. This has worked well for the last year or so.

Now at the time the problem starts, cpbackup has been running for hours and is at about 80%. What i noticed at this moment was the following:
- gzip at about 30% CPU, compressing a very large account (the backup is 9 GB).
- munin cron jobs running
- cpanellog running (although it's explicitly configured NOT to run at the same time).
- a LOT of Apache threads hanging around.
- Server Status script (port check) reports port 80 as down (and only port 80).

I just checked the Apache graphs as well in LFD, this is the Apache CPU graph for the last week:
/http://i50.tinypic.com/2qvqebt.gif

So I'm fairly certain my problem is caused by Apache not handling things as it did in the past. I'm only hoping there were some kernel optimizations in there that have been "corrected" after a recompile :)

For the record, my Pache config:
fileetag: None
keepalive: On
keepalivetimeout: 5
maxclients: 150
maxkeepaliverequests: 100
maxrequestsperchild: 10000
maxspareservers: 20
minspareservers: 10
root_options: ExecCGI, FollowSymLinks, Includes, IncludesNOEXEC, Indexes, SymLinksIfOwnerMatch
serverlimit: 256
serversignature: Off
servertokens: ProductOnly
sslciphersuite: ALL:!ADH:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP
startservers: 5
timeout: 300
traceenable: Off
 
Last edited:

Dante78

Well-Known Member
May 1, 2010
59
0
56
I had the same issue. Check if cpanelogd it is running in the same time with the backup, then check if some MySQL database (exim or any server service) it is not broken, by repairing it.