urgent: Server crash / high load

Simplesi

Registered
Mar 30, 2005
3
0
151
Hi,

I'm having huge problems with a cpanel server I'm running.

On a near daily basis, it becomes sluggish, and needs restarting. Sometimes, the mysql process crashes and isn't restarted by cpanel - and the first I hear is users complaining that mysql is down.

Its running FC1, as that's what our host supports (its a VPS box). However, I don't know how to work out the problem. I'm running STABLE, but am considering using RELEASE instead just incase there's a bugfix there.

Here's an example of top when it's misbehaving:

Code:
22:19:45  up 17:49,  2 users,  load average: 49.16, 43.18, 23.32
98 processes: 93 sleeping, 4 running, 1 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total    1.0%    0.0%    6.2%   0.3%     0.0%   92.3%    0.0%
Mem:   126664k av,  124680k used,    1984k free,       0k shrd,     264k buff
        63724k active,              52132k inactive
Swap:  130040k av,  130020k used,      20k free                    3440k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 1346 root      16   0 13552  368    56 S     2.9  0.2  10:03   0 httpd
   55 root      15   0     0    0     0 DW    0.4  0.0   1:06   0 kswapd0
13831 mailman   18   0  5604 1956   940 D     0.4  1.5   0:03   0 python
 1598 root      18   0 10120  740   152 D     0.2  0.5   0:53   0 cppop
13803 mailman   18   0  6020 2020   716 D     0.2  1.5   0:05   0 python
13813 mailman   18   0  6044 1868   624 R     0.2  1.4   0:04   0 python
13826 mailman   18   0  5620 2060   960 D     0.2  1.6   0:03   0 python
13827 root      17   0  2244  552   288 R     0.2  0.4   0:05   0 top
13829 mailman   18   0  5812 2124   940 D     0.2  1.6   0:03   0 python
13837 root      18   0  7580 3768   356 D     0.2  2.9   0:02   0 mrtg
13766 root      18   0  7996 2692   404 D     0.1  2.1   0:06   0 mrtg
13778 root      18   0  4760 1240   480 D     0.1  0.9   0:04   0 dcpumon
13793 root      18   0 11492  844   368 D     0.1  0.6   0:05   0 php
    1 root      17   0  1596   60    48 S     0.0  0.0   0:06   0 init
    2 root      34  19     0    0     0 SWN   0.0  0.0   0:00   0 ksoftirqd/0
    3 root      10  -5     0    0     0 SW<   0.0  0.0   0:00   0 events/0
    4 root      11  -5     0    0     0 SW<   0.0  0.0   0:00   0 khelper
    9 root      10  -5     0    0     0 SW<   0.0  0.0   0:00   0 kthread
   12 root      10  -5     0    0     0 SW<   0.0  0.0   0:00   0 kblockd/0
   56 root      20  -5     0    0     0 SW<   0.0  0.0   0:00   0 aio/0
  737 root      15   0     0    0     0 SW    0.0  0.0   0:03   0 kjournald
 1106 root      17   0  1648  104    64 S     0.0  0.0   0:09   0 syslogd
 1110 root      15   0  1600   88    48 S     0.0  0.0   0:03   0 klogd
 1205 root      17   0  3852   32     0 S     0.0  0.0   0:01   0 sshd
 1218 root      16   0  2204    4     0 S     0.0  0.0   0:00   0 xinetd
 1304 mailnull  16   0  7128    4     0 S     0.0  0.0   0:01   0 exim
 1308 mailnull  18   0  7128    4     0 S     0.0  0.0   0:00   0 exim
 1315 root      16   0  3840  268   188 S     0.0  0.2   1:05   0 antirelayd
 1354 root      17   0  1640   68    36 S     0.0  0.0   0:02   0 crond
 1428 root       0 -20     0    0     0 SW<   0.0  0.0   0:00   0 loop0
 1429 root      15   0     0    0     0 SW    0.0  0.0   0:00   0 kjournald
 1511 root      16   0 11648    0     0 SW    0.0  0.0   0:00   0 cpsrvd
 1557 root      16   0  6056    4     0 S     0.0  0.0   0:00   0 pure-ftpd
 1560 root      18   0  3756    4     0 S     0.0  0.0   0:00   0 pure-authd
 1577 cpanel    15   0  4136    4     0 S     0.0  0.0   0:00   0 stunnel-4.04loc
 1695 mailman   17   0  7588  276     0 S     0.0  0.2   0:04   0 mailmanctl
 1697 root      16   0  1596    4     0 S     0.0  0.0   0:00   0 agetty
As you can see, it's nearly out of memory - swap and RAM - and I belive alot of the CPU is being used for paging. In fact, in that extract, the cpu doesn't look too bad - except check out the load for a 1CPU machine. But at other times, the system cpu state has up to 90% load.

How can I work out what's consuming the resources and crashing the machine? I'm fairly used to linux, but haven't had to deal with something like this before.

All my packages are up to date, according to cpanel.

Thanks,
Simon
 

AndyReed

Well-Known Member
PartnerNOC
May 29, 2004
2,217
4
193
Minneapolis, MN
Simplesi said:
I'm having huge problems with a cpanel server I'm running.

On a near daily basis, it becomes sluggish, and needs restarting. Sometimes, the mysql process crashes and isn't restarted by cpanel - and the first I hear is users complaining that mysql is down.

Its running FC1, as that's what our host supports (its a VPS box). However, I don't know how to work out the problem. I'm running STABLE, but am considering using RELEASE instead just incase there's a bugfix there.

How can I work out what's consuming the resources and crashing the machine? I'm fairly used to linux, but haven't had to deal with something like this before.

All my packages are up to date, according to cpanel.
Although this issue has been discussed hundreds of times in these forums, you need to find out who or what is causing the problem. Most likely a bad php and/or cgi/perl script. If that's true, you'll need to clean up your server, and install security patches and applications including ModeSecurity, APF, and BFD.
 

Simplesi

Registered
Mar 30, 2005
3
0
151
At the moment, the server hosts just one website with an up-to-date ezpublish install - there are no other users to upload dodgy scripts with vulnerabilities.

I'd appreciate some pointers on identifying the problem - I certainly agree that I need to deal with it!

Simon

PS. I've just moved from Stable to Release - but it seems there's no actual difference in package, so unfortunately that hasn't fixed things.
 

codek

Active Member
Jul 26, 2004
25
0
151
VERY interesting. I have exactly this problem on my server FC2, it developed today. Load goes through the roof, but nothing is using cpu, it's all in wait state.

Can anyone help? I've tried everything i and my host can think of to debug this but no joy!!!

Thanks!
Dan
 

Simplesi

Registered
Mar 30, 2005
3
0
151
It seems that for me the problem was spamd, the spam assassin daemon. When I looked closely, it was consuming 90Mb of memory - and considering I have only 128Mb ram and 128Mb swap, that soon meant that it keeled over.

My temporary fix was to disable spamd - in the email settings, but also (and I missed this the first time round) in the service configuration, so the service definately doesn't run.

That was yesterday, and it hasn't got anywhere near a crash since. There are quite a few posts in these forums talking about spamd consuiming loads of memory and some tips for dealing with it. In due course I'll need spamd back up and running, so I guess I'll start working through those.

Good luck!

Simon