cPanel/WHM - caused an incredible load by running on almost empty box!

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
cPanel/WHM (cppop is the killer!) - incredible load by running on almost empty box!

Today in the middle of the day one of our server running 9.9.9 S15 become heavy loaded.

After a lot of research we have found that if cPanel/WHM running - load goes up to 30-40% and server becomes almost unrespondable as well as common services like http/smtp/named goes virtually down.

The interesting thing is that box running 10 domains with no actual load on them. No matter if Apache/ftp/Exim stopped (with chkservd to preven restarts) - load is high.
The only solution is to issue:
service cpanel stop
several times and then all remaining services works like a charm. But it is not a case...

We have firewall installed, checked its logs, server's logs - nothing wrong.
/scripts/upcp --force
not helps.

Anyone may helps us?
We need to find a solution urgently!
 
Last edited:

philb

Well-Known Member
Jan 28, 2004
118
4
168
This typically only happens if people hammer cpanel/whm a bit and get the load to go up. If you run top on the command line while you're having the problem, which processes are actually using cpu time?
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
Is it was so easy...
There are no customers on the server. Noone requesting cPanel/WHM services besides of outside monitoring services.

top shows nothing usefull, even more, according to the 'top' there just 3% of load by user and 1% by system - about 95% idle...
iowait is 0.0%

But 5/15/60 averages are 30-40% !

We are really frustrating and has no more ideas on what's going on!
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
Addition.

...killing of python2 (for mailman) helps to stop cPanel at last:
kill -9 -g python2

After cPanel stopped load goes down immediately.

But it is not a solution...

Will try to disable mailman - perhaps it is offends server?..
 

philb

Well-Known Member
Jan 28, 2004
118
4
168
wait, the load averages show 30-40%? do you mean they're 0.30 or 30.0 ?
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
Not 0.3!
30%
Let's see:

[email protected] [~]# w
04:04:43 up 13 days, 12:25, 3 users, load average: 32.98, 17.23, 8.78
USER TTY FROM [email protected] IDLE JCPU PCPU WHAT
root pts/2 dial_up_dinamic_ 11:04pm 2:02m 0.14s 0.12s -bash
root pts/3 dial_up_dinamic_ 11:04pm 24.00s 0.16s 0.12s -bash
root pts/5 dial_up_dinamic_ 11:04pm 20:25 4.45s 0.30s -bash

[email protected] [~]# w
04:06:41 up 13 days, 12:27, 3 users, load average: 33.57, 22.17, 11.57
USER TTY FROM [email protected] IDLE JCPU PCPU WHAT
root pts/2 dial_up_dinamic_ 11:04pm 2:04m 0.14s 0.12s -bash
root pts/3 dial_up_dinamic_ 11:04pm 22.00s 0.17s 0.13s -bash
root pts/5 dial_up_dinamic_ 11:04pm 22:24 4.59s 0.30s -bash

[email protected] [~]# w
04:20:59 up 13 days, 12:41, 3 users, load average: 59.48, 48.26, 32.46
USER TTY FROM [email protected] IDLE JCPU PCPU WHAT
root pts/2 dial_up_dinamic_ 11:04pm 57.00s 0.12s 0.12s -bash
root pts/3 dial_up_dinamic_ 11:04pm 56.00s 0.18s 0.15s -bash
root pts/5 dial_up_dinamic_ 11:04pm 36:44 1:03 0.30s -bash

Got idea?

Btw, disabling mailman helps. What is even more interesting - we has NO mailman lists...
Very strange.
 

philb

Well-Known Member
Jan 28, 2004
118
4
168
Ok, problem is - load averages are not actually a percentage. Load average indicates how many processes are waiting at any given time for cpu time, on average, over the time period it applies to (5/10/15 mins).

A single process running at 100% cpu will give a load average of around 1.00 if not much else is going on. Two processes running at 100% cpu (or attempting to) will usually yield a load of 2.00, etc.

There was some issues with some Xeon CPU based systems and RHEL that I recall a while ago, and load averages that flew up for no apparent reason at all. Are you using RHEL on a Xeon / Dual Xeon server?
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
No. It is Single CPU P-IV 2.4 with RH9...

Btw, issue was resolved temporary. Now cPanel launched puthon2/mailman again and things become weird again...
 

philb

Well-Known Member
Jan 28, 2004
118
4
168
When it's under high load, can you show me the output of:

free -m
df -h
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
Will try... the 5-min load is 55% and it takes about five (!) minutes to execute any single command like killing/restarting cPanel...

[email protected] [~]# free -m
total used free shared buffers cached
Mem: 502 493 8 0 2 12
-/+ buffers/cache: 478 23
Swap: 2000 182 1817

[email protected] [~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/hda3 35G 8.8G 25G 27% /
/dev/hda1 99M 3.3M 91M 4% /boot
none 252M 0 252M 0% /dev/shm
/usr/tmpDSK 243M 18M 213M 8% /tmp
/tmp 243M 18M 213M 8% /var/tmp
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
I JUST managed to kill cPanel and stop its services incl. mailman...

Load goes down and at least exim getting and delivering emails... ;(

How to debug this issue?..
 

philb

Well-Known Member
Jan 28, 2004
118
4
168
Things like this are weird ones, thats for sure. Has this machine always done this or is this a recent development? Are you running the latest kernel available?
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
This mashine NEVER do like this!

Btw, having cPanel off machine was stable for a whole night...

We do not run latest kernel (2.4.20-28.9), but I suppose it is not a case.
Actually you needn't to run latest kernel or your uptime will be not as good ;)

Now I have change skipmailman=0 to skipmailman=1 in /var/cpanel/cpanel.config and give it a try as for now...
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
Noticed that when MailMain is off - server works as in past - load averages to 1%.

However, MailMain logs shows nothing interesting besides our kill/stop attempts.
cPanel and qrunner logs also having no important information.

I'm asking MailMain specialists: "How to debug the problem?"

It is a very weird problem and what if it will appear on other, production, server of us which clients actually using MailMain?..
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
Ough... MailMan was not the case.... when cPanel services started - server goes with its load sjy rocket - while actually there are no offending processes in top...

Stopping cPanel services right now...
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
Note: I'm continueing posting not for posts' count! Guess it may be interesting for anyones having similar problems.

Well, as I have found that cPanel services caused a high load I did another test.

At the firewall Ihave blocked 2082,2083,2086,2087,2095,2096 and 110,965 for POP, 143 for IMAP.

This way I was pretty sure that noone may use them outside.

Then I have started cPanel - load goes up like a crazy...
I have killed it and stopped.

Then I have tried to launch cppop individually... well, having NO cpsrvs/webmaild/whostmgr load goes up... No local connections for 110,965 are being made!
killed cppop - load goes down...

So, at this point cppop is the offender - will look for more...

Any comments are appreciated.
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
cppop killing server!

Well, finally, started all the services of cPanel and instantly killing cppop (& removing cppop from chkservd to prevent cppop restarts) load is ok....

What should we do?..
 

philb

Well-Known Member
Jan 28, 2004
118
4
168
ispro said:
We do not run latest kernel (2.4.20-28.9), but I suppose it is not a case.
Actually you needn't to run latest kernel or your uptime will be not as good ;)
Uptime is nothing. Security is everything. Update your kernel when updates are available or risk being rooted every time someone finds a hole in a php/cgi script that allows them to spawn a shell on your system. Not to mention that some kernels even have remotely exploitable holes in.
 

ispro

Well-Known Member
Verifed Vendor
Apr 8, 2004
628
2
168
When a security hole found we update kernels.
However this kernel having no security leaks.
Thank you for the suggestion, anyway.

However actually cppop is the problem and we are trying to research what may be done - in spite of fact cppop killing server even having no inbound connections!

P.S. I have updated kernel to latest available - just for make sure. However as server was restarted in the case problems solved (I do hope!) it will be not a clear confirmation, but...
 
Last edited: