The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

cPanel/WHM - caused an incredible load by running on almost empty box!

Discussion in 'General Discussion' started by ispro, Jan 8, 2005.

  1. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    cPanel/WHM (cppop is the killer!) - incredible load by running on almost empty box!

    Today in the middle of the day one of our server running 9.9.9 S15 become heavy loaded.

    After a lot of research we have found that if cPanel/WHM running - load goes up to 30-40% and server becomes almost unrespondable as well as common services like http/smtp/named goes virtually down.

    The interesting thing is that box running 10 domains with no actual load on them. No matter if Apache/ftp/Exim stopped (with chkservd to preven restarts) - load is high.
    The only solution is to issue:
    service cpanel stop
    several times and then all remaining services works like a charm. But it is not a case...

    We have firewall installed, checked its logs, server's logs - nothing wrong.
    /scripts/upcp --force
    not helps.

    Anyone may helps us?
    We need to find a solution urgently!
     
    #1 ispro, Jan 8, 2005
    Last edited: Jan 9, 2005
  2. philb

    philb Well-Known Member

    Joined:
    Jan 28, 2004
    Messages:
    116
    Likes Received:
    0
    Trophy Points:
    16
    This typically only happens if people hammer cpanel/whm a bit and get the load to go up. If you run top on the command line while you're having the problem, which processes are actually using cpu time?
     
  3. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    Is it was so easy...
    There are no customers on the server. Noone requesting cPanel/WHM services besides of outside monitoring services.

    top shows nothing usefull, even more, according to the 'top' there just 3% of load by user and 1% by system - about 95% idle...
    iowait is 0.0%

    But 5/15/60 averages are 30-40% !

    We are really frustrating and has no more ideas on what's going on!
     
  4. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    Addition.

    ...killing of python2 (for mailman) helps to stop cPanel at last:
    kill -9 -g python2

    After cPanel stopped load goes down immediately.

    But it is not a solution...

    Will try to disable mailman - perhaps it is offends server?..
     
  5. philb

    philb Well-Known Member

    Joined:
    Jan 28, 2004
    Messages:
    116
    Likes Received:
    0
    Trophy Points:
    16
    wait, the load averages show 30-40%? do you mean they're 0.30 or 30.0 ?
     
  6. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    Not 0.3!
    30%
    Let's see:

    root@secure [~]# w
    04:04:43 up 13 days, 12:25, 3 users, load average: 32.98, 17.23, 8.78
    USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
    root pts/2 dial_up_dinamic_ 11:04pm 2:02m 0.14s 0.12s -bash
    root pts/3 dial_up_dinamic_ 11:04pm 24.00s 0.16s 0.12s -bash
    root pts/5 dial_up_dinamic_ 11:04pm 20:25 4.45s 0.30s -bash

    root@secure [~]# w
    04:06:41 up 13 days, 12:27, 3 users, load average: 33.57, 22.17, 11.57
    USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
    root pts/2 dial_up_dinamic_ 11:04pm 2:04m 0.14s 0.12s -bash
    root pts/3 dial_up_dinamic_ 11:04pm 22.00s 0.17s 0.13s -bash
    root pts/5 dial_up_dinamic_ 11:04pm 22:24 4.59s 0.30s -bash

    root@secure [~]# w
    04:20:59 up 13 days, 12:41, 3 users, load average: 59.48, 48.26, 32.46
    USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
    root pts/2 dial_up_dinamic_ 11:04pm 57.00s 0.12s 0.12s -bash
    root pts/3 dial_up_dinamic_ 11:04pm 56.00s 0.18s 0.15s -bash
    root pts/5 dial_up_dinamic_ 11:04pm 36:44 1:03 0.30s -bash

    Got idea?

    Btw, disabling mailman helps. What is even more interesting - we has NO mailman lists...
    Very strange.
     
  7. philb

    philb Well-Known Member

    Joined:
    Jan 28, 2004
    Messages:
    116
    Likes Received:
    0
    Trophy Points:
    16
    Ok, problem is - load averages are not actually a percentage. Load average indicates how many processes are waiting at any given time for cpu time, on average, over the time period it applies to (5/10/15 mins).

    A single process running at 100% cpu will give a load average of around 1.00 if not much else is going on. Two processes running at 100% cpu (or attempting to) will usually yield a load of 2.00, etc.

    There was some issues with some Xeon CPU based systems and RHEL that I recall a while ago, and load averages that flew up for no apparent reason at all. Are you using RHEL on a Xeon / Dual Xeon server?
     
  8. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    No. It is Single CPU P-IV 2.4 with RH9...

    Btw, issue was resolved temporary. Now cPanel launched puthon2/mailman again and things become weird again...
     
  9. philb

    philb Well-Known Member

    Joined:
    Jan 28, 2004
    Messages:
    116
    Likes Received:
    0
    Trophy Points:
    16
    When it's under high load, can you show me the output of:

    free -m
    df -h
     
  10. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    Will try... the 5-min load is 55% and it takes about five (!) minutes to execute any single command like killing/restarting cPanel...

    root@secure [~]# free -m
    total used free shared buffers cached
    Mem: 502 493 8 0 2 12
    -/+ buffers/cache: 478 23
    Swap: 2000 182 1817

    root@secure [~]# df -h
    Filesystem Size Used Avail Use% Mounted on
    /dev/hda3 35G 8.8G 25G 27% /
    /dev/hda1 99M 3.3M 91M 4% /boot
    none 252M 0 252M 0% /dev/shm
    /usr/tmpDSK 243M 18M 213M 8% /tmp
    /tmp 243M 18M 213M 8% /var/tmp
     
  11. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    I JUST managed to kill cPanel and stop its services incl. mailman...

    Load goes down and at least exim getting and delivering emails... ;(

    How to debug this issue?..
     
  12. philb

    philb Well-Known Member

    Joined:
    Jan 28, 2004
    Messages:
    116
    Likes Received:
    0
    Trophy Points:
    16
    Things like this are weird ones, thats for sure. Has this machine always done this or is this a recent development? Are you running the latest kernel available?
     
  13. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    This mashine NEVER do like this!

    Btw, having cPanel off machine was stable for a whole night...

    We do not run latest kernel (2.4.20-28.9), but I suppose it is not a case.
    Actually you needn't to run latest kernel or your uptime will be not as good ;)

    Now I have change skipmailman=0 to skipmailman=1 in /var/cpanel/cpanel.config and give it a try as for now...
     
  14. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    Noticed that when MailMain is off - server works as in past - load averages to 1%.

    However, MailMain logs shows nothing interesting besides our kill/stop attempts.
    cPanel and qrunner logs also having no important information.

    I'm asking MailMain specialists: "How to debug the problem?"

    It is a very weird problem and what if it will appear on other, production, server of us which clients actually using MailMain?..
     
  15. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    Ough... MailMan was not the case.... when cPanel services started - server goes with its load sjy rocket - while actually there are no offending processes in top...

    Stopping cPanel services right now...
     
  16. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    Note: I'm continueing posting not for posts' count! Guess it may be interesting for anyones having similar problems.

    Well, as I have found that cPanel services caused a high load I did another test.

    At the firewall Ihave blocked 2082,2083,2086,2087,2095,2096 and 110,965 for POP, 143 for IMAP.

    This way I was pretty sure that noone may use them outside.

    Then I have started cPanel - load goes up like a crazy...
    I have killed it and stopped.

    Then I have tried to launch cppop individually... well, having NO cpsrvs/webmaild/whostmgr load goes up... No local connections for 110,965 are being made!
    killed cppop - load goes down...

    So, at this point cppop is the offender - will look for more...

    Any comments are appreciated.
     
  17. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    cppop killing server!

    Well, finally, started all the services of cPanel and instantly killing cppop (& removing cppop from chkservd to prevent cppop restarts) load is ok....

    What should we do?..
     
  18. philb

    philb Well-Known Member

    Joined:
    Jan 28, 2004
    Messages:
    116
    Likes Received:
    0
    Trophy Points:
    16
    Uptime is nothing. Security is everything. Update your kernel when updates are available or risk being rooted every time someone finds a hole in a php/cgi script that allows them to spawn a shell on your system. Not to mention that some kernels even have remotely exploitable holes in.
     
  19. ispro

    ispro Well-Known Member

    Joined:
    Apr 8, 2004
    Messages:
    628
    Likes Received:
    1
    Trophy Points:
    18
    When a security hole found we update kernels.
    However this kernel having no security leaks.
    Thank you for the suggestion, anyway.

    However actually cppop is the problem and we are trying to research what may be done - in spite of fact cppop killing server even having no inbound connections!

    P.S. I have updated kernel to latest available - just for make sure. However as server was restarted in the case problems solved (I do hope!) it will be not a clear confirmation, but...
     
    #19 ispro, Jan 9, 2005
    Last edited: Jan 9, 2005
Loading...

Share This Page