The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Strange problem resulting in hung server.

Discussion in 'General Discussion' started by dc2447, Aug 14, 2005.

  1. dc2447

    dc2447 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    6
    I have a dual xeon server, 4Gb of Ram - the server runs fine pretty much all the time but this morning the server became unresponsive and I could access it over httpd or ssh.

    The server was rebooted and I manged to get an ssh session just after the reboot for a few moments until I couldn't coninue as the server had apparently run out of ram. Another reboot and I got on again via ssh but when I tried to su ssh couldn't fork as it was out of ram already (less than 1 minute after coming up). There is something obviously wrong as ther server rarely uses more than about 1.2Gb of ram at anytime.

    I would normally put this behaviour down to high traffic but the server is really quiet today and there is nothing to suggest that we are being dos'd or there are any large traffic spikes. MRTG on the swich shows now large burts of traffic.

    I second suspicion is that there is a hardware fault of the server - however we had the exact same experience with the server hanging a few months ago which went away as quickly as it came and we have been fine since then.

    There has been no new scripts uploaded to the server recently either.

    Has anyone seen anything like this before on a server running cpanel? The forums don't suggest much - mostly people running unoptimised mysql (mine is optimised).

    Any thoughts?
     
  2. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    A few things that are simple to check:

    1. What OS and kernel are you running? If it's RHE or CentOS with an old kernel, make sure you upgrade it to their latest release

    2. Do you have the laus rpm installed? If so, uninstall it (search the forum on how to best do that)

    As you say, such problems can happen with hardware problems, especially bad memory sticks.
     
  3. dc2447

    dc2447 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    6
    Rather pathetically I am on redhat 7.2 - labeit running the lastest 2.4 kernel

    Nope

    The memory was actually swapped out before but it made no difference.

    I really need to get off this server as soon as -possible - it's just the thought of migrating everything puts me off.

    The server has now been up for about 15 minutes, is servers 3 times the traffic it was when it last crashed and is using less than a gig of ram.

    Wierd.
     
  4. AndyReed

    AndyReed Well-Known Member
    PartnerNOC

    Joined:
    May 29, 2004
    Messages:
    2,222
    Likes Received:
    3
    Trophy Points:
    38
    Location:
    Minneapolis, MN
    Did you check the error messages log /var/log/messages
    Any error leads to a software and/or hardware fault?
     
  5. dc2447

    dc2447 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    6
    For sure - nothing in sylog at all

    Not sure - it seems the OS justs starts believing that all the ram is gone and starts swapping when in fact there is nothing actually using any ram at all.
     
  6. dc2447

    dc2447 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    6
    Server just stopped responding again for no apparent reason other than I assume suddenly having no free memory

    [​IMG]

    actually got a shell on the box

    Code:
    5:09pm  up  5:41,  1 user,  load average: 123.55, 237.06, 211.77
    331 processes: 300 sleeping, 26 running, 5 zombie, 0 stopped
    CPU0 states: 83.4% user, 16.0% system,  0.0% nice,  0.0% idle
    CPU1 states: 85.1% user, 14.3% system,  0.0% nice,  0.0% idle
    CPU2 states: 83.3% user, 16.1% system,  0.0% nice,  0.0% idle
    CPU3 states: 86.0% user, 14.0% system,  0.0% nice,  0.0% idle
    Mem:  4068524K av, 3793132K used,  275392K free,       0K shrd,   13504K buff
    Swap: 2048276K av,  119036K used, 1929240K free                  212960K cached
    
     
    #6 dc2447, Aug 14, 2005
    Last edited: Aug 14, 2005
  7. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    That load average seems to suggest a looping process rather than a memory thrashing problem to me. TBH, I'd go with getting off the server and onto a new one as you're running RH7.2 if it's something that you want to do anyway. In the long run, it'll definitely be worth it. Unless you've done a load of OS level customisations, moving server using cPanel full backups is reasonably painless these days.

    Have you checked things like /tmp, /var/tmp, /usr/local/apache/proxy and /dev/shm for exploits and run chkrootkit and rkhunter - just incase those are exploits running. The processes at a CPU sorted top might help ( press P and then c in top)
     
  8. dc2447

    dc2447 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    6
    I agree with what you are saying. The problem appears to be with Apache, some processes taking a lot of cpu time whilst most of the others behave normally. I have attempted to mitigate this by doing the old Solaris trick of setting maxRequestsPerChild to a very low level so that even if some children are running amok apache should catch them and kill them. There is an extra overhead but I don't see what else I can do.

    With regard to mving servers - I think I have to. I was kind of waiting until rh4 U1 was available so I could get 2.6 kernel etc but I will probably push ahead with moving now.

    yes - ran checkrootkit and had an audit - nothing seems amisss.
     
  9. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    The other thing you can do is to disable KeepAlives if too many children are hanging around claiming resources unecessarily, though this could cause a hike in CPU as new children are started. If you have KeepAlives already disabled, you could enable them but with a low Timeout, e.g. 3 seconds. It can be a balancing act with Apache, but as you know, it's always difficult with too many processes chasing too few resources.
     
  10. dc2447

    dc2447 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    6
    I have KeepAlive's disabled already - I don't think the issue is resources in general - just a certain set of criteria are being met that is resulting in the server spiralling out of control every so often.

    Am pricing a new server as we speak.
     
  11. dc2447

    dc2447 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    49
    Likes Received:
    0
    Trophy Points:
    6
    Have ordered a new server and am transferring accounts as we speak however the problematic server [above] is exhibiting the exact same problems again today. Keep rebooting the server and within 10 seconds the server is unusable - the server is a dual xeon with 4Gb of ram

    Nightmare.
     
Loading...

Share This Page