The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Server ground to a halt

Discussion in 'General Discussion' started by dev.null, Dec 9, 2003.

  1. dev.null

    dev.null Well-Known Member

    May 27, 2003
    Likes Received:
    Trophy Points:
    Help Help Help.

    I get a call this morning that a client can't get to their site. Sure enough, I can't ssh in. Funny thing is ssh never timed out (over five minutes trying to connect), but also didn't connect. Next time I'll do `ssh -v` to watch the progress (if any).

    So I go to the server and see iptables logging to the screen (I have it log various forms of traffic), so I know that the net was up and running. As a side note I've struggled with my syslog.conf to *not* log to the screen, but it somehow always reverts to doing it - seems like it grants my wish for a week or so, but eventually it ends up on the screen and the log file I direct it to.

    Well I try to log in, it takes my name but never gets around to asking me for my password (this is a console, no X running).

    I switch to another console (CTRL-ALT-F2) and try the same, same results.

    I switch the box off and bring it up. Seems that all is well (minus the fact that it knows the disks weren't cleanly disconnected and it prompts for a forced check).

    Well I let it come on up, I check out the logs (now via ssh back at my desk) and see that about 20 after midnight everything stopped logging, cron, network traffic, exim, everything.

    The last traffic I saw on the "network" was 127.0.01:2095 traffic, something internal was talking to webmaild. Is there a known vulnerability with webmail?

    The logging intervals look fine, like the system didn't gradually go bad, it just suddenly halted. Like I said, the kernel was evidently still running because various portions of the network were up and logging.

    The system had been rebooted last on Thanksgiving morning, so it's been up for less than two weeks and has ran for months on end in the past without any problem.

    1. What else do I need to do as a post-mortem?
    2. Is there some monitoring tool I can put in that will note system activity and a slow degredation over time?
    3. What can I do to mitigate this? I've seen the clustering feature listed in whm, (briefly) how does it work and would it have helped prevent site down time in this case?

    Any comments, pointers, etc are welcome here. I've only been running cPanel for about 6 mo, and have lived in slackware for 5 years (as apposed to redhat for the past 6 mo).

    Thanks for your help.

  2. Website Rob

    Website Rob Well-Known Member

    Mar 23, 2002
    Likes Received:
    Trophy Points:
    Alberta, Canada
    cPanel Access Level:
    Root Administrator
    Although I cannot help with the cause/solution to this problem, I did run into a similar situation.

    Various monitoring sites (DC included) did not show the server as down and yet, pages would not load, I could not get eMail and I could not SSH to the box. A call to the DC and having someone physically check out the Server, showed the load was so high, nothing else could run. A reboot of the Server also corrected.

    To date, this has not happened again, nor, do I know what caused it.
  3. tAzMaNiAc

    tAzMaNiAc Well-Known Member

    Feb 16, 2003
    Likes Received:
    Trophy Points:
    Sachse, TX
    A bad PHP script can cause your box to skyrocket. I had it happen and luckily got in to restart processes and kill off the bad script.

    It had hit loads of >30 on a 1 cpu machine. Bad.

  4. james61

    james61 Registered

    Aug 31, 2003
    Likes Received:
    Trophy Points:
    I've seen an Athlon XP2200 hit 200+ load before :p

Share This Page