The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Server Froze Up

Discussion in 'General Discussion' started by Marc Tremblay, Mar 19, 2017.

  1. Marc Tremblay

    Marc Tremblay Member

    Joined:
    Mar 19, 2017
    Messages:
    15
    Likes Received:
    2
    Trophy Points:
    3
    Location:
    Canada
    cPanel Access Level:
    Root Administrator
    Hi, I've been running this dedicated server for 3.5 years and 3 weeks ago, I had to replace one of the 2 hard drives in a SoftRaid array because smartd reported 2 errors after a cPanel upgrade graceful reboot I did. In the last 3 weeks everything was running super smoothly and the server was very fast/responsive.

    This morning I woke up and any website tab I would open would timeout, a SSH connection would not even be established and 100% packet loss on ping! I received *NO* alert email from cPanel at all. There was no other option than to hard reboot the machine.

    Upon hard reboot, everything was back at a functional state after a minute or so and I could eventually log in WHM and check the Munin graphs. You can see about at the same time as the backup begun, the WHOLE machine kind of froze or something. No disk activity (ZERO) no network activity, nothing, until I rebooted an hour ago. I am now puzzled as to what could have happened and how to make sure it doesn't happen again, since many clients called in this morning to report their email and website wasn't working, obviously.

    I also checked via SSH the smartctl reports and all disks report as PASSED. There's one thing I do not understand tough, as a side question: Is it normal in a SoftRaid that only sdb is used constantly and sda is almost NEVER used at all?

    Can anybody PLEASE help me figure this weird problem out?!
    - Removed -

    Thanks

    Here are the Munin screenshots.
     

    Attached Files:

    #1 Marc Tremblay, Mar 19, 2017
    Last edited by a moderator: Mar 19, 2017
  2. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    15,623
    Likes Received:
    296
    Trophy Points:
    433
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    Looks like the server was offline. Have you spoken with your Hosting provider about this?
     
  3. Marc Tremblay

    Marc Tremblay Member

    Joined:
    Mar 19, 2017
    Messages:
    15
    Likes Received:
    2
    Trophy Points:
    3
    Location:
    Canada
    cPanel Access Level:
    Root Administrator
    Hi, thank you so much for replying. No, not yet, because I'm on a hosting provider that takes no responsibility for the software part and I'm not sure if this is software or hardware based. Munin would have continued to log the graphs if internet was down at the data center no??

    EDIT: I just did call them and as I thought, they simply told me there was no intervention or downtime in their network over night, so this is a software issue OR a hardware issue that their monitoring tools cannot detect. It's like the server COMPLETELY FROZE, like CPU usage = 0 for the whole night, not 1%, ZERO.
     
    #3 Marc Tremblay, Mar 19, 2017
    Last edited: Mar 19, 2017
  4. Marc Tremblay

    Marc Tremblay Member

    Joined:
    Mar 19, 2017
    Messages:
    15
    Likes Received:
    2
    Trophy Points:
    3
    Location:
    Canada
    cPanel Access Level:
    Root Administrator
    I'm trying to post more information (no pictures) and it won't let me! It says it looks like spam, what the?

    Trying for the 3rd time now:

    1) Temperature was 30c to 31c between 9pm and 1am where this occured (just when backups start)

    2) RAM was unused, like 3GB used out of 64GB, all the rest was cache and 10GB of that was even totally unused RAM.

    3) Hard disk free space was (and still is) at least 280GB.
     
  5. Marc Tremblay

    Marc Tremblay Member

    Joined:
    Mar 19, 2017
    Messages:
    15
    Likes Received:
    2
    Trophy Points:
    3
    Location:
    Canada
    cPanel Access Level:
    Root Administrator
    At exactly 1:00 AM (when this occured) the /var/log/messages filled one line full of NUL and there was NOTHING else appended until I asked the data center to hard reboot it. It has been running like nothing happened for the past 5 hours now. Please somebody from cPanel, help me figure this out, any clue anybody?
     
  6. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    15,623
    Likes Received:
    296
    Trophy Points:
    433
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    Just post new posts as needed instead of edits.
     
  7. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    15,623
    Likes Received:
    296
    Trophy Points:
    433
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    Hand picking a few quotes from your thread:

    You should never need to reboot after a cPanel update. Assuming your Hosting Provider replaced the drive;

    No, I don't think it is.

    Who replaced the drive(s)? If it was me, and I had my Hosting Provider replace a drive on my rig and then this issue came up, I wouldn't think intervention or downtime in their network, so much as, did they configure the server properly.

    I don't see this as an issue with your cPanel installation so much as it might be a possible hardware issue.

    If your Hosting Provider is not being helpful, you might want to consider hiring a System Admin to take a closer look at this for you.
     
  8. Marc Tremblay

    Marc Tremblay Member

    Joined:
    Mar 19, 2017
    Messages:
    15
    Likes Received:
    2
    Trophy Points:
    3
    Location:
    Canada
    cPanel Access Level:
    Root Administrator
    First of all, thank you so much for taking 2 minutes to reply to this thread, for god's sake, my face looks like a huge interrogation mark since sunday, so thank you for trying to help me here :P

    I did not "need" to reboot the server after the cPanel update. I just gracefully did by my own will so that they could replace the drive at that moment, which they did, and it worked fine for at least 3 weeks in a row without any problem that I could notice. Changing the hard drive may not even be related to the issue, but I thought it was worth mentionning. Also, concerning my secondary question which you answered, I also got the end of the story today: SDB was the only disk used back then because, somehow, SDA was not populated in the SoftRaid array at the time of writing the OP. A stupid mistake I must have done, but now it is populated correctly and the SoftRaid is totally clean and functional with both drives as intended. Did not need to reboot since sunday morning (at the time of writing the OP)

    SO... you think it's a hardware issue? It's been 3 days in a row without any sign of a problem yet... crossing my fingers so tight man, it's not even funny. I woke up this morning at 3:00 AM paranoying the server was down... it was just a nightmare I had thankfully :P
     
  9. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    15,623
    Likes Received:
    296
    Trophy Points:
    433
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    Going by your screenshots the server seems to have had some sort of issue for almost 9 hours. I would think most any software issues that might have come up wouldn't happen without leaving some sort of tracks in the logs for closer investigation after the fact.

    Making sure you've got full, proper backups of all accounts saved off of that server, somewhere else, is always a good idea. If you're not sure whats going on here with the server, checking to make sure you're backing things up properly nightly, is important right now I think.
     
  10. Marc Tremblay

    Marc Tremblay Member

    Joined:
    Mar 19, 2017
    Messages:
    15
    Likes Received:
    2
    Trophy Points:
    3
    Location:
    Canada
    cPanel Access Level:
    Root Administrator
    Yes and also, as I stated before, exactly when all graphs suddenly dropped to zero, the /var/log/messages filled a line full of NUL characters and there was NOTHING else appended to it until I asked the data center to hard reboot it, which immediately brought it back up and it's been running ever since. It's almost like the hard drives suddenly got disconnected or something, no?

    EDIT: Taking that back. NUL characters being WRITTEN to disk means the disks were still attached obviously :P
     
Loading...

Share This Page