The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Servers keep crashing

Discussion in 'General Discussion' started by Reado, Oct 16, 2010.

  1. Reado

    Reado Well-Known Member

    Joined:
    Sep 8, 2009
    Messages:
    161
    Likes Received:
    4
    Trophy Points:
    18
    Location:
    United Kingdom
    cPanel Access Level:
    DataCenter Provider
    I'm running two servers locally hosted on a XenServer platform. Both are running CentOS 5.5 and cPanel-RELEASE. Every now and then they stop responding to SSH and console access. XenCentre reports CPU Core 0 and 7 at 100% whereas the other cores are at 0%.

    Here's the log from around the time the crash occurred. Today it was just after 13:00 the server stopped responding; it was rebooted at around 13:04.

    Code:
    Oct 16 12:55:29 l1vs05rcms pure-ftpd: (?@127.0.0.1) [INFO] New connection from 127.0.0.1
    Oct 16 12:55:29 l1vs05rcms pure-ftpd: (?@127.0.0.1) [INFO] Logout.
    Oct 16 13:00:18 l1vs05rcms pure-ftpd: (?@127.0.0.1) [INFO] New connection from 127.0.0.1
    Oct 16 13:00:18 l1vs05rcms pure-ftpd: (?@127.0.0.1) [INFO] Logout.
    Oct 16 13:00:23 l1vs05rcms kernel: Firewall: *TCP_IN Blocked* IN=eth0 OUT= MAC=f6:9d:fb:8d:5e:dd:00:50:7f:8e:78:c0:08:00 SRC=218.108.63.210 DST=10.1.1.116 LEN=60 TOS=0x00 PREC=0x00 TTL=38 ID=34974 DF PROTO=TCP SPT=41871 DPT=22 WINDOW=5840 RES=0x00 SYN URGP=0
    Oct 16 13:00:26 l1vs05rcms smbd[24690]: [2010/10/16 13:00:26,  0] printing/print_cups.c:cups_connect(103)
    Oct 16 13:00:26 l1vs05rcms smbd[24690]:   Unable to connect to CUPS server localhost:631 - Connection refused
    Oct 16 13:00:26 l1vs05rcms smbd[24691]: [2010/10/16 13:00:26,  0] printing/print_cups.c:cups_connect(103)
    Oct 16 13:00:26 l1vs05rcms smbd[24691]:   Unable to connect to CUPS server localhost:631 - Connection refused
    Oct 16 13:04:45 l1vs05rcms syslogd 1.4.1: restart.
    Oct 16 13:04:45 l1vs05rcms kernel: klogd 1.4.1, log source = /proc/kmsg started.
    Oct 16 13:04:45 l1vs05rcms kernel: Linux version 2.6.18-194.17.1.el5xen (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Sep 29 14:12:56 EDT 2010
    
    Can anyone help to diagnose what could have caused this? Are there any other logs that may be able to tell me what happened? I figured with it happening bang on 13:00 it was a cron job, but there are no cron jobs set to run at 13:00 that I can see.

    Any help would be greatly appreciated. Thanks in advance.
     
  2. cPanelTristan

    cPanelTristan Quality Assurance Analyst
    Staff Member

    Joined:
    Oct 2, 2010
    Messages:
    7,623
    Likes Received:
    21
    Trophy Points:
    38
    Location:
    somewhere over the rainbow
    cPanel Access Level:
    Root Administrator
    Is there anything in /var/log/dmesg that shows any system faults?

    Also, to clarify, do you run the main Xen instance itself or only have nodes on the Xen machine?

    I did want to mention something about cron jobs. I've recently seen some tickets where people believe crons were either causing high load or crashes. In all the years I've been administering machines, I've only seen one server crash from a cron job and it was a java-based cron job. I'm just trying to point out the unlikelihood that a cron job would cause a machine to have high load or crash.
     
  3. Reado

    Reado Well-Known Member

    Joined:
    Sep 8, 2009
    Messages:
    161
    Likes Received:
    4
    Trophy Points:
    18
    Location:
    United Kingdom
    cPanel Access Level:
    DataCenter Provider
    Hi Tristan,

    In /var/log/dmsg I can see this:

    Code:
    device-mapper: uevent: version 1.0.3
    device-mapper: ioctl: 4.11.5-ioctl (2007-12-12) initialised: dm-devel@redhat.com
    device-mapper: dm-raid45: initialized v0.2594l
    EXT3-fs: INFO: recovery required on readonly filesystem.
    EXT3-fs: write access will be enabled during recovery.
    kjournald starting.  Commit interval 5 seconds
    EXT3-fs: xvda1: orphan cleanup on readonly fs
    ext3_orphan_cleanup: deleting unreferenced inode 3932161
    ext3_orphan_cleanup: deleting unreferenced inode 2762928
    ext3_orphan_cleanup: deleting unreferenced inode 2763561
    The last entry repeats quite a few times then says this:

    Code:
    ext3_orphan_cleanup: deleting unreferenced inode 4162628
    ext3_orphan_cleanup: deleting unreferenced inode 2767840
    EXT3-fs: xvda1: 80 orphan inodes deleted
    EXT3-fs: recovery complete.
    EXT3-fs: mounted filesystem with ordered data mode.
    Is that good?

    There's a few IRQ errors, such as:

    Code:
    Failed to obtain physical IRQ 6
    floppy0: no floppy controllers found
    lp: driver loaded but no devices found
    But I can't see any system faults, panics, or anything obvious.

    Regarding Xen, I'm not sure what you mean by instance or nodes, but I'm basically running XenServer 5.5 Dell Edition - installed on an 8GB USB stick connected to the server's internal USB port. This is the embedded edition of XenServer. The VMs are running on 2 separate RAID-5 arrays; 3 hard drives per array. Two of the VMs are the cPanel servers.

    I installed CentOS 5.5 with the minimal configuration (no extra packages etc), then as soon as it was good to go installed the "latest" cPanel.

    Attached is what XenServer reported at the time of the crash. If you look closely you can see a spike in hard drive usage at around 13:00, at which point the CPU goes up and the network (the SSH session I had open) drops to zero.
     

    Attached Files:

  4. cPanelTristan

    cPanelTristan Quality Assurance Analyst
    Staff Member

    Joined:
    Oct 2, 2010
    Messages:
    7,623
    Likes Received:
    21
    Trophy Points:
    38
    Location:
    somewhere over the rainbow
    cPanel Access Level:
    Root Administrator
    No, it isn't good. You have orphaned inodes and the errors in /var/log/messages indicate it was a read only file system. You need to ensure you have backups off the machine and probably move to another drive or machine.

    You don't need to look for issues being caused by processes running. A read only file system isn't related to processes, it's related to hardware issues.
     
  5. Reado

    Reado Well-Known Member

    Joined:
    Sep 8, 2009
    Messages:
    161
    Likes Received:
    4
    Trophy Points:
    18
    Location:
    United Kingdom
    cPanel Access Level:
    DataCenter Provider
    When you say read-only, is this due to something happening on a VM or a XenServer level?

    I have 3 other Windows servers running on this Xen host, our Primary Domain Controller, a SQL Server and an XP VM. They never have any issues, so could it just be a problem with the Linux VMs?

    I seem to remember at the time I installed CentOS there was no "CentOS 5.5" template available, only "CentOS 5.3", but I tried installing CentOS 5.5 with the template anyway and never encountered any issues.
     
    #5 Reado, Oct 20, 2010
    Last edited: Oct 20, 2010
  6. cPanelTristan

    cPanelTristan Quality Assurance Analyst
    Staff Member

    Joined:
    Oct 2, 2010
    Messages:
    7,623
    Likes Received:
    21
    Trophy Points:
    38
    Location:
    somewhere over the rainbow
    cPanel Access Level:
    Root Administrator
    It's at the server level. It is a hardware issue.
     
  7. Reado

    Reado Well-Known Member

    Joined:
    Sep 8, 2009
    Messages:
    161
    Likes Received:
    4
    Trophy Points:
    18
    Location:
    United Kingdom
    cPanel Access Level:
    DataCenter Provider
    But the Windows VMs don't crash and I never have a problem with them.

    Another thing the Linux VMs use the latest CentOS kernel (2.6.18-194.17.1.el5xen) instead of the kernel that comes with the XenTools (kernel-xen-2.6.18-128.1.10.el5.xs5.5.0.52.i686).

    Could this be the cause of the problem? Is it recommended to use XenTools kernel over the CentOS version or doesn't it matter?
     
  8. cPanelTristan

    cPanelTristan Quality Assurance Analyst
    Staff Member

    Joined:
    Oct 2, 2010
    Messages:
    7,623
    Likes Received:
    21
    Trophy Points:
    38
    Location:
    somewhere over the rainbow
    cPanel Access Level:
    Root Administrator
    As far as I'm aware, it's recommended to use the Xen provided kernel. You could certainly try to switch to the kernel. I still highly encourage ensuring you have good off server backups. No matter what, these orphan inodes and file system errors aren't related to any processes on the machine, so you are inherently looking for hardware or operating system issues.
     
  9. Reado

    Reado Well-Known Member

    Joined:
    Sep 8, 2009
    Messages:
    161
    Likes Received:
    4
    Trophy Points:
    18
    Location:
    United Kingdom
    cPanel Access Level:
    DataCenter Provider
    Just noticed this in the dmesg log:

    XENBUS: Device with no driver: device/vbd/51712
    XENBUS: Device with no driver: device/vbd/51760
    XENBUS: Device with no driver: device/vif/0

    Also I've now switched to the Xen Kernel.
     
Loading...

Share This Page