The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

server crash every 2nd day @ same time :)

Discussion in 'General Discussion' started by Rubas, Oct 18, 2004.

  1. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    Situation:
    It is a brand new sister server (with a only a small difference to the other cpanel server -> it has the latest firmware for the raid controller).

    WHM 9.9.0 cPanel 9.9.2-S8
    Redhat EL ES 3
    2.4.21-20.ELsmp

    Dual Xeon 2,8 Ghz
    4 GB RAM
    Raid 5 with 4 SCSI drives (Adaptec 2910 -> aacraid)

    Now the server runs rock stable BUT every 2nd day at 2:00 AM it is over!

    The logs shows me nothing interesting - one time I found an entry about a scsi error at last.
    I think it is a problem like the "smart check" with scsi raid systems - but with the smart check on the system crashed every time I ran upcp!

    Okay I have to be an issue with a cronjob, but every cron job run at least daily!

    /var/log
    /var/message

    It has to be a issue with Oct 18 02:00:00 cpanel03 CROND[5310]: (root) CMD (/scripts/upcp)

    But upcp runs daily!
    I can run upcp 10times without a problem and the only difference if I start it is
    At the crash days the server stops working befor it could make the backup.
    But I have also no problem to rund 10times /scripts/cpbackup.


    Like I said it only crash every second day and everthing is running at least every day!
    I couldn't explain it - first I thought of the "smartcheck issue" and I disabled it with "touch /var/cpanel/disablesmartcheck", next I thought I have to do with the cpanellog issue (cpanellogd spawning thousands of processes (fix) - http://forums.cpanel.net/showthread.php?t=30705).

    Actually I had no idea :)

    Next I will do is to seperate /scripts/upcp to testify my upcp theory.

    What does upcp only every 2nd day do and not every day?
     
    #1 Rubas, Oct 18, 2004
    Last edited: Oct 18, 2004
  2. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    I seperated /scripts/cpbackup from /scripts/upcp and changed the time of runnig for upcp.

    The server crashed @ 2nd day if upcp called from cron - I did a lot of updates with upcp today from the shell without any problem!

    Strange!
     
  3. jsteel

    jsteel Well-Known Member

    Joined:
    Jul 4, 2002
    Messages:
    646
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Atlanta, GA

    Any resolution on this. We have one new box experiencing the same problem (though we use SATA RAID and do have smart disabled). We have numeropurs other boxes with the exact same hardware having no problems.
     
  4. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    No - I talked a lot with the cpanel support in the last days but without succeed.

    I know circa where the server will crash in the upcp but I can not reproduce this error.
    But every second day it crashs even though this is running daily and only if upcp called from cron.

    Also to change the cronjob to "upcp manual" will not help ...


    The actually workaround is to use only cpbackup from cron and I call upcp every day from the shell (and delete the new cronjob of upcp after running).
    With this tactic the server is running rock-stable.

    If you have a new idea, please let me know :)
     
  5. jsteel

    jsteel Well-Known Member

    Joined:
    Jul 4, 2002
    Messages:
    646
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Atlanta, GA
    We are seeing the same behavior. Manual upcp's seem fine, but random cron-based upcp's cause the failure. We end up having to power cycle the server each time.

    The only thing we notice different with this one particular server compared to the others is that it has the latest Perl RPM installed as part of the base install when the server was built (and subsequently had the cPanel Perl 5.8.1 installer run on it - a 'perl -v' does show the system is using cPanel's Perl):

    For example:

    Good Servers have:

    perl-5.8.0-88.4

    Bad Server has:

    perl-5.8.0-88.7

    But on both, a 'perl -v' yields:

    This is perl, v5.8.1 built for i686-linux

    We remember something similar occurring back when the first Perl overwrite from RHEL up2date occurred months ago leading to the tweak setting. Our guess is that something related to the installation of the latest RPM may still be causing the issue.

    If you've got an existing ticket open, you may want to pass this info on to support. If you could verify your Perl RPM set as a reply here, that would be great ('rpm -qa perl*').
     
  6. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    Yes, I also noticed the difference between the perl version but couldn't figure out why. (no multi threaded perl).


    Good
    perl-5.8.0-88.4
    # perl -v
    This is perl, v5.8.1 built for i686-linux

    Bad
    perl-5.8.0-88.7 (no perl update - just a fresh installation)
    # perl -v
    This is perl, v5.8.4 built for i686-linux


    I setup up a monitoring script which sends me every second a processlist

    the last info before crash
    This would be the next step if there is no crash with upcp


    The support had only the idea of an issue with multi-threaded perl and actually the ticket is closed.
     
    #6 Rubas, Nov 2, 2004
    Last edited: Nov 2, 2004
  7. jsteel

    jsteel Well-Known Member

    Joined:
    Jul 4, 2002
    Messages:
    646
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Atlanta, GA
    What was your ticket number? We'll open a new one with both of our information.

    One additional thing we found is even though we installed the perl581installer on all RHEL servers, the good servers show the following:

    -rwxr-xr-x 2 root root 1002181 Apr 18 2004 /usr/bin/perl*


    And the bad server shows:

    -rwxr-xr-x 2 root root 994853 Oct 28 16:51 /usr/bin/perl*
     
    #7 jsteel, Nov 2, 2004
    Last edited: Nov 2, 2004
  8. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    ID# 77087

    Please keep me up to date!
     
  9. Jasonbd

    Jasonbd Member

    Joined:
    Jan 4, 2004
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    Texas
    I have the same problem...'

    I wish i could give yall more information but yall already listed everythign I have experienced...

    -jb
     
  10. jsteel

    jsteel Well-Known Member

    Joined:
    Jul 4, 2002
    Messages:
    646
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Atlanta, GA
    What's your OS, cPanel version and list of perl RPMs?
     
  11. Jasonbd

    Jasonbd Member

    Joined:
    Jan 4, 2004
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    Texas
    OS - RedHat ES 3
    Perl - v5.8.4 perl-5.8.0-88.7
    Cpanel - 9.9.8-RELEASE_5
     
    #11 Jasonbd, Nov 3, 2004
    Last edited: Nov 3, 2004
  12. jsteel

    jsteel Well-Known Member

    Joined:
    Jul 4, 2002
    Messages:
    646
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Atlanta, GA
    Jason & Rubas:

    Do you know if your problematic servers were built out using RHEL ES 3 Update 3 specifically?
     
  13. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    Yes, the server is built on RH EL 3 Update 3 ISO.
    I know this exactly - we had some troubles because with the "update 3 CDs" you need at least the first 2 CDs for minimum installation and not only the first.
     
    #13 Rubas, Nov 4, 2004
    Last edited: Nov 4, 2004
  14. Jasonbd

    Jasonbd Member

    Joined:
    Jan 4, 2004
    Messages:
    22
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    Texas
    Mine to, the server is built on RH EL 3 Update 3 ISO
     
  15. bbailey

    bbailey Registered

    Joined:
    Aug 10, 2004
    Messages:
    2
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    Atlanta, GA
    In our case, this problem has turned out to be related to the laus package. After disabling this service, the server that had been crashing every other day has stayed up, and enabling it on other previously stable test machines resulted in them starting to crash every other day.

    This package started getting included in a "minimal" installation as of RHEL 3 update 3, and the audit daemon is enabled by default. If you don't need it (if you didn't know it was running or what it was, then you don't need it) you can safely disable it and reclaim a bunch of disk space from /var/log/audit.d.

    root@localhost# chkconfig audit off
    root@localhost# service audit stop
    root@localhost# ps -ef | grep auditd (make sure it's stopped)

    It also involves a kernel module (named "audit"), which you may also want to disable. Doing so will prevent the userspace tools that support auditing from generating errors when they can no longer find /dev/audit.

    root@localhost# service crond stop
    root@localhost# service atd stop
    root@localhost# rmmod audit
    root@localhost# lsmod | grep audit (make sure it's gone)
    root@localhost# echo "alias char-major-10-224 off" >> /etc/modules.conf
    root@localhost# service crond start
    root@localhost# service atd start

    You could even go so far as to remove the laus package altogether.

    root@localhost# rpm -e laus

    Reports on the Taroon mailing list and Red Hat's Bugzilla indicate that other activities such as restarting Lotus Domino server have approximately the same effect as our nightly running of upcp. I don't think this is a problem with cPanel, other than it causes occasional system activity which seems to trigger whatever problem auditd has. Updated kernel and laus packages that resolve this issue should be out with update 4.

    Ref:
    [bugzilla] System running LAuS hanging regularly
    [bugzilla] audit service on by default?
    [bugzilla] cron and laus problem
    [bugzilla] Kernel panic when stopping Lotus Domino 6.52
    [taroon-list] Re: Kernel panics
     
  16. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    Thank you for your first post bbailey!
     
  17. jsteel

    jsteel Well-Known Member

    Joined:
    Jul 4, 2002
    Messages:
    646
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Atlanta, GA
    As an FYI, we've already posted the resolution to cPanel as part of the open ticket we created and copied Nick directly on it (so no need to report it to them again). Hopefully Nick will figure out what in upcp is causing the problem with system auditing.
     
  18. AlaskanWolf

    AlaskanWolf Well-Known Member

    Joined:
    Aug 11, 2001
    Messages:
    537
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Fremont CA
    Did you try disabling the cpanel update scripts? we had 2 servers ages ago that had RAID on em, and ended up it was cpanels nightly updates causing the problem of the servers crashing every night
     
  19. Rubas

    Rubas Well-Known Member

    Joined:
    Sep 15, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
  20. Jeewhizz

    Jeewhizz Well-Known Member

    Joined:
    Mar 12, 2003
    Messages:
    51
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    London, England
    I'm having the same issues on a server at the moment. I have disabled and removed laus, and will see if this makes a difference.

    Jee
     
Loading...

Share This Page