Community Forums
Connect with us on LinkedIn
Community Notice
+ Reply to Thread
Page 1 of 4 1 2 3 ... LastLast
Results 1 to 15 of 46
  1. #1
    Member
    Join Date
    Sep 2003
    Posts
    146

    Exclamation server crash every 2nd day @ same time :)

    Situation:
    It is a brand new sister server (with a only a small difference to the other cpanel server -> it has the latest firmware for the raid controller).

    WHM 9.9.0 cPanel 9.9.2-S8
    Redhat EL ES 3
    2.4.21-20.ELsmp

    Dual Xeon 2,8 Ghz
    4 GB RAM
    Raid 5 with 4 SCSI drives (Adaptec 2910 -> aacraid)

    Now the server runs rock stable BUT every 2nd day at 2:00 AM it is over!

    The logs shows me nothing interesting - one time I found an entry about a scsi error at last.
    I think it is a problem like the "smart check" with scsi raid systems - but with the smart check on the system crashed every time I ran upcp!

    Okay I have to be an issue with a cronjob, but every cron job run at least daily!

    /var/log
    Oct 18 02:00:00 cpanel03 CROND[5295]: (root) CMD (/usr/local/sim/sim -q >> /dev/null 2>&1)
    sim runs every 5mins
    Oct 18 02:00:00 cpanel03 CROND[5299]: (root) CMD (/usr/local/sbin/spri -q >> /dev/null 2>&1)
    spri runs every 45mins
    Oct 18 02:00:00 cpanel03 CROND[5301]: (root) CMD (/usr/local/cpanel/bin/dcpumon >/dev/null 2>&1)
    dcpumon runs every 5mins

    Oct 18 02:00:00 cpanel03 CROND[5310]: (root) CMD (/scripts/upcp)

    Oct 18 02:00:00 cpanel03 CROND[5297]: (root) CMD (/usr/local/sbin/lsm -c >> /dev/null 2>&1)
    lsm runs every 10mins
    Oct 18 02:00:00 cpanel03 CROND[5304]: (root) CMD (/bin/rm /tmp/cpanel.TMP* >>/dev/null 2>&1)
    lsm runs every 60mins
    Oct 18 02:00:00 cpanel03 CROND[5308]: (root) CMD (/usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1)
    dnsqueue runs every 15mins
    Oct 18 02:00:01 cpanel03 crontab[5842]: (root) LIST (root)
    Oct 18 02:00:01 cpanel03 crontab[5844]: (root) LIST (root)
    Oct 18 02:00:01 cpanel03 crontab[5845]: (root) LIST (root)
    Oct 18 02:00:01 cpanel03 crontab[5846]: (root) REPLACE (root)
    /var/message
    Oct 18 02:00:06 cpanel03 proftpd[5272]: cpanel03.xxxx (127.0.0.1[127.0.0.1]) - FTP login timed out, disconnected
    Oct 18 02:00:06 cpanel03 proftpd[5272]: cpanel03.xxxx (127.0.0.1[127.0.0.1]) - FTP session closed.

    It has to be a issue with Oct 18 02:00:00 cpanel03 CROND[5310]: (root) CMD (/scripts/upcp)

    But upcp runs daily!
    I can run upcp 10times without a problem and the only difference if I start it is
    if (!$ishuman) {
    system("/scripts/cpbackup");
    }
    At the crash days the server stops working befor it could make the backup.
    But I have also no problem to rund 10times /scripts/cpbackup.


    Like I said it only crash every second day and everthing is running at least every day!
    I couldn't explain it - first I thought of the "smartcheck issue" and I disabled it with "touch /var/cpanel/disablesmartcheck", next I thought I have to do with the cpanellog issue (cpanellogd spawning thousands of processes (fix) - http://forums.cpanel.net/showthread.php?t=30705).

    Actually I had no idea

    Next I will do is to seperate /scripts/upcp to testify my upcp theory.

    What does upcp only every 2nd day do and not every day?
    Last edited by Rubas; 10-18-2004 at 02:35 AM.

  2. #2
    Member
    Join Date
    Sep 2003
    Posts
    146

    Default

    I seperated /scripts/cpbackup from /scripts/upcp and changed the time of runnig for upcp.

    The server crashed @ 2nd day if upcp called from cron - I did a lot of updates with upcp today from the shell without any problem!

    Strange!

  3. #3
    Member
    Join Date
    Jul 2002
    Location
    Atlanta, GA
    Posts
    646

    Default

    Quote Originally Posted by Rubas
    I seperated /scripts/cpbackup from /scripts/upcp and changed the time of runnig for upcp.

    The server crashed @ 2nd day if upcp called from cron - I did a lot of updates with upcp today from the shell without any problem!

    Strange!

    Any resolution on this. We have one new box experiencing the same problem (though we use SATA RAID and do have smart disabled). We have numeropurs other boxes with the exact same hardware having no problems.

  4. #4
    Member
    Join Date
    Sep 2003
    Posts
    146

    Default

    No - I talked a lot with the cpanel support in the last days but without succeed.

    I know circa where the server will crash in the upcp but I can not reproduce this error.
    But every second day it crashs even though this is running daily and only if upcp called from cron.

    Also to change the cronjob to "upcp manual" will not help ...


    The actually workaround is to use only cpbackup from cron and I call upcp every day from the shell (and delete the new cronjob of upcp after running).
    With this tactic the server is running rock-stable.

    If you have a new idea, please let me know

  5. #5
    Member
    Join Date
    Jul 2002
    Location
    Atlanta, GA
    Posts
    646

    Default

    Quote Originally Posted by Rubas
    No - I talked a lot with the cpanel support in the last days but without succeed.

    I know circa where the server will crash in the upcp but I can not reproduce this error.
    But every second day it crashs even though this is running daily and only if upcp called from cron.

    Also to change the cronjob to "upcp manual" will not help ...


    The actually workaround is to use only cpbackup from cron and I call upcp every day from the shell (and delete the new cronjob of upcp after running).
    With this tactic the server is running rock-stable.

    If you have a new idea, please let me know
    We are seeing the same behavior. Manual upcp's seem fine, but random cron-based upcp's cause the failure. We end up having to power cycle the server each time.

    The only thing we notice different with this one particular server compared to the others is that it has the latest Perl RPM installed as part of the base install when the server was built (and subsequently had the cPanel Perl 5.8.1 installer run on it - a 'perl -v' does show the system is using cPanel's Perl):

    For example:

    Good Servers have:

    perl-5.8.0-88.4

    Bad Server has:

    perl-5.8.0-88.7

    But on both, a 'perl -v' yields:

    This is perl, v5.8.1 built for i686-linux

    We remember something similar occurring back when the first Perl overwrite from RHEL up2date occurred months ago leading to the tweak setting. Our guess is that something related to the installation of the latest RPM may still be causing the issue.

    If you've got an existing ticket open, you may want to pass this info on to support. If you could verify your Perl RPM set as a reply here, that would be great ('rpm -qa perl*').

  6. #6
    Member
    Join Date
    Sep 2003
    Posts
    146

    Default

    Yes, I also noticed the difference between the perl version but couldn't figure out why. (no multi threaded perl).


    Good
    perl-5.8.0-88.4
    # perl -v
    This is perl, v5.8.1 built for i686-linux

    Bad
    perl-5.8.0-88.7 (no perl update - just a fresh installation)
    # perl -v
    This is perl, v5.8.4 built for i686-linux


    I setup up a monitoring script which sends me every second a processlist

    the last info before crash
    CROND
    19581 19582 19582 19582 ? -1 S 0 0:00 \_ /bin/sh -c (/scripts/upcp manual)
    19582 19583 19582 19582 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/upcp manual
    19583 19892 19582 19582 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/rpmup2
    19892 20031 19582 19582 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/installgd
    20031 20072 19582 19582 ? -1 S 0 0:00 | \_ /usr/sbin/userhelper -t -w up2date --nox -i perl-CPAN libpng-devel libjpeg-devel XFree86-devel iconv jpeg xpm png
    20072 20075 19582 19582 ? -1 R 0 0:01 | \_ /usr/bin/python -u /usr/sbin/up2date --nox -i perl-CPAN libpng-devel libjpeg-devel XFree86-devel iconv jpeg xpm png
    19581 19657 4088 4088 ? -1 S 47 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem root
    This would be the next step if there is no crash with upcp
    4088 11482 4088 4088 ? -1 S 0 0:00 \_ CROND
    11482 11483 11483 11483 ? -1 S 0 0:00 \_ /bin/sh -c (/scripts/upcp manual)
    11483 11484 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/upcp manual
    11484 11791 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/rpmup2
    11791 11825 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /usr/local/cpanel/bin/checkperlmodules
    11825 11837 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/realperlinstaller Net::AIM Net::SSLeay Archive::Tar GD::Graph Tree::MultiNode Tie::IxHash HTML::Entities IO::Tty Bundle::DBD::mysql CGI MD5 Digest::MD5 Expect Mail::SpamAssassin Net::DNS Bundle::Interchange Geo::IPfr
    11837 11845 11483 11483 ? -1 S 0 0:00 | \_ /usr/sbin/userhelper -t -w up2date --nox -i ncftp
    11845 11848 11483 11483 ? -1 D 0 0:00 | \_ /usr/bin/python -u /usr/sbin/up2date --nox -i ncftp
    11482 11558 4088 4088 ? -1 S 47 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi


    The support had only the idea of an issue with multi-threaded perl and actually the ticket is closed.
    Last edited by Rubas; 11-02-2004 at 02:56 AM.

  7. #7
    Member
    Join Date
    Jul 2002
    Location
    Atlanta, GA
    Posts
    646

    Default

    What was your ticket number? We'll open a new one with both of our information.

    One additional thing we found is even though we installed the perl581installer on all RHEL servers, the good servers show the following:

    -rwxr-xr-x 2 root root 1002181 Apr 18 2004 /usr/bin/perl*


    And the bad server shows:

    -rwxr-xr-x 2 root root 994853 Oct 28 16:51 /usr/bin/perl*
    Last edited by jsteel; 11-02-2004 at 08:23 AM.

  8. #8
    Member
    Join Date
    Sep 2003
    Posts
    146

    Default

    Quote Originally Posted by jsteel
    What was your ticket number? We'll open a new one with both of our information.
    ID# 77087

    Please keep me up to date!

  9. #9
    Member
    Join Date
    Jan 2004
    Location
    Texas
    Posts
    24

    Default

    I have the same problem...'

    I wish i could give yall more information but yall already listed everythign I have experienced...

    -jb

  10. #10
    Member
    Join Date
    Jul 2002
    Location
    Atlanta, GA
    Posts
    646

    Default

    Quote Originally Posted by Jasonbd
    I have the same problem...'

    I wish i could give yall more information but yall already listed everythign I have experienced...

    -jb
    What's your OS, cPanel version and list of perl RPMs?

  11. #11
    Member
    Join Date
    Jan 2004
    Location
    Texas
    Posts
    24

    Default

    OS - RedHat ES 3
    Perl - v5.8.4 perl-5.8.0-88.7
    Cpanel - 9.9.8-RELEASE_5
    Last edited by Jasonbd; 11-03-2004 at 10:08 AM.

  12. #12
    Member
    Join Date
    Jul 2002
    Location
    Atlanta, GA
    Posts
    646

    Default

    Jason & Rubas:

    Do you know if your problematic servers were built out using RHEL ES 3 Update 3 specifically?

  13. #13
    Member
    Join Date
    Sep 2003
    Posts
    146

    Default

    Quote Originally Posted by jsteel
    Jason & Rubas:

    Do you know if your problematic servers were built out using RHEL ES 3 Update 3 specifically?
    Yes, the server is built on RH EL 3 Update 3 ISO.
    I know this exactly - we had some troubles because with the "update 3 CDs" you need at least the first 2 CDs for minimum installation and not only the first.
    Last edited by Rubas; 11-04-2004 at 09:13 AM.

  14. #14
    Member
    Join Date
    Jan 2004
    Location
    Texas
    Posts
    24

    Default

    Mine to, the server is built on RH EL 3 Update 3 ISO

  15. #15
    Registered User
    Join Date
    Aug 2004
    Location
    Atlanta, GA
    Posts
    2

    Default

    In our case, this problem has turned out to be related to the laus package. After disabling this service, the server that had been crashing every other day has stayed up, and enabling it on other previously stable test machines resulted in them starting to crash every other day.

    This package started getting included in a "minimal" installation as of RHEL 3 update 3, and the audit daemon is enabled by default. If you don't need it (if you didn't know it was running or what it was, then you don't need it) you can safely disable it and reclaim a bunch of disk space from /var/log/audit.d.

    root@localhost# chkconfig audit off
    root@localhost# service audit stop
    root@localhost# ps -ef | grep auditd (make sure it's stopped)

    It also involves a kernel module (named "audit"), which you may also want to disable. Doing so will prevent the userspace tools that support auditing from generating errors when they can no longer find /dev/audit.

    root@localhost# service crond stop
    root@localhost# service atd stop
    root@localhost# rmmod audit
    root@localhost# lsmod | grep audit (make sure it's gone)
    root@localhost# echo "alias char-major-10-224 off" >> /etc/modules.conf
    root@localhost# service crond start
    root@localhost# service atd start

    You could even go so far as to remove the laus package altogether.

    root@localhost# rpm -e laus

    Reports on the Taroon mailing list and Red Hat's Bugzilla indicate that other activities such as restarting Lotus Domino server have approximately the same effect as our nightly running of upcp. I don't think this is a problem with cPanel, other than it causes occasional system activity which seems to trigger whatever problem auditd has. Updated kernel and laus packages that resolve this issue should be out with update 4.

    Ref:
    [bugzilla] System running LAuS hanging regularly
    [bugzilla] audit service on by default?
    [bugzilla] cron and laus problem
    [bugzilla] Kernel panic when stopping Lotus Domino 6.52
    [taroon-list] Re: Kernel panics

+ Reply to Thread
Page 1 of 4 1 2 3 ... LastLast
Similar Threads & Tags
Similar threads

  1. Help Desperatly! Server crash every day httpd consumes all memory
    By kran in forum cPanel and WHM Discussions
    Replies: 15
    Last Post: 05-15-2005, 08:04 PM
  2. Server crash every day 1-3 times
    By cosmin in forum cPanel and WHM Discussions
    Replies: 6
    Last Post: 04-19-2004, 01:30 AM
  3. bad week: all day server crash..and i must reboot:(
    By Creazioni1 in forum cPanel and WHM Discussions
    Replies: 0
    Last Post: 02-23-2004, 08:19 AM
  4. Redhat: 2nd Root password changes every day
    By NNNils in forum cPanel and WHM Discussions
    Replies: 7
    Last Post: 07-24-2003, 03:31 PM
  5. Server slows to a halt, same time, every day
    By MarkB in forum cPanel and WHM Discussions
    Replies: 8
    Last Post: 06-02-2003, 05:44 PM
Linkedin       Facebook       Twitter       RSS       Flickr       YouTube