#1 (permalink)  
Old 10-18-2004, 02:30 AM
Registered User
 
Join Date: Sep 2003
Posts: 142
Rubas is on a distinguished road
Exclamation server crash every 2nd day @ same time :)

Situation:
It is a brand new sister server (with a only a small difference to the other cpanel server -> it has the latest firmware for the raid controller).

WHM 9.9.0 cPanel 9.9.2-S8
Redhat EL ES 3
2.4.21-20.ELsmp

Dual Xeon 2,8 Ghz
4 GB RAM
Raid 5 with 4 SCSI drives (Adaptec 2910 -> aacraid)

Now the server runs rock stable BUT every 2nd day at 2:00 AM it is over!

The logs shows me nothing interesting - one time I found an entry about a scsi error at last.
I think it is a problem like the "smart check" with scsi raid systems - but with the smart check on the system crashed every time I ran upcp!

Okay I have to be an issue with a cronjob, but every cron job run at least daily!

/var/log
Quote:
Oct 18 02:00:00 cpanel03 CROND[5295]: (root) CMD (/usr/local/sim/sim -q >> /dev/null 2>&1)
sim runs every 5mins
Oct 18 02:00:00 cpanel03 CROND[5299]: (root) CMD (/usr/local/sbin/spri -q >> /dev/null 2>&1)
spri runs every 45mins
Oct 18 02:00:00 cpanel03 CROND[5301]: (root) CMD (/usr/local/cpanel/bin/dcpumon >/dev/null 2>&1)
dcpumon runs every 5mins

Oct 18 02:00:00 cpanel03 CROND[5310]: (root) CMD (/scripts/upcp)

Oct 18 02:00:00 cpanel03 CROND[5297]: (root) CMD (/usr/local/sbin/lsm -c >> /dev/null 2>&1)
lsm runs every 10mins
Oct 18 02:00:00 cpanel03 CROND[5304]: (root) CMD (/bin/rm /tmp/cpanel.TMP* >>/dev/null 2>&1)
lsm runs every 60mins
Oct 18 02:00:00 cpanel03 CROND[5308]: (root) CMD (/usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1)
dnsqueue runs every 15mins
Oct 18 02:00:01 cpanel03 crontab[5842]: (root) LIST (root)
Oct 18 02:00:01 cpanel03 crontab[5844]: (root) LIST (root)
Oct 18 02:00:01 cpanel03 crontab[5845]: (root) LIST (root)
Oct 18 02:00:01 cpanel03 crontab[5846]: (root) REPLACE (root)
/var/message
Quote:
Oct 18 02:00:06 cpanel03 proftpd[5272]: cpanel03.xxxx (127.0.0.1[127.0.0.1]) - FTP login timed out, disconnected
Oct 18 02:00:06 cpanel03 proftpd[5272]: cpanel03.xxxx (127.0.0.1[127.0.0.1]) - FTP session closed.

It has to be a issue with Oct 18 02:00:00 cpanel03 CROND[5310]: (root) CMD (/scripts/upcp)

But upcp runs daily!
I can run upcp 10times without a problem and the only difference if I start it is
Quote:
if (!$ishuman) {
system("/scripts/cpbackup");
}
At the crash days the server stops working befor it could make the backup.
But I have also no problem to rund 10times /scripts/cpbackup.


Like I said it only crash every second day and everthing is running at least every day!
I couldn't explain it - first I thought of the "smartcheck issue" and I disabled it with "touch /var/cpanel/disablesmartcheck", next I thought I have to do with the cpanellog issue (cpanellogd spawning thousands of processes (fix) - cpanellogd spawning thousands of processes (fix)).

Actually I had no idea

Next I will do is to seperate /scripts/upcp to testify my upcp theory.

What does upcp only every 2nd day do and not every day?

Last edited by Rubas; 10-18-2004 at 02:35 AM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 10-20-2004, 12:38 AM
Registered User
 
Join Date: Sep 2003
Posts: 142
Rubas is on a distinguished road
I seperated /scripts/cpbackup from /scripts/upcp and changed the time of runnig for upcp.

The server crashed @ 2nd day if upcp called from cron - I did a lot of updates with upcp today from the shell without any problem!

Strange!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 11-01-2004, 02:05 PM
cPanel Partner NOC
 
Join Date: Jul 2002
Location: Atlanta, GA
Posts: 639
jsteel is on a distinguished road
Quote:
Originally Posted by Rubas
I seperated /scripts/cpbackup from /scripts/upcp and changed the time of runnig for upcp.

The server crashed @ 2nd day if upcp called from cron - I did a lot of updates with upcp today from the shell without any problem!

Strange!

Any resolution on this. We have one new box experiencing the same problem (though we use SATA RAID and do have smart disabled). We have numeropurs other boxes with the exact same hardware having no problems.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 11-01-2004, 02:18 PM
Registered User
 
Join Date: Sep 2003
Posts: 142
Rubas is on a distinguished road
No - I talked a lot with the cpanel support in the last days but without succeed.

I know circa where the server will crash in the upcp but I can not reproduce this error.
But every second day it crashs even though this is running daily and only if upcp called from cron.

Also to change the cronjob to "upcp manual" will not help ...


The actually workaround is to use only cpbackup from cron and I call upcp every day from the shell (and delete the new cronjob of upcp after running).
With this tactic the server is running rock-stable.

If you have a new idea, please let me know
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 11-01-2004, 10:17 PM
cPanel Partner NOC
 
Join Date: Jul 2002
Location: Atlanta, GA
Posts: 639
jsteel is on a distinguished road
Quote:
Originally Posted by Rubas
No - I talked a lot with the cpanel support in the last days but without succeed.

I know circa where the server will crash in the upcp but I can not reproduce this error.
But every second day it crashs even though this is running daily and only if upcp called from cron.

Also to change the cronjob to "upcp manual" will not help ...


The actually workaround is to use only cpbackup from cron and I call upcp every day from the shell (and delete the new cronjob of upcp after running).
With this tactic the server is running rock-stable.

If you have a new idea, please let me know
We are seeing the same behavior. Manual upcp's seem fine, but random cron-based upcp's cause the failure. We end up having to power cycle the server each time.

The only thing we notice different with this one particular server compared to the others is that it has the latest Perl RPM installed as part of the base install when the server was built (and subsequently had the cPanel Perl 5.8.1 installer run on it - a 'perl -v' does show the system is using cPanel's Perl):

For example:

Good Servers have:

perl-5.8.0-88.4

Bad Server has:

perl-5.8.0-88.7

But on both, a 'perl -v' yields:

This is perl, v5.8.1 built for i686-linux

We remember something similar occurring back when the first Perl overwrite from RHEL up2date occurred months ago leading to the tweak setting. Our guess is that something related to the installation of the latest RPM may still be causing the issue.

If you've got an existing ticket open, you may want to pass this info on to support. If you could verify your Perl RPM set as a reply here, that would be great ('rpm -qa perl*').
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 11-02-2004, 02:51 AM
Registered User
 
Join Date: Sep 2003
Posts: 142
Rubas is on a distinguished road
Yes, I also noticed the difference between the perl version but couldn't figure out why. (no multi threaded perl).


Good
perl-5.8.0-88.4
# perl -v
This is perl, v5.8.1 built for i686-linux

Bad
perl-5.8.0-88.7 (no perl update - just a fresh installation)
# perl -v
This is perl, v5.8.4 built for i686-linux


I setup up a monitoring script which sends me every second a processlist

the last info before crash
Quote:
CROND
19581 19582 19582 19582 ? -1 S 0 0:00 \_ /bin/sh -c (/scripts/upcp manual)
19582 19583 19582 19582 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/upcp manual
19583 19892 19582 19582 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/rpmup2
19892 20031 19582 19582 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/installgd
20031 20072 19582 19582 ? -1 S 0 0:00 | \_ /usr/sbin/userhelper -t -w up2date --nox -i perl-CPAN libpng-devel libjpeg-devel XFree86-devel iconv jpeg xpm png
20072 20075 19582 19582 ? -1 R 0 0:01 | \_ /usr/bin/python -u /usr/sbin/up2date --nox -i perl-CPAN libpng-devel libjpeg-devel XFree86-devel iconv jpeg xpm png
19581 19657 4088 4088 ? -1 S 47 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi -oem root
This would be the next step if there is no crash with upcp
Quote:
4088 11482 4088 4088 ? -1 S 0 0:00 \_ CROND
11482 11483 11483 11483 ? -1 S 0 0:00 \_ /bin/sh -c (/scripts/upcp manual)
11483 11484 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/upcp manual
11484 11791 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/rpmup2
11791 11825 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /usr/local/cpanel/bin/checkperlmodules
11825 11837 11483 11483 ? -1 S 0 0:00 | \_ /usr/bin/perl /scripts/realperlinstaller Net::AIM Net::SSLeay Archive::Tar GD::Graph Tree::MultiNode Tie::IxHash HTML::Entities IO::Tty Bundle::DBD::mysql CGI MD5 Digest::MD5 Expect Mail::SpamAssassin Net::DNS Bundle::Interchange Geo::IPfr
11837 11845 11483 11483 ? -1 S 0 0:00 | \_ /usr/sbin/userhelper -t -w up2date --nox -i ncftp
11845 11848 11483 11483 ? -1 D 0 0:00 | \_ /usr/bin/python -u /usr/sbin/up2date --nox -i ncftp
11482 11558 4088 4088 ? -1 S 47 0:00 \_ /usr/sbin/sendmail -FCronDaemon -i -odi


The support had only the idea of an issue with multi-threaded perl and actually the ticket is closed.

Last edited by Rubas; 11-02-2004 at 02:56 AM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 11-02-2004, 08:18 AM
cPanel Partner NOC
 
Join Date: Jul 2002
Location: Atlanta, GA
Posts: 639
jsteel is on a distinguished road
What was your ticket number? We'll open a new one with both of our information.

One additional thing we found is even though we installed the perl581installer on all RHEL servers, the good servers show the following:

-rwxr-xr-x 2 root root 1002181 Apr 18 2004 /usr/bin/perl*


And the bad server shows:

-rwxr-xr-x 2 root root 994853 Oct 28 16:51 /usr/bin/perl*

Last edited by jsteel; 11-02-2004 at 08:23 AM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 11-02-2004, 08:21 AM
Registered User
 
Join Date: Sep 2003
Posts: 142
Rubas is on a distinguished road
Quote:
Originally Posted by jsteel
What was your ticket number? We'll open a new one with both of our information.
ID# 77087

Please keep me up to date!
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 11-02-2004, 05:06 PM
Registered User
 
Join Date: Jan 2004
Location: Texas
Posts: 24
Jasonbd
I have the same problem...'

I wish i could give yall more information but yall already listed everythign I have experienced...

-jb
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 11-02-2004, 05:49 PM
cPanel Partner NOC
 
Join Date: Jul 2002
Location: Atlanta, GA
Posts: 639
jsteel is on a distinguished road
Quote:
Originally Posted by Jasonbd
I have the same problem...'

I wish i could give yall more information but yall already listed everythign I have experienced...

-jb
What's your OS, cPanel version and list of perl RPMs?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #11 (permalink)  
Old 11-03-2004, 09:54 AM
Registered User
 
Join Date: Jan 2004
Location: Texas
Posts: 24
Jasonbd
OS - RedHat ES 3
Perl - v5.8.4 perl-5.8.0-88.7
Cpanel - 9.9.8-RELEASE_5

Last edited by Jasonbd; 11-03-2004 at 10:08 AM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 11-04-2004, 08:57 AM
cPanel Partner NOC
 
Join Date: Jul 2002
Location: Atlanta, GA
Posts: 639
jsteel is on a distinguished road
Jason & Rubas:

Do you know if your problematic servers were built out using RHEL ES 3 Update 3 specifically?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 11-04-2004, 09:09 AM
Registered User
 
Join Date: Sep 2003
Posts: 142
Rubas is on a distinguished road
Quote:
Originally Posted by jsteel
Jason & Rubas:

Do you know if your problematic servers were built out using RHEL ES 3 Update 3 specifically?
Yes, the server is built on RH EL 3 Update 3 ISO.
I know this exactly - we had some troubles because with the "update 3 CDs" you need at least the first 2 CDs for minimum installation and not only the first.

Last edited by Rubas; 11-04-2004 at 09:13 AM.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 11-04-2004, 09:17 AM
Registered User
 
Join Date: Jan 2004
Location: Texas
Posts: 24
Jasonbd
Mine to, the server is built on RH EL 3 Update 3 ISO
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 11-08-2004, 11:02 AM
cPanel Partner NOC
 
Join Date: Aug 2004
Location: Atlanta, GA
Posts: 2
bbailey is on a distinguished road
In our case, this problem has turned out to be related to the laus package. After disabling this service, the server that had been crashing every other day has stayed up, and enabling it on other previously stable test machines resulted in them starting to crash every other day.

This package started getting included in a "minimal" installation as of RHEL 3 update 3, and the audit daemon is enabled by default. If you don't need it (if you didn't know it was running or what it was, then you don't need it) you can safely disable it and reclaim a bunch of disk space from /var/log/audit.d.

root@localhost# chkconfig audit off
root@localhost# service audit stop
root@localhost# ps -ef | grep auditd (make sure it's stopped)

It also involves a kernel module (named "audit"), which you may also want to disable. Doing so will prevent the userspace tools that support auditing from generating errors when they can no longer find /dev/audit.

root@localhost# service crond stop
root@localhost# service atd stop
root@localhost# rmmod audit
root@localhost# lsmod | grep audit (make sure it's gone)
root@localhost# echo "alias char-major-10-224 off" >> /etc/modules.conf
root@localhost# service crond start
root@localhost# service atd start

You could even go so far as to remove the laus package altogether.

root@localhost# rpm -e laus

Reports on the Taroon mailing list and Red Hat's Bugzilla indicate that other activities such as restarting Lotus Domino server have approximately the same effect as our nightly running of upcp. I don't think this is a problem with cPanel, other than it causes occasional system activity which seems to trigger whatever problem auditd has. Updated kernel and laus packages that resolve this issue should be out with update 4.

Ref:
[bugzilla] System running LAuS hanging regularly
[bugzilla] audit service on by default?
[bugzilla] cron and laus problem
[bugzilla] Kernel panic when stopping Lotus Domino 6.52
[taroon-list] Re: Kernel panics
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT -5. The time now is 06:07 AM.


Powered by vBulletin® Version 3.8.2
Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
© cPanel Inc