|
|||
|
Situation:
It is a brand new sister server (with a only a small difference to the other cpanel server -> it has the latest firmware for the raid controller). WHM 9.9.0 cPanel 9.9.2-S8 Redhat EL ES 3 2.4.21-20.ELsmp Dual Xeon 2,8 Ghz 4 GB RAM Raid 5 with 4 SCSI drives (Adaptec 2910 -> aacraid) Now the server runs rock stable BUT every 2nd day at 2:00 AM it is over! The logs shows me nothing interesting - one time I found an entry about a scsi error at last. I think it is a problem like the "smart check" with scsi raid systems - but with the smart check on the system crashed every time I ran upcp! Okay I have to be an issue with a cronjob, but every cron job run at least daily! /var/log Quote:
Quote:
It has to be a issue with Oct 18 02:00:00 cpanel03 CROND[5310]: (root) CMD (/scripts/upcp) But upcp runs daily! I can run upcp 10times without a problem and the only difference if I start it is Quote:
But I have also no problem to rund 10times /scripts/cpbackup. Like I said it only crash every second day and everthing is running at least every day! I couldn't explain it - first I thought of the "smartcheck issue" and I disabled it with "touch /var/cpanel/disablesmartcheck", next I thought I have to do with the cpanellog issue (cpanellogd spawning thousands of processes (fix) - cpanellogd spawning thousands of processes (fix)). Actually I had no idea ![]() Next I will do is to seperate /scripts/upcp to testify my upcp theory. What does upcp only every 2nd day do and not every day? Last edited by Rubas; 10-18-2004 at 02:35 AM. |
|
|||
|
I seperated /scripts/cpbackup from /scripts/upcp and changed the time of runnig for upcp.
The server crashed @ 2nd day if upcp called from cron - I did a lot of updates with upcp today from the shell without any problem! Strange! |
|
|||
|
Quote:
Any resolution on this. We have one new box experiencing the same problem (though we use SATA RAID and do have smart disabled). We have numeropurs other boxes with the exact same hardware having no problems. |
|
|||
|
No - I talked a lot with the cpanel support in the last days but without succeed.
I know circa where the server will crash in the upcp but I can not reproduce this error. But every second day it crashs even though this is running daily and only if upcp called from cron. Also to change the cronjob to "upcp manual" will not help ... The actually workaround is to use only cpbackup from cron and I call upcp every day from the shell (and delete the new cronjob of upcp after running). With this tactic the server is running rock-stable. If you have a new idea, please let me know
|
|
|||
|
Quote:
The only thing we notice different with this one particular server compared to the others is that it has the latest Perl RPM installed as part of the base install when the server was built (and subsequently had the cPanel Perl 5.8.1 installer run on it - a 'perl -v' does show the system is using cPanel's Perl): For example: Good Servers have: perl-5.8.0-88.4 Bad Server has: perl-5.8.0-88.7 But on both, a 'perl -v' yields: This is perl, v5.8.1 built for i686-linux We remember something similar occurring back when the first Perl overwrite from RHEL up2date occurred months ago leading to the tweak setting. Our guess is that something related to the installation of the latest RPM may still be causing the issue. If you've got an existing ticket open, you may want to pass this info on to support. If you could verify your Perl RPM set as a reply here, that would be great ('rpm -qa perl*'). |
|
|||
|
Yes, I also noticed the difference between the perl version but couldn't figure out why. (no multi threaded perl).
Good perl-5.8.0-88.4 # perl -v This is perl, v5.8.1 built for i686-linux Bad perl-5.8.0-88.7 (no perl update - just a fresh installation) # perl -v This is perl, v5.8.4 built for i686-linux I setup up a monitoring script which sends me every second a processlist the last info before crash Quote:
Quote:
The support had only the idea of an issue with multi-threaded perl and actually the ticket is closed. Last edited by Rubas; 11-02-2004 at 02:56 AM. |
|
|||
|
What was your ticket number? We'll open a new one with both of our information.
One additional thing we found is even though we installed the perl581installer on all RHEL servers, the good servers show the following: -rwxr-xr-x 2 root root 1002181 Apr 18 2004 /usr/bin/perl* And the bad server shows: -rwxr-xr-x 2 root root 994853 Oct 28 16:51 /usr/bin/perl* Last edited by jsteel; 11-02-2004 at 08:23 AM. |
|
|||
|
Quote:
|
|
|||
|
Quote:
I know this exactly - we had some troubles because with the "update 3 CDs" you need at least the first 2 CDs for minimum installation and not only the first. Last edited by Rubas; 11-04-2004 at 09:13 AM. |
|
|||
|
In our case, this problem has turned out to be related to the laus package. After disabling this service, the server that had been crashing every other day has stayed up, and enabling it on other previously stable test machines resulted in them starting to crash every other day.
This package started getting included in a "minimal" installation as of RHEL 3 update 3, and the audit daemon is enabled by default. If you don't need it (if you didn't know it was running or what it was, then you don't need it) you can safely disable it and reclaim a bunch of disk space from /var/log/audit.d. root@localhost# chkconfig audit off root@localhost# service audit stop root@localhost# ps -ef | grep auditd (make sure it's stopped) It also involves a kernel module (named "audit"), which you may also want to disable. Doing so will prevent the userspace tools that support auditing from generating errors when they can no longer find /dev/audit. root@localhost# service crond stop root@localhost# service atd stop root@localhost# rmmod audit root@localhost# lsmod | grep audit (make sure it's gone) root@localhost# echo "alias char-major-10-224 off" >> /etc/modules.conf root@localhost# service crond start root@localhost# service atd start You could even go so far as to remove the laus package altogether. root@localhost# rpm -e laus Reports on the Taroon mailing list and Red Hat's Bugzilla indicate that other activities such as restarting Lotus Domino server have approximately the same effect as our nightly running of upcp. I don't think this is a problem with cPanel, other than it causes occasional system activity which seems to trigger whatever problem auditd has. Updated kernel and laus packages that resolve this issue should be out with update 4. Ref: [bugzilla] System running LAuS hanging regularly [bugzilla] audit service on by default? [bugzilla] cron and laus problem [bugzilla] Kernel panic when stopping Lotus Domino 6.52 [taroon-list] Re: Kernel panics |
![]() |
| Thread Tools | |
| Display Modes | |
|
|