CPanel crashes every single day

Godcore

Active Member
Dec 31, 2002
39
0
156
We recently installed CPanel on a new server and started importing a series of accounts from Ensim.

Since we have been using CPanel, our server has locked up anywhere from 1 to 4 times a day and there does not seem to be any rhyme or reason to it.

We have rebuilt the server with all new hardware (including a complete reinstall of CPanel). We have checked for cron jobs, manually tested every CGI on the server, run complex processes (like backup) manually to make sure they do not lock up the server. We've even tried both the latest stable and latest edge releases of CPanel with no success. We have done everything we know to do and it still locks up every day.

We are using a 1.6Ghz P4 with 1GB RAM and RedHat 8.0 (we had also tried using RedHat 7.3 earlier with the same issue) and we currently only have about 20-30 sites on this server. It is not running a heavy load at all and seems to run beautifully except for the daily crashes.

The customers on this server are obviously angry and we've run out of ideas for troubleshooting. Does anyone here have any ideas or insight? Has anyone else had this trouble?
 

orca

Well-Known Member
Sep 23, 2002
75
0
156
Switzerland
Hi there,

I did once have trouble with WHM crashed and not restarting again. However, I submitted a trouble ticket and Nick fixed in a short time.
 

Godcore

Active Member
Dec 31, 2002
39
0
156
Once we reboot the server, all is well. The problem is that the entire server crashes every day.
 

Godcore

Active Member
Dec 31, 2002
39
0
156
Thanks, but this has happened with two completely seperate servers. Everything down to the power cord has been replaced.
 

rpmws

Well-Known Member
Aug 14, 2001
1,822
8
318
back woods of NC, USA
I can't believe ..that this post is ME!!!

2 totally different machines ..brand new. all different hardware, different NOCS and now new box locks up needs manual reboot. It just dies. Only returns pings.

Guys I have spent weeks on this. It killing me. Does anyone have a clue. I have never seen anything in the logs give it away. Please help.!!!!!!!!
 

rpmws

Well-Known Member
Aug 14, 2001
1,822
8
318
back woods of NC, USA
nope ..no spike in traffi ..nothing in the logs. It's not the firewall. in fact the old box has no firewall. The old box had mod_gzip. I was almost sure it was mod_gzip. Nope ...not on new box. Then I was sure it was a certain account. Nope crashed on me before I moved that account and after.

The only thing I know in my case is 2 different server. 2 totally different hardwares. different NOCS and IPs same cpanel edge version. same apache and php and mysql (but mod_gzip crashes on both). the new box crashed with only 9 sites on it and the old one crashed while that same 9 sites had been moved. It had 300 on it. So I know it's not one of the 9 sites becuase both crashed before and after those 9 moved around.

what really sux!!!!! is this has happened to me 20 times in the last few months. The reason why I went to new machine with fresh installed accounts. I just knew the old box was bad. Also sux .... it locks up completely. I have been in TOP and it just stops. I keep top running and the screen freezes and almost never is the load past .6. ssh, pop3 apache stops. chkservd will NOT restart it. power cycle is the only thing that helps. I thought it was NT worms scanning my tange of ips. In fact the last thing in access_log on old box was almost always just that. worms hitting all 255 ips. Nope ... put all sites on 1 IP on old box and deleted 250 iPs ..same issue. Moved to new box with 5 IPs and boom same isue. Only no worms right beofre crash this time. it's almost like it gets flooded with either legetimate traffic bursts or worms and it trips it into halt. chkservd doesn't restart it. Driving me nuts. Eric looked at new box today. he greped the messages. I didn't hear back from him.

I am wondering if it is the old skin I use. I mean very old ..like 2 years sold. It's a stripped down iconic with no real features in it. That's the only thing my box is different than the rest.

anyone out there have any clue??? oh by the way ..the box returns pings only. No connections to common services.
 

rpmws

Well-Known Member
Aug 14, 2001
1,822
8
318
back woods of NC, USA
I want to add that this has happened on both boxes already in the day during high traffic times and at night on both boxes. No user has come to me and said "you know ..the last 3 times I have clicked on ...whatever .. boom! no server" I would kiss the poor SOB and thank him and then hit delete if they did come forward. Oh boxes both clean of hacks and kits. another reason I moved to new fresh install just incase.
 

Sash

Well-Known Member
Feb 18, 2003
252
0
166
PM me. We've encountered a problem like this as well. I can tell you what we did to fix it.

Mike
 

rpmws

Well-Known Member
Aug 14, 2001
1,822
8
318
back woods of NC, USA
no raid.

dell dual 2GHZ xeon 1GB ram 73 GB scsi

old box.

Dual P3 1 GB ram .. 60GB ide
2.4.18-27.7.xsmp (SMP)
 

shaun

Well-Known Member
PartnerNOC
Verifed Vendor
Nov 9, 2001
708
1
318
San Clemente, Ca
cPanel Access Level
DataCenter Provider
Twitter
i had a problem with smartcheck killing a cupple machine way back, but it was raid bug. I guess you can try disabling smartcheck...

echo > /scripts/smartcheck
chattr +ia /scripts/smartcheck

if it crash's again then do...

chattr -ia /scripts/smartcheck and when upcp runs it will replace that file.
 

carock

Well-Known Member
Sep 25, 2002
263
7
168
St. Charles, MO
What kernel are you running? We had a problem with the kernel installed with the RedHat 7.3 CD's. There was a bug that stopped the server when a lot of disk I/O was happening. The console has something about kjournald problem, etc..

Found the problem researching the console errors on Yahoo.

We had to upgrade to a newer SMP kernel

Chuck
 

rpmws

Well-Known Member
Aug 14, 2001
1,822
8
318
back woods of NC, USA
Originally posted by carock
What kernel are you running? We had a problem with the kernel installed with the RedHat 7.3 CD's. There was a bug that stopped the server when a lot of disk I/O was happening. The console has something about kjournald problem, etc..

Found the problem researching the console errors on Yahoo.

We had to upgrade to a newer SMP kernel

Chuck
2.4.18-27.7.xsmp (SMP)
Dual Dell Xeon 1GB ram dual 73 GB scsi
 

rpmws

Well-Known Member
Aug 14, 2001
1,822
8
318
back woods of NC, USA
well been up almost 2 days ..world record!!! fingers crossed.
 

rhood

Well-Known Member
Feb 15, 2003
90
0
156
Try keeping top open to see if CPU usage surges just before the server becomes unavailable and only responds to pings? I say that because we had an almost identical issue on two servers, and in both cases it turned out to be runaway scripts.
 

rpmws

Well-Known Member
Aug 14, 2001
1,822
8
318
back woods of NC, USA
I have a 19 inch screen that runs 24/7 right here besde me. It's only job is to run "top" and "tail -f /var/log/exim_mainlog"

Once the box stops the "top" screen freezes. Loads are almost never over .50

I used to have problems with mod_zip filling up /tmp when a looping error script would build a inmfinate output to .wrk file. That would crash apache and eventually the box. That one is easy to catch and fix. This kind really sux!!! So far so good :)