Ready to pull out hair - Server is grinding to a halt every few days

Kaith Rustaz

Active Member
Jun 5, 2002
37
0
306
We have a server that every couple of days needs to be rebooted.

Email freezes. Webmail starts giving errors. WHM starts generating partial or blank pages.
Rebooting fixes this, for a while.

Here's the symptoms:

In WHM:
Goto Email - Mail Queue Manager - This hangs.
Goto Account Functions - Quota Manager - This hangs

If you try to access Squirelmail for a site, you get this error:
ERROR : Connection dropped by imap-server.
Query: FETCH 1:* (FLAGS UID RFC822.SIZE BODY.PEEK[HEADER.FIELDS (Date To Cc From Subject X-Priority Content-Type)])

Neomail just hangs.



I have run the following commands (in order, checking after each):
/scripts/restartsrv_spamd
/etc/init.d/exim restart
/etc/init.d/cpanel restart

/scripts/upcp --force
/scripts/eximup --force


I've restarted every service, verified I can access all the ports that are supposed to be accessible, checked that all drives have plenty of room and generally beaten my head against the wall.

Any ideas? Other than adding a daily reboot to cron?
 

adept2003

Well-Known Member
Aug 11, 2003
281
0
166
~ "/(extra|special)/data"
Is there anything odd in your log files?
Does the server need rebooting at around the same time every day?
Have you tried updating apache? (Could be that your existing build hasn't compiled correctly.)
Do you have a particular process running around the time when the server starts playing up (eg. backup), and what is the server load at? May be worth checking how much space is available in /tmp too.
 
Last edited:

Kaith Rustaz

Active Member
Jun 5, 2002
37
0
306
Is there anything odd in your log files?

Nope. Haven't seen anything unusual.

Does the server need rebooting at around the same time every day?

No. Sometimes It'll need it back to back, other times it'll go days, weeks even before the problem shows up again.

Have you tried updating apache? (Could be that your existing build hasn't compiled correctly.)

Yup.

Do you have a particular process running around the time when the server starts playing up (eg. backup), and what is the server load at?

Server load was at .5 and below the most of the time. Current snapshot:
17:02:24 up 3 days, 23:31, 1 user, load average: 1.08, 0.85, 0.62

May be worth checking how much space is available in /tmp too.
[email protected] [/home/pointbre/mail]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda6 2.0G 187M 1.7G 10% /
/dev/sda1 99M 45M 49M 48% /boot
/dev/sda2 9.7G 2.2G 7.1G 24% /usr
/dev/sda3 7.7G 3.2G 4.2G 43% /var
/dev/sda8 162G 23G 131G 15% /home
none 501M 0 501M 0% /dev/shm
/dev/sdb1 184G 21G 154G 12% /backup
/backup/Tmp 4.7G 2.1G 2.4G 47% /tmp


We're running
WHM 10.6.0 cPanel 10.6.0-R55
CentOS 3.4 i686 - WHM X v3.1.0

Tried this as well:

rm -f /usr/local/cpanel/cpanel
/scripts/updatenow
/scripts/upcp --force

Without success.

Our techteam got a suggestion from Cpanel to reinstall Cpanel.
When I checked, the issue was still there.
 

Kaith Rustaz

Active Member
Jun 5, 2002
37
0
306
We've had this problem for a while, but restarting Cpanel used to fix it. The last 2 months, it's been happening more and more frequently, with the last couple being only days apart. There was an issue with a runaway script that we thought was responsible for the problem, but removing that hasn't resolved it.

Also, when trying to access Squirelmail, some accounts get the "ERROR : Connection dropped by imap-server." message, others work fine. I haven't found any successful accesses with NeoMail though.

Tried telneting to port 143 as 1 thread I found suggested. Worked fine.

* OK [CAPABILITY IMAP4REV1 LOGIN-REFERRALS AUTH=LOGIN] [xxx.xxx.xxx.xxx] IMAP4rev
1 2003.339-cpanel at Mon, 22 Aug 2005 17:12:53 -0400 (EDT)
* BYE xxx.xxx.xxx IMAP4rev1 server terminating connection
a01 OK LOGOUT completed


Connection to host lost.

C:\Documents and Settings\rmh>
 
Last edited:

adept2003

Well-Known Member
Aug 11, 2003
281
0
166
~ "/(extra|special)/data"
Kaith Rustaz said:
/backup/Tmp 4.7G 2.1G 2.4G 47% /tmp
Wowsers!! Sure, its a big tmp partition, but it's got a lotta stuff in there! You should clear out you tmp partition - read this thread: http://forums.cpanel.net/showthread.php?t=35129

One of our servers is running CentOS 4.1, and although our tmp partition is much smaller, the amount of space used averages around 6-10MB... not 2.1GB!!!

It may not fix the underlying cause, but it may help to reduce the frequency of reboots.
 

Kaith Rustaz

Active Member
Jun 5, 2002
37
0
306
adept2003 said:
Wowsers!! Sure, its a big tmp partition, but it's got a lotta stuff in there! You should clear out you tmp partition - read this thread: http://forums.cpanel.net/showthread.php?t=35129

One of our servers is running CentOS 4.1, and although our tmp partition is much smaller, the amount of space used averages around 6-10MB... not 2.1GB!!!
Most of that's left overs from the script issue we just resolved. (Poorly writen banner server that didn't handle file not found well).

Thanks for the link, I'll take a look.