Strange problem: server lock every few minutes

S

Secret Agent

Guest
I have a strange problem on my server. My server locks up (nothing loads, no websits, no ssh) every 15 minutes or so (random time, just guessing aprpox 15 minutes). I am mind boggled as to what it could be that is causing this to happen. I'm 100% positive it is my server and not my ISP / network settings from my home office. I confirmed so, many times by connecting to other sites immediately and doing a speed test - all well but my server.

These are the only errors I see other than the common "file does not exist" which is normal, in apache error logs

[Tue Nov 29 00:11:30 2005] [notice] SIGHUP received. Attempting to restart
[Tue Nov 29 00:13:11 2005] [warn] pid file /usr/local/apache/logs/httpd.pid overwritten -- Unclean shutdown of previous Apache run?
[Tue Nov 29 00:13:12 2005] [notice] Apache configured -- resuming normal operations
[Tue Nov 29 00:13:12 2005] [notice] suEXEC mechanism enabled (wrapper: /usr/local/apache/bin/suexec)
[Tue Nov 29 00:13:12 2005] [notice] Accept mutex: sysvsem (Default: sysvsem)

[Tue Nov 29 05:12:19 2005] [error] mod_ssl: SSL handshake interrupted by system [Hint: Stop button pressed in browser?!] (System error follows)
[Tue Nov 29 05:12:19 2005] [error] System: Connection reset by peer (errno: 104)


Are any of these two possible causes? Other than these two, I don't know what else could cause the server to act bizarre this way. Even /var/log/messages is 0 bytes.

crontab

21 0 * * * /scripts/upcp
0 1 * * * /scripts/cpbackup
*/15 * * * * /usr/local/cpanel/whostmgr/bin/dnsqueue > /dev/null 2>&1
2,58 * * * * /usr/local/bandmin/bandmin
0 0 * * * /usr/local/bandmin/ipaddrmap
0 6 * * * /scripts/exim_tidydb > /dev/null 2>&1
*/1 * * * * /home/abc/parser/cerberus-2.649.Linux-2.4.18-3smp /home/abc/parser/boot.xml DEBUG /home/abc/parser/cerberus.log
27 10 * * * cd /usr/local/cpanel/whostmgr/docroot/cgi/fantastico/scripts/ ; /usr/local/cpanel/3rdparty/bin/php cron.php > /dev/null 2>&1
*/5 * * * * /usr/local/cpanel/bin/dcpumon >/dev/null 2>&1

I did an OS reload a few days ago hoping to resolve this after spending so much time trying to trace it prior. I can't trace it unfortuantely. I confirmed it is only my server on the same router, no other servers and the data center network is perfectly fine.

Server Info:

Centos 4.2
cPanel 10.x
PHP 4.4.1 (even 4.4.0 was tested with same problem)
Apache 1.3.34
MySQL 4.1.13

Software installed:

APF
BFD
SIM
LSM
LES
Nessus
Logwatcher
mod_dosevasive
mod_throttle
eaccelerator
zend optimizer
libsafe
Exim ACL

Note: I have used the above configuration / software / modules for a very long time so I just cannot figure out what is causing this to happen.

Tested:
PHP 4.4.0 / 4.4.1
Database Optimized
/etc/my.cnf optimized
php.ini optimized
server load always near 0 (cpu average 0.10 / memory average 20%)
apache logs (see above)
/var/log/messages (0 bytes oddly)
recompiled apached
restarted services, even server
network confirmed all ok via pings, mrtg/rtg, traceroutes, etc.
cpanel upgradd via upcp --force
disabled APF
uninstalled nessus

I hope someone will help me out on this. I am racking out my brains trying to trace this little ghost of a problem.
 
Last edited by a moderator:

chirpy

Well-Known Member
Verifed Vendor
Jun 15, 2002
13,437
31
473
Go on, have a guess
Are you getting load average spikes at the same time on the server? IF not, then it's likely to be one of:

1. Faulty NIC

2. Faulty NOC

3. Other faulty hardware on the server
 
S

Secret Agent

Guest
Thanks for the reply. I'm just about 100% sure it is not the hardware.

I ran smartmon hard drive tests, 0 bad blocks
I ran memtester - all good, 4GB total

I've done a lot of testing.

As a matter of fact, I think I possibly traced back to my home office network that may be the problem. It has become an insane ghost problem for me that I have never dealt with before. I have always (at least for a while now) knoww how to handle the networking infrastructure at my home and testing the server in the dc as well.

I'm mind boggled. I even had already intended to purchase a router for the home office, Belkin Pre-N (excellent by the way). The issue has been on going before and after the new router was setup. I have connected the home pc direct to the modem, problem persisted. ISP says no problem on their. Data center says no problem on their end either.

I have done an OS reload on the hosting server and my home system as well. Problem persisted before and still does after the reloads.

Very difficult thing to trace.