I know this is a difficult thing to track down, but I'm at my end with this machine.
I have this machine from ThePlanet.com, they are pretty helpful if you tell them exactly what to do, or just get the right person on the phone.
Either way, this machine crashes once, sometimes twice per day. I think it's a hardware related issue, but ThePlanet did all their tests and swears that's not the problem.
Here's what they said:
Anyone that can give me a tip on something I left out to check would be greatly appreciated.Code:============================================== Jason R. - Monday June 25th, 2007; 11:10 PM CDT I will be bringing the server down for the hardware diagnostics shortly. Please stand by for updates. ============================================== Jason R. - Monday June 25th, 2007; 11:43 PM CDT The server has pasted the system memory test. It has proceeded on to the full hardware diagnostics test. I will update you when it has been complete. ============================================== Jason R. - Tuesday June 26th, 2007; 12:08 AM CDT The server has completed and passed all aspects of the hardware diagnostics test including system test, system bus tests, and full hard drive tests. I have placed the server back online and it is responding to ping and ssh. Please contact us if we can be of any further assistance, thank you. ==============================================
Here's the contents of crontab -e
Kernel:Code:42 3 * * * /scripts/upcp 0 1 * * * /scripts/cpbackup */15 * * * * /usr/local/cpanel/whostmgr/bin/dnsqueue > /Dev/null 2>&1 2,58 * * * * /usr/local/bandmin/bandmin 0 0 * * * /usr/local/bandmin/ipaddrmap 50 4 * * * /usr/local/cpanel/whostmgr/docroot/cgi/cpaddons_report.pl --notify 0 6 * * * /scripts/exim_tidydb > /Dev/null 2>&1 */5 * * * * /usr/local/cpanel/bin/dcpumon >/Dev/null 2>&1
Here's the contents of the 'var/log/messages'. There's nothing unusual in the Apache logs, and the domlogs. This server only hosts 1 website.Code:Linux a.hcph.us 2.6.9-42.EL #1 Sat Aug 12 09:17:58 CDT 2006 i686 i686 i386 GNU/Linux
I thought it may be a security problem, so I had ThePlanet hook the server up to their CiscoGuard and monitor for a day. Here's the reply "I found no signs of compromise at this time. Back to support.".
I've ran the following in regards to some of the messages from the 'var/log/messages'.
I've also installed PRM from: http://r-fx.org/prm.php to catch any run away processes.Code:[root@a ]# /scripts/ftpupd --force [root@a ]# /scripts/upcp --force
Want to see some console outputs? My escalation procedures at ThePlanet require console outputs for all hard reboots.
All the reboots before the 24th, I never asked for console outputs.Code:Bennie G. - Wednesday June 27th, 2007; 6:20 PM CDT Upon attaching console, noticed that the machine was not responsive. No information was available onscreen. Performed a hard reboot, monitored boot process and noticed no errors. Machine now responds to ping and ssh. ============================================== Jeremy M. - Tuesday June 26th, 2007; 4:49 AM CDT The server has been rebooted, and is responding to PING and SSH. I did see a lot of firewall-related activity on the server, which seems to be what caused it to lock up. ============================================== Brandon B. - Monday June 25th, 2007; 6:53 PM CDT Your server's console was full of netfilter log messages and your server was unresponsive to keyboard input, so I power cycled your server. It is back online again. ***************************************************************************** bbroyles@PlanetShell ~ $ date && ssh root@74.52.78.186; nmap 74.52.78.186 Mon Jun 25 18:51:23 CDT 2007 ssh: connect to host 74.52.78.186 port 22: Connection refused Starting nmap 3.93 ( http://www.insecure.org/nmap/ ) at 2007-06-25 18:51 CDT Interesting ports on server1.companyx.tv (74.52.78.186): (The 1654 ports scanned but not shown below are in state: filtered) PORT STATE SERVICE 20/tcp closed ftp-data 21/tcp open ftp 22/tcp closed ssh 25/tcp open smtp 53/tcp open domain 80/tcp open http 110/tcp open pop3 143/tcp open imap 443/tcp open https 465/tcp open smtps 953/tcp closed rndc 993/tcp open imaps 995/tcp open pop3s 3306/tcp open MySQL Nmap finished: 1 IP address (1 host up) scanned in 32.136 seconds ***************************************************************************** ============================================== Matthew L. - Sunday June 24th, 2007; 4:23 PM CDT We apologize for the delay as we have had a minor issue with our ticket system, but it is now back up. We are unable to attain a console output with multiple keyboards both USB and PS/2. We will be rebooting your server shortly. ============================================== Ian R. - Sunday June 24th, 2007; 10:53 AM CDT Your server is now online and responding to ping and SSH requests. Unfortunately, the server was unresponsive to keyboard input, and as such I was unable to retrieve any console output. ==============================================
Let me mention why I think this is hardware related. This machine has been freezing ever since I got it. I ordered it thru one of their salesman over the phone, and he failed on getting a cPanel license on the order. That's alright, I told him I will just install cPanel myself (I've done a million times). So, I get the box, login to SSH, check the partitioning, then start the cPanel install. Everything goes fine until the layer2 install started. It froze about middle way through. So, I called them told them to reload and install cPanel for me. It took them forever (they charged me $25 bucks), then they handed the machine over to me. Anyway, this server has been crashing daily since day one. I'm sure their next step will be to tell me to reload and reinstall AGAIN, with another reload fee.
If someone could point me in the right direction as what to ask ThePlanet, I would appreciate it. This server belongs to a customer of mine, and only hosts one website. He's trying to launch a pretty nice website, and can't afford to have server problems.
Sorry if this post is too long. I wanted to make sure I included any relevant information that I had.



LinkBack URL
About LinkBacks
Reply With Quote




