server locking up, and not sure how to figure out why

schwim

Well-Known Member
Aug 2, 2006
213
0
166
Hi there guys,

I've got a server:

CPU: P4 3.0Ghz
Mem: 2 Gig
OS: fc 4
Apache: 1.3.37

And it's load at the moment is quite low(I've not moved over all of the sites yet), but every day or so, it shuts down, and I have to call the host to reboot it. I've checked /var/log/messages, and all I see is this(last 100):

Sep 8 12:30:04 server kernel: audit(1157733004.107:23821): user pid=26879 uid=0 auid=0 msg='PAM session close: user=root exe="/usr/sbin/crond" (hostname=?, addr=?, terminal=cron result=Success)'
Sep 8 12:30:05 server kernel: Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:15:c7:c7:07:40:08:00 SRC=208.109.96.4 DST=255.255.255.255 LEN=130 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=34454 DPT=22 LEN=110
Sep 8 12:30:05 server kernel: Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:15:c7:c7:07:40:08:00 SRC=208.109.96.4 DST=255.255.255.255 LEN=130 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=34454 DPT=22 LEN=110
Sep 8 12:30:07 server kernel: Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:15:c7:c7:07:40:08:00 SRC=208.109.96.4 DST=255.255.255.255 LEN=130 TOS=0x00 PREC=0x00 TTL=63 ID=0 DF PROTO=UDP SPT=34456 DPT=22 LEN=110
Apr 18 01:03:49 server syslogd 1.4.1: restart.
Apr 18 01:03:49 server kernel: klogd 1.4.1, log source = /proc/kmsg started.
Apr 18 01:03:49 server kernel: Linux version 2.6.16-1.2115_FC4_HPTRAID ([email protected]) (gcc version 4.0.2 20051125 (Red Hat 4.0.2-8)) #1 SMP Mon Jun 12 16:13:33 MST 2006
Apr 18 01:03:49 server kernel: BIOS-provided physical RAM map:
Apr 18 01:03:49 server kernel: BIOS-e820: 0000000000000000 - 000000000009e000 (usable)
Apr 18 01:03:49 server kernel: BIOS-e820: 000000000009e000 - 00000000000a0000 (reserved)
Apr 18 01:03:49 server kernel: BIOS-e820: 00000000000f0000 - 0000000000100000 (reserved)
Apr 18 01:03:49 server kernel: BIOS-e820: 0000000000100000 - 000000007eef0000 (usable)
Apr 18 01:03:49 server kernel: BIOS-e820: 000000007eef0000 - 000000007eef3000 (ACPI NVS)
Apr 18 01:03:49 server kernel: BIOS-e820: 000000007eef3000 - 000000007ef00000 (ACPI data)
Which doesn't tell me a whole heck of a lot. The times and dates are all out of whack, and I can't find anything that shows a reason for a shutdown I'm assuming that Apr 18 is while the server is being rebooted. I don't see anything wrong in the final moments of the server logs. :(

how in the heck do I find out what is causing this problem?

thanks,
json
 
Last edited:

schwim

Well-Known Member
Aug 2, 2006
213
0
166
Thanks for the reply celliott,

My host tells me not to upgrade it due to the built in raid support. Is there a way to find out if that is what's causing the problem?

thanks,
json
 

rhenderson

Well-Known Member
Apr 21, 2005
784
2
168
Oklahoma
cPanel Access Level
Root Administrator
schwim said:
Hi there guys,

I've got a server:

CPU: P4 3.0Ghz
Mem: 2 Gig
OS: fc 4
Apache: 1.3.37

And it's load at the moment is quite low(I've not moved over all of the sites yet), but every day or so, it shuts down, and I have to call the host to reboot it. I've checked /var/log/messages, and all I see is this(last 100):



Which doesn't tell me a whole heck of a lot. The times and dates are all out of whack, and I can't find anything that shows a reason for a shutdown I'm assuming that Apr 18 is while the server is being rebooted. I don't see anything wrong in the final moments of the server logs. :(

how in the heck do I find out what is causing this problem?

thanks,
json
The above is not much help, it appears to show what is happening after the reboot. Have you looked at the different error logs as well? i.e. httpd errors? You might have to go back farther than 100 lines, but I have seen instances where the problem was not recorded in the messages file. When it is down via the web can you ssh in? You might look at SIM and PRM from RfxNetworks and install those then maybe you could get a email message on a run away process as they start to climb and possibly even kill and restart a process before it takes the whole server down. I had an problem a while back from a faulty script someone was running and it looped a file to /tmp and was filling up the /tmp folder causing write errors because the /tmp folder was then full. The only way I found it was when the server started slowing I was able to ssh in run a ps aux and find the pid with the big load on it and strace it, then the problem was obvious. Nothing was ever reported to the /var/log/messages file.
 

schwim

Well-Known Member
Aug 2, 2006
213
0
166
Hi there rhenderson, and thanks very much for your reply,

I agree that it's of no use. I edited the log view so that it actually shows the last time it was up before reboot. I checked apache's logs as well, and it only showed missing images up until the shutdown.

I'm looking at rfxnetwork's scripts and I'd like to install them, but I just had WayOfTheWeb make their modifications to the server, and I'm afraid I'm going to create a conflict of some sort if I attempt to install competing scripts(it seems that both guy's scripts are trying to do some of the same things). Also, opening rfxnetwork's tar files for PRM and SIM show absolutely no install instructions, and I'm not the person you want to trust to install a server script with no instructions :)

I've written the WOTW support to see if there's any problem with integration or if they offer a comparable service.

thanks,
json
 

rhenderson

Well-Known Member
Apr 21, 2005
784
2
168
Oklahoma
cPanel Access Level
Root Administrator
schwim said:
Hi there rhenderson, and thanks very much for your reply,

I agree that it's of no use. I edited the log view so that it actually shows the last time it was up before reboot. I checked apache's logs as well, and it only showed missing images up until the shutdown.

I'm looking at rfxnetwork's scripts and I'd like to install them, but I just had WayOfTheWeb make their modifications to the server, and I'm afraid I'm going to create a conflict of some sort if I attempt to install competing scripts(it seems that both guy's scripts are trying to do some of the same things). Also, opening rfxnetwork's tar files for PRM and SIM show absolutely no install instructions, and I'm not the person you want to trust to install a server script with no instructions :)

I've written the WOTW support to see if there's any problem with integration or if they offer a comparable service.

thanks,
json

Good call, I do not know what is installed on the server, but anything you can use that will monitor server process load and email you when there is a problem or kill and restart a something causing a big server load will be helpful as long as it will take a snippet of the log and send it to you. The RFX Scripts are easy to install (if I remember right it was something like ./install after untarring it - See http://www.rfxnetworks.com/appdocs/README.sim) You might also check the WHM CPU/Memory/MySQL Usage which should show an average since midnight, so it might contain a clue to help.
 

schwim

Well-Known Member
Aug 2, 2006
213
0
166
I'm waiting on responses from either service provider in regards to the scripts, but I've been keeping an eye on cpu/mem/mysql usage, you can find the pagesource here

It seems that mailscanner is very high, doesn't it? I had my server lockup problem before mailscanner was even installed, but I know it could exaserbate problems if something's eating up too much processor or memory resources.

Does anything else look out of sorts to you?

thanks,
json