blackice

Active Member
Dec 21, 2004
40
0
156
Oxfordshire UK
Hi there,

We've had a problem with one machine, which (At the same times each day- it's a pattern) completely locks up and refuses to do anything. Inspection of the system via KVM over IP shows a kernel panic, but there's nothing in any logs as far as I can see that lends any clue to what might be causing it.

If I go any further without some error messages, someone's going to shoot me ;) this is from /var/log/messages

Code:
 Aug 30 04:13:38 lapwing xinetd[3328]: Exiting...
Aug 30 04:13:38 lapwing xinetd: xinetd shutdown succeeded
Aug 30 04:13:39 lapwing xinetd: xinetd startup succeeded
Aug 30 04:13:39 lapwing xinetd[342]: Server in.ntalkd is not executable [file=/etc/xinetd.d/ntalk] [line=8]
Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/ntalk] [line=8]
Aug 30 04:13:39 lapwing xinetd[342]: Server in.qpopper is not executable [file=/etc/xinetd.d/pop-3] [line=8]
Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/pop-3] [line=8]
Aug 30 04:13:39 lapwing xinetd[342]: Server in.talkd is not executable [file=/etc/xinetd.d/talk] [line=8]
Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/talk] [line=8]
Aug 30 04:13:39 lapwing xinetd[342]: Server in.telnetd is not executable [file=/etc/xinetd.d/telnet] [line=8]
Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/telnet] [line=8]
Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in ntalk
Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in pop-3
Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in talk
Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in telnet
Aug 30 04:13:40 lapwing xinetd[342]: xinetd Version 2.3.12 started with libwrap loadavg options compiled in.
Aug 30 04:13:40 lapwing xinetd[342]: Started working: 1 available service
Aug 30 04:13:54 lapwing stunnel[3696]: Connection closed: 2674 bytes sent to SSL, 622 bytes sent to socket
This is a slice from the 04-15 crash (SLightly before it) and it's the only suspicous thing i've turned up.

Any suggestions on where to look, how to fix it and what might be causing it?

Anything is a welcome suggestion right now XD

We're moving to a new machine tomorrow, so it's not any really important need, but i'd like to sort it out all the same :)
 
Last edited:

chirpy

Well-Known Member
Verifed Vendor
Jun 15, 2002
13,437
31
473
Go on, have a guess
Couple of things:

1. Make sure that the laus rpm is not installed and if it is remove it using instructions that I've posted on the forum

2. Make sure that, if you're running APF, that /etc/apf/deny_hosts.rules and or /etc/apf/ad/ad.rules aren't very big. If they are, clear them down and restart APF

3. Make sure the cpbackup and upcp cron jobs are not clashing with the daily 04:00 cron run
 

blackice

Active Member
Dec 21, 2004
40
0
156
Oxfordshire UK
Hi,

Yes, APF is installed. The files were a tad big so I cleaned them out and rebooted APF. Laus is not installed.

I don't think the crons were clashing, but i've set upcp to run at 8PM GMT every day. cpbackup will run at 11PM GMT.

I'll see if that fixes it and get back to you ;) many thanks.
 

blackice

Active Member
Dec 21, 2004
40
0
156
Oxfordshire UK
Still not fixed.

Nothing's changed, despite the APF cleanout.


Any ideas from here?

Edit: While trying to transfer to another machine, seems it's timing out during transfer process.

Code:
..38%.. ..38%.. ..38%.. ..38%.. ..38%.. ..39%.. ..39%.. ..39%.. ..40%.. ..40%.. ..40%.. ..41%.. ..41%.. ..41%.. ..41%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..43%.. ..43%.. ..43%.. ..43%.. ..44%.. ..44%.. ..44%.. ..44%.. ..44%.. ..45%.. ..45%.. ..45%.. ..45%.. ..45%.. ..45%.. ..46%.. ..46%.. ..47%.. ..47%.. ..47%.. ..47%.. ..48%.. ..48%.. ..48%.. Timeout ... (internal death) Thu Sep 1 08:52:54 2005 [30916] error: Died at whostmgr/bin/whostmgr2.pl line 5148. main::__ANON__('ALRM') called at whostmgr/bin/whostmgr2.pl line 5161 eval {...} called at whostmgr/bin/whostmgr2.pl line 5144 main::scpsession('Copying account package file', '[PASS CENSORED]', '/scripts/sshcontrol', '--ctl', 'scp', '--user', 'root', '--host', ...) called at whostmgr/bin/whostmgr2.pl line 19727 main::remotecopy('txt', 'Copying account package file', 'password', '[PASS CENSORED]', 'user', 'root', 'port', 22, ...) called at whostmgr/bin/whostmgr2.pl line 3752 main::copyacct() called at whostmgr/bin/whostmgr2.pl line 443 [a fatal error or timeout occurred while processing this directive]

Any ideas for a fix for this, while we're at it? :(
 
Last edited:

iCARus

Well-Known Member
Apr 8, 2003
113
0
166
We have problems with one server too. Is CentOs 3.1 with latest Current Cpanel.
Server just kill all services, but is still online and is availible to ping. Just httpd,mail,... doesnt work.

We get the same errors in message after reboot:
Aug 30 04:13:39 xinetd[342]: Server in.ntalkd is not executable [file=/etc/xinetd.d/ntalk] [line=8]
Aug 30 04:13:39 xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/ntalk] [line=8]
Aug 30 04:13:39 xinetd[342]: Server in.qpopper is not executable [file=/etc/xinetd.d/pop-3] [line=8]
Aug 30 04:13:39 xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/pop-3] [line=8]
Aug 30 04:13:39 xinetd[342]: Server in.talkd is not executable [file=/etc/xinetd.d/talk] [line=8]
Aug 30 04:13:39 xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/talk] [line=8]
Aug 30 04:13:39 xinetd[342]: Server in.telnetd is not executable [file=/etc/xinetd.d/telnet] [line=8]
Aug 30 04:13:39 xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/telnet] [line=8]
We had 3 downtimes but not times were different (1 crash = 24.8.05' at 02:39 PM) (2 crash = 25.8.05' at 12:04 PM, today at 04:19 PM).
This this is really wierd, because in logs there is nothin suspected, server just close all services. RAM is OK, CPU is Ok...server worked for months without downtimes.

We allready look what says chirpy, but still nothing.

Any idea ? anyone ?

Thanks all!
 
Last edited:

iCARus

Well-Known Member
Apr 8, 2003
113
0
166
Thanks Sheldon for suggestion , i'll try today at night hours.

Regards.
 

chirpy

Well-Known Member
Verifed Vendor
Jun 15, 2002
13,437
31
473
Go on, have a guess
Those xinetd messages, while annoying, aren't causing a problem (they're just enabled services in /etc/xinetd.d/ that don't work). I'd go with what Sheldon advised, but be aware that the FSCK could take some time to run before your server comes back up again. I would also recommend you upgrade to CentOS v3.5 from v3.1
 

blackice

Active Member
Dec 21, 2004
40
0
156
Oxfordshire UK
chirpy said:
Those xinetd messages, while annoying, aren't causing a problem (they're just enabled services in /etc/xinetd.d/ that don't work). I'd go with what Sheldon advised, but be aware that the FSCK could take some time to run before your server comes back up again. I would also recommend you upgrade to CentOS v3.5 from v3.1
We're going to try the reboot with filesystem check now- we're using CentOS 3.4 on our machine.

Also, we've got another machine, but cPanel's transfer script won't work. Is there a way to manually transfer the accounts? The new machine is completely fresh (Except a configured cpanel etc, same OS) but we're planning to switch the IPs, so there's no downtime from DNS stuff going through etc.

IE: Transfer, old box to a spare IP, new box to old boxes' ip.

Anyone know a manual way to transfer? Of course, it may not be needed, but it'd be helpful to know anyway :)
 

iCARus

Well-Known Member
Apr 8, 2003
113
0
166
Sorry..my mistake.. this box has CentOS 3.5. So, we will try with FSCK.
 

iCARus

Well-Known Member
Apr 8, 2003
113
0
166
Ok.
Problem is still there. Today we have another crash and all service st ops respond.

This "TOP" just a few second before freeze:

9 root 17 0 0 0 0 RW 96.1 0.0 6:00 0 bdflush
2768 root 15 0 10980 10M 1848 S 16.9 0.3 7:55 0 /usr/local/apache/bin/httpd -DSSL
15190 root 16 0 1652 1652 892 R 3.3 0.0 0:00 1 top
5557 mailnull 15 0 1892 1892 1544 S 0.9 0.0 0:03 1 /usr/sbin/exim -bd
5337 root 15 0 1464 1464 1048 S 0.4 0.0 0:04 0 sshd: [email protected]/2
1 root 15 0 500 500 440 S 0.0 0.0 0:10 1 init
2 root RT 0 0 0 0 SW 0.0 0.0 0:00 0 migration/0
3 root RT 0 0 0 0 SW 0.0 0.0 0:00 1 migration/1
4 root 15 0 0 0 0 SW 0.0 0.0 0:00 1 keventd
5 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
6 root 34 19 0 0 0 SWN 0.0 0.0 0:00 1 ksoftirqd/1
7 root 15 0 0 0 0 SW 0.0 0.0 2:33 0 kswapd
8 root 15 0 0 0 0 SW 0.0 0.0 3:48 0 kscand
10 root 15 0 0 0 0 SW 0.0 0.0 1:06 0 kupdated
11 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 mdrecoveryd
17 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 scsi_eh_0
20 root 15 0 0 0 0 SW 0.0 0.0 7:29 0 kjournald
98 root 25 0 0 0 0 SW 0.0 0.0 0:00 1 khubd
665 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
1117 root 15 0 576 576 496 S 0.0 0.0 1:04 0 syslogd -m 0
1121 root 15 0 440 440 380 S 0.0 0.0 0:01 0 klogd -x
bdflush just came out from nowhere and services freezed. But ping just work. Anyone have any idea ?
 

iCARus

Well-Known Member
Apr 8, 2003
113
0
166
We take server offline and check RAM and we see that server have one 1GB that is not compatibl. with other two.
We removed this RAM and server work now for allmost 2 days without any BIG load.

I think for all out troubles was gulty RAM ;)

We still wait if trouble accour again.

Regards.