The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Really wierd server problem..

Discussion in 'General Discussion' started by blackice, Aug 30, 2005.

  1. blackice

    blackice Active Member

    Joined:
    Dec 21, 2004
    Messages:
    40
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Oxfordshire UK
    Hi there,

    We've had a problem with one machine, which (At the same times each day- it's a pattern) completely locks up and refuses to do anything. Inspection of the system via KVM over IP shows a kernel panic, but there's nothing in any logs as far as I can see that lends any clue to what might be causing it.

    If I go any further without some error messages, someone's going to shoot me ;) this is from /var/log/messages

    Code:
     Aug 30 04:13:38 lapwing xinetd[3328]: Exiting...
    Aug 30 04:13:38 lapwing xinetd: xinetd shutdown succeeded
    Aug 30 04:13:39 lapwing xinetd: xinetd startup succeeded
    Aug 30 04:13:39 lapwing xinetd[342]: Server in.ntalkd is not executable [file=/etc/xinetd.d/ntalk] [line=8]
    Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/ntalk] [line=8]
    Aug 30 04:13:39 lapwing xinetd[342]: Server in.qpopper is not executable [file=/etc/xinetd.d/pop-3] [line=8]
    Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/pop-3] [line=8]
    Aug 30 04:13:39 lapwing xinetd[342]: Server in.talkd is not executable [file=/etc/xinetd.d/talk] [line=8]
    Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/talk] [line=8]
    Aug 30 04:13:39 lapwing xinetd[342]: Server in.telnetd is not executable [file=/etc/xinetd.d/telnet] [line=8]
    Aug 30 04:13:39 lapwing xinetd[342]: Error parsing attribute server - DISABLING SERVICE [file=/etc/xinetd.d/telnet] [line=8]
    Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in ntalk
    Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in pop-3
    Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in talk
    Aug 30 04:13:40 lapwing xinetd[342]: Must specify a server in telnet
    Aug 30 04:13:40 lapwing xinetd[342]: xinetd Version 2.3.12 started with libwrap loadavg options compiled in.
    Aug 30 04:13:40 lapwing xinetd[342]: Started working: 1 available service
    Aug 30 04:13:54 lapwing stunnel[3696]: Connection closed: 2674 bytes sent to SSL, 622 bytes sent to socket
    
    This is a slice from the 04-15 crash (SLightly before it) and it's the only suspicous thing i've turned up.

    Any suggestions on where to look, how to fix it and what might be causing it?

    Anything is a welcome suggestion right now XD

    We're moving to a new machine tomorrow, so it's not any really important need, but i'd like to sort it out all the same :)
     
    #1 blackice, Aug 30, 2005
    Last edited: Aug 30, 2005
  2. shashank

    shashank Well-Known Member
    PartnerNOC

    Joined:
    Apr 12, 2003
    Messages:
    159
    Likes Received:
    1
    Trophy Points:
    18
    cPanel Access Level:
    Root Administrator
    Does it crash around 4 AM daily ?
     
  3. blackice

    blackice Active Member

    Joined:
    Dec 21, 2004
    Messages:
    40
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Oxfordshire UK
    It's about 05:15 GMT daily, and a few other times- not just that one.
     
  4. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    Couple of things:

    1. Make sure that the laus rpm is not installed and if it is remove it using instructions that I've posted on the forum

    2. Make sure that, if you're running APF, that /etc/apf/deny_hosts.rules and or /etc/apf/ad/ad.rules aren't very big. If they are, clear them down and restart APF

    3. Make sure the cpbackup and upcp cron jobs are not clashing with the daily 04:00 cron run
     
  5. blackice

    blackice Active Member

    Joined:
    Dec 21, 2004
    Messages:
    40
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Oxfordshire UK
    Hi,

    Yes, APF is installed. The files were a tad big so I cleaned them out and rebooted APF. Laus is not installed.

    I don't think the crons were clashing, but i've set upcp to run at 8PM GMT every day. cpbackup will run at 11PM GMT.

    I'll see if that fixes it and get back to you ;) many thanks.
     
  6. blackice

    blackice Active Member

    Joined:
    Dec 21, 2004
    Messages:
    40
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Oxfordshire UK
    Still not fixed.

    Nothing's changed, despite the APF cleanout.


    Any ideas from here?

    Edit: While trying to transfer to another machine, seems it's timing out during transfer process.

    Code:
    ..38%.. ..38%.. ..38%.. ..38%.. ..38%.. ..39%.. ..39%.. ..39%.. ..40%.. ..40%.. ..40%.. ..41%.. ..41%.. ..41%.. ..41%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..42%.. ..43%.. ..43%.. ..43%.. ..43%.. ..44%.. ..44%.. ..44%.. ..44%.. ..44%.. ..45%.. ..45%.. ..45%.. ..45%.. ..45%.. ..45%.. ..46%.. ..46%.. ..47%.. ..47%.. ..47%.. ..47%.. ..48%.. ..48%.. ..48%.. Timeout ... (internal death) Thu Sep 1 08:52:54 2005 [30916] error: Died at whostmgr/bin/whostmgr2.pl line 5148. main::__ANON__('ALRM') called at whostmgr/bin/whostmgr2.pl line 5161 eval {...} called at whostmgr/bin/whostmgr2.pl line 5144 main::scpsession('Copying account package file', '[PASS CENSORED]', '/scripts/sshcontrol', '--ctl', 'scp', '--user', 'root', '--host', ...) called at whostmgr/bin/whostmgr2.pl line 19727 main::remotecopy('txt', 'Copying account package file', 'password', '[PASS CENSORED]', 'user', 'root', 'port', 22, ...) called at whostmgr/bin/whostmgr2.pl line 3752 main::copyacct() called at whostmgr/bin/whostmgr2.pl line 443 [a fatal error or timeout occurred while processing this directive] 

    Any ideas for a fix for this, while we're at it? :(
     
    #6 blackice, Sep 1, 2005
    Last edited: Sep 1, 2005
  7. iCARus

    iCARus Well-Known Member

    Joined:
    Apr 8, 2003
    Messages:
    113
    Likes Received:
    0
    Trophy Points:
    16
    We have problems with one server too. Is CentOs 3.1 with latest Current Cpanel.
    Server just kill all services, but is still online and is availible to ping. Just httpd,mail,... doesnt work.

    We get the same errors in message after reboot:
    We had 3 downtimes but not times were different (1 crash = 24.8.05' at 02:39 PM) (2 crash = 25.8.05' at 12:04 PM, today at 04:19 PM).
    This this is really wierd, because in logs there is nothin suspected, server just close all services. RAM is OK, CPU is Ok...server worked for months without downtimes.

    We allready look what says chirpy, but still nothing.

    Any idea ? anyone ?

    Thanks all!
     
    #7 iCARus, Sep 1, 2005
    Last edited: Sep 1, 2005
  8. Sheldon

    Sheldon Well-Known Member

    Joined:
    Jun 7, 2004
    Messages:
    378
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Canada
    #shutdown -rF now

    try that, could be file system problems, also turn off any unccessary services.

    which may help in identifying the problem.

    -Sheldon
     
  9. iCARus

    iCARus Well-Known Member

    Joined:
    Apr 8, 2003
    Messages:
    113
    Likes Received:
    0
    Trophy Points:
    16
    Thanks Sheldon for suggestion , i'll try today at night hours.

    Regards.
     
  10. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    Those xinetd messages, while annoying, aren't causing a problem (they're just enabled services in /etc/xinetd.d/ that don't work). I'd go with what Sheldon advised, but be aware that the FSCK could take some time to run before your server comes back up again. I would also recommend you upgrade to CentOS v3.5 from v3.1
     
  11. blackice

    blackice Active Member

    Joined:
    Dec 21, 2004
    Messages:
    40
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Oxfordshire UK
    We're going to try the reboot with filesystem check now- we're using CentOS 3.4 on our machine.

    Also, we've got another machine, but cPanel's transfer script won't work. Is there a way to manually transfer the accounts? The new machine is completely fresh (Except a configured cpanel etc, same OS) but we're planning to switch the IPs, so there's no downtime from DNS stuff going through etc.

    IE: Transfer, old box to a spare IP, new box to old boxes' ip.

    Anyone know a manual way to transfer? Of course, it may not be needed, but it'd be helpful to know anyway :)
     
  12. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
  13. iCARus

    iCARus Well-Known Member

    Joined:
    Apr 8, 2003
    Messages:
    113
    Likes Received:
    0
    Trophy Points:
    16
    Sorry..my mistake.. this box has CentOS 3.5. So, we will try with FSCK.
     
  14. iCARus

    iCARus Well-Known Member

    Joined:
    Apr 8, 2003
    Messages:
    113
    Likes Received:
    0
    Trophy Points:
    16
    Ok.
    Problem is still there. Today we have another crash and all service st ops respond.

    This "TOP" just a few second before freeze:

    bdflush just came out from nowhere and services freezed. But ping just work. Anyone have any idea ?
     
  15. iCARus

    iCARus Well-Known Member

    Joined:
    Apr 8, 2003
    Messages:
    113
    Likes Received:
    0
    Trophy Points:
    16
    We take server offline and check RAM and we see that server have one 1GB that is not compatibl. with other two.
    We removed this RAM and server work now for allmost 2 days without any BIG load.

    I think for all out troubles was gulty RAM ;)

    We still wait if trouble accour again.

    Regards.
     

Share This Page