The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Ghost on Server! crash without reason

Discussion in 'General Discussion' started by wimp, Jan 3, 2005.

  1. wimp

    wimp Well-Known Member

    Joined:
    Jul 13, 2002
    Messages:
    301
    Likes Received:
    0
    Trophy Points:
    16
    hi all,
    it really have a ghost on my server. It freeze a few times a day beause of high server load.
    I couldnt' finde any reason for this problem. also i let working 2 Admin. companies to upgrade
    the kernel (for the iowait problem) and other server tweaks/securities and installing APF

    firewall but nothing.
    I installed SIM and PRM disable the spam and antivirus filter. Disable the SMTP and make tha

    way
    the nobody cannot send e-mails. I ask the NOC to check thr server for HW problems. they also
    installed a new NIC. But nothing. the server load is normally from 0.50 - 2.50 and suddenly

    goes up to 30.00- 80.00
    So, is there anyone who could give me a tip what to check to solve this problem?
    When the server goes high i have the following in "messages"


    ==========================

    Jan 3 04:21:02 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=69.105.56.3$
    Jan 3 04:21:02 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=69.105.56.3$
    Jan 3 04:21:03 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=63.151.206.$
    Jan 3 04:21:09 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=63.151.206.$
    Jan 3 04:22:09 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=69.152.39.1$
    Jan 3 04:22:12 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=69.152.39.1$
    Jan 3 04:22:18 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=69.152.39.1$
    Jan 3 04:22:26 server proftpd[28395]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP login

    timed out, disconnected
    Jan 3 04:22:26 server proftpd[28395]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP session

    closed.
    Jan 3 04:23:11 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=202.9.99.25$
    Jan 3 04:24:39 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=205.207.184$
    Jan 3 04:24:49 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=206.168.193$
    Jan 3 04:26:53 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=217.18.64.7$
    Jan 3 04:26:58 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=209.237.25.$
    Jan 3 04:27:45 server wall[2964]: wall: user root broadcasted 1 lines (58 chars)
    Jan 3 04:27:51 server shutdown: shutting down for system reboot
    Jan 3 04:27:57 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=209.15.115.$
    Jan 3 04:28:02 server init: Switching to runlevel: 6
    Jan 3 04:28:11 server kernel: ** IN_TCP DROP ** IN=eth0 OUT=

    MAC=00:01:02:9b:3f:86:00:02:85:0d:7c:80:08:00 SRC=218.7.153.2$
    Jan 3 04:28:16 server portsentry[3299]: securityalert: Psionic PortSentry is shutting down
    Jan 3 04:28:16 server portsentry[3299]: adminalert: Psionic PortSentry is shutting down
    Jan 3 04:28:20 server portsentry: portsentry shutdown succeeded
    Jan 3 04:28:21 server xfs[2099]: terminating
    Jan 3 04:28:24 server xfs: xfs shutdown succeeded
    Jan 3 04:28:26 server mysql: Killing mysqld with pid 3016
    Jan 3 04:28:27 server mysql: Wait for mysqld to exit
    Jan 3 04:28:28 server mysql: .
    Jan 3 04:28:59 server last message repeated 31 times
    Jan 3 04:29:00 server wall[7103]: wall: user root broadcasted 1 lines (58 chars)
    Jan 3 04:29:00 server mysql: .
    Jan 3 04:29:01 server mysql: gave up waiting!
    Jan 3 04:29:01 server rc: Stopping mysql: succeeded
    Jan 3 04:29:01 server antirelayd: antirelayd shutdown failed
    Jan 3 04:29:02 server exim: exim shutdown succeeded
    Jan 3 04:29:02 server exim: antirelayd shutdown failed
    Jan 3 04:30:10 server syslogd 1.4.1: restart.
    Jan 3 04:30:10 server syslog: syslogd startup succeeded
    Jan 3 04:30:10 server syslog: klogd startup succeeded
    Jan 3 04:30:10 server kernel: klogd 1.4.1, log source = /proc/kmsg started.
    Jan 3 04:30:10 server kernel: Linux version 2.4.21-20.0.1.EL (centos@centos-build) (gcc version

    3.2.3 20030502 (Red Hat Li$
    Jan 3 04:30:10 server kernel: BIOS-provided physical RAM map:
    =============================


    sometimes i see this befor it crash's
    ============================
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112201
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112601
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112001
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004122301
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112301
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112701
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112301
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112003
    Jan 3 22:31:56 server named[1899]: zone domain.com/IN: loaded serial 2004112502
    and so on...
    ===========================
    and sometime i have this befor it crash's
    ===========================
    Jan 3 21:27:37 server proftpd[10367]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP session
    opened.
    Jan 3 21:27:37 server proftpd[10367]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP session
    closed.
    Jan 3 21:35:58 server proftpd[11236]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP session
    opened.
    Jan 3 21:35:58 server proftpd[11236]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP session
    closed.
    Jan 3 21:49:06 server proftpd[12260]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP login
    timed out, disconnected
    Jan 3 21:49:06 server proftpd[12260]: server.an-dns.com (127.0.0.1[127.0.0.1]) - FTP session
    closed.
    Jan 3 22:31:03 server syslogd 1.4.1: restart.
    ==========================



    and i can see somethign like this in "TOP"

    =============================
    16:23:33 up 51 min, 2 users, load average: 85.73, 38.67, 16.35
    375 processes: 361 sleeping, 5 running, 9 zombie, 0 stopped
    CPU states: cpu user nice system irq softirq iowait idle
    total 9.6% 2.0% 2.7% 0.3% 1.2% 83.8% 0.0%
    Mem: 962848k av, 953632k used, 9216k free, 0k shrd, 112048k buff
    720240k actv, 135076k in_d, 10068k in_c
    Swap: 1959920k av, 156764k used, 1803156k free 290416k cached

    PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
    2004 root 19 4 16796 124 64 S N 2.5 0.0 1:19 0 httpd
    1969 mailnull 15 0 544 272 188 S 1.9 0.0 0:07 0 exim
    21544 someuser 19 4 2788 2788 1088 D N 0.4 0.2 0:00 0 search.pl
    2548 root 24 8 3560 1372 472 S N 0.3 0.1 0:04 0 cppop
    21532 root 15 0 1276 1276 656 R 0.3 0.1 0:00 0 top
    5 root 15 0 0 0 0 SW 0.2 0.0 0:01 0 kscand
    21002 root 15 0 2528 1940 636 D 0.2 0.2 0:01 0 mkvhostspasswd
    21619 root 15 0 3440 3440 2432 S 0.2 0.3 0:00 0 exim
    4 root 15 0 0 0 0 SW 0.1 0.0 0:01 0 kswapd
    7 root 15 0 0 0 0 DW 0.1 0.0 0:00 0 kupdated
    12 root 15 0 0 0 0 DW 0.1 0.0 0:09 0 kjournald
    20707 otherus 23 8 4288 2764 968 S N 0.1 0.2 0:00 0 cppop
    21631 root 19 0 3168 3168 2376 D 0.1 0.3 0:00 0 exim
    1 root 15 0 128 80 56 S 0.0 0.0 0:04 0 init
    2 root 15 0 0 0 0 SW 0.0 0.0 0:00 0 keventd
    3 root 34 19 0 0 0 SWN 0.0 0.0 0:00 0 ksoftirqd/0
    6 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 bdflush
    8 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 mdrecoveryd
    104 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 khubd
    576 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
    577 root 25 0 0 0 0 SW 0.0 0.0 0:00 0 kjournald
    579 root 15 0 0 0 0 DW 0.0 0.0 0:00 0 loop0
    580 root 15 0 0 0 0 DW 0.0 0.0 0:00 0 kjournald
    1835 root 15 0 244 216 156 D 0.0 0.0 0:01 0 syslogd
    1839 root 15 0 72 4 0 S 0.0 0.0 0:00 0 klogd
    1884 named 25 0 10004 8752 724 S 0.0 0.9 0:25 0 named
    1916 root 15 0 1388 408 88 D 0.0 0.0 0:00 0 chkservd
    1975 mailnull 25 0 344 4 0 S 0.0 0.0 0:00 0 exim
    1982 root 15 0 780 728 432 S 0.0 0.0 0:13 0 antirelayd
    2018 root 25 0 152 4 0 S 0.0 0.0 0:00 0 mysqld_safe
    2052 mysql 21 6 29572 17M 1164 S N 0.0 1.8 0:07 0 mysqld
    3101 root 15 0 3560 756 304 S 0.0 0.0 0:00 0 cpsrvd
    3119 root 21 6 2736 984 340 D N 0.0 0.1 0:00 0 eximstats
    3126 root 34 19 6164 2200 40 S N 0.0 0.2 0:08 0 cpanellogd
    3152 root 23 8 3212 324 124 S N 0.0 0.0 0:00 0 cppop

    ==============================
     
  2. kris1351

    kris1351 Well-Known Member

    Joined:
    Apr 18, 2003
    Messages:
    963
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Lewisville, Tx
    Take a look at your resources during the crash time. I am seeing a very low amount of free memory on that last top you had.

    Mem: 962848k av, 953632k used, 9216k free, 0k shrd, 112048k buff

    It is very possible that you are eating up your memory and causing a crash there.
     
  3. dezignguy

    dezignguy Well-Known Member

    Joined:
    Sep 26, 2004
    Messages:
    534
    Likes Received:
    0
    Trophy Points:
    16
    kris1351: Linux uses as much memory as it possibly can... it keeps programs in RAM instead of using slower swap space... so it's normal for 'free' memory to be low when using linux. (Though not when using windows). It will swap more unused things out to swap space on the drive if it needs more ram for something else. The last top was using a bit of swap space, but it doesn't seem anything to be concerned about...

    I haven't carefully looked over much of your stuff, but it seems the concern should be that there are 9 zombies after only 51 minutes of uptime in that last top... that's very high, and is likely related to your troubles (right now, my server has been up for 8 days with 0 zombie processes... and it's gone for months without zombies as well). Find out what those processes are and why they aren't working properly.
     
  4. GOT

    GOT Get Proactive!

    Joined:
    Apr 8, 2003
    Messages:
    900
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Norfolk, VA
    cPanel Access Level:
    DataCenter Provider
    There really is not enough info here to give you a solid answer.

    You MIGHT want to look at the cpu usage stats in WHM as that can give you a rough feel for overall waht is taking up a lot of cpu cycles.

    I know that you indicated that you had some people come in and take a look, were they not able to tell you anything useful?
     
  5. bimal

    bimal Member

    Joined:
    Jan 1, 2002
    Messages:
    7
    Likes Received:
    0
    Trophy Points:
    1
    I had same problem. I deleted and moved huge log files. It is working now 15 days with out any problem. Try this

    1. Cleanup /tmp
    2. Ceanup your huge log files.

    find . -size +100000 -print
     
    #5 bimal, Jan 7, 2005
    Last edited: Jan 7, 2005
  6. Alexandre Duran

    Alexandre Duran Well-Known Member

    Joined:
    May 6, 2003
    Messages:
    61
    Likes Received:
    0
    Trophy Points:
    6
    Location:
    Rio de Janeiro - BRAZIL
    Execute the command:

    df

    And post the results here.
     
  7. wimp

    wimp Well-Known Member

    Joined:
    Jul 13, 2002
    Messages:
    301
    Likes Received:
    0
    Trophy Points:
    16
    i already empty the log files form server but i will go around to see what else is there. thanks
     
  8. philb

    philb Well-Known Member

    Joined:
    Jan 28, 2004
    Messages:
    116
    Likes Received:
    0
    Trophy Points:
    16
    Did you reboot the server or is this something it did 'by itself' ?
     
  9. wimp

    wimp Well-Known Member

    Joined:
    Jul 13, 2002
    Messages:
    301
    Likes Received:
    0
    Trophy Points:
    16
    hi,
    how cna i see the associate processes for those zombies ? I currenlty have 11 ather only 30 mins.

    thanks
     
  10. GOT

    GOT Get Proactive!

    Joined:
    Apr 8, 2003
    Messages:
    900
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Norfolk, VA
    cPanel Access Level:
    DataCenter Provider
    ps -ax | grep defunct
     
  11. Promethyl

    Promethyl Well-Known Member

    Joined:
    Mar 27, 2004
    Messages:
    68
    Likes Received:
    0
    Trophy Points:
    6
    Processor util is pretty damn high. I mean, it's not 300 or anything, but I'd stay as far from 100 as possible.

    What about mysql... is the max_connections_per_hour getting you?
    ( http://www.fedoraforum.org/forum/showthread.php?t=26074 )

    hrm...

    Using AFP? Turn off the drop.
    http://www.crucialparadigm.com/reso...ng/changing-apf-log-for-tdp-udp-tcp-drops.php

    General search:
    http://www.google.com/search?q=serv...ient=firefox-a&rls=org.mozilla:en-US:official

    Good luck to you mate. If push comes to shove... review the iptables/ifconfig/ipf/afp rules... If you have to, get a new box, and migrate all the accounts.

    Having customers means you have very little time to play games.
     
  12. wish

    wish Member

    Joined:
    Aug 14, 2003
    Messages:
    9
    Likes Received:
    0
    Trophy Points:
    1
    There is a known but poorly documented VM bug in early RHE3 kernels. It caused us some problems for a while and shows up as high I/O wait states and system crashes. The two latest kernel updates fix it. If you haven't solved the problem and you're using an "older" RHE3 kernel, this may be worth looking into.
     
  13. Promethyl

    Promethyl Well-Known Member

    Joined:
    Mar 27, 2004
    Messages:
    68
    Likes Received:
    0
    Trophy Points:
    6
    So are you a proponent for rebuilding/updating the system kernel?

    For future reference, could you provide a link for how to update the kernel for the users?
     
  14. dezignguy

    dezignguy Well-Known Member

    Joined:
    Sep 26, 2004
    Messages:
    534
    Likes Received:
    0
    Trophy Points:
    16
    Hmm, he does seem to have a high I/O wait percentage... so either there are heavily disk intensive scripts/programs running, or he's running on an old (insecure) kernel with the RHE VM bug.

    run 'uname -a' and make sure that you're running on the latest kernel.. currently "2.4.21-27.0.1.EL".

    Promethyl, keeping the kernel up to date is highly recommended... it fixes bugs, improves performance, and fixes serious security issues. That's using the standard RHN/up2date kernels though. Running on a bleeding edge/custom compiled kernel isn't to be done unless you really know what you're doing though.
     
Loading...

Share This Page