Please whitelist cPanel in your adblocker so that you’re able to see our version release promotions, thanks!

The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Hang, Failed, Recovered and Out of Memory

Discussion in 'General Discussion' started by Scott Baird, Jan 9, 2017.

  1. Scott Baird

    Scott Baird Member

    Joined:
    Feb 18, 2016
    Messages:
    17
    Likes Received:
    0
    Trophy Points:
    1
    Location:
    Spanish Fork, UT
    cPanel Access Level:
    Root Administrator
    • CENTOS 6.4 x86_64 standard – webserver
    • WHM 60.0 (build 31)
    • Load Averages: 0.01 0.07 0.08
    Ever since last month our server has been crashing on and off. Here are a bunch of emails we received during the latest crash. What is causing these issues and how can I fix?

    1. Subject: RECOVERED: clamd (10.0.0.1)
    Code:
    The service “clamd” is now operational.
    
    Server webserver.xyz.com
    Primary IP Address 10.0.0.1
    Service Name clamd
    Service Status recovered
    Notification The service “clamd” is now operational.
    Service Check Raw Output The 'clamd' service passed the check.
    Startup Log No startup log
    Memory Information
    Used 695 MB
    Available 2.96 GB
    Installed 3.64 GB
    Load Information 0.10 19.66 71.82
    Uptime 32 days, 5 hours, 52 minutes, and 44 seconds
    IOStat Information avg-cpu:  %user   %nice %system %iowait  %steal   %idle            1.35    0.02    0.21    1.10    0.00   97.32 Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn sdb              16.09      1094.43       203.80 3049039383  567781500 sda              15.97      1092.30       203.80 3043113078  567781500 md127            32.52       391.62       203.39 1091048192  566625206
    Top Processes
    PID Owner CPU % Memory % Command
    24823 mysql 0.07 1.25 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/webserver.xyz.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/webserver.xyz.com.pid
    10533 nscd 0.01 0.05 /usr/sbin/nscd
    1664 root 0.01 0.01 irqbalance
    25008 root 0.00 11.14 /usr/local/cpanel/3rdparty/bin/clamd
    10358 root 0.00 0.48 cPhulkd - processor
    2. HANG: ⚠: chkservd (10.0.0.1) --- 10 consecutive emails saying the same thing.
    Code:
      
    The chkservd subprocess with PID “24249” ran for “10 minutes and 19 seconds”. The system terminated this sub-process when it exceeded the time allowed between checks, which is “5 minutes”. To determine why, check the “  /var/log/chkservd.log ” and “  /usr/local/cpanel/logs/tailwatchd_log ” files.
    
    You likely received this notification as a symptom of a larger problem. If your server is experiencing a high load, we recommend that you investigate the cause. If you continue to receive this notification, it is likely that your system is unable to handle demand or there is a misconfiguration that delays restarts.
    
    If you are sure that no misconfigurations exist, you should consider gradually increasing the following options in WHM’s “Tweak Settings” feature: “The number of times chkservd will allow a previous check to complete before terminating the check”, “The number of seconds between chkservd service checks”, or both. (https://webserver.xyz.com:2087/scripts2/tweaksettings?find=chkservd)
    
    Notification Type    hang ⚠
    Server    webserver.xyz.com
    Primary IP Address    10.0.0.1
    Service    chkservd
    Memory Information  
    Used    3.52 GB
    Available    121 MB
    Installed    3.64 GB
    Load Information    91.03 95.95 105.82
    Uptime    32 days, 5 hours, 7 minutes, and 12 seconds
    IOStat Information    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               1.35    0.02    0.21    1.04    0.00   97.39
    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sdb              15.96      1093.60       203.17 3043751367  565482516
    sda              15.83      1091.36       203.17 3037525958  565483636
    md127            32.04       388.10       202.76 1080186088  564335598
    ChkServd Version    17.0
    Top Processes  
    PID    Owner    CPU %    Memory %    Command
    23246    mysql    0.20    0.77    /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/webserver.xyz.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/webserver.xyz.com.pid
    22471    xyzweb    0.07    0.87    /usr/bin/php /home/xyzweb/public_html/index.php
    22297    xyzweb    0.07    0.69    /usr/bin/php /home/xyzweb/public_html/index.php
    24593    root    0.07    0.00    [iostat]
    22300    xyzweb    0.06    1.25    /usr/bin/php /home/xyzweb/public_html/index.php
    Configure chkservd:
    https://webserver.xyz.com:2087/scripts2/tweaksettings?find=chkservd
    
    Disable HTML notifications:
    https://webserver.xyz.com:2087/scripts2/tweaksettings?find=chkservd_plaintext_notify
    
    Preview of “cpanel_chkservd_log_tail.txt”
    Loading services .....clamd..Service Check Started
    The previous service check is still running (309 second). It will be terminated if still hanging after 2 check intervals. (1/2)
    Service Check Started
    The previous service check was still running (1137 second). It was terminated.
    Service Check Started
    [2017-01-07 20:06:13 -0700] Disk check .... / (/) [10.49%] ... /var/tmp (/var/tmp) [8.46%] ... /var/named/chroot/etc/named.rfc1912.zones (/var/named/chroot/etc/named.rfc1912.zones) [10.49%] ... /var/named/chroot/etc/named (/var/named/chroot/etc/named) [10.49%] ... /var/named/chroot/etc/named.root.key (/var/named/chroot/etc/named.root.key) [10.49%] ... /tmp (/tmp) [8.46%] ... /var/named/chroot/etc/rndc.key (/var/named/chroot/etc/rndc.key) [10.49%] ... /var/named/chroot/etc/named.iscdlv.key (/var/named/chroot/etc/named.iscdlv.key) [10.49%] ... /var/named/chroot/usr/lib64/bind (/var/named/chroot/usr/lib64/bind) [10.49%] ... /boot (/boot) [12.73%] ... {status:ok} ... Done
    ..imap....ipaliases....lmtp....mailman....mysql....named....nscd....pop....queueprocd....rsyslogd....sshd..Done
    The previous service check is still running (496 second). It will be terminated if still hanging after 2 check intervals. (1/2)
    Loading services .....clamd....cpanellogd....cpdavd....cphulkd....cpsrvd....crond....dnsadmin....exim....ftpd....httpd..Service Check Started
    The previous service check was still running (1015 second). It was terminated.
    Preview of “cpanel_tailwatchd_log_tail.txt”
    [19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] Restored /var/log/maillog (size:2491642) to 2491642 (requested 2491642)
    [19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] Restored /usr/local/apache/logs/modsec_audit.log (size:0) to 0 (requested 0)
    [19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] /var/log/exim_mainlog opened with inode 18874454
    [19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] /var/log/maillog opened with inode 18874490
    [19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] /usr/local/apache/logs/modsec_audit.log opened with inode 19140862
    [19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [START] 19529 1483706956
    [19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] The tailwatchd driver 'Cpanel::TailWatch::JailManager' is not enabled.
    [19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch::Eximstats] Loading email sending limits from 1483704000 - 1483707600
    [19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] inotify enabled. watch file is /var/cpanel/.tailwatchd_inotify_alarm_trick
    [19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] Opened /usr/local/cpanel/logs/tailwatchd_log in append mode

    3. FAILED: clamd (10.0.0.1)
    Code:
    The service “clamd” appears to be down.
    Server    webserver.xyz.com
    Primary IP Address    10.0.0.1
    Service Name    clamd
    Service Status    failed ⛔
    Notification    The service “clamd” appears to be down.
    Service Check Method    The system’s command to check or to restart this service failed.
    Number of Restart Attempts    1
    Startup Log    No startup log
    Memory Information  
    Used    701 MB
    Available    2.96 GB
    Installed    3.64 GB
    Load Information    41.64 94.60 119.13
    Uptime    32 days, 5 hours, 44 minutes, and 57 seconds
    IOStat Information    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
               1.35    0.02    0.21    1.10    0.00   97.32
    Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
    sdb              16.09      1094.58       203.82 3048955631  567728412
    sda              15.97      1092.46       203.82 3043046502  567728412
    md127            32.52       391.63       203.40 1090897864  566572302
    Top Processes  
    PID    Owner    CPU %    Memory %    Command
    24823    mysql    0.07    1.22    /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/webserver.xyz.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/webserver.xyz.com.pid
    10533    nscd    0.01    0.05    /usr/sbin/nscd
    1664    root    0.01    0.01    irqbalance
    25008    root    0.00    11.14    /usr/local/cpanel/3rdparty/bin/clamd
    10358    root    0.00    0.48    cPhulkd - processor

    4. Out of memory: ⚠ The process “php” was terminated because the system is low on memory. --- 17 consecutive email saying the same thing, but with a different PID log files attached.
    Code:
      
    In order to avoid a system crash due to low memory, the kernel terminated the process named “php” with the PID“22469”.
    
    Server webserver.xyz.com
    Primary IP Address 10.0.0.1
    Process Name php
    Event Time Sunday, January 8, 2017 at 1:27:17 AM UTC
    PID 22469
    Process UID 516
    Process Username xyzweb
    Process Total Virtual Memory 251428kB
    Process Anonymous Resident Set Size 53648kB
    Process File Resident Set Size 1072kB
    Process OOM Score 12
    Status Out of Memory ⚠
    Memory Information
    Used 3.51 GB
    Available 129 MB
    Installed 3.64 GB
    Load Information 95.15 89.00 98.80
    Uptime 32 days, 5 hours, 10 minutes, and 35 seconds
    IOStat Information avg-cpu:  %user   %nice %system %iowait  %steal   %idle            1.35    0.02    0.21    1.05    0.00   97.38 Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn sdb              15.98      1093.71       203.25 3044301079  565748596 sda              15.85      1091.43       203.25 3037962566  565749724 md127            32.09       388.42       202.84 1081159696  564599694
    Top Processes
    PID Owner CPU % Memory % Command
    23828 xyzweb 0.04 1.40 /usr/bin/php /home/xyzweb/public_html/wp-login.php
    22300 xyzweb 0.06 1.26 /usr/bin/php /home/xyzweb/public_html/index.php
    22444 xyzweb 0.06 1.16 /usr/bin/php /home/xyzweb/public_html/index.php
    22363 xyzweb 0.06 1.11 /usr/bin/php /home/xyzweb/public_html/index.php
    22361 xyzweb 0.06 1.08 /usr/bin/php /home/xyzweb/public_html/index.php
    
    For addtional details, see the attached dmesg log dump.
    
    Preview of “oom_dmesg.txt”
    [2774660.076765] Out of memory: Kill process 22469 (php) score 12 or sacrifice child
    [2774660.077188] Killed process 22469, UID 516, (php) total-vm:251428kB, anon-rss:53648kB, file-rss:1072kB
     

    Attached Files:

  2. cPanelMichael

    cPanelMichael Forums Analyst
    Staff Member

    Joined:
    Apr 11, 2011
    Messages:
    38,658
    Likes Received:
    1,425
    Trophy Points:
    363
    cPanel Access Level:
    Root Administrator
    Hello,

    The server's load average, as referenced in the notification emails, is likely the reason the services are failing. The following thread is a good place to start when troubleshooting the cause of the high load averages:

    Troubleshooting high server loads on Linux servers

    Let us know if this helps.

    Thank you.
     
Loading...

Share This Page