Hang, Failed, Recovered and Out of Memory

Scott Baird

Member
Feb 18, 2016
17
0
1
Spanish Fork, UT
cPanel Access Level
Root Administrator
  • CENTOS 6.4 x86_64 standard – webserver
  • WHM 60.0 (build 31)
  • Load Averages: 0.01 0.07 0.08
Ever since last month our server has been crashing on and off. Here are a bunch of emails we received during the latest crash. What is causing these issues and how can I fix?

1. Subject: RECOVERED: clamd (10.0.0.1)
Code:
The service “clamd” is now operational.

Server webserver.xyz.com
Primary IP Address 10.0.0.1
Service Name clamd
Service Status recovered
Notification The service “clamd” is now operational.
Service Check Raw Output The 'clamd' service passed the check.
Startup Log No startup log
Memory Information
Used 695 MB
Available 2.96 GB
Installed 3.64 GB
Load Information 0.10 19.66 71.82
Uptime 32 days, 5 hours, 52 minutes, and 44 seconds
IOStat Information avg-cpu:  %user   %nice %system %iowait  %steal   %idle            1.35    0.02    0.21    1.10    0.00   97.32 Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn sdb              16.09      1094.43       203.80 3049039383  567781500 sda              15.97      1092.30       203.80 3043113078  567781500 md127            32.52       391.62       203.39 1091048192  566625206
Top Processes
PID Owner CPU % Memory % Command
24823 mysql 0.07 1.25 /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/webserver.xyz.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/webserver.xyz.com.pid
10533 nscd 0.01 0.05 /usr/sbin/nscd
1664 root 0.01 0.01 irqbalance
25008 root 0.00 11.14 /usr/local/cpanel/3rdparty/bin/clamd
10358 root 0.00 0.48 cPhulkd - processor
2. HANG: ⚠: chkservd (10.0.0.1) --- 10 consecutive emails saying the same thing.
Code:
The chkservd subprocess with PID “24249” ran for “10 minutes and 19 seconds”. The system terminated this sub-process when it exceeded the time allowed between checks, which is “5 minutes”. To determine why, check the “  /var/log/chkservd.log ” and “  /usr/local/cpanel/logs/tailwatchd_log ” files.

You likely received this notification as a symptom of a larger problem. If your server is experiencing a high load, we recommend that you investigate the cause. If you continue to receive this notification, it is likely that your system is unable to handle demand or there is a misconfiguration that delays restarts.

If you are sure that no misconfigurations exist, you should consider gradually increasing the following options in WHM’s “Tweak Settings” feature: “The number of times chkservd will allow a previous check to complete before terminating the check”, “The number of seconds between chkservd service checks”, or both. (https://webserver.xyz.com:2087/scripts2/tweaksettings?find=chkservd)

Notification Type    hang ⚠
Server    webserver.xyz.com
Primary IP Address    10.0.0.1
Service    chkservd
Memory Information  
Used    3.52 GB
Available    121 MB
Installed    3.64 GB
Load Information    91.03 95.95 105.82
Uptime    32 days, 5 hours, 7 minutes, and 12 seconds
IOStat Information    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.35    0.02    0.21    1.04    0.00   97.39
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb              15.96      1093.60       203.17 3043751367  565482516
sda              15.83      1091.36       203.17 3037525958  565483636
md127            32.04       388.10       202.76 1080186088  564335598
ChkServd Version    17.0
Top Processes  
PID    Owner    CPU %    Memory %    Command
23246    mysql    0.20    0.77    /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/webserver.xyz.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/webserver.xyz.com.pid
22471    xyzweb    0.07    0.87    /usr/bin/php /home/xyzweb/public_html/index.php
22297    xyzweb    0.07    0.69    /usr/bin/php /home/xyzweb/public_html/index.php
24593    root    0.07    0.00    [iostat]
22300    xyzweb    0.06    1.25    /usr/bin/php /home/xyzweb/public_html/index.php
Configure chkservd:
https://webserver.xyz.com:2087/scripts2/tweaksettings?find=chkservd

Disable HTML notifications:
https://webserver.xyz.com:2087/scripts2/tweaksettings?find=chkservd_plaintext_notify

Preview of “cpanel_chkservd_log_tail.txt”
Loading services .....clamd..Service Check Started
The previous service check is still running (309 second). It will be terminated if still hanging after 2 check intervals. (1/2)
Service Check Started
The previous service check was still running (1137 second). It was terminated.
Service Check Started
[2017-01-07 20:06:13 -0700] Disk check .... / (/) [10.49%] ... /var/tmp (/var/tmp) [8.46%] ... /var/named/chroot/etc/named.rfc1912.zones (/var/named/chroot/etc/named.rfc1912.zones) [10.49%] ... /var/named/chroot/etc/named (/var/named/chroot/etc/named) [10.49%] ... /var/named/chroot/etc/named.root.key (/var/named/chroot/etc/named.root.key) [10.49%] ... /tmp (/tmp) [8.46%] ... /var/named/chroot/etc/rndc.key (/var/named/chroot/etc/rndc.key) [10.49%] ... /var/named/chroot/etc/named.iscdlv.key (/var/named/chroot/etc/named.iscdlv.key) [10.49%] ... /var/named/chroot/usr/lib64/bind (/var/named/chroot/usr/lib64/bind) [10.49%] ... /boot (/boot) [12.73%] ... {status:ok} ... Done
..imap....ipaliases....lmtp....mailman....mysql....named....nscd....pop....queueprocd....rsyslogd....sshd..Done
The previous service check is still running (496 second). It will be terminated if still hanging after 2 check intervals. (1/2)
Loading services .....clamd....cpanellogd....cpdavd....cphulkd....cpsrvd....crond....dnsadmin....exim....ftpd....httpd..Service Check Started
The previous service check was still running (1015 second). It was terminated.
Preview of “cpanel_tailwatchd_log_tail.txt”
[19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] Restored /var/log/maillog (size:2491642) to 2491642 (requested 2491642)
[19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] Restored /usr/local/apache/logs/modsec_audit.log (size:0) to 0 (requested 0)
[19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] /var/log/exim_mainlog opened with inode 18874454
[19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] /var/log/maillog opened with inode 18874490
[19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] /usr/local/apache/logs/modsec_audit.log opened with inode 19140862
[19529] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [START] 19529 1483706956
[19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] The tailwatchd driver 'Cpanel::TailWatch::JailManager' is not enabled.
[19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch::Eximstats] Loading email sending limits from 1483704000 - 1483707600
[19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] inotify enabled. watch file is /var/cpanel/.tailwatchd_inotify_alarm_trick
[19523] [2017-01-06 05:49:16 -0700] [Cpanel::TailWatch] [INFO] Opened /usr/local/cpanel/logs/tailwatchd_log in append mode

3. FAILED: clamd (10.0.0.1)
Code:
The service “clamd” appears to be down.
Server    webserver.xyz.com
Primary IP Address    10.0.0.1
Service Name    clamd
Service Status    failed ⛔
Notification    The service “clamd” appears to be down.
Service Check Method    The system’s command to check or to restart this service failed.
Number of Restart Attempts    1
Startup Log    No startup log
Memory Information  
Used    701 MB
Available    2.96 GB
Installed    3.64 GB
Load Information    41.64 94.60 119.13
Uptime    32 days, 5 hours, 44 minutes, and 57 seconds
IOStat Information    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           1.35    0.02    0.21    1.10    0.00   97.32
Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sdb              16.09      1094.58       203.82 3048955631  567728412
sda              15.97      1092.46       203.82 3043046502  567728412
md127            32.52       391.63       203.40 1090897864  566572302
Top Processes  
PID    Owner    CPU %    Memory %    Command
24823    mysql    0.07    1.22    /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --user=mysql --log-error=/var/lib/mysql/webserver.xyz.com.err --open-files-limit=10000 --pid-file=/var/lib/mysql/webserver.xyz.com.pid
10533    nscd    0.01    0.05    /usr/sbin/nscd
1664    root    0.01    0.01    irqbalance
25008    root    0.00    11.14    /usr/local/cpanel/3rdparty/bin/clamd
10358    root    0.00    0.48    cPhulkd - processor

4. Out of memory: ⚠ The process “php” was terminated because the system is low on memory. --- 17 consecutive email saying the same thing, but with a different PID log files attached.
Code:
In order to avoid a system crash due to low memory, the kernel terminated the process named “php” with the PID“22469”.

Server webserver.xyz.com
Primary IP Address 10.0.0.1
Process Name php
Event Time Sunday, January 8, 2017 at 1:27:17 AM UTC
PID 22469
Process UID 516
Process Username xyzweb
Process Total Virtual Memory 251428kB
Process Anonymous Resident Set Size 53648kB
Process File Resident Set Size 1072kB
Process OOM Score 12
Status Out of Memory ⚠
Memory Information
Used 3.51 GB
Available 129 MB
Installed 3.64 GB
Load Information 95.15 89.00 98.80
Uptime 32 days, 5 hours, 10 minutes, and 35 seconds
IOStat Information avg-cpu:  %user   %nice %system %iowait  %steal   %idle            1.35    0.02    0.21    1.05    0.00   97.38 Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn sdb              15.98      1093.71       203.25 3044301079  565748596 sda              15.85      1091.43       203.25 3037962566  565749724 md127            32.09       388.42       202.84 1081159696  564599694
Top Processes
PID Owner CPU % Memory % Command
23828 xyzweb 0.04 1.40 /usr/bin/php /home/xyzweb/public_html/wp-login.php
22300 xyzweb 0.06 1.26 /usr/bin/php /home/xyzweb/public_html/index.php
22444 xyzweb 0.06 1.16 /usr/bin/php /home/xyzweb/public_html/index.php
22363 xyzweb 0.06 1.11 /usr/bin/php /home/xyzweb/public_html/index.php
22361 xyzweb 0.06 1.08 /usr/bin/php /home/xyzweb/public_html/index.php

For addtional details, see the attached dmesg log dump.

Preview of “oom_dmesg.txt”
[2774660.076765] Out of memory: Kill process 22469 (php) score 12 or sacrifice child
[2774660.077188] Killed process 22469, UID 516, (php) total-vm:251428kB, anon-rss:53648kB, file-rss:1072kB
 

Attachments

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,268
463