TailWatch / Service Manager has stopped monitoring MySQL

Feb 24, 2019
12
2
3
London
cPanel Access Level
Root Administrator
Hi,

I'm hoping someone can advise me, please.

Since June 2018 I have been Administrator of a cPanel server with one WordPress website using it. Occasionally we have an "abusive user" that hammers the facilities on the website in a stupid way and the MySQL server snarls up in a state of "statistics" or "waiting for table level lock".

From June until late November Service Monitor/chkservd used to restart the MySQL server and clear the processes, I would receive an email saying:
  • "MYSQL appears to be down", or;
  • "the service MySQL failed to restart"
  • followed closely by an email saying "MySQL is now operational"
Everyone else using the site would experience minimum disruption. From early December Service Monitor/chkservd has apparently stopped monitoring MySQL, and the first I know of an abusive user stalling the website is when Service Monitor/chkservd email me to say "http is unresponsive". I check the site and it's unusable, with MySQL processes showing a state of "statistics" or "waiting for table level lock" for an indefinite length of time, and I have to manually restart MySQL to clear it.

We have identified a root of the problem itself, but I have to wait for the developers etc to be available to work on a fix. In the meantime I'd be deeply appreciative if Service Monitor/chkservd went back to restarting the MySQL server automatically, so I'm not having to spend my entire life manually monitoring the website/server for disruption!

  • Nothing has been changed in the configuration of cPanel.
  • In Service Monitor MySQL is checked under both "Enable" and "Monitor".

Does anyone have any ideas how I can make it work again please?

By the way:
  • cPanel v76.0.20
  • MySQL version 5.6
Thanks!
 
Last edited by a moderator:

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston
Hi @TwilightZoneCP


If the service manager shows that the MySQL service is in fact monitored + enabled it should be automatically restarting the service after failure. Is anything MySQL related as far as the service restart failures in /var/log/chkservd.log?
 
Feb 24, 2019
12
2
3
London
cPanel Access Level
Root Administrator
Here is an example from /var/log/chkservd.log from when Service Manager was successfully monitoring and restarting MySQL...
Code:
Service Check Started
Loading services .....apache_php_fpm....cpanellogd....cpdavd....cphulkd....cpsrvd....crond....dnsadmin....exim....ftpd....httpd....imap....ipaliases....lmtp....mailman....mysql....named....nscd....pop....queueprocd....rsyslogd....spamd....sshd..Done
[2018-10-19 13:51:29 +0100] Disk check .... / (/) [19.77%] ... /var/tmp (/var/tmp) [5.46%] ... /tmp (/tmp) [5.46%] ... {status:eek:k} ... Done
[2018-10-19 13:51:29 +0100] OOM check ......OOM Event:[anon_rss=308488kB,file_rss=0kB,is_cgroup=0,pid=29324,proc_name=mysqld,score=108,seconds_since_boot=697213.249977,time=1539953265,total_vm=9319476kB,uid=993,user=mysql]....OOM Event:[anon_rss=2824kB,file_rss=0kB,is_cgroup=0,pid=1653,proc_name=named,score=9,seconds_since_boot=697213.367078,time=1539953265,total_vm=755852kB,uid=25,user=named].....Skipped OOM Notification (too soon)......Skipped OOM Notification (too soon)...... Done
[2018-10-19 13:51:29 +0100] Service check ....
queueprocd [[check command:+][socket connect:N/A]]...
sshd [[check command:+][socket connect:N/A]]...
spamd [[check command:+][socket connect:N/A]]...
rsyslogd [[check command:+][socket connect:N/A]]...
pop [[check command:+][socket connect:+]]...
p0f [[check command:N/A][socket connect:N/A]]...
nscd [[check command:+][socket connect:N/A]]...
named [[check command:-][check command output:(XID wc7gm3) The “named” service is down.
The subprocess “/usr/local/cpanel/scripts/restartsrv_named” reported error number 255 when it ended.][socket connect:N/A][fail count:1]Restarting named....
[notify:failed service:nameserver]]...
mysql [[check command:-][check command output:(XID eg4ua2) The “mysql” service is down.
The subprocess “/usr/local/cpanel/scripts/restartsrv_mysql” reported error number 255 when it ended.][socket connect:N/A][fail count:1]Restarting mysql....
[notify:failed service:mysql]]...
mailman [[check command:+][socket connect:N/A]]...
lmtp [[check command:+][socket connect:+]]...
ipaliases [[check command:+][socket connect:N/A]]...
interval [[check command:N/A][socket connect:N/A]]...
imap [[socket_service_auth:1][check command:+][socket connect:+]]...
httpd [[check command:N/A][socket connect:+]]...
ftpd [[socket_service_auth:1][check command:+][socket connect:+]]...
exim [[check command:+][socket connect:+]]...
dnsadmin [[http_service_auth:1][check command:+][socket connect:+]]...
crond [[check command:+][socket connect:N/A]]...
cpsrvd [[http_service_auth:1][check command:N/A][socket connect:+]]...
cphulkd [[check command:+][socket connect:+]]...
cpdavd [[http_service_auth:1][check command:+][socket connect:+]]...
cpanellogd [[check command:+][socket connect:N/A]]...
cpanel_php_fpm [[check command:N/A][socket connect:N/A]]...
apache_php_fpm [[check command:+][socket connect:N/A]]...Done
Service Check Finished
Here is an example of the /var/log/chkservd.log from a time when MySQL was stuck for almost 2 hours and I had to manually restart MySQL myself to resolve the issue...
Code:
Service Check Started
Loading services .....apache_php_fpm....cpanellogd....cpdavd....cphulkd....cpsrvd....crond....dnsadmin....exim....ftpd....httpd....imap....ipaliases....lmtp....mailman....mysql....named....nscd....pop....queueprocd....rsyslogd....spamd....sshd..Done
[2019-02-24 08:13:23 +0000] Disk check .... / (/) [69.83%] ... /tmp (/tmp) [5.46%] ... /var/tmp (/var/tmp) [5.46%] ... {status:eek:k} ... Done
[2019-02-24 08:13:23 +0000] OOM check ....Done
[2019-02-24 08:13:23 +0000] Service check ....
queueprocd [[check command:+][socket connect:N/A]]...
sshd [[check command:+][socket connect:N/A]]...
spamd [[check command:+][socket connect:N/A]]...
rsyslogd [[check command:+][socket connect:N/A]]...
pop [[check command:+][socket connect:+]]...
p0f [[check command:N/A][socket connect:N/A]]...
nscd [[check command:+][socket connect:N/A]]...
named [[check command:+][socket connect:N/A]]...
mysql [[check command:+][socket connect:N/A]]...
mailman [[check command:+][socket connect:N/A]]...
lmtp [[check command:+][socket connect:+]]...
ipaliases [[check command:+][socket connect:N/A]]...
imap [[socket_service_auth:1][check command:+][socket connect:+]]...
httpd [Service check failed to complete
Timeout while trying to get data from service: Died[check command:N/A][socket connect:-][socket failure threshold:16/3][fail count:14]Restarting httpd....
[notify:failed service:httpd]]...
ftpd [[socket_service_auth:1][check command:+][socket connect:+]]...
exim [[check command:+][socket connect:+]]...
dnsadmin [[http_service_auth:1][check command:+][socket connect:+]]...
crond [[check command:+][socket connect:N/A]]...
cpsrvd [[http_service_auth:1][check command:N/A][socket connect:+]]...
cphulkd [[check command:+][socket connect:+]]...
cpgreylistd [[check command:N/A][socket connect:N/A]]...
cpdavd [[check command:+][socket connect:N/A]]...
cpanellogd [[check command:+][socket connect:N/A]]...
cpanel_php_fpm [[check command:N/A][socket connect:N/A]]...
apache_php_fpm [[check command:+][socket connect:N/A]]...Done
Service Check Finished
Does this shed any light on the matter? It all means next to nothing to me! Thank you!
 
Last edited by a moderator:

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston
Hi @TwilightZoneCP

That output in the second code box shows it seeing/checking MySQL but obviously not seeing it being down. I'm curious what the output of the following is:

Code:
ps faux |grep -i mysql
 
Feb 24, 2019
12
2
3
London
cPanel Access Level
Root Administrator
Here is the output....

# ps faux |grep -i mysql
root 25697 0.0 0.0 112708 996 pts/0 S+ 15:13 0:00 \_ grep --color=auto -i mysql
mysql 7454 0.0 0.0 113312 1292 ? Ss Feb25 0:00 /bin/sh /usr/bin/mysqld_safe
mysql 7701 1.9 11.5 6074220 926084 ? Sl Feb25 26:12 \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plugin-dir=/usr/lib64/mysql/plugin --log-error=serversaddress.com.err --open-files-limit=10000 --pid-file=serversaddress.com.pid
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston
Hi @TwilightZoneCP

Nothing weird there, all looks as it should be, you don't have a custom MySQL dir so that rules that out. Have you ever made any changes to the my.cnf? If not can you please open a ticket using the link in my signature? Once open please reply with the Ticket ID here so that we can update this thread with the resolution once the ticket is resolved.


Thanks!
 
Feb 24, 2019
12
2
3
London
cPanel Access Level
Root Administrator
This is in the my.cnf...

[mysqld]
performance-schema=on
default-storage-engine=MyISAM
slow_query_log=1
slow_query_log_file=slow-query.log
innodb_file_per_table=1
max_allowed_packet=268435456
innodb_buffer_pool_instances=4
query_cache_size=0
query_cache_type=0
query_cache_limit=1M
tmp_table_size=16M
max_heap_table_size=16M
innodb_buffer_pool_size=4G
innodb_log_file_size=2G
open_files_limit=10000
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston
Hi @TwilightZoneCP

You're most welcome and I'm sure we'll be able to get to the bottom of this soon. I'll update this thread with the outcome of the ticket as soon as more information is available.


Thanks!
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,300
363
Houston
Hello,


I checked in on this ticket today and it looks like the service was indeed being monitored but the restart wasn't occurring due to the fact it was waiting on a table level lock on a database to be cleared. The advice from the analyst was to look at switching to InnoDB for row level locking.
 
Feb 24, 2019
12
2
3
London
cPanel Access Level
Root Administrator
It appears so. Also, for anyone else reviewing this thread in future, I upgraded MySQL from version 5.6 to version 5.7 and testing suggests that this has resolved the "table level lock" issue, as the server is handling much bigger requests without issue.
 
  • Like
Reactions: cPanelLauren