[CPANEL-21627] Chkservd reports service failures during graceful reboots

leadwatch

Registered
Jun 11, 2018
4
0
1
NJ, USA
cPanel Access Level
Root Administrator
Every time I do a graceful reboot in WHM, a few minutes later, I receive a cPanel alert email for each service that says it failed. The message is the same for all services. The alerts typically only come in once, but sometimes a few will be sent a second time. After that they stop.

For example:
Service Check Raw Output:
The “mysql” service is down.
The subprocess “/usr/local/cpanel/scripts/restartsrv_mysql” reported error number 255 when it ended.

Server is running CENTOS 7.5 kvm, cPanel/WHM v70.0.48 (though the issue has been happening for months, seemingly independent of version). The setup is default from the host. I do updates and the associated graceful restarts (which is when I experience the issue) as needed.

I'm relatively new to server administration. I've been searching for a solution for this but have come up empty other than turning off failure alerts altogether, which I don't want to do.

Does anyone know why this happens and how I can fix it without turning off alerts?
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,267
463
Hello @leadwatch,

This can happen after a reboot when the service monitoring process (Chksrvd) starts before the other services. You can try increasing the default value of "3" to a value such as "4" for the following option under the System tab in WHM >> Tweak Settings:

ChkServd TCP check failure threshold

Per it's description:

The number of times a ChkServd TCP check must fail before notification is sent and the service is restarted. On heavily loaded systems these types of service checks fail occasionally, producing erroneous indications that services are down. A value of 3 or higher is recommended for most systems.

Thank you.
 

leadwatch

Registered
Jun 11, 2018
4
0
1
NJ, USA
cPanel Access Level
Root Administrator
Hi Michael,

I set the ChkServd TCP check failure threshold to 5 and rebooted the server to test. It did not fix the issue. I still got a series of alert emails for failed services right after reboot. Do you have any other suggestions?

Thanks.
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,267
463
Hello @leadwatch,

To update, per support ticket 9730779, it looks like the services were not actually failing. Instead the service checks from Chkservd occurred while the server was in the process of shutting down and thus the notifications were sent when the server booted back up.

Thank you.
 

leadwatch

Registered
Jun 11, 2018
4
0
1
NJ, USA
cPanel Access Level
Root Administrator
the services were not actually failing. Instead the service checks from Chkservd occurred while the server was in the process of shutting down and thus the notifications were sent when the server booted back up
Yes, this is the problem I'm having. I suspected the services were not actually failing, but it's nice to have confirmation. However, that still leaves me with the same problem - a flood of unnecessary and unwanted failure emails every time I restart the server.

Response from cPanel support:
It was not a problem! I do believe there could be room for improvement here in where we possibly suspend chkservd if cpsrvd issues a shutdown/reboot (aka, doing a graceful restart from within WHM). Then on the tailwatchd start up process we can check to see if there is a suspended state and 'unsuspend' based on how long has passed. I am going to do some testing on this and will be sure to update this ticket if an improvement case gets pushed out for this.

Regarding the flood of email, I am unfortunately not seeing anything specific that can be changed to alter that behavior at the moment. The improvement case would definitely prevent that from happening so frequently as well. We greatly appreciate your understanding.
Has anyone in the community experienced this issue? Does anyone know of a workaround or a setting that may be causing this?
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,267
463
Hello @leadwatch,

We do have an internal case open (CPANEL-21627) that would address this issue by suspending Chkservd (the service monitoring daemon) upon initiating a graceful reboot through Web Host Manager. I'll monitor this case and update this thread with more information on it's status as it becomes available. There's no workaround to report at this time, but you can safely ignore the notifications that are sent during the time the server is rebooting.

Thank you.