Services fail and recover constantly

ContinuIT

Member
Mar 9, 2019
15
0
1
Ijamsville, MD
cPanel Access Level
Root Administrator
Since yesterday, cpanel keeps sending notifications that nameserver, spamd, cpanel-dovecot-solr and clamd have failed. They recover in about 4 minutes, but 10 minutes later it happens again. There are no updates pending, and as far as we can tell, resources are fine.

/etc/redhat-release:CentOS Linux release 7.6.1810 (Core)
/usr/local/cpanel/version:11.78.0.16
/var/cpanel/envtype:kvm
CPANEL=release

This is from one of the errors

Service Check Raw Output

(XID zfwq28) The “clamd” service is down.

The subprocess “/usr/local/cpanel/scripts/restartsrv_clamd” reported error number 3 when it ended.

Startup Log

Mar 09 08:50:41 colony2.example.com systemd[1]: Starting clamd antivirus daemon...
Mar 09 08:51:08 colony2.example.com systemd[1]: Started clamd antivirus daemon.
Mar 09 09:06:10 colony2.example.com systemd[1]: clamd.service: main process exited, code=killed, status=9/KILL
Mar 09 09:06:11 colony2.example.com systemd[1]: Unit clamd.service entered failed state.
Mar 09 09:06:11 colony2.example.com systemd[1]: clamd.service failed.

Memory Information

Used

1.33 GB

Available

2.37 GB

Installed

3.7 GB

Load Information

0.93 4.65 4.40

Uptime

13 hours, 6 minutes, and 11 seconds

IOStat Information


avg-cpu: %user %nice %system %iowait %steal %idle 4.09 0.06 2.23 1.79 0.47 91.36 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn vda 251.74 6900.76 1433.97 325509870 67640632

I also received a High 5 minute load average alert - 6.51 from one site, but not every time the services go down. It seems to be the only other thing happening, but when checking, the loads seem fine

Time: Sat Mar 9 08:47:00 2019 -0500

1 Min Load Avg: 27.28

5 Min Load Avg: 6.51

15 Min Load Avg: 4.27

Running/Total Processes: 48/521


Anyone has any idea where to look for the problem?
 
Last edited by a moderator:

ContinuIT

Member
Mar 9, 2019
15
0
1
Ijamsville, MD
cPanel Access Level
Root Administrator
I increased the memory to one of the wordpress installations for which I was receiving an email once in a while for too much resources usage and that seems to have stopped the problem. Weird as they seemed unrelated, but there you go.
I think we should hold down to see if it happens again before you guys invest time in the issue.