Thank you looking into this.
I have a managed Cloud VPS with Inmotion Hosting which has 4 websites (
Server Version: Apache/2.4.38 (cPanel) OpenSSL/1.0.2q mod_bwlimited/1.4 mod_cpanel/1.4 Server MPM: event), and have been having this issue every few days since around end of 2018. WHM is set to update each day, so I'm currently running 78.0.17.
cphulkd, httpd, apache_php_fpm, cpanellogd, crond, exim, cpservd are all examples of the services I get notified are down.
I also find that if I set the off-peak times for CPBackup, Backup and UPCP to my preferred times, they switch back to other different times after a couple of days.
IMH Tech support thought it was a memory issue, but later confirmed I had not exceeded my allocation. I ended up with a 5 day trial on the next package up with double the memory (3GB RAM burstable as needed to 6GB) and the failure notifications seemed less, so I upgraded permanently. However every few days I still get notifications about services failing and my websites are still going down. It is so frustrating.
Support have forgotten about it now. The one thing they said was that a process had taken CPHulk down, but the process ID wasn't listed anywhere to tell them what it was.
With today's investigation, Chkservd (this service that sent the email, it's what makes sure other services are online) did correctly determine that cPHulk was offline, during that time period, while every other service was online. Since we only see that cPHulk failed, I checked /usr/local/cpanel/logs/cphulkd.log to try and find if this service logged why it was killed. We found the following:
[2019-02-15 05:20:18 +0000] info [cPhulkd] DB processor shutdown via SIGTERM with pid 6181
[2019-02-15 05:20:18 +0000] info [cPhulkd] processor shutdown via SIGTERM with pid 929
[2019-02-15 05:35:06 +0000] info [cPhulkd] processor startup with pid 7152
[2019-02-15 05:35:06 +0000] info [cPhulkd] DB processor startup with pid 7593
While it is normal for cPHulk's DB processor to be started and stopped, the processor itself should be remaining online. The above logs show that a process with an ID 929 was what killed cPHulk. Unfortunately, just an hour later and no such process is running as ID 929 any longer, meaning now we can't tell what external process had issued this SIGTERM and killed cPHulk.
OOM kills originate from the kernel, not as some process ID, so that rules the low memory/RAM theory out.
To give us better resources to help dig deeper into these service downtime events, I've temporarily installed some advanced logging via cPanel System Snapshot, which does take process logs every few minutes, and keeps them for 24 hours. If you recieve another service downtime email, reply to us again just like you did today, and hopefully with this more advanced logging, we can reach a conclusion and see a resolution.
When my sites and services go down, WHM Service Status says all the processes are running yet when when I click on Apache Status it says it is not responding.
If I manually restart Apache via WHM, it comes back online immediately, so it's frustrating CPanel can't achieve the same thing.