So this used to be a very occasional problem I thought could be attributed to memory issues, but while I've solved the memory problem and it appeared stable for a stretch, we're back to this problem again, but now more often. At least daily, sometimes once every couple days, I'll see alerts that multiple services have failed. CPSRVD is a usual suspect, as is the Bind nameserver. clamd, IMAP, and Exim sometimes follow shortly thereafter. The weird part is, all of the services are still running... it's just that occasionally, they like to throw mostly unhelpful errors.
What I believe is happening is something all of the services that fail commonly use, possibly for authentication, is failing and it's only after CPanel goes through the usual restart routines and stumbles on the thing that's failing that the issue actually self-corrects, until the next time. But until CPanel hits on the service that's failing, I'll see things like this.
dovecot: auth: Error: auth worker: Aborted PASSV request for [email protected]: Shutting down
dovecot: auth: Error: net_connect_unix(anvil-auth-penalty) failed: Permission denied
Or, from another affected service:
2019-02-19 21:38:07.244 [31472] 1gwHlt-0008BS-4Z == [email protected] R=virtual_user T=dovecot_virtual_delivery_no_batch defer (-44): LMTP error after RCPT TO:<[email protected]>: 451 4.3.0 <[email protected]> Temporary internal error
On the occasion I try to access a site on this server when these issues are on-going, I'll get a 503 error instead of what I'm trying to access. As said, this is all very temporary--usually resolved in about 5 minutes from when CPanel notices, but I'd be curious to know what causes it and if there's any way it can be avoided. The only thing my own digging has been able to uncover is that all of the affected services seem to rely on one common mechanism, and it's that mechanism rather than the individual services that has fallen sideways.
What I believe is happening is something all of the services that fail commonly use, possibly for authentication, is failing and it's only after CPanel goes through the usual restart routines and stumbles on the thing that's failing that the issue actually self-corrects, until the next time. But until CPanel hits on the service that's failing, I'll see things like this.
dovecot: auth: Error: auth worker: Aborted PASSV request for [email protected]: Shutting down
dovecot: auth: Error: net_connect_unix(anvil-auth-penalty) failed: Permission denied
Or, from another affected service:
2019-02-19 21:38:07.244 [31472] 1gwHlt-0008BS-4Z == [email protected] R=virtual_user T=dovecot_virtual_delivery_no_batch defer (-44): LMTP error after RCPT TO:<[email protected]>: 451 4.3.0 <[email protected]> Temporary internal error
On the occasion I try to access a site on this server when these issues are on-going, I'll get a 503 error instead of what I'm trying to access. As said, this is all very temporary--usually resolved in about 5 minutes from when CPanel notices, but I'd be curious to know what causes it and if there's any way it can be avoided. The only thing my own digging has been able to uncover is that all of the affected services seem to rely on one common mechanism, and it's that mechanism rather than the individual services that has fallen sideways.