All site on the server go down about the same time every day

rinkleton

Well-Known Member
Jul 16, 2015
116
6
68
Cleveland
cPanel Access Level
Root Administrator
For about a week now, every day sometime between 11am and 3pm Eastern (typically around 2pm), all the sites go down for several minutes on two different servers. It seems like it's limited to sites with SSLs, both cPanel and Let's Encrypt. We have very few sites without SSL so it has been hard to test. Sometimes the outages are staggered for different sites or for end uses with different ISPs, but eventually they all go down together. During an outage I was able to ping and tracert the domain successfully, but some people have said those fail. I don't see anything in the logs that would indicate an issue. We've contacted our hosting company (Liquid Web) and they haven't found any issues. I thought it might be an SSL stapling issue based on some other threads, so I set
Code:
SSLStaplingResponderTimeout 11
per a recommendation, but that didn't help. I will try turning it off next. But I thought if it was an SSL stapling issue, only one of the CAs would be affected.

I'm starting to run out of ideas, any suggestions of where to look?
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston
What is the status of the following when this occurs?

Code:
sar -r
Code:
free -m
Code:
apachectl status
I also wouldn't suggest disabling SSL Stapling - but identify the issue. There are other workaround for SSL stapling issues depending on what's going on as well and it shouldn't crash the server.
 

rinkleton

Well-Known Member
Jul 16, 2015
116
6
68
Cleveland
cPanel Access Level
Root Administrator
I'm doing an experiment to see if it is stapling. I turned it off yesterday and did not have any sites go down. I've re-enabled today so I'll see if the issue comes back. But so far it does look like stapling is the issue. I will try the commands you recommend tomorrow.

I am seeing this error
Code:
[Fri May 15 09:06:39.221445 2020] [ssl:error] [pid 5225:tid 47083512002304] (70007)The timeout specified has expired: [client 5.45.82.205:46424] AH01985: error reading response from OCSP server
[Fri May 15 09:06:39.221494 2020] [ssl:error] [pid 5225:tid 47083512002304] AH01941: stapling_renew_response: responder error
and
Code:
[Fri May 15 09:04:53.400211 2020] [ssl:error] [pid 5336:tid 47083476281088] [client 72.14.199.99:35593] AH01980: bad response from OCSP server: 503 Service Unavailable
[Fri May 15 09:04:53.400235 2020] [ssl:error] [pid 5336:tid 47083476281088] AH01941: stapling_renew_response: responder error
I do see some other issues for the same tid (different pid and client IP) relating to modsec rules being triggered. Not sure if this is somehow related.
 

porplemontage

Member
Apr 6, 2013
5
3
53
cPanel Access Level
Root Administrator
I added these to pre_virtualhost_global.conf:
Code:
SSLStaplingResponderTimeout 12
SSLStaplingStandardCacheTimeout 172800
Increase the timeout for each check to 12 seconds, and only check every two days instead of every hour. My Apache restarts each night so the idea is it only actually checks once per day, right after the restart at like 3am, so I'm not getting those stapling errors in the afternoon anymore.
 

rinkleton

Well-Known Member
Jul 16, 2015
116
6
68
Cleveland
cPanel Access Level
Root Administrator
I tried to update the timeout to 11, it didn't help. I doubt going to 12 would make the difference. I'd rather find out the root cause then just force the errors to happen at night. I can't imagine this is a normal way stapling is meant to work.

It seems like if SSLStaplingResponderTimeout is hit, it would just continue on like if Stapling was disabled. The clients would see a slight lag, but in the end would work. But I'm no Stapling expert.
 

cPanelLauren

Product Owner II
Staff member
Nov 14, 2017
13,266
1,301
363
Houston
There are quite a few threads on this as well, but I would want to still see the output of the requested commands.



We also have a tutorial on addressing these here: Tutorial - How to address OCSP responder errors
 

rinkleton

Well-Known Member
Jul 16, 2015
116
6
68
Cleveland
cPanel Access Level
Root Administrator
Sites just went down again. I can't say for sure the issue is OCSP or even SSLs in general, but there is a strong correlation. I've tried most of the suggestions in the threads and tutorial you've linked to such as pinging the CAs when the sites go down, but nothing fails like I would expect. I'm turning off stapling for the weekend so I'll see how that goes.
 
  • Like
Reactions: cPanelLauren

rinkleton

Well-Known Member
Jul 16, 2015
116
6
68
Cleveland
cPanel Access Level
Root Administrator
To update this. I had left it off for a few weeks and there were 0 issues. I enabled it last friday and now there are issues again. So it's 100% stapling. But it's hard to pin down where the issue lies.
 

rinkleton

Well-Known Member
Jul 16, 2015
116
6
68
Cleveland
cPanel Access Level
Root Administrator
The problem still happens every few weeks. It only affects some users. Restarting Apache does not fix the issue. However, turning off stapling then restarting apache then re-enabling stapling and restarting Apache again fixes it.

I realize this might not be a cPanel issue directly, but any ideas if it's an Apache issue, OS issue, browser issue, ISP issue, OCSP issue?