SOLVED CPANEL-40507 - WHM/CPANEL Broke two sites on my server (that I know of) out of the blue today.

JoseDieguez

Well-Known Member
PartnerNOC
Jan 26, 2016
57
31
68
Chile
cPanel Access Level
Root Administrator
i can confirm this problem persists on latest cpanel version.

it's hard to believe we have been giving explanations to our clients as of why their websites just dissapears...
 

cPRex

Jurassic Moderator
Staff member
Oct 19, 2014
12,481
1,966
363
cPanel Access Level
Root Administrator
UPDATE

I want to start by acknowledging your frustration with the time that it has taken to resolve this ongoing issue, and assure you that we are making every effort to resolve this issue as soon as possible. I'm hoping this explanation will show why this took longer to resolve than a typical high-priority case.

The process that we use to update the Apache configuration on a cPanel has gotten quite complex through the years, and there are now several different subsystems of code that come into play. To name a few:

1. The domain/account creation logic
2. The task queue (queueprocd) that we use to asynchronously perform tasks
3. The file locking logic that we use ensure that multiple processes do not attempt to read/write to httpd.conf at the same time
4. The logic to surgically insert VirtualHost's into the apache configuration
5. The logic to restart apache after the configuration has been built

Unfortunately, we have had difficulty replicating this issue in even a semi-infrequent capacity in our internal testing infrastrure, which has made it far more difficult for us to track the root cause of the issue.

After looking at the data from affected servers, we were able to identify that this was occurring when a new domain/account was added to the server, and that the issue itself was a race condition that occurred from asynchronous tasks that are offloaded to queueprocd during domain/account creation.

The first attempt to resolve this issue (102.0.12) did three things:

1. It updated queueprocd's logic to force certain processes to perform sequentially in order to remove the chances of the race condition occurring
2. It updated the domain/account creation logic so that the certain processes were offloaded to queueprocd faster
3. It added some debug logging to help us analyze the issue further in the event that the first two changes did not resolve the issue

The second attempt to resolve this issue (102.0.14) was a targeted change that added logic to rebuild the userdata in the event that it was missing (the race condition occurred). Due to the complexity of the logic that we use to surgically insert VirtualHost's into the apache configuration, it turned out that this code path was not reached every single time that a new domain/account is created on a server.

The third attempt to resolve this issue, which will be released in 102.0.15, does two things:

1. It resolves a specific race condition that was found in the logic we use to update the userdata during domain and account creation. This race condition was identified as a direct result of the additional debug logging that we added during our first attempt to resolve this issue.
2. In the event the issue is not completely resolved with the 102.0.15 update, although we do hope this takes care of things, we've added additional debug logging to one of the code path's that is taken when new domains are added to the server to help us analyze this issue further.

We plan to release 102.0.15 soon after it goes through final testing on our end.
 

JoseDieguez

Well-Known Member
PartnerNOC
Jan 26, 2016
57
31
68
Chile
cPanel Access Level
Root Administrator
UPDATE


1. It resolves a specific race condition that was found in the logic we use to update the userdata during domain and account creation. This race condition was identified as a direct result of the additional debug logging that we added during our first attempt to resolve this issue.


We plan to release 102.0.15 soon after it goes through final testing on our end.
i just want to comment, that we have reported 2-3 cases at least (but have seen many many more, it's just that we can't keep clients waiting), and i'm pretty sure, that those were not domain-account creation cases.

even on this thread, there are comments of accounts that failed to create-renew SSL (because of AutoSSL-Sectigo issues) that causes the issues. (or i'm mixing stuff, because i'm following a few threads on this forums)

But, personally saw a case that an account that failed renewing the SSL (because of autossl-sectigo issue), and after the SSL expired, and we forced (manually running autossl few times) getting the SSL installed, this bug showed up.
 

azadhussnain

Well-Known Member
May 28, 2020
62
0
6
India
cPanel Access Level
Root Administrator
Hello,

Frequently every two three day customers are reporting me that their website is redirecting to default web page, i thought it is an SSL issue so i gone to cPanel -> Lets Encrypt SSL and tried to issue SSL but i got this error : error code 403 urn:ieft:params:acme:error:unauthorized

I tried to reinstall apache and it worked but i got another solution also just remove all txt records and then reissue certificate. but reinstalling apache and removing txt records is a temporary solution bcz after few day an another customer will come and say about this issue.

I was looking about a permanent solution for this issue. is it fixed in 0.15 ?
 

JoseDieguez

Well-Known Member
PartnerNOC
Jan 26, 2016
57
31
68
Chile
cPanel Access Level
Root Administrator
While it's still early in the day here, we're seeing MUCH better results with 0.15 at this point.
We just got a case, but for a very big website, that we couldn't have on that state before fixing it for them.

the server is 0.15

is it possible for the "issue" happens on 0.14 version... later update the server to 0.15 version, and the issue remains on that account?

just trying to imagine a situation to justify the issue on the 0.15 .
 

ServerAdminSD

Registered
May 11, 2022
3
0
1
Leiden
cPanel Access Level
Root Administrator
@JoseDieguez - yes, if the site was in a broken state in .14 it wouldn't have been fixed as part of the update to .15, but would have needed to be manually adjusted, so that's totally possible.
We are running on .15 as well and are still experiencing issues. In your comment you mention "would have needed to be manually adjusted". Can you describe this a little bit more or post an example of how to do this or will "/scripts/rebuildhttpdconf && /scripts/restartsrv_apache" solve the problem?

Ps. If you need an example of a server that is experiencing issues right now to do some testing or analyse logs please send me a DM. On this server the issue occurred today and we decided to not change anyting so we have a testcase to analyse.
 
Last edited:

Steini Petur

Well-Known Member
Apr 24, 2016
99
25
68
Iceland
cPanel Access Level
Root Administrator
Same issue on our server,

Posted today at 14:07
This website has been down too for the same SSL problem I think.
REDACTED THE DOMAIN


Server is

Operating SystemCloudLinux v6.10.0
ProductcPanel & WHM v102.0.15 (STANDARD)


Ran the rebuildhttpdconf and fixed it for him, he thought it was an SSL problem but it turned out that it was "default page" issue which usually is because of the vhost record, soI can confirm that even on 102.0.15 we're getting it
 
  • Like
Reactions: cPRex

Steini Petur

Well-Known Member
Apr 24, 2016
99
25
68
Iceland
cPanel Access Level
Root Administrator
WHM 102.0.15. Still loses SSL domains. rebuildhttpdconf fixes it.
Yeah this is happening so much with us now our board is just flaming with "SSL" questions" and it's always the same, it's the rebuildhttpdconf, at this point you wonder if you should just crontab the rebuildhttpdconf every 30 minutes.

I attached our board, just after isolating anything mentioning "SSL" and its never "help installing SSL" its just "SSL is broken"

Got to say this is getting a bit annoying after the v102..

PS: Be reminded its just those with SSL in title or body, some are just "HELP HELP MY WEBSITE IS DOWN!" no one mentions SSL we just fix it.. please cPanel figure this out soon..
 

Attachments

azadhussnain

Well-Known Member
May 28, 2020
62
0
6
India
cPanel Access Level
Root Administrator
In 102.0.14 there is a issue, i installed FleetSSL after 1-2 days i am unable to see Lets Encrypt SSL option in cPanel, i tried running yum -y install letsencrypt-cpanel it gives that it is already installed. Then i have removed letsencrypt-cpanel and installed it again and it fixed the issue but it also happens frequently