SOLVED [CPANEL-31558] - DNS Clustering issue

Operating System & Version
CentOS 6.10, CentOS 7.7
cPanel & WHM Version
v84.0.21

Bdzzld

Well-Known Member
Apr 3, 2004
410
5
168
I've a setup of three servers:
  1. s1.dom.ext = Dedicated Server /w IP-address 1.1.1.1 running CentOS 6.10, cPanel/WHM v84.0.21
  2. s2.dom.ext = Dedicated Server /w IP-address 2.2.2.2 running CentPS 6.10, cPanel/WHM v84.0.21
  3. vps.dom.ext = VPS /w IP-address 3.3.3.3 running CentOS 7.7, DNSonly v84.0.21
The DNS cluster of these servers was running without any problems using the old method of accesshashes.
Due to a recent network migration, however, the DNS Cluster needed to be configured again (cPanel's "IP Migration Wizard" did not automatically change the IP-addresses, but that's another thing) using the new API keys. Unfortunately this did not go as smoothly as planned:

DNS Clustering for s1.dom.ext:
1580636195583.png
DNS Clustering for s2.dom.ext:
1580636283328.png
DNS Clustering for vps.dom.ext:
1580636349617.png
All API keys have been setup with only "DNS Clustering" access and "Trust relationship" enabled.

What can be the reason for this error?
Thanking you in advance.
 
Last edited:

cPanelLauren

Product Owner
Staff member
Nov 14, 2017
13,296
1,252
313
Houston
Hello,

We did have an issue previously where DNS Clustering was requiring the ALL ACL as opposed to the clustering ACL but this was reported to be resolved in v84.0. with CPANEL-29830

Is the issue resolved when you add the token with the ALL ACL?
 

Bdzzld

Well-Known Member
Apr 3, 2004
410
5
168
@cPanelLauren :
I created a new API token on server s1.dom.ext with ALL access (everything checked) and then removed the erroneous entry on server vps.dom.ext. Then I added s1.dom.ext back to the DNS Cluster on vps.dom.ext. Afterwards the DNS Cluster on server vps.dom.ext did not show any errors. However, the DNS Cluster on server s1.dom.ext now started to show the same error, so I also created a new API token on server vps.dom.ext with ALL access (everything checked) and then removed the erroneous entry on server s1.dom.ext. Then I added vps.dom.ext back to the DNS Cluster on s1.dom.ext. Afterwards the DNS Cluster on server s1.dom.ext did not show any errors. However, the DNS Cluster on server vps.dom.ext started to show the same error again and we're back to where we started!
 

jndawson

Well-Known Member
Aug 27, 2014
289
31
78
Western US
cPanel Access Level
DataCenter Provider
We are running into the exact same thing. We have 5 WHM servers and one DNSOnly server running v.86.0.3. They were running just fine for several years. We needed to upgrade to CentOS 7 on one WHM server, so built a new server, decommissioned the old one, and are running into the same "API key used has insufficient ACLs. The clustering ACL is required" error. We were using the access hash, but can't get clustering to work.

We cleared out all of the tokens and restarted using the new, 'but experimental', API management tool. We created new tokens with only DNS clustering assigned, and for a brief few minutes, every server was clustering. Then, all of the API clustered servers started reporting the same error. We painstakingly repeated the process to be sure we weren't missing anything. Same results.

What is going on? What do we do to fix this?
 

Bdzzld

Well-Known Member
Apr 3, 2004
410
5
168
@jndawson : We've been waiting for a solution from cPanel for this bug to be resolved for the last ten(!) days.
We did also remove the original accesshash to try to get it back to work too (which also did not work for us).
 

cPanelLauren

Product Owner
Staff member
Nov 14, 2017
13,296
1,252
313
Houston
@cPanelLauren :
I created a new API token on server s1.dom.ext with ALL access (everything checked) and then removed the erroneous entry on server vps.dom.ext. Then I added s1.dom.ext back to the DNS Cluster on vps.dom.ext. Afterwards the DNS Cluster on server vps.dom.ext did not show any errors. However, the DNS Cluster on server s1.dom.ext now started to show the same error, so I also created a new API token on server vps.dom.ext with ALL access (everything checked) and then removed the erroneous entry on server s1.dom.ext. Then I added vps.dom.ext back to the DNS Cluster on s1.dom.ext. Afterwards the DNS Cluster on server s1.dom.ext did not show any errors. However, the DNS Cluster on server vps.dom.ext started to show the same error again and we're back to where we started!
So, to clarify, when you set the ACL for all servers in the cluster with ALL - does the issue persist? If so I'd suggest that you open a ticket with us. In any event where you are actively "waiting" for a fix or the issue is an emergency you should open a ticket where you can be assisted 24/7.

I found a few cases that related to this, one of which related to Reverse Trust but the errors are different and while I'd like to lean toward CPANEL-29163 (setting up reverse trust in clustering requires an accesshash to be set up) being the resolution, it would require updating to v86 which is in CURRENT right now, and I would prefer to be certain before recommending that.
 

jndawson

Well-Known Member
Aug 27, 2014
289
31
78
Western US
cPanel Access Level
DataCenter Provider
[SNIP]
I found a few cases that related to this, one of which related to Reverse Trust but the errors are different and while I'd like to lean toward CPANEL-29163 (setting up reverse trust in clustering requires an accesshash to be set up) being the resolution, it would require updating to v86 which is in CURRENT right now, and I would prefer to be certain before recommending that.
We're using v86 and having the same issues.
 

jndawson

Well-Known Member
Aug 27, 2014
289
31
78
Western US
cPanel Access Level
DataCenter Provider
Quick update: Cleared all api tokens and reverse trust tokens, created new api tokens with 'everything' selected, reconstructed the dns clustering, and within 2 minutes it was all broken again.
 

jndawson

Well-Known Member
Aug 27, 2014
289
31
78
Western US
cPanel Access Level
DataCenter Provider
Another update: Techs have determined that the API tokens are working and that clustering is working. However, they're unsure why the errors persist, and they've bumped it to a LIII tech.
 

jndawson

Well-Known Member
Aug 27, 2014
289
31
78
Western US
cPanel Access Level
DataCenter Provider
More update: The techs duplicated the issue in the test environment. They tried disenabling the reverse trust option on all servers and the error went away, and clustering is still working. However, the cause is still under investigation.
 

cPanelLauren

Product Owner
Staff member
Nov 14, 2017
13,296
1,252
313
Houston
@jndawson and all

It looks like this will most likely result in a new case, I can't be 100% sure on that but based on the investigation, it seems to be leaning that way.

I spoke to @cPanelHB who is currently working the ticket and he mentioned he will also respond here with his findings when he has something of value to provide you guys.
 

cPanelHB

Technical Analyst
Staff member
Sep 6, 2018
42
7
83
Houston
cPanel Access Level
Root Administrator
Hello,

Thank you all for your patience.

There are two principal reasons why this issue occurs, both of which have to do with the autoconfiguration of Reverse Trust.

In version 84, the autoconfiguration of reverse trust can fail on multi-server clusters due to an API token naming collision. This causes authentication errors, and is fixed in version 86:
  • Fixed case CPANEL-30569: Fix issue where API keys were overwritten for other machines in a DNS cluster when automatic reverse trust setup was selected.
However, there is a second issue which is not fixed in 86, having to do with the formatting of the files used to store cluster authentication credentials. On such servers, the credentials work, but the interface erroneously claims that the token may not have the Clustering privilege, or that the remote server may require a cPanel update.

I have filed a new internal case CPANEL-31558 regarding the second issue. When it is resolved, the case number will be referenced on our changelogs at go.cpanel.net/changelogs

In the meantime, try the following as a workaround:

On the server with a clustering error displayed in WHM » DNS Clustering, click the "Edit" button in the DNS Clustering interface. Ensure that "Setup Reverse Trust Relationship" is un-checked. Make no changes and click save. Refresh the clustering interface and see if the issue is resolved.

In the example screenshots in the original post, this would mean logging into vps.dom.ext and clicking the "Edit" button for the 1.1.1.1 server in the DNS Cluster UI.

If the error is not resolved (or if you get a 403 error), manually generate an API token with the DNS Clustering ACL on the remote system, and when editing the cluster member, use the new token. Ensure that "Setup Reverse Trust Relationship" is un-checked.

Until case CPANEL-31558 is resolved, any time you are setting up or changing DNS clustering, make sure that the "Setup Reverse Trust Relationship" box is always un-checked.
 

jndawson

Well-Known Member
Aug 27, 2014
289
31
78
Western US
cPanel Access Level
DataCenter Provider
My update: The cPanel techs working on this issue were great, and they kept us informed every step of the way. Our DNS cluster is working without the reverse trust enabled. Hopefully, they'll have a fix soon.

Kudos to Hans and his team!
 
  • Like
Reactions: cPanelLauren

cPanelHB

Technical Analyst
Staff member
Sep 6, 2018
42
7
83
Houston
cPanel Access Level
Root Administrator
Our DNS cluster is working without the reverse trust enabled.
To clarify on this a little -- your servers do have reverse trust configured, but it was done manually by you.

It's like this:

1. You connected server A to server B. When choosing "setup reverse trust", it also tries to connect server B to server A.
2. You connected server B to server A. When choosing "setup reverse trust", it tries to get server A to connect to server B, but you had already done that. When it auto-configures reverse trust, it messed with what you did in step 1.

By un-checking the "setup reverse trust" check-box and doing both steps, you are manually configuring "reverse trust" in the sense that each server can connect to the other.

You're not currently missing out on anything -- everything on your cluster is configured. It's just a bit of a trick trying to get it set up in the first place.
 

Bdzzld

Well-Known Member
Apr 3, 2004
410
5
168
An update from my part (as I'm the original poster): After all servers were automatically upgraded to 11.86.0.8, I edited the entry for server s1.dom.ext (1.1.1.1) in the DNS Cluster GUI of the DNSonly VPS vps.dom.ext and saved it again (Reverse Trust was already disabled). This seems to have solved the problem as all DNS Cluster GUIs no longer show an error.

A question: Is this a BIND-only related error or does this problem also occur in a PowerDNS setup? I.a.w. when upgrading from BIND to PowerDNS will this problem occur too? In that case I guess it may be better to postpone such an upgrade till [CPANEL-31558] is solved.
 

cPanelHB

Technical Analyst
Staff member
Sep 6, 2018
42
7
83
Houston
cPanel Access Level
Root Administrator
Hello,

Thank you for getting back to us.

Is this a BIND-only related error or does this problem also occur in a PowerDNS setup?
The issue is unrelated to the nameserver software in use. It happens for both BIND and PowerDNS.

In that case I guess it may be better to postpone such an upgrade till [CPANEL-31558] is solved.
You should be able to switch between BIND and PowerDNS without causing issues.

and saved it again (Reverse Trust was already disabled).
I just want to comment that the checkbox for "Setup Reverse Trust Relationship" isn't an indicator for the current state of the reverse trust relationship. It's asking if you want it to try to set it up again, regardless of whether or not there is a pre-existing reverse trust relationship.

So, similar to jndawson, I suspect you already had a working reverse trust relationship, and still have one even now.

Let us know if you have any further questions, issues, or concerns.