DNS Cluster "Could not communicate with remote API server"

thowden

Well-Known Member
May 17, 2013
56
5
58
cPanel Access Level
Root Administrator
Hi All

I have a DNS Cluster of 3 DNS Only servers with 6 Web servers using Write-Only connections to each of the DNS Only servers.

I have had an ongoing issue when trying to manually 'Synchronize DNS Records' to the cluster and errors showing with one or more of the DNS servers. this seems to occur with all / any of the web servers but without any discernible pattern. Like it could be any of the webservers showing any of the name servers as 'unknown', but never all of the name servers, there is always at least 1 name server showing ok.

1595371161952.png
If I select another page in the WHM console and return then the error message is gone and all 3 of the DNS servers are listed as connected and ok.

I am just not clear if this should be expected when selecting the manual Synchronize DNS Records from the WHM menu or if I actually have an API or Token issue ?

Thoughts ?

Thanks.
 
Last edited:

cPAdminsMichael

Well-Known Member
Dec 19, 2016
182
65
103
Denmark
cPanel Access Level
Root Administrator
Hmm... sounds odd. Are you running NAT or your servers on different VLANs?
 

cPAdminsMichael

Well-Known Member
Dec 19, 2016
182
65
103
Denmark
cPanel Access Level
Root Administrator
Hm - I don't have a good explanation to your issue. It's a bit odd that it's random - else I would say that it's network or firewall related..
I suggest you open a support ticket (If you haven't already done).
 

andrew.n

Well-Known Member
Jun 9, 2020
626
181
43
EU
cPanel Access Level
Root Administrator
As far as I know the APIs uses the cPanel default ports to communicate (2083,2086 etc...). Are you able to telnet to these ports from one server to another? Do you have cpHulk or hosts access restriction in place? These could explain the connection issues you see.
 

cPanelLauren

Product Owner
Staff member
Nov 14, 2017
13,296
1,271
313
Houston
Can you please open a ticket using the link in my signature? Once open please reply with the Ticket ID here so that we can update this thread with the resolution once the ticket is resolved.


Thanks!
 

thowden

Well-Known Member
May 17, 2013
56
5
58
cPanel Access Level
Root Administrator
As far as I know the APIs uses the cPanel default ports to communicate (2083,2086 etc...). Are you able to telnet to these ports from one server to another? Do you have cpHulk or hosts access restriction in place? These could explain the connection issues you see.
Hi Andrew

Thanks for the input. I'd expect that if it were a port or firewall issue then it would be either open or closed and not intermittent. I can trigger the sync, get the error, check the dns cluster page to see an outage, screenshot it, refresh it, and have all servers connected. It looks more like a timing issue.
 

blue928

Member
Feb 7, 2021
5
1
1
United States
cPanel Access Level
Root Administrator
I'm having this issue as well. I followed the instructions on setting up a separate DNS Only server and added it to the cluster. Everything works fine, and I got a greenlight for the first server just like the OP did in the above screenshot. I used Terraform to setup my first server, and I used the same Terraform plan to spin up an exact replica of the server, and I then configured its hostname and IP address.


When I added the server I got an intermediate page that said that everything was working fine - that synchronization was good and reverse trust was good. Then when I got back to the clusters page, I get the same error for the second server.

I rebuilt the server manually just in case my Terraform plan did something wonky. No go - no matter what I do, this always displays that error. I can telnet to and from each server I can access all recommended ports per the documentation.

Just like the OP the only difference between these two servers is that they are in different geographically distributed datacenters. I have not tested if that could be the case, but then again, that would defeat the purpose of having reliable DNS servers if they were in the same data center.

I see that a support ticket was submitted from above. Was this issue resolved? Can you post the results or how to resolve if so?

Thanks!
 

andrewmoras

Active Member
Feb 6, 2021
35
23
8
Remote
cPanel Access Level
DataCenter Provider
I'm having this issue as well. I followed the instructions on setting up a separate DNS Only server and added it to the cluster. Everything works fine, and I got a greenlight for the first server just like the OP did in the above screenshot.
I'm seeing the same problem on two nameservers that recently got upgraded to cPanel v94 DNS ONLY. Whenever I "edit" one server I get:

DNS Cluster Management
The Trust Relationship has been established.
The remote server, ns1.domain.com, is running WHM version: 10.0.0
The new role for IP ADDRESS is sync. Return to Cluster Status


but when I go back to cluster status I see: "Could not communicate with remote API server."

Anyone else having the same issue?
 

cPRex

Jurassic Moderator
Staff member
Oct 19, 2014
7,418
1,000
313
cPanel Access Level
Root Administrator
@blue928 - in the ticket that was opened, we discovered intermittent network issues, although that customer did not write back saying what the official resolution was.

It sounds like you may be experiencing the following interface error, which you can ignore: cPanel

Can you check that and see if that is the case? The same would apply for @andrewmoras
 
  • Like
Reactions: andrewmoras

andrewmoras

Active Member
Feb 6, 2021
35
23
8
Remote
cPanel Access Level
DataCenter Provider
@blue928 - in the ticket that was opened, we discovered intermittent network issues, although that customer did not write back saying what the official resolution was.

It sounds like you may be experiencing the following interface error, which you can ignore: cPanel

Can you check that and see if that is the case? The same would apply for @andrewmoras
It seems like you're right, as always @cPRex. Looking forward to see this resolved :)

Thanks,
Andrew
 
  • Like
Reactions: cPRex

cPJustinD

Administrator
Staff member
Jan 12, 2021
286
51
103
Houston
cPanel Access Level
Root Administrator
Hello DoghouseAgency! That's certainly odd. If the issue is not intermittent, The originally reported issue should have been resolved; however, as you're still experiencing the issue with a build that's already had the fix applied, it would be best to open a support ticket so that our analysts can review the issue more thoroughly and determine what exactly is occurring. You can submit a support request using the "Submit a ticket" link in my signature below.

Please be sure to link this thread when opening the ticket and provide the ticket number here to track the issue properly. If our analysts help you resolve the issue, please be sure to post the resolution here as it may help other community members with similar issues.

I hope that this helps. If you have any other questions or concerns, please let us know!
 

HD-Sam

Active Member
PartnerNOC
Sep 23, 2003
42
0
156
Iowa City, Iowa
I'm running into a similar issue here. It's happening with one particular nameserver running cPanel DNSonly on old Dell hardware with XenServer/XCP. We even refreshed DNSonly on AlmaLinux 8, but no luck.. we get the same "Could not communicate with remote API server" error. Reverse trust was established. Public IPs, different geo zones. No VLANs. The DNS zones sync without issues. Telnet is fine between both.

One important note, it's the only VM sitting on the server. It was very slow when it was on CentOS 7, and it's very slow on AlmaLinux 8. No hardware issues reported by OMSA. We'll be replacing the server sometime next week and importing the VM. My gut is it's the server itself.

I'll post my findings then.
 

thowden

Well-Known Member
May 17, 2013
56
5
58
cPanel Access Level
Root Administrator
Hi All

For whats its worth, I am still experiencing this same issue. A year later with no resolution from my upstream service provider, who have supposedly engaged with CPanel support.

I am now in the habit of hitting up the Manual Synchronise DNS Servers process, getting the error message, refreshing the cluster links, once or twice or three times depending on the mood of the server(s). When I finally get a clear indication that the sync will work, syncing the DNS servers. 24 hours later, rinse and repeat the same actions.

Never happened with the old DNS server daemon. It is only since using PowerDNS. So it could be a WHM/CPanel issue introduced around the same time as PowerDNS was added as the preferred tool, or it could be co-incidence, but Rule 39: There is no such thing as a coincidence.

At the time of my original post, I could not find anyone with the same issue. Obviously, I am now not Robinson Crusoe and there is an underlying issue that has not been resolved and it is highly unlikely, across multiple ISP's, multiple IaaS providers, and multiple geographical locations, that we all have a network issue.

Over the year since reporting this my upstream support provided these gems in the ticket thread:

" We are still getting the API related errors on nameservers for your server, to resolve and fix the root cause of intermittent issues related to DNS synchronization. We will keep you posted about the same. "
"This behavior occurs due to the request for information from cluster members timing out; The timeout is 7 seconds but often it takes longer to read packages on a remote server. The screenshot provided in the document (initial reply) is currently running cPanel version 90. This issue was recently resolved in cPanel version 96. You can mostly ignore this error as the DNS cluster does continue to function even though the error appear or upgrade the cPanel version to 96 for DNS servers. "
At which point I upgraded to the non-stable .96 release and 24 hours later, I responded to the thread with:

"Our 3 servers were already upgraded to Cpanel 96 and it is not 'fixed'.

If you check two of our servers now you will see the error again on the Sync page. The other server is apparently not affected today but that is consistent in the random manner that this issue arises.

This is an ongoing issue that I cannot "mostly ignore" as it means that our DNS cluster is NOT WORKING from the perspective that if it is not sync'ing it is not working. The 3 DNS Only servers will continue to function but with out-of-date records.

If it is a time-out issue, it needs to be fixed. If it is a comms issue, it needs to be fixed.

Whatever it is it needs to be fixed and it is not fixed, and I cannot ignore it. "
That was back in May 2021. I have had nothing further from either my provider, nor CPanel.

YMMV

cheers
 

HD-Sam

Active Member
PartnerNOC
Sep 23, 2003
42
0
156
Iowa City, Iowa
Update: We swapped out the server today and all is working now.
We exported the VM from the old server, and imported exactly as it was on the new server. Identical VM, new hardware. Problem solved.

It must have been hardware or network related for us because the old server was running very slowly. SSH was slow to respond and it was the only VM on the server. Perhaps there is a new network timeout value for the dns cluster in a new cPanel update? The issue existed with both BIND and PowerDNS. Hardware swap fixed it all.
 

Host1no

Registered
Oct 2, 2010
1
0
51
I turns out this is caused by a 7sec timeout. If your dns server does not respond to the API call within 7sec it will time out. For some reason my cPanel DNS Only servers respond way slower than our full cPanel servers, even if they have less cpu use and memory pressure. Support created this KB thats being updated regarding this issue. cPanel is investigating it in CPANEL-38426 afaik.


You can check if this is the reason by doing a
Bash:
time whmapi1 installed_versions  | tail -0
and see if it uses more than 7sec to complete. You might want to run it a couple of times to see if the real times you get is representative.