DNS cluster issues - BIND not reloading

xanthi

Member
Oct 17, 2010
6
0
51
Hi,

I'm having issues with DNS clustering. I currently have 3 servers in the cluster.

When I update or add a zone on server A, the changes are reflected in servers B and C (checked by viewing the DNS zone), but BIND is only reloaded on server A.

Adding or editing a zone on either server B or server C again reflects the changes in the other two servers but BIND is only reloaded on server A (regardless of which server the change is made on).

So, it seems the clustering is working but for some reason BIND is not reloading on servers B or C. I can restart named manually on B and C using
Code:
service named restart
and BIND restarts correctly with no errors are displayed, and with the updated zones.

I have checked the logs using
Code:
tail -f /var/log/messages
when restarting BIND manually and there's nothing unusual there.

I have tried running
Code:
/scripts/upcp
I have checked that none of the following exist:
Code:
/etc/binddisable
/etc/nameddisable
/etc/dnsdisable
and all of the info in the files in the following is correct:
Code:
/var/cpanel/cluster/root/config
Is there anything I've missed? What else could cause this problem?

Thanks
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
43
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
Please check if you have anything set in WHM > Tweak Settings for this option:

BIND deferred restart time [?]

Time (in seconds) before dnsadmin will wait before restarting BIND. Additional restart requests during this time period will be silently discarded. On systems that process very frequent DNS updates a setting of 300 or 600 seconds is recommended. On systems with few DNS changes, the default setting of 0 is recommended. Note that DNS changes will not take effect until the restart is complete.
Also, you can grep that value in /var/cpanel/cpanel.config file:

Code:
grep bind /var/cpanel/cpanel.config
Any value higher than 0 means that it is deferred for that amount of time to restart bind.
 

xanthi

Member
Oct 17, 2010
6
0
51
Thanks for the reply - all servers have the default value of 0 seconds for that setting...
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
43
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
Could you turn on more verbose logging and create a zone to see the results in WHM > Tweak Settings:

Log dnsadmin requests [?]

Log dnsadmin requests to /usr/local/cpanel/logs/dnsadmin.log
At that point, try to test creating or changing a zone file to see the results. I would also suggest tailing the error log for cPanel when doing that:

Code:
tail -fn0 /usr/local/cpanel/logs/error_log
You might also want to check /var/log/messages and /var/named/data/named.run file (this one might not exist for logging, though).
 

xanthi

Member
Oct 17, 2010
6
0
51
I've enabled logging and more verbose logging.

The problem is not as straight forward as I initially thought:
Adding a new zone is added correctly, and BIND is reloaded on all servers. Editing this new zone also works correctly. So it seems the problem is actually BIND not reloading when editing the zone files for certain existing domains.

Editing one of the zones which does not work correctly outputs the following:

Code:
[email protected] [~]# tail -fn0 /usr/local/cpanel/logs/error_log
[2011-04-22 22:33:12 +0000] warn [dnsadmin-ssl] Could not read from /var/named/co.uk.db on omega at whostmgr/bin/dnsadmin-ssl line 1604
        main::getzonelocal() called at whostmgr/bin/dnsadmin-ssl line 539
        main::local_action_handler('GETZONE') called at whostmgr/bin/dnsadmin-ssl line 341

Editing a (broken) zone outputs nothing to /var/log/messages, and editing a new (working) zone outputs

Code:
Apr 22 22:43:12 omega named[21714]: zone domain.com/IN/external: loaded serial 2011042002
Apr 22 22:43:12 omega named[21714]: zone domain.com/IN/external: sending notifies (serial 2011042002)
Apr 22 22:43:12 omega named[21714]: zone domain.com/IN/internal: loaded serial 2011042002
Apr 22 22:43:12 omega named[21714]: zone domain.com/IN/internal: sending notifies (serial 2011042002)
Does this mean that it is an issue with individual .db zone files?
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
43
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
This appears to mean there's an issue with certain zone files. Can you check the permissions on those files?

Code:
ls -lah /var/named/domain.com.db
 

xanthi

Member
Oct 17, 2010
6
0
51
All zone files (*.db) are owned by named:named. Some have permissions
-rw-r--r--
and others have permissions
-rw-r-----

however I have tested this and there is no correlation between these permissions and the zones that do/don't work correctly. Editing a zone in WHM changes those zone files with permissions -rw-r----- to -rw-r--r--, and broken zones remain broken and working zones remain working.
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
43
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
Can you please open a ticket about this issue and provide an example of a non-working zone and a working zone in the ticket? The method to submit a ticket would be using either WHM > Support Center > Contact cPanel or using the link in my signature.
 

JeffP.

Well-Known Member
Sep 28, 2010
164
15
68
This issue has since been resolved :)

The problem is that the /etc/named.conf.cache file(s) on 1 or more of the servers was corrupt. The solution was quick & easy:

Code:
# cd /etc
# mv named.conf.cache named.conf.cache.old
I just renamed /etc/named.conf.cache on all cluster members, and the issue was resolved. named.conf.cache was successfully and automatically regenerated.
 
  • Like
Reactions: IndicHosts.net

RACKSET

Active Member
Apr 28, 2006
44
0
156
localhost
For the record, there was the same issue on a server, and it was perm issue on /var/named/named.* which was root.root
I changed that by chgrp named /var/named/named.* and restarted named. It resolved. I'm not sure but cpanel should check this if can cause the issue and fix it on cron upcp.
 

Peer1_Stuart

Registered
Jul 20, 2012
1
1
3
cPanel Access Level
DataCenter Provider
Recently experienced the same issue and it was indeed resolved by removing the "named.conf.cache" and restarting named.
It might be worth mentioning that this file does not appear to be re-created until you attempt to add, edit or delete a DNS zone.
 
  • Like
Reactions: IndicHosts.net