Exim problems after CURRENT update???

teakwood

Active Member
Jul 8, 2006
26
0
151
Sydney
Two days in a row I have found mail backed up for several hours in the exim queue on one VPS. This followed a cPanel update.

Another similar VPS did not exhibit the problem, but that machine was having a problem with upcp not working. Today I got upcp to work manually, and it received the update - now its mail queue is clogging too. In both cases, this behaviour has not been evident in the past.

When I try to manually deliver messages ("Deliver now" in the WHM queue manager) I get "connection refused", like this:

Message 1GQJPc-0000ID-KW is not frozen
delivering 1GQJPc-0000ID-KW
Connecting to mailwash33.pair.com [66.39.2.33]:25 ... failed: Connection refused
LOG: MAIN
mailwash33.pair.com [66.39.2.33]: Connection refused
LOG: MAIN
== [email protected] R=lookuphost T=remote_smtp defer (111): Connection refused

Curiously, manual experimentation has shown me restarting exim (and various other things) doesn't fix the problem, but restarting APF does fix it, immediately.

However, it's looming as a major problem because I have to log in and check the mail queue constantly or people end up with 5-hour delays on their email.

And I'm guessing this is going to be a problem for other people as well when they get this update.

Both these systems are set to auto-update cPanel to CURRENT. Can I reset to RELEASE without breaking anything?
 

raventec

Well-Known Member
Apr 19, 2003
116
0
166
I'm not going to promise anything but I've usually had pretty good downgrading success using /scripts/upcp --force after setting the version to RELEASE (I tend to run CURRENT)
Have you tried reinstalling exim? Sometimes that helps wierd problems.
 

teakwood

Active Member
Jul 8, 2006
26
0
151
Sydney
Thanks raventec. I'm sure going to have to give it a try, no good being the only one using my supplier on CURRENT, I get all the problems they don't know about. :(

If anyone from cPanel cares, my mail queue was stuck again this morning. I have now tied the failure to the running of /scripts/upcp - no mail goes out from that point, including the message from the upcp cron job.
 

chris74108

Well-Known Member
Apr 30, 2004
86
0
156
teakwood said:
Thanks raventec. I'm sure going to have to give it a try, no good being the only one using my supplier on CURRENT, I get all the problems they don't know about. :(

If anyone from cPanel cares, my mail queue was stuck again this morning. I have now tied the failure to the running of /scripts/upcp - no mail goes out from that point, including the message from the upcp cron job.
Had you checked your mail logs to see whats going on?
 

teakwood

Active Member
Jul 8, 2006
26
0
151
Sydney
A whole series of "Connection refused" like when I try to deliver manually, plus several "retry time not reached for any host".
 

teakwood

Active Member
Jul 8, 2006
26
0
151
Sydney
Would have to wait until it's backed up again to answer, and I think I'll be putting in an APF restart after upcp as a workaround, so I might never know. :)
 

hexstar

Active Member
Jun 23, 2004
35
0
156
Internet
odd...dnsreport.com shows no issues for that domain...

DNS Report for xxxx.com
Generated by www.DNSreport.com at 02:30:47 GMT on 22 Sep 2006.
Category Status Test Name Information
Parent PASS Missing Direct Parent check OK. Your direct parent zone exists, which is good. Some domains (usually third or fourth level domains, such as example.co.us) do not have a direct parent zone ('co.us' in this example), which is legal but can cause confusion.
INFO NS records at parent servers Your NS records at the parent servers are:

ns1.viceinternet.com. [66.98.204.10] [TTL=172800] [US]
ns2.viceinternet.com. [66.98.205.77] [TTL=172800] [US]

[These were obtained from f.gtld-servers.net]
PASS Parent nameservers have your nameservers listed OK. When someone uses DNS to look up your domain, the first step (if it doesn't already know about your domain) is to go to the parent servers. If you aren't listed there, you can't be found. But you are listed there.
PASS Glue at parent nameservers OK. The parent servers have glue for your nameservers. That means they send out the IP address of your nameservers, as well as their host names.
PASS DNS servers have A records OK. All your DNS servers either have A records at the zone parent servers, or do not need them (if the DNS servers are on other TLDs). A records are required for your hostnames to ensure that other DNS servers can reach your DNS servers. Note that there will be problems if your DNS servers do not have these same A records.
NS INFO NS records at your nameservers Your NS records at your nameservers are:

ns1.viceinternet.com. [66.98.204.10] [TTL=14400]
ns2.viceinternet.com. [66.98.205.77] [TTL=14400]

FAIL Open DNS servers ERROR: One or more of your nameservers reports that it is an open DNS server. This usually means that anyone in the world can query it for domains it is not authoritative for (it is possible that the DNS server advertises that it does recursive lookups when it does not, but that shouldn't happen). This can cause an excessive load on your DNS server. Also, it is strongly discouraged to have a DNS server be both authoritative for your domain and be recursive (even if it is not open), due to the potential for cache poisoning (with no recursion, there is no cache, and it is impossible to poison it). Also, the bad guys could use your DNS server as part of an attack, by forging their IP address. Problem record(s) are:

Server 66.98.204.10 reports that it will do recursive lookups. [test]
Server 66.98.205.77 reports that it will do recursive lookups. [test]


See this page for info on closing open DNS servers.
PASS Mismatched glue OK. The DNS report did not detect any discrepancies between the glue provided by the parent servers and that provided by your authoritative DNS servers.
PASS No NS A records at nameservers OK. Your nameservers do include corresponding A records when asked for your NS records. This ensures that your DNS servers know the A records corresponding to all your NS records.
PASS All nameservers report identical NS records OK. The NS records at all your nameservers are identical.
PASS All nameservers respond OK. All of your nameservers listed at the parent nameservers responded.
PASS Nameserver name validity OK. All of the NS records that your nameservers report seem valid (no IPs or partial domain names).
PASS Number of nameservers OK. You have 2 nameservers. You must have at least 2 nameservers (RFC2182 section 5 recommends at least 3 nameservers), and preferably no more than 7.
PASS Lame nameservers OK. All the nameservers listed at the parent servers answer authoritatively for your domain.
PASS Missing (stealth) nameservers OK. All 2 of your nameservers (as reported by your nameservers) are also listed at the parent servers.
PASS Missing nameservers 2 OK. All of the nameservers listed at the parent nameservers are also listed as NS records at your nameservers.
PASS No CNAMEs for domain OK. There are no CNAMEs for xxxx.com. RFC1912 2.4 and RFC2181 10.3 state that there should be no CNAMEs if an NS (or any other) record is present.
PASS No NSs with CNAMEs OK. There are no CNAMEs for your NS records. RFC1912 2.4 and RFC2181 10.3 state that there should be no CNAMEs if an NS (or any other) record is present.
PASS Nameservers on separate class C's OK. You have nameservers on different Class C (technically, /24) IP ranges. You must have nameservers at geographically and topologically dispersed locations. RFC2182 3.1 goes into more detail about secondary nameserver location.
PASS All NS IPs public OK. All of your NS records appear to use public IPs. If there were any private IPs, they would not be reachable, causing DNS delays.
PASS TCP Allowed OK. All your DNS servers allow TCP connections. Although rarely used, TCP connections are occasionally used instead of UDP connections. When firewalls block the TCP DNS connections, it can cause hard-to-diagnose problems.
FAIL Single Point of Failure ERROR: Although you have at least 2 NS records, they both point to the same server, resulting in a single point of failure. You are required to have at least 2 nameservers per RFC 1035 section 2.2.
INFO Nameservers versions Your nameservers have the following versions:

66.98.204.10: "9.2.4"
66.98.205.77: "9.2.4"
PASS Stealth NS record leakage Your DNS servers do not leak any stealth NS records (if any) in non-NS requests.
SOA INFO SOA record Your SOA record [TTL=14400] is:

Primary nameserver: ns1.viceinternet.com.
Hostmaster E-mail address: admin.viceinternet.com.
Serial #: 2004012101
 

hexstar

Active Member
Jun 23, 2004
35
0
156
Internet
Refresh: 28800
Retry: 7200
Expire: 3600000
Default TTL: 86400

PASS NS agreement on SOA serial # OK. All your nameservers agree that your SOA serial number is 2004012101. That means that all your nameservers are using the same data (unless you have different sets of data with the same serial number, which would be very bad)! Note that the DNS Report only checks the NS records listed at the parent servers (not any stealth servers).
PASS SOA MNAME Check OK. Your SOA (Start of Authority) record states that your master (primary) name server is: ns1.viceinternet.com.. That server is listed at the parent servers, which is correct.

PASS SOA RNAME Check OK. Your SOA (Start of Authority) record states that your DNS contact E-mail address is: [email protected]. (techie note: we have changed the initial '.' to an '@' for display purposes).
PASS SOA Serial Number OK. Your SOA serial number is: 2004012101. This appears to be in the recommended format of YYYYMMDDnn, where 'nn' is the revision. So this indicates that your DNS was last updated on 21 Jan 2004 (and was revision #1). This number must be incremented every time you make a DNS change.
PASS SOA REFRESH value OK. Your SOA REFRESH interval is : 28800 seconds. This seems normal (about 3600-7200 seconds is good if not using DNS NOTIFY; RFC1912 2.2 recommends a value between 1200 to 43200 seconds (20 minutes to 12 hours)). This value determines how often secondary/slave nameservers check with the master for updates.
PASS SOA RETRY value OK. Your SOA RETRY interval is : 7200 seconds. This seems normal (about 120-7200 seconds is good). The retry value is the amount of time your secondary/slave nameservers will wait to contact the master nameserver again if the last attempt failed.
WARN SOA EXPIRE value WARNING: Your SOA EXPIRE time is : 3600000 seconds. This seems a bit high. You should consider decreasing this value to about 1209600 to 2419200 seconds (2 to 4 weeks). RFC1912 suggests 2-4 weeks. This is how long a secondary/slave nameserver will wait before considering its DNS data stale if it can't reach the primary nameserver.
PASS SOA MINIMUM TTL value OK. Your SOA MINIMUM TTL is: 86400 seconds. This seems normal (about 3,600 to 86400 seconds or 1-24 hours is good). RFC2308 suggests a value of 1-3 hours. This value used to determine the default (technically, minimum) TTL (time-to-live) for DNS entries, but now is used for negative caching.
MX INFO MX Record Your 1 MX record is:
0 dirk.xxxx.com. [TTL=14400] IP=64.34.200.176 [TTL=14400] [US]
PASS Low port test OK. Our local DNS server that uses a low port number can get your MX record. Some DNS servers are behind firewalls that block low port numbers. This does not guarantee that your DNS server does not block low ports (this specific lookup must be cached), but is a good indication that it does not.
PASS Invalid characters OK. All of your MX records appear to use valid hostnames, without any invalid characters.
PASS All MX IPs public OK. All of your MX records appear to use public IPs. If there were any private IPs, they would not be reachable, causing slight mail delays, extra resource usage, and possibly bounced mail.
PASS MX records are not CNAMEs OK. Looking up your MX record did not just return a CNAME. If an MX record query returns a CNAME, extra processing is required, and some mail servers may not be able to handle it.
PASS MX A lookups have no CNAMEs OK. There appear to be no CNAMEs returned for A records lookups from your MX records (CNAMEs are prohibited in MX records, according to RFC974, RFC1034 3.6.2, RFC1912 2.4, and RFC2181 10.3).
PASS MX is host name, not IP OK. All of your MX records are host names (as opposed to IP addresses, which are not allowed in MX records).
INFO Multiple MX records NOTE: You only have 1 MX record. If your primary mail server is down or unreachable, there is a chance that mail may have troubles reaching you. In the past, mailservers would usually re-try E-mail for up to 48 hours. But many now only re-try for a couple of hours. If your primary mailserver is very reliable (or can be fixed quickly if it goes down), having just one mailserver may be acceptable.
PASS Differing MX-A records OK. I did not detect differing IPs for your MX records (this would happen if your DNS servers return different IPs than the DNS servers that are authoritative for the hostname in your MX records).
PASS Duplicate MX records OK. You do not have any duplicate MX records (pointing to the same IP). Although technically valid, duplicate MX records can cause a lot of confusion, and waste resources.
FAIL Reverse DNS entries for MX records ERROR: The IP of one or more of your mail server(s) have no reverse DNS (PTR) entries (if you see "Timeout" below, it may mean that your DNS servers did not respond fast enough). RFC1912 2.1 says you should have a reverse DNS for all your mail servers. It is strongly urged that you have them, as many mailservers will not accept mail from mailservers with no reverse DNS entry. You can double-check using the 'Reverse DNS Lookup' tool at the DNSstuff site (it contacts your servers in real time; the reverse DNS lookups in the DNS report use our local caching DNS server). The problem MX records are:
176.200.34.64.in-addr.arpa [No reverse DNS entry (rcode: 3 ancount: 0) (check it)]
Mail PASS Connect to mail servers OK: I was able to connect to all of your mailservers.
WARN Mail server host name in greeting WARNING: One or more of your mailservers is claiming to be a host other than what it really is (the SMTP greeting should be a 3-digit code, followed by a space or a dash, then the host name). If your mailserver sends out E-mail using this domain in its EHLO or HELO, your E-mail might get blocked by anti-spam software. This is also a technical violation of RFC821 4.3 (and RFC2821 4.3.1). Note that the hostname given in the SMTP greeting should have an A record pointing back to the same server. Note that this one test may use a cached DNS record.

dirk.xxxx.com claims to be invalid hostname 'XXXX':
220 XXXX ESMTP
PASS Acceptance of NULL <> sender OK: All of your mailservers accept mail from "<>". You are required (RFC1123 5.2.9) to receive this type of mail (which includes reject/bounce messages and return receipts).
PASS Acceptance of postmaster address OK: All of your mailservers accept mail to [email protected] (as required by RFC822 6.3, RFC1123 5.2.7, and RFC2821 4.5.1).
PASS Acceptance of abuse address OK: All of your mailservers accept mail to [email protected].
PASS Acceptance of domain literals OK: All of your mailservers accept mail in the domain literal format (user@[64.34.200.176]).
PASS Open relay test OK: All of your mailservers appear to be closed to relaying. This is not a thorough check, you can get a thorough one here.
dirk.xxxx.com OK: 554 <Not.abuse.see.www.DNSreport.[email protected]>: Relay access denied
WARN SPF record Your domain does not have an SPF record. This means that spammers can easily send out E-mail that looks like it came from your domain, which can make your domain look bad (if the recipient thinks you really sent it), and can cost you money (when people complain to you, rather than the spammer). You may want to add an SPF record ASAP, as 01 Oct 2004 was the target date for domains to have SPF records in place (Hotmail, for example, started checking SPF records on 01 Oct 2004).
WWW
INFO WWW Record Your www.xxxx.com A record is:

www.xxxx.com. CNAME xxxx.com. [TTL=14400]
xxxx.com. A 66.98.204.10 [TTL=14400] [US]

PASS All WWW IPs public OK. All of your WWW IPs appear to be public IPs. If there were any private IPs, they would not be reachable, causing problems reaching your web site.
PASS CNAME Lookup OK. You do have a CNAME record for www.xxxx.com, which can cause some confusion. However, this is legal. Your CNAME entry also returns the A record for the CNAME entry, which is good -- otherwise, it would require an extra DNS lookup, which slightly delays the initial access to the website and use extra bandwidth. Note that if the CNAME points to another CNAME, it will likely cause problems.


Legend:

* Rows with a FAIL indicate a problem that in most cases really should be fixed.
* Rows with a WARN indicate a possible minor problem, which often is not worth pursuing.
* Note that all information is accessed in real-time (except where noted), so this is the freshest information about your domain.
* Note that automated usage is not tolerated; please only view the DNS report directly with your web browser.



(C) Copyright 2000-2006 DNSstuff.com
 

easyhoster1

Well-Known Member
Sep 25, 2003
656
0
166
Hmm...........This is the first thing I would look at;

FAIL Reverse DNS entries for MX records ERROR: The IP of one or more of your mail server(s) have no reverse DNS (PTR) entries (if you see "Timeout" below, it may mean that your DNS servers did not respond fast enough). RFC1912 2.1 says you should have a reverse DNS for all your mail servers. It is strongly urged that you have them, as many mailservers will not accept mail from mailservers with no reverse DNS entry.
 

teakwood

Active Member
Jul 8, 2006
26
0
151
Sydney
Today I left one VPS "unprotected", set the other to restart APF after upcp.

The unprotected one had a clogged mail queue. I could connect by web, ssh, pop3 and I could connect out (ping etc). But exim does the "connection refused" thing until I restart APF.
 

jamesbond

Well-Known Member
Oct 9, 2002
737
1
168
teakwood said:
The unprotected one had a clogged mail queue. I could connect by web, ssh, pop3 and I could connect out (ping etc). But exim does the "connection refused" thing until I restart APF.
Do you have the SMTP tweak enabled in WHM security settings perhaps? You could try disabling that. It could be that when you start APF the SMTP iptables rule gets flushed?
 

teakwood

Active Member
Jul 8, 2006
26
0
151
Sydney
Yes, the SMTP tweak is enabled, and has been for months before this update created the problem. :D

I have a workaround (cron fw after upcp), so I'm not looking for another. Would be good if someone could figure out what it was about the update that caused the problem, so it doesn't hit everyone at once when it is moved to RELEASE.
 

NetMan1

Registered
Jan 16, 2004
2
0
151
RF: Exim problems after CURRENT update

I have had this problem for months. I was advised by one of the techs at my data center to telnet the server and do the following,
iptables -D OUTPUT 1
then restart Exim. This has worked every time. It takes a few minutes, but the status light goes back green. If I don't get to it quick enough the CPU goes over 100% load.
 

madlady

Registered
May 3, 2008
2
0
51
Currently for me, disabling the smtp tweak in the whm security center has worked to stop the 'connection refused' scenario so I hope this helps someone :)