Hi there. I had the same odd problem some time ago, and after reading a lot the exim docs, I got the solution and now it is working like a breeze, with no more than 15 messages in the queue at the same time (well, may you find more messages if some customers send massive newsletters or if using news groups, but then, they never will be more than 600 in a given time ).
Hope this helps, since I wanted to share this since some time ago.
At first, you need to know that every time you click Save on the cPanel Exim Config Editor, the entire Exim config file is fully rewritten. So, if you manually change anything there, it will be lost next time you save the Exim config from WHM. cPanel builds the Exim config file (/etc/exim.conf) using templates. This is why you must know them and repeat the modifications in all of them.
I found that the bullet proof method to customize your retry times is to touch those elements in the templates which cPanel only use to copy and paste to build the final config. You can modify the below values to what better fit in your scenario. The only way to test different values without letting cPanel overwrite them is to edit exim.conf and restart exim via SSH.
I like to edit using pico: pico /etc/exim.conf
Now, scroll down (Pg Down) till the end of the file. You'll find the default
The problem with this concept is that make generalizations, dealing with every error in the same way. But you and me know that every bounce error have not born equal
* * F,2h,15m; G,16h,1h,1.5; F,4d,8h
Well, to deal with my daily scenario, I replaced the default settings for this one:
This makes sense to better handle different situations:
* rcpt_4xx F,12h,30m
* timeout F,4h,30m
* refused F,1h,20m
* lost_connection F,1h,20m
* * F,4h,30m
- Yahoo, AOL and graylisted messages receiving 4xx error, will retry every 30 minutes during 12 hs because sooner or later, they will arrive to their destinations. It's only a matter of patience.
- Destination domains which returns timeout in connection may be due to the domain gone down, or maybe some antispam systems returning a camouflaged error. So, I CONSIDER that there is no reason to hammer the destination for more than 4 hs. Also, a timeout connection consume memory and we don't want to waste resources.
- Refused connections may effectively be due to antispam blocks. 10% may be due to greylisting. That's why I'll retry them for no more than 1 hr.
- Lost connections too. May be an antispam, may be a blackhole... I don't want to waste CPU time with them.
- And for all the remaining error situations, just keep trying every 30 minutes during 4 hs. Since the most common errors have been already addressed, I THINK that retrying with these for more than 4 hrs is a nonsense.
You think this was great, but there is more to read..
Scroll up! Till the first screen of exim.conf at the very beginning. Locate these variables and change its values:
This will reinforce the retry times with behavioural settings. The variable names are self-explanatory as you can see
ignore_bounce_errors_after = 6h
timeout_frozen_after = 5h
auto_thaw = 23h
deliver_queue_load_max = 3
These changes will sum up to better process messages without wasting resources in queuing.
BUT there are mooore facts to pay attention to.
Till here, you know how to improve the queue work. But, the next time you save Exim configuration from cPanel, all this job will be lost. To prevent this, you'll need to repeat all these changes in the template files at /etc/exim.conf* (there are about 5 files). That way, you'll have covered the templates that cPanel uses to build the final config file.
But you better first test changing only the exim.conf and restarting exim service thru SSH command line.
And before letting you go... There's is something more you need to know: the next time you run eximup, or next time you upgrade cPanel, all the templates will be upgraded too and obviously reset to their defaults, so you'll need to redo all this job again... and again. Isn't that great?