imap failing daily

SenB

Registered
May 23, 2010
2
0
51
Hey all -

Got a strange issue lately with a cPanel/WHM server. It's been running for a while now with no problem, but lately I've been getting messages every morning like so:

imap failed @ Wed May 19 05:29:35 2010. A restart was attempted automagically.
Service Check Method: [check command]

Cmd Service Check Raw Output: dovecot is not running
The thing is, this always happens at exactly the same time every day (+/- a minute or so), and it's always right after the upcp script runs. IOW, I get a mail from the upcp script, and then right after, I get another mail indicating that imap has failed.

This has happened daily for four days in a row now, and always in that same order - upcp runs, imap fails. I don't see anything in the upcp output indicating that dovecot is being updated, but I have to believe that the two things are related somehow.

The imap service seems to restart with no problem, but the notifications are not helpful - real imap failures are set to notify me by SMS, and getting a text message every day at 5:30 AM for a non-issue is getting old fast. Anyone run into this, or have any idea what the problem is?
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
I've also been getting the same message at random times for the last 10 days.

The service has always restarted by the time I check the server so I haven't been overly concerned, but it would be good to work out the cause and fix it.
 

ibrow99

Registered
Jan 7, 2006
2
0
151
I am also getting the same message and very regularly - at least 10 times a day.

I've done some searching on this forum, but found no answers as yet, so I've had a look into the logs and can see that Dovecot is being shut down and then restarted every 2 minutes:

# tail -f /var/log/maillog | grep "Killed\|starting"
Code:
Jun 10 13:04:01 vps1930 dovecot: dovecot: Killed with signal 15 (by pid=11429 uid=0 code=kill)
Jun 10 13:04:02 vps1930 dovecot: Dovecot v1.2.11 starting up (core dumps disabled)
Jun 10 13:06:03 vps1930 dovecot: dovecot: Killed with signal 15 (by pid=23880 uid=0 code=kill)
Jun 10 13:06:03 vps1930 dovecot: Dovecot v1.2.11 starting up (core dumps disabled)
Jun 10 13:08:01 vps1930 dovecot: dovecot: Killed with signal 15 (by pid=7225 uid=0 code=kill)
Jun 10 13:08:02 vps1930 dovecot: Dovecot v1.2.11 starting up (core dumps disabled)
Jun 10 13:10:02 vps1930 dovecot: dovecot: Killed with signal 15 (by pid=17604 uid=0 code=kill)
Jun 10 13:10:03 vps1930 dovecot: Dovecot v1.2.11 starting up (core dumps disabled)
The emails I am getting is coenciding with when Dovecot is killed,. but not yet started. Eg I had an email
imap failed @ Thu Jun 10 13:00:21 2010. A restart was attempted automagically.

and the logs said:
Code:
Jun 10 13:00:04 vps1930 dovecot: dovecot: Killed with signal 15 (by pid=24269 uid=0 code=kill)
Jun 10 13:00:25 vps1930 dovecot: Dovecot v1.2.11 starting up (core dumps disabled)
It seems to me that this must have something to do with Dovecot constantly killing itself (or being killed?) and the the monitor coincidentally checking before it has started itself again, but I don't know what is causing the kills.

Any help much appreciated.
Rob
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
It has been suggested that bumping up the "Maximum IMAP Connections Per IP" value may help.

I've just increased mine from 20 to 30, I'll post the results if it's made any difference after a few days.
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
Another suggestion is to go to WHM >> Tweak Settings and scroll down to the following item (it's the second last item on the page) and increase the value ...

The number of times a ChkServd TCP check must fail before notification is sent and the service is restarted. On heavily loaded systems these types of service checks fail occasionaly producing erroneous indications that services are down. A setting of 0 will disable all notifications and restarts due to TCP checks. Setting this value to 3 or higher is recommended for most systems.

I've increased mine from 3 to 4.
 

ibrow99

Registered
Jan 7, 2006
2
0
151
I have just tweaked my settings as advised above, so we shall see if this works.

Also, I did a bit more rooting around and round a crontab entry set to go off every 2 minutes:
*/2 * * * * sh /root/check >/tmp/cron 2>&1

I contacted my hosting company to see what this was and they said it would be safe to cancel it - so I commented it out.

So far since then I haven't had the irritating emails . However, I don't know what this is and if it is safe to not have this running.

As such I will uncomment it and see how the "tweak settings" changes go.

Thanks for the info, will keep you posted with any more findings
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
The changes I made above don't appear to have made any difference.

But I just found this entry in my LogWatch email ...

dovecot: dovecot: Fatal: Time just moved backwards by 10 seconds. This might cause a lot of problems, so I'll just kill myself now. TimeMovedBackwards - Dovecot Wiki: 2 Time(s)

Might be another avenue to explore
 

karikas

Member
Mar 8, 2006
15
0
151
Any luck?

Has anyone had any luck in resolving this issue? I've been experiencing the same thing, had both my host and management company on the case for the past three weeks with no results. I've confirmed that running the /scripts/upcp script appears to be the culprit.

Still knocking ideas around with our server management company... if we find a solution I'll post it here!
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
The timing of mine doesn't coincide with the running of the upcp script at all.

I'm still pursuing the time adjustment problem that the link above discusses.

I too will post the solution once I find it.
 

garconcn

Well-Known Member
Oct 29, 2009
167
17
68
We have fixed the similar problem by run /scripts/courierup --force
Maybe you can try /scripts/dovecotup --force ?
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
I haven't had the problem for a few days now.

Perhaps there was something in one of the recent upcp's that fixed it.
 

karikas

Member
Mar 8, 2006
15
0
151
I hadn't received it for a few days but it came back with a vengeance two nights ago and continues to restart in the middle of the night. Thought we had it licked but apparently not. I'm wondering indeed if this is a cpanel issue I can't do much/anything about!
- Mike
 

karikas

Member
Mar 8, 2006
15
0
151
Well, my management company decided to take the easy way out and we just switched from dovecot to courier. Problem solved. Seems more of an avoidance than a solution but we hardly have any email accounts on that server and it certainly did the trick.

- Mike
 

karikas

Member
Mar 8, 2006
15
0
151
This solution worked for us (even though it's an old thread now I'll put this here for people who wander this way looking for help):

- Create (or update) a script at /scripts/postupcp
- In the contents of that script, put a command to restart dovecot (/sbin/service dovecot restart)
- Save it, chmod to 755

...and that should do it =) This creates a dovecot restart after every cPanel automatic update, which gets dovecot to "realign" itself and not get hung up on the clock which causes all those problems in the first place.
 

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
40
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
Dovecot is very time sensitive and if your clock is constantly having trouble with properly running by skewing backwards, it's inherently more important to fix the clock issues on the machine. Courier is not as time sensitive, and that's why it doesn't error out when the clock isn't working right on a server.

Here's a discussion on time skew issues and how they can be corrected (ntpd being used and other methods based on whether the system is a VPS or a dedicated machine):

clock skew inside CTs +openvz - Web Hosting Talk

Your resolution to fix the Dovecot issue does not actually fix the actual issue. Time skew is not good to happen on a machine. If it is constantly skewing backwards, there's something wrong.
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
Thanks karikas, I've implemented your suggestion. I'll wait and see what happens with the next cPanel upgrade.

It's odd though, I only have this problem on one server, all of the others are fine.
 

mikelegg

Well-Known Member
Mar 29, 2005
330
3
166
Thanks karikas, I've implemented your suggestion. I'll wait and see what happens with the next cPanel upgrade.

It's odd though, I only have this problem on one server, all of the others are fine.
Nope, didn't fix it. I'll have to delve into the time problem (again). The failures don't line up with cPanel updates anyway.

I've just disabled ntpd
Code:
/sbin/chkconfig ntpd off
I'll see if that helps.
 
Last edited:

cPanelTristan

Quality Assurance Analyst
Staff member
Oct 2, 2010
7,607
40
348
somewhere over the rainbow
cPanel Access Level
Root Administrator
Disabling ntpd isn't what was suggested, since if you have time skew, ntpd would be more likely to help it not hurt it.

Do you have Dovecot showing time skew on your system where time keeps going backwards? Please read the link I provided previously where time skew was discussed. There is something wrong with the system itself if time keeps going backwards. Please contact your hosting provider, datacenter or NOC to have them help you with this system issue.