The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

POP Problems

Discussion in 'General Discussion' started by beddo, Jul 3, 2008.

  1. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    Hi folks, we're having some problems with POP mail on our Cpanel server.

    A few weeks ago I ran upcp and got everything up to date - I've just run it again incase any bugs have been fixed and will monitor. I'm not even sure the problem is with our server but need to investigate everything to check.

    I'm not sure if there is any way to turn on full logging for courier pop3 but what we are seeing is a very limited case.

    We have two local clients on BT Broadband, both of which seem to keep loosing the ability to check mail (sometimes randomly and only affecting some PCs). It only happens when mail is waiting on the server. If there is nothing, the connection goes through fine.

    No other people have reported the problem, so that means BT broadband customers not local to our town are fine (they do go through different exchanges though for people knowing about UK ADSL setups) and other ISP customers are fine.

    I did a sniff of one of the sessions and it went like this:

    1) Client connects to server
    2) Client logs in
    3) Client does stat
    4) Client start RETR
    5) TCP RST received for the connection (apparently from our server)

    The maillog simply shows a client login and then no log out.

    The odd thing is, the problem goes away if they reset their router but then comes back later - but it comes back at the same time for both of our clients so it has to be either something in the ISPs network or something on our server.

    Where on earth do I start looking to rule the server out?
     
  2. jayh38

    jayh38 Well-Known Member

    Joined:
    Mar 3, 2006
    Messages:
    1,215
    Likes Received:
    0
    Trophy Points:
    36
    You pretty much did rule out the server by stating no one else has problems. The client is behind a router but it may be possible that the router may be filtering some needed packets.

    You can install configserver mail manage www.configserver.com to quickly create a test account on their domain and test this yourself. Test the pop and webmail just to be complete.

    good luck
     
  3. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    To be honest I think it is some combination. The upcp has changed something and whatever has changed seems to disagree with these particular accounts. I have since realised that there are other similarities. Not only are both sites on BT Broadband but they are also on similar Netgear routers that are a few years old.

    So if there is a problem in the system somewhere with either BT or the routers, it wasn't appearing before the upcp. Thus is reasons to stand that something can be done on the mailserver (or even undone..) which will allow it to work around the system as it has done since the server was set up.
     
  4. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    14,472
    Likes Received:
    200
    Trophy Points:
    63
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    This is probably no help at all, but have you clicked the Exim Configuration Editor in WHM recently? Upgrades recently have made changes in this area, if yours needs reset there will be a message waiting for you there. (you should have been mailed about it as well)

    Anyway, just a thought.
     
  5. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    I have indeed, do it after every update. Most of the ACLs I put in manually have been incorporated into the config properly so there's no problems there anymore.
     
  6. cPanelKenneth

    cPanelKenneth cPanel Development
    Staff Member

    Joined:
    Apr 7, 2006
    Messages:
    4,461
    Likes Received:
    22
    Trophy Points:
    38
    cPanel Access Level:
    Root Administrator
    Are the clients connecting via SSL or non-SSL?
     
  7. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    non SSL.
    I've had client programs do silly things with SSL so unless it is needed I don't bother..

    I upgraded the firmware on one of the routers on Saturday. Will see if that has any effect and if it does then I'll do the other one (not doing both so I can tell when the non-upgraded one has trouble and compare as to whether or not the upgraded one works).

    I can't really think of much else to try except replacing the routers as its only those older Netgear ones that seem to be affected.
     
  8. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    OK, this one is now seriously doing my nut in. I spent several hours on site at one place trying to resolve this.

    The full low down:

    3 machines using Windows XP Pro SP2 With Outlook 2000 SP3 configured to collect vanilla POP3 mail.

    Each PC has its own mail account plus one shared account (mail@). Problems have been experienced on both individual and shared ones.

    The problems:

    1) The mail client gets to a particular message and then just hangs there with no response. Outlook locks up completely and has to be killed from task manager.

    OR

    2) The mail client gets to the same message, waits a few minutes and then errors out with a TCP/IP error sending data to the server (generic Outlook 2000 message).

    My testing (all with no effect whatsoever):

    Router has been changed (was previously an old Netgear DG834 running firmware v1.01.01), now a Thomson Speedtouch 780 WL

    Mail hosting server has been restarted

    Antivirus (Trend WFB - Was Symantec 9.something when this started) fully disabled

    Tested with Outlook 2000 - Problem evident.
    Tested with Outlook Express - Problem evident.
    Tested with Mozilla Thunderbird - Problem STILL evident!

    So I tested it with Thunderbird on my work PC via remote - Problem not evident.

    I tested it from my HTC PDA over their wireless connection - Problem not evident.

    I then ran a telnet session from one of their PCs and saw something I have never seen before when retrieving the problematic message

    Code:
    +OK Hello there.
    user mail@domain.tld
    +OK Password required.
    pass ******
    +OK logged in.
    retr 25
    +OK 214523 octets follow.
    Comments:
    The server waits a few moments after saying octets follow and then says "Comments:". It then waits a while longer and repeats the same message. It does it a few more times and then the connection drops.

    This surely is a problem with the server software as the only valid responses to RETR are OK and ERR, infact I can't find any mention of "Comments:" in the protocol spec at all.

    Why this only happens from the PCs in the client's office completely baffles me though. It always happens on the same message on every PC but it happens far too frequently for it to be an issue with each specific message.

    The specific message in question follows, but somewhat changed to protect private details. There is nothing special about it except it was a forward and came from squirrelmail. I used to work at the source ISP so if needs be I can contact them, but I highly doubt it is anything their end.

     
  9. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    14,472
    Likes Received:
    200
    Trophy Points:
    63
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    How large is the email?
     
  10. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    You might have just hit something I missed.
    The server says:

    There is no way the message is over 200Kb infact when I take the whole message (retrieved from its a different connection) and save it to its own file, it comes to 2403 bytes.

    So not only is the server giving an incorrect response to the RETR, it is also stating the wrong size for the message.

    Incase you are thinking the problem is just the thing timing out because the message is too big, it can't be. In response to the RETR, it isn't sending ANY data other than the OK response.

    Its not a firewall on the server either as I dropped it temporarily to check.
     
  11. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    14,472
    Likes Received:
    200
    Trophy Points:
    63
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
  12. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    Well at first I didn't see how it could be related, but then I noticed something.

    There are only six messages in the mailbox, but the maildirsize file does not contain the first line stating the quota:

    Code:
          -23658           -1
           -4890           -1
          -58387           -1
         -429153           -1
          -23052           -1
          -93111           -1
           -1139           -1
           -3383           -1
           -1143           -1
          -54487           -1
          -22033           -1
        -3497973           -1
         -211691           -1
    2671 1
           -2671           -1
          -25649           -1
         -137234           -1
          -15119           -1
          -61673           -1
          -22000           -1
         -330813           -1
          -11670           -1
         -212670           -1
         -137405           -1
          -22025           -1
           -7932           -1
          -57709           -1
    2420 1
           -2420           -1
    4321 1
    2411 1
    4352 1
    4164 1
    1902 1
    5953 1
    I moved the file to a backup and tried to get the file recreated by logging out and in etc to no avail. I had to actually modify the quota for the mailbox in order to get it to recreate the file.

    It is odd though because I can't understand how something like that would affect just the three PCs in question rather than affecting the mailbox for every location.

    They are currently using mail via IMAP so I'll have to put one of them back to POP3 mail and see if the problem recurs unless anyone has any other ideas of what to check.
     
  13. Infopro

    Infopro cPanel Sr. Product Evangelist
    Staff Member

    Joined:
    May 20, 2003
    Messages:
    14,472
    Likes Received:
    200
    Trophy Points:
    63
    Location:
    Pennsylvania
    cPanel Access Level:
    Root Administrator
    Twitter:
    So the problem was not corrected by recreating the file? You mention IMAP, syncing files and folders in IMAP is different from POP just downloading mail to you.
     
  14. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    I know IMAP and POP is different, that's why I've changed their setup to use IMAP so they can still at least check their mail because POP was broken. I don't want to start encouraging users to leave mail on the server though otherwise they'll quickly hit their quota!

    Unfortunately I can't test it easily as there is no predicting which message is going to screw up, plus I'm not out at that site.
     
  15. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    Right, the problem has now recurred on one of the PCs on a different mailbox that is only accessed by a single PC.

    This mailbox does NOT have the missing line of the maildirsize so that is not the issue.

    :(

    I have opened a ticket (280965) on it now because I'm stuck!
     
    #15 beddo, Jul 16, 2008
    Last edited: Jul 16, 2008
  16. jayh38

    jayh38 Well-Known Member

    Joined:
    Mar 3, 2006
    Messages:
    1,215
    Likes Received:
    0
    Trophy Points:
    36
    Any follow-up on this? Just curious of the end result.
     
  17. beddo

    beddo Well-Known Member

    Joined:
    Jan 19, 2007
    Messages:
    157
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    England
    cPanel Access Level:
    DataCenter Provider
    No, everyone seems to think they're waiting for someone else to test something and I've had far too many other problems to deal with that I haven't had the chance to rule anything else out.

    Essentially I believe there to have been two problems. Firstly whatever caused the random "Comments:" response that has now been resolved.
    Secondly there are the messages just locking the queue up. On the application level the server is acknowledging the RETR command issued by the client but we have one of the following possible problems:

    The ack is not making it out onto the wire (server problem)
    The ack is getting lost in transit (oh the fun of finding someone to accept responsibility)
    The ack is getting to the destination network but not making it up the wire to the PCs (only possible reason left would be the main switch which is next on our list to test, if that doesn't work then a hub between the edge router and the switch and a sniff to prove the packet doesn't make it to the network).

    In the meantime all I can do is log onto the server whenever the problem surfaces and move the bad mail out of the mail directory.
     
Loading...

Share This Page