Over the last few weeks I'm experiencing strange behaviour on a small cPanel server. Connections to SSH, SMTP, IMAP and HTTP (incl. WHM web services) do not respond for several minutes after which everything works again for a few minutes. However icmp/ping does give me instant replies all the time and active SSH sessions are not dropped but stay alive.
This behaviour seems to be related to a single host (the one I use to manage the machine). Connections from other IP's at the same time do not show any trouble.
There is nothing in the logs which give me any clue. Also the appearance of the problem (timing) do not give me any clues on lockouttimers of some sort.
I have:
- checked all logs for any clues: apache, cpanel, dovecot, lfd, maillog, exim, messages, audit/cphulk, modsec, etc.
- disabled LFD/CSF > no effect
- disabled selinux > no effect
- disabled cPHulk > no effect
- checked system monitoring (graphs), no cpu, network, memory, io related issues seem apparent
- checked all whitelists to see if the troubled host is still on it (yes it is)
- checked if the IP was listed in iptables, no it is not
- checked if there is a connection limit issue (not more then a handful of established connections)
- updated all yum packages
- rebooted several times
- made a tcpdump on the external interface of the cPanel machine, it confirms what I see. Incoming regular request and after that TCP retransmits for all TCP related traffic from the troubled host, in combination with good traffic from the same services to other IP's. And also working icmp requests and the active ssh session from the troubled host. And after a few minutes everything turns to normal and works again.
Somehow it seems some application is blocking the troubled IP from making new TCP connections. However I'm out of clues... does anyone have any leads for me to look at?
P.S. It's an oversized AWS Amazon Linux machine which has been working without any (noticeable) problems over de last year or so.
This behaviour seems to be related to a single host (the one I use to manage the machine). Connections from other IP's at the same time do not show any trouble.
There is nothing in the logs which give me any clue. Also the appearance of the problem (timing) do not give me any clues on lockouttimers of some sort.
I have:
- checked all logs for any clues: apache, cpanel, dovecot, lfd, maillog, exim, messages, audit/cphulk, modsec, etc.
- disabled LFD/CSF > no effect
- disabled selinux > no effect
- disabled cPHulk > no effect
- checked system monitoring (graphs), no cpu, network, memory, io related issues seem apparent
- checked all whitelists to see if the troubled host is still on it (yes it is)
- checked if the IP was listed in iptables, no it is not
- checked if there is a connection limit issue (not more then a handful of established connections)
- updated all yum packages
- rebooted several times
- made a tcpdump on the external interface of the cPanel machine, it confirms what I see. Incoming regular request and after that TCP retransmits for all TCP related traffic from the troubled host, in combination with good traffic from the same services to other IP's. And also working icmp requests and the active ssh session from the troubled host. And after a few minutes everything turns to normal and works again.
Somehow it seems some application is blocking the troubled IP from making new TCP connections. However I'm out of clues... does anyone have any leads for me to look at?
P.S. It's an oversized AWS Amazon Linux machine which has been working without any (noticeable) problems over de last year or so.