The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

High loads with EXIM, when Mailman lists used

Discussion in 'E-mail Discussions' started by ttremain, Nov 6, 2005.

  1. ttremain

    ttremain Well-Known Member

    Joined:
    Feb 16, 2003
    Messages:
    212
    Likes Received:
    0
    Trophy Points:
    16
    We typically run "release" and this started about 2 weeks ago, on multiple servers.

    A couple of our clients have larger mailing lists, or multiple smaller mailing lists that
    only they can send to.

    When a mailing is sent via these lists, server load, can easily jump from .10 to 30 or 40.
    Exim is at that point running several tasks at 99% load.

    These lists had no problem until receintly, and have been tested at much lower output than
    normal, with bad results. (high loads, Apache can't be reached, or too slow to respond, etc)

    We are at our wits end here, and would love some ideas.
     
  2. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    A few ideas:

    1. Make sure that you're using a local DNS resolver in /etc/resolv.conf and not a remote one

    2. Disable sender callouts in WHM > Exim Configuration Editor

    3. You can offload email to the queue on high loads by adding the following to the first textarea of the advanced mode exim config editor:

    queue_only_load = 6
     
    dropby23 likes this.
  3. ttremain

    ttremain Well-Known Member

    Joined:
    Feb 16, 2003
    Messages:
    212
    Likes Received:
    0
    Trophy Points:
    16
    I'll certainly try #'s 2 and 3 right now.

    The first resolve server we use, is on the same VLAN, but not part of
    the same DNS cluster. This makes it close, but public settings are
    not overridden by the local DNS.

    Thanks!
     
  4. budway

    budway Well-Known Member

    Joined:
    Apr 16, 2003
    Messages:
    186
    Likes Received:
    0
    Trophy Points:
    16
    It`s actually said to deal with mailman it does not troutle it`s load/sent.

    If all your customers send 1.000.000 e-mail mailman will actually try to send all these e-mails in a fly.

    It`s actually said!
     
  5. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    I'm afraid I don't understand your update. Are you asking a question about something related to this thread?

    If you're talking about the fact that mailman doesn't rate limit mail sending, that is true, but you can tweak both mailman and exim to prevent large mailing lists bringing your server to a stop.
     
  6. ttremain

    ttremain Well-Known Member

    Joined:
    Feb 16, 2003
    Messages:
    212
    Likes Received:
    0
    Trophy Points:
    16
    After disabling callouts, and setting queue_only_load, we are now getting nothing but complaints from our client.

    "how come our email isn't as quick as it was two weeks ago?" Mind you, two weeks ago we didn't have load issues, either...

    I believe something serious has changed in either exim or mailman.
     
  7. lloyd_tennison

    lloyd_tennison Well-Known Member

    Joined:
    Mar 12, 2004
    Messages:
    698
    Likes Received:
    1
    Trophy Points:
    18
    Some exim update has changed server load as I am also seeing this in the last week or so. Exim is uses more resources than it has been especialy with Mailman even with Mailman throttled.

    Have to wait to either Exim 4.54 or re-setup Mailman limits. I know I am going to resetup limits as have no idea on 4.54.
     
    #7 lloyd_tennison, Nov 16, 2005
    Last edited: Dec 12, 2005
  8. MichaelShanks

    MichaelShanks Well-Known Member
    PartnerNOC

    Joined:
    Aug 20, 2001
    Messages:
    104
    Likes Received:
    0
    Trophy Points:
    16
    there is defiently a problem with exim, i'm noticing high load on about a dozen servers, all mail process's, killing off spamd seems to help
     
  9. MichaelShanks

    MichaelShanks Well-Known Member
    PartnerNOC

    Joined:
    Aug 20, 2001
    Messages:
    104
    Likes Received:
    0
    Trophy Points:
    16
  10. ttremain

    ttremain Well-Known Member

    Joined:
    Feb 16, 2003
    Messages:
    212
    Likes Received:
    0
    Trophy Points:
    16
    I have added to the ticket.

    Thanks!
     
  11. ttremain

    ttremain Well-Known Member

    Joined:
    Feb 16, 2003
    Messages:
    212
    Likes Received:
    0
    Trophy Points:
    16
    Killing off spamd (disabling Spamassassin in tweak settings, and service manager) does not seem to make any difference here.
     
  12. jondolar

    jondolar Well-Known Member

    Joined:
    Feb 15, 2004
    Messages:
    46
    Likes Received:
    0
    Trophy Points:
    6
    exim processes

    I have a very similar issue but I can't narrow it down to any mailing list.

    Exim starts increasing cpu utilization (as displayed in top) to 99% and then kicks off another exim process. After a few minutes, that process jumps to 99% and then another exim process starts. Ater about 1/2 hour I have 6 or 7 exim processes and the server load is sitting at about 10 (up from 1 to 2). If I let this go exim will keep kicking off more processes until the load crashes the server.

    I have a cron job to restart exim every 1/2 hour and that "fixes" the issue with the load. However, this is obviously not a permanent solution.

    This is so consistent and is happening over 5+ days. It starts happening immediately after a reboot as well. It looks like there is something causing the exim process to "hang" and it starts another process which eventually "hangs" as well.

    Do you think this is related?
    Any thoughts?
     
  13. ttremain

    ttremain Well-Known Member

    Joined:
    Feb 16, 2003
    Messages:
    212
    Likes Received:
    0
    Trophy Points:
    16
    This could very well be closer to the cause than any of us have gotten.

    It may or may not be related, but does sound very much like what I am seeing.

    As an added note, in the process however, very little mail is actually getting sent per minute/hour.
     
  14. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    Have you tried running a strace on the pd to find out what, exactly it is doing?
     
  15. MichaelShanks

    MichaelShanks Well-Known Member
    PartnerNOC

    Joined:
    Aug 20, 2001
    Messages:
    104
    Likes Received:
    0
    Trophy Points:
    16

    try disabling spamd, 70% sure this will fix it, kill off cheksrvd first though so cpanel doesnt bring it up,

    there is defiently a problem with cpanel, i have gone from stable servers to having half a dozen having problems, this happened in the space of one night and killing off spamd brings them under control

    by bugzilla report was dismissed stating "not much we can do as we can't recreate locally"
     
  16. MichaelShanks

    MichaelShanks Well-Known Member
    PartnerNOC

    Joined:
    Aug 20, 2001
    Messages:
    104
    Likes Received:
    0
    Trophy Points:
    16
    it may be a case that spamd just isn't very good with dealing with more than a few mails a minute and needs some optimisation
     
  17. fleksi

    fleksi Well-Known Member

    Joined:
    Sep 17, 2003
    Messages:
    125
    Likes Received:
    0
    Trophy Points:
    16
    How to optimize spamd?
    Thank you,
    -fl-
     
  18. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    There have been documented problems with spamd and throughput discussed on the forums in the past. Many people were indeed finding emai being refected with 421 errors because of it, and it appeared to be a design flaw with spamd/exim/cpanel implementation. Using SpamAssassin through alternative scanning methods (i.e. using the perl modules through a 3rd party wrapper) isn't affected by that issue at all.
     
  19. lloyd_tennison

    lloyd_tennison Well-Known Member

    Joined:
    Mar 12, 2004
    Messages:
    698
    Likes Received:
    1
    Trophy Points:
    18
    Cpanel was offerered access to two different servers to recreate the problem and they do not seem to have responded to either.
     
  20. ttremain

    ttremain Well-Known Member

    Joined:
    Feb 16, 2003
    Messages:
    212
    Likes Received:
    0
    Trophy Points:
    16
    Here is an example, not running too high quite yet...

    top - 02:35:13 up 37 days, 2:33, 3 users, load average: 20.63, 14.48, 12.21
    Tasks: 140 total, 22 running, 118 sleeping, 0 stopped, 0 zombie
    Cpu(s): 98.8% us, 1.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.2% hi, 0.0% si
    Mem: 1026300k total, 1008080k used, 18220k free, 131460k buffers
    Swap: 4192956k total, 160k used, 4192796k free, 586920k cached

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    3064 root 25 0 9664 3920 2660 R 10.3 0.4 0:31.32 /usr/sbin/exim -Mc 1EhPEZ-0000nO-8N
    3085 root 25 0 8480 3932 2648 R 10.3 0.4 0:22.72 /usr/sbin/exim -Mc 1EhPFH-0000nj-Tk
    3093 root 25 0 8804 3892 2648 R 10.3 0.4 0:20.41 /usr/sbin/exim -Mc 1EhPFb-0000ns-RR
    2580 root 25 0 9716 3920 2648 R 9.9 0.4 1:51.11 /usr/sbin/exim -Mc 1EhP8g-0000fZ-Ij
    2592 root 25 0 8596 3924 2648 R 9.9 0.4 1:52.82 /usr/sbin/exim -Mc 1EhP8l-0000fl-Dj
    2591 root 25 0 8704 3940 2648 R 9.9 0.4 1:48.01 /usr/sbin/exim -Mc 1EhP8l-0000fl-4R
    3076 root 25 0 8320 3892 2648 R 9.9 0.4 0:26.29 /usr/sbin/exim -Mc 1EhPEu-0000nZ-8Q
    3084 root 25 0 8680 3896 2648 R 9.9 0.4 0:22.51 /usr/sbin/exim -Mc 1EhPFH-0000nj-Fu
    3086 root 25 0 8796 3916 2648 R 9.9 0.4 0:23.12 /usr/sbin/exim -Mc 1EhPFI-0000nj-4K
    3095 root 25 0 9828 3900 2648 R 9.9 0.4 0:20.33 /usr/sbin/exim -Mc 1EhPFc-0000ns-8T
    3102 root 25 0 8956 3904 2648 R 9.9 0.4 0:18.42 /usr/sbin/exim -Mc 1EhPFw-0000o0-Gr
    3130 root 25 0 8728 3800 2556 R 9.9 0.4 0:03.89 /usr/sbin/exim -Mc 1EhPIE-0000oS-1X
    3131 root 25 0 9948 3796 2556 R 9.9 0.4 0:03.82 /usr/sbin/exim -Mc 1EhPIE-0000oS-EQ
    3075 root 25 0 10044 3904 2648 R 9.6 0.4 0:26.42 /usr/sbin/exim -Mc 1EhPEt-0000nZ-P4
    3103 root 25 0 9900 3904 2648 R 9.6 0.4 0:18.32 /usr/sbin/exim -Mc 1EhPFw-0000o0-MO
    2630 root 25 0 8484 3956 2676 R 8.6 0.4 1:51.61 /usr/sbin/exim -Mc 1EhP93-0000gO-D0
    3094 root 25 0 10096 3916 2648 R 7.6 0.4 0:20.12 /usr/sbin/exim -Mc 1EhPFc-0000ns-16
    3129 root 25 0 9272 3792 2556 R 7.6 0.4 0:03.86 /usr/sbin/exim -Mc 1EhPID-0000oS-Sf
    2652 root 25 0 8788 3968 2684 R 7.0 0.4 1:48.03 /usr/sbin/exim -Mc 1EhP93-0000gO-Lv
    3063 root 25 0 8428 3896 2648 R 7.0 0.4 0:29.01 /usr/sbin/exim -Mc 1EhPEZ-0000nO-3L
    3134 root 25 0 8740 3776 2556 R 6.6 0.4 0:01.51 /usr/sbin/exim -MCS -MCP -MC remote_smtp mx1.mail.yahoo.com 4.79.181.14 2 1EhP8g-0000fZ-C6
    3054 root 24 0 8296 3888 2660 S 6.3 0.4 0:32.68 /usr/sbin/exim -MCS -MCP -MC remote_smtp imailv.emirates.net.ae 195.229.241.57 2 1EhOdy-00009A-FA


    I ran an strace on the PID 2591 and got the following, however it just continued over and over and over, with different sets of numbers:


    _llseek(6, 9889215, [9889215], SEEK_SET) = 0
    _llseek(6, 0, [9889215], SEEK_CUR) = 0
    read(6, "419856 262046 .\n1114419856 26204"..., 4096) = 4096
    _llseek(6, 9885135, [9885135], SEEK_SET) = 0
    _llseek(6, 0, [9885135], SEEK_CUR) = 0
    read(6, "419781 261972 .\n1114419781 26197"..., 4096) = 4096
    _llseek(6, 9881055, [9881055], SEEK_SET) = 0
    _llseek(6, 0, [9881055], SEEK_CUR) = 0
    read(6, "419746 262005 .\n1114419746 26200"..., 4096) = 4096
    _llseek(6, 9876975, [9876975], SEEK_SET) = 0
    _llseek(6, 0, [9876975], SEEK_CUR) = 0
    read(6, ".\n1114419187 293559 .\n1114419187"..., 4096) = 4096
    _llseek(6, 9872881, [9872881], SEEK_SET) = 0
    _llseek(6, 0, [9872881], SEEK_CUR) = 0
    read(6, "418962 293569 .\n1114418962 29356"..., 4096) = 4096
    _llseek(6, 9868801, [9868801], SEEK_SET) = 0
    _llseek(6, 0, [9868801], SEEK_CUR) = 0
    read(6, "418922 293299 .\n1114418922 29329"..., 4096) = 4096
    _llseek(6, 9864721, [9864721], SEEK_SET) = 0
    _llseek(6, 0, [9864721], SEEK_CUR) = 0
    read(6, "418569 293298 .\n1114418570 29329"..., 4096) = 4096
    _llseek(6, 9860641, [9860641], SEEK_SET) = 0
    _llseek(6, 0, [9860641], SEEK_CUR) = 0
    read(6, ".\n1114413232 110994 .\n1114413232"..., 4096) = 4096


    I have no idea how to read that...
     
Loading...

Share This Page