The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

SpamAssassin bayes scores don't match (debug vs. processing)

Discussion in 'E-mail Discussions' started by shacker23, Dec 19, 2012.

  1. shacker23

    shacker23 Well-Known Member

    Joined:
    Feb 20, 2005
    Messages:
    263
    Likes Received:
    1
    Trophy Points:
    16
    I've been training spamassassin like crazy, but it's not doing any good - many users are getting around 50% spam, and the reason is that the bayes scores are too low.

    Check this example - I have copied an example spam message that was processed by the mail server, to /root/tmp/spam-example. If I cat that file, then try running that file through spamassassin manually, I see very different Bayes results:

    The message as processed normally:

    Code:
    cat /root/tmp/spam-example
    
    X-Spam-Status: No, score=1.4
    X-Spam-Score: 14
    X-Spam-Bar: +
    
    pts rule name description
    ---- ---------------------- --------------------------------------------------
    1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
    [URIs: fcagahaujeqaraf.tk]
    -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
    domain
    1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
    [URIs: fcagahaujeqaraf.tk]
    -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
    [score: 0.0000]
    0.0 HTML_MESSAGE BODY: HTML included in message
    X-Spam-Flag: NO
    Here's the same message run through spamassassin from the command line in debug mode:

    Code:
    spamassassin -t -D < /root/tmp/spam-example 
    
    Content analysis details: (6.8 points, 2.0 required)
    
    pts rule name description
    ---- ---------------------- --------------------------------------------------
    1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
    [URIs: fcagahaujeqaraf.tk]
    1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
    [URIs: fcagahaujeqaraf.tk]
    3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
    [score: 1.0000]
    0.0 SINGLE_HEADER_3K A single header contains 3K-4K characters
    -0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
    domain
    0.0 HTML_MESSAGE BODY: HTML included in message

    So when run through manually, the Bayes probability is very high - but as processed normally by exim, it gets a negative Bayes score!

    Ideas? Thanks.
     
  2. shacker23

    shacker23 Well-Known Member

    Joined:
    Feb 20, 2005
    Messages:
    263
    Likes Received:
    1
    Trophy Points:
    16
    In another test, with a different message, I get 0.6 points after processing normally, but 14.7 points if I run it through SA manually.

    It really feels as if they're looking at two totally separate Bayes dbs.
     
  3. shacker23

    shacker23 Well-Known Member

    Joined:
    Feb 20, 2005
    Messages:
    263
    Likes Received:
    1
    Trophy Points:
    16
    I finally got to the bottom of this - running sa-learn as root only trains Bayes for root's mail! The admin needs to run sa-learn commands "as" other users in order for it to work. I've written up a blog post with detailed notes on spam/ham training for WHM administrators:

    /http://birdhouse.org/blog/2012/12/22/spam-training-on-cpanel-for-desktop-mail-clients/

    I hope someone finds it useful!
     
Loading...

Share This Page