SpamAssassin bayes scores don't match (debug vs. processing)

shacker23

Well-Known Member
Feb 20, 2005
263
1
168
I've been training spamassassin like crazy, but it's not doing any good - many users are getting around 50% spam, and the reason is that the bayes scores are too low.

Check this example - I have copied an example spam message that was processed by the mail server, to /root/tmp/spam-example. If I cat that file, then try running that file through spamassassin manually, I see very different Bayes results:

The message as processed normally:

Code:
cat /root/tmp/spam-example

X-Spam-Status: No, score=1.4
X-Spam-Score: 14
X-Spam-Bar: +

pts rule name description
---- ---------------------- --------------------------------------------------
1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: fcagahaujeqaraf.tk]
-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
domain
1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
[URIs: fcagahaujeqaraf.tk]
-1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1%
[score: 0.0000]
0.0 HTML_MESSAGE BODY: HTML included in message
X-Spam-Flag: NO
Here's the same message run through spamassassin from the command line in debug mode:

Code:
spamassassin -t -D < /root/tmp/spam-example 

Content analysis details: (6.8 points, 2.0 required)

pts rule name description
---- ---------------------- --------------------------------------------------
1.7 URIBL_BLACK Contains an URL listed in the URIBL blacklist
[URIs: fcagahaujeqaraf.tk]
1.6 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
[URIs: fcagahaujeqaraf.tk]
3.5 BAYES_99 BODY: Bayes spam probability is 99 to 100%
[score: 1.0000]
0.0 SINGLE_HEADER_3K A single header contains 3K-4K characters
-0.0 T_RP_MATCHES_RCVD Envelope sender domain matches handover relay
domain
0.0 HTML_MESSAGE BODY: HTML included in message

So when run through manually, the Bayes probability is very high - but as processed normally by exim, it gets a negative Bayes score!

Ideas? Thanks.
 

shacker23

Well-Known Member
Feb 20, 2005
263
1
168
In another test, with a different message, I get 0.6 points after processing normally, but 14.7 points if I run it through SA manually.

It really feels as if they're looking at two totally separate Bayes dbs.
 

shacker23

Well-Known Member
Feb 20, 2005
263
1
168
I finally got to the bottom of this - running sa-learn as root only trains Bayes for root's mail! The admin needs to run sa-learn commands "as" other users in order for it to work. I've written up a blog post with detailed notes on spam/ham training for WHM administrators:

/http://birdhouse.org/blog/2012/12/22/spam-training-on-cpanel-for-desktop-mail-clients/

I hope someone finds it useful!