This is done with sa-learn. Information on its use can be found here:
sa-learn - train SpamAssassin's Bayesian classifier
There are two primary types of training you can apply Supervised and Unsupervised (bayes_auto_learn)
Supervised learning
This means keeping a copy of all or most of your mail, separated into spam and ham piles, and periodically re-training using those. It produces the best results, but requires more work from you, the user.
(An easy way to do this, by the way, is to create a new folder for 'deleted' messages, and instead of deleting them from other folders, simply move them in there instead. Then keep all spam in a separate folder and never delete it. As long as you remember to move misclassified mails into the correct folder set, it is easy enough to keep up to date.)
Unsupervised learning from SpamAssassin rules
Also called 'auto-learning' in SpamAssassin. Based on statistical analysis of the SpamAssassin success rates, we can automatically train the Bayesian database with a certain degree of confidence that our training data is accurate.
It should be supplemented with some supervised training in addition, if possible.
This is the default, but can be turned off by setting the SpamAssassin configuration parameter bayes_auto_learn to 0.
- One great way to implement this (that's pretty user-friendly is to set up the following:
- In each mail account that will be participating in AutoLearn Create two folders Spam and Ham
- Have your users move mail that is Spam or Ham to these folders
- Create two cron jobs that run sa-learn to those folders so:
sa-learn --ham
sa-learn --spam
The Definition
of HAM can be found here