Understanding Apache SpamAssassin's bayes filtering

hm2k

Well-Known Member
Jul 19, 2005
93
0
156
I've recently been pestered by a user complaining about their space being used up...

Upon further investigation I find that their "/home/<user>/.spamassassin" directory is taking up most of the space...

Code:
[email protected] [/home/<user>/.spamassassin]# ls -alh
total 291M
drwx------   2 <user> <user>  12K Aug 17 10:42 ./
drwx--x--x  20 <user> <user> 4.0K Aug 14 14:24 ../
-rw-------   1 <user> <user> 4.0K Aug  7 08:58 __db.bayes_toks.expire10017
-rw-------   1 <user> <user>    0 Aug  9 19:39 __db.bayes_toks.expire11928
-rw-------   1 <user> <user>    0 Aug  7 06:30 __db.bayes_toks.expire13341
-rw-------   1 <user> <user>    0 Aug  7 06:22 __db.bayes_toks.expire13343
-rw-------   1 <user> <user>    0 Aug 15 10:12 __db.bayes_toks.expire14224
-rw-------   1 <user> <user>    0 Aug 14 19:01 __db.bayes_toks.expire1492
-rw-------   1 <user> <user> 4.0K Aug 16 18:33 __db.bayes_toks.expire15519
-rw-------   1 <user> <user>    0 Aug  6 21:42 __db.bayes_toks.expire16051
-rw-------   1 <user> <user>    0 Aug 15 10:42 __db.bayes_toks.expire1638
-rw-------   1 <user> <user>    0 Aug  5 19:28 __db.bayes_toks.expire18180
-rw-------   1 <user> <user>    0 Aug  7 11:04 __db.bayes_toks.expire2010
-rw-------   1 <user> <user>    0 Aug  6 21:45 __db.bayes_toks.expire21733
..crop..
-rw-------   1 <user> <user>    0 Aug 10 10:39 __db.bayes_toks.expire3738
-rw-------   1 <user> <user>    0 Aug 14 18:09 __db.bayes_toks.expire5448
-rw-------   1 <user> <user> 4.0K Aug  9 16:40 __db.bayes_toks.expire5849
-rw-------   1 <user> <user>    0 Aug  4 07:52 __db.bayes_toks.expire5981
-rw-------   1 <user> <user>    0 Aug  4 07:36 __db.bayes_toks.expire5982
-rw-------   1 <user> <user> 4.0K Aug 15 07:52 __db.bayes_toks.expire7186
-rw-------   1 <user> <user>    0 Aug  8 23:46 __db.bayes_toks.expire7784
-rw-------   1 <user> <user>    0 Aug  6 12:26 __db.bayes_toks.expire9659
-rw-------   1 <user> <user> 9.9M Aug 17 09:46 auto-whitelist
-rw-------   1 <user> <user>    0 Aug 17 10:35 bayes.lock
-rw-------   1 <user> <user>  20K Aug 17 10:41 bayes_journal
-rw-------   1 <user> <user> 2.5M Aug 17 10:35 bayes_seen
-rw-------   1 <user> <user>  20M Aug 17 10:35 bayes_toks
-rw-------   1 <user> <user> 4.3M Aug 12 12:39 bayes_toks.expire10001
-rw-------   1 <user> <user> 2.1M Aug 12 11:03 bayes_toks.expire10113
-rw-------   1 <user> <user> 2.1M Aug  8 21:22 bayes_toks.expire10183
-rw-------   1 <user> <user> 4.3M Aug 10 03:57 bayes_toks.expire11708
-rw-------   1 <user> <user> 520K Aug 10 04:39 bayes_toks.expire11710
-rw-------   1 <user> <user> 9.3M Aug 12 08:48 bayes_toks.expire12138
-rw-------   1 <user> <user>  72K Aug  8 10:05 bayes_toks.expire1361
-rw-------   1 <user> <user> 148K Jul 19  2006 bayes_toks.expire1367
-rw-------   1 <user> <user> 2.6M Sep 14  2006 bayes_toks.expire13765
-rw-------   1 <user> <user> 2.2M Sep 12  2006 bayes_toks.expire13825
-rw-------   1 <user> <user> 8.5M Aug 12 08:25 bayes_toks.expire14207
-rw-------   1 <user> <user>  12K Jul 29 18:22 bayes_toks.expire1430
-rw-------   1 <user> <user>  72K Aug 10 12:51 bayes_toks.expire14335
-rw-------   1 <user> <user> 4.2M Aug 13 22:00 bayes_toks.expire1438
-rw-------   1 <user> <user> 268K Aug 14 21:44 bayes_toks.expire1493
-rw-------   1 <user> <user> 4.2M Aug 13 14:09 bayes_toks.expire1502
-rw-------   1 <user> <user> 5.1M Aug 14  2006 bayes_toks.expire15528
-rw-------   1 <user> <user>  48K Jul 29 20:52 bayes_toks.expire1567
-rw-------   1 <user> <user> 9.6M Jan 26  2007 bayes_toks.expire15712
-rw-------   1 <user> <user> 4.3M Aug 11 02:38 bayes_toks.expire15781
-rw-------   1 <user> <user> 9.5M Aug 12 04:44 bayes_toks.expire16054
-rw-------   1 <user> <user> 9.0M Aug 13 06:45 bayes_toks.expire1607
-rw-------   1 <user> <user> 1.1M Aug 13 02:01 bayes_toks.expire1608
-rw-------   1 <user> <user> 2.4M Sep 12  2006 bayes_toks.expire16086
-rw-------   1 <user> <user>  72K Aug  7 18:38 bayes_toks.expire1610
-rw-------   1 <user> <user> 8.6M Aug 11 00:50 bayes_toks.expire1626
-rw-------   1 <user> <user> 4.2M Aug 12 20:24 bayes_toks.expire1646
-rw-------   1 <user> <user> 4.2M Aug 14 06:39 bayes_toks.expire1654
-rw-------   1 <user> <user> 2.1M Aug 12 00:12 bayes_toks.expire1666
..crop..
-rw-------   1 <user> <user> 9.2M Aug 14 03:14 bayes_toks.expire5705
-rw-------   1 <user> <user> 2.3M Sep 12  2006 bayes_toks.expire5786
-rw-------   1 <user> <user> 2.2M Sep 12  2006 bayes_toks.expire5787
-rw-------   1 <user> <user> 2.2M Aug  9 17:36 bayes_toks.expire5848
-rw-------   1 <user> <user> 136K Aug  7 16:04 bayes_toks.expire5916
-rw-------   1 <user> <user>  72K Jul 29 19:20 bayes_toks.expire6116
-rw-------   1 <user> <user> 1.1M Sep 12  2006 bayes_toks.expire6252
-rw-------   1 <user> <user> 4.3M Aug 11 16:25 bayes_toks.expire7599
-rw-------   1 <user> <user> 8.9M Aug  6 22:07 bayes_toks.expire7627
-rw-------   1 <user> <user> 520K Sep 13  2006 bayes_toks.expire7870
-rw-------   1 <user> <user> 520K Sep 13  2006 bayes_toks.expire7871
-rw-------   1 <user> <user> 4.3M Aug 11 14:38 bayes_toks.expire7930
-rw-------   1 <user> <user> 3.0M Sep 14  2006 bayes_toks.expire8115
-rw-------   1 <user> <user> 2.3M Sep 12  2006 bayes_toks.expire8517
-rw-------   1 <user> <user> 2.4M Sep  9  2006 bayes_toks.expire9277
-rw-------   1 <user> <user> 1.1M Aug  8 22:03 bayes_toks.expire9377
-rw-------   1 <user> <user> 4.3M Aug 11 11:10 bayes_toks.expire9380
-rw-------   1 <user> <user> 2.9M Aug  9  2006 bayes_toks.expire9467
-rw-------   1 <user> <user> 1.1M Aug 14  2006 bayes_toks.expire958
-rw-------   1 <user> <user> 2.1M Aug 14 15:20 bayes_toks.expire9709
-rw-------   1 <user> <user> 2.2M Aug  6 11:24 bayes_toks.expire9982
-rw-r--r--   1 <user> <user> 1.5K Jan 30  2006 user_prefs
As you can see, it aint pretty...

I had a quick read of this... http://wiki.apache.org/spamassassin/BayesFaq

I'm now overwhelmed with information on SA's Bayes Filter, yet I have no idea exactly what these files are for, how this works, why they are needed, how to deal with them or anything.

How should I handle these files so they are not taking up so much space?

What is the actual purpose of these files?

If I remove the bayes_toks* files, how will this impact this user?
 

ryan-fah

Member
Dec 20, 2006
10
0
151
You should be ok to delete all of the files in that directory except for user_prefs

I have had problems with spamd crashing or taking up a lot of resources in the past, then removed those files and seen no problems.

I'm not sure what those files contain, but I suspect they log what has been rejected, accepted and any other possibly useful info for dealing with spam.

Edit: Just another thought on this, you could backup the directory yourself, then delete all but user_prefs and see how it goes. If there are any noticable problems then just upload the files you backed up.

Those files should automagically recreate themselves.
 
Last edited: