SpamAssassin

You are in section: Home > IT > Anti Spam

SpamAssissin

Update August 2003
Having used SAproxy for a while, I have stopped using it. I now use K9. Although this is not quite as configurable, the Bayesian analysis and learning works automatically and reliably - I have never yet managed to get this working properly with SAproxy (it is fine with SpamAssassin) nor have I managed to get the Internet based blacklists to work without hanging the application. The lack of these two facilities means that I constantly have to tweak the SAproxy rules and that is a game I can never win, fun for a very short while! See my page on K9 for more details about this tool.
Note though that I still use SpamAssassin on my web server and in that context it works very well indeed.

SpamAssassin is an open source (free) tool for helping to manage spam email. Not only is it a tool set in its own right (for use under Linux), it is also embedded into a number of other tools such as the excellent (and free) SAproxy which is a Windows POP3 proxy.

This page is a record of the settings I personally use with SA both on this web/email server and for my work email client (MS Outlook). It is a bit of a brain dump I'm afraid (as is too much of this site) so the information gets a bit chaotic towards the end. If I find that people other than just me are looking at the page, I will tidy it up.

Although this page may tend to put you off SA and SAproxy, it shouldn't! The basic setup for SAproxy is very easy and will more than suffice for most people. Sadly, I'm an inveterate tinkerer hence the details here.

Please see the Anti Spam page for general details on dealing with spam.

As well as the SpamAssassin.org web site, there is quite a good write up of what SA is all about here on the IBM Developerworks site.

Server SpamAssassin Settings

# Need a score of 5 or more required_hits 5 # Change the subject line on spam rewrite_subject 1 subject_tag [SPAM _HITS_] # Limit email languages to english and german ok_languages en de # Limit email locales (charsets) to western only ok_locales en # Tweak the scores if required (Note the 2 decimal places, doesnt seem to work with one) # Reduce score on Outlook MUA Forged as it sometimes gets it wrong (normally 3.5) score FORGED_MUA_OUTLOOK 3.00 # Increase score for attached executables as I dont generally get these score MICROSOFT_EXECUTABLE 5.00 # Force mail from following domains to be accepted whitelist_from *serifsoftware.com *moneyextra.net *winxpnews.com *woodyswatch.com whitelist_from *thegadgetstore.com *neptune.svr2-speedyservers.com *dennisnet.co.uk whitelist_from *palmdigitalmedia.com *wrdc.com *wrdclogsys.com *googlealert.com whitelist_from *developershed.com *sony-europe.com *spywareinfo.com *freshmeat.net

Note the fair number of white-listed entries. I've found that spamassassin seems to give a larger number of false positives than the pure baysean based tools. I believe this is partly due to the fact that as spamassassin is running on a web host, I don't get the chance to update the baysean learning. It is also due to large numbers of positives from public blackhole lists.

If this were for an organisation I would probably increase the required hits setting to ensure that there was even less chance of false positives.

Client SAproxy Settings

In Outlook I have the POP3 server set to "localhost" and the main id set to "username:serverIP" as opposed to just "username". Note though that I have had to use Outlook's advanced settings to independently set the outgoing SMTP user id and password as this client connects to a Microsoft Exchange server which requires a log-in to send email.

Whilst playing with these settings, I've noticed that SA is significantly more accurate than even the new Outlook 2003 junk filters even on default settings and without the network and Bayesian learning tests switch on. With the Outlooks junk filters I have been fighting a loosing battle trying to keep up with spam domains as these seem to change faster than my underwear! Now I don't need to do this and I am getting fewer false positives too. However, SA still misses a few at present so I have left Outlook's own filters on while I continue to tweak the settings.

# SpamAssassin user preferences file. ################################################################ # How many hits before a mail is considered spam. # required_hits 5 # Add indicative text to spam subject headings subject_tag ** POSSIBLE SPAM ** (_HITS_) # Allow mislabeled stuff whitelist_from *W2Knews.com # Force acknowledgement from some domains that SA missed blacklist_from *advizemark*.com *aoforu.com *.ew01.com *pogobuzz.com *ptofgld.com *pogstation.com *fundetective.com *amazdir.com # Tweak the scores if required # Note the 2 decimal places, doesnt seem to work with one # If invisible text is included then certainly spam or the mail is rubbish anyway! score HTML_FONT_INVISIBLE 2.00 # Hmm, these are only ever spam for me so bump up the scores score IMPOTENCE 4.00 score PENIS_ENLARGE 4.00 score PENIS_ENLARGE2 4.00 # No JS thanks! score HTML_JAVASCRIPT 1.00 score JAVASCRIPT_URI 1.00 score HTML_EVENT_UNSAFE 1.00 score HTML_WIN_OPEN 1.00 score HTML_WIN_BLUR 1.00 score HTML_EVENT 1.00 # New rules # kill stupid pogo* mails uri JK_POGO_URI /pgodir|ptogd\.com/i describe JK_POGO_URI Persistant spam domain (pogostation,etc) score JK_POGO_URI 20.00 # kill stupid azod* mails uri JK_AZOD_URI /azod1\.com/i describe JK_AZOD_URI Persistant spam domain (azod1.com) score JK_AZOD_URI 20.00 # kill mails with oin= in a url uri JK_OIN_URI /oin\=/i describe JK_OIN_URI oin= in url (typical spam url) score JK_OIN_URI 2.00

Note the black and white lists which override the default rules. As you can see, I've only had one domain generating false positives, this is due largely to the format of their HTML mailings and I could avoid the problem by switching to plain text emails if I wanted to. The mails from this domain were generating scores of less than 6.

I certainly would not auto-delete emails using these settings unless the score exceeded 15 and in fact I am using Outlook filters to flag scores in excess of 15 to see if I can auto delete in the future. A very safe score for auto-deletes is 100 or above as this requires an entry in a blacklist.

Here are the other settings:


Note that I am not yet using the network checks (valid DNS, etc) nor the Baysean learning checks. The former seems particularly prone to breaking the flow of emails and the later seems to need a lot more setup. It doesn't seem to be much of an issue though as SA is doing very well indeed without them. Hopefully these settings will improve over time.


Here I am saying that only English and German languages are valid rather than the default of every language.

Note that the next tab is for tweaking the rules. I have found that you do really need to edit the rules here rather than direct in the text configuration file - that seems to break SAproxy fairly easily.

Also remember that you need to shut down and re-start SAproxy to pick up new/changed rules.

General Comments

The default required hit level for SA is 5. Whilst this is fine for personal use, it is rather too aggressive if you are running a server for an organisation (it may generate some false positives). In that case it is better to keep the server at a higher level and allow users to put more aggressive filters in place if required.

SA by default will always add some x-headers to emails showing the hit number so most email clients can filter on this:

X-Spam-Status: No, hits=0.1 required=5.0 tests=AWL version=2.52-cvs X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.52-cvs (1.174.2.7-2003-03-20-exp)

The above example is from an email that is not marked as spam, the following is marked:

X-Spam-Flag: YES X-Spam-Status: Yes, hits=11.4 required=5.0 tests=CALL_FREE,CLICK_BELOW,DATE_IN_FUTURE_03_06,EXCUSE_1, EXCUSE_19,EXCUSE_3,EXCUSE_7,HTML_80_90,HTML_FONT_BIG, HTML_FONT_COLOR_BLUE,HTML_IMAGE_ONLY_04, HTML_LINK_CLICK_HERE,HTML_MESSAGE,HTML_TAG_EXISTS_TBODY, HTML_WEB_BUGS,MIME_HEADER_CTYPE_ONLY,MIME_HTML_ONLY, NORMAL_HTTP_TO_IP,OFFER,OFFERS_ETC,ONLY_COST,RECEIVE_OFFER, SUBJ_FOR_ONLY version=2.52-cvs X-Spam-Level: *********** X-Spam-Checker-Version: SpamAssassin 2.52-cvs 1.174.2.7-2003-03-20-exp

The spam level header shows an asterix for every hit. You can use this with Outlook's filters say to delete spam with a score > 100 or at least to flag it.

When an email is marked as spam, the actual mail is encapsulated into a mime attachment and a report appears detailing the tests and scores. All of these default settings are configurable.


This is an example of a very simple filter in Outlook 2003 (previous versions are vaguely similar). It simply moves any email with "*** SPAM ***" in the header to the Junk email folder. The next example shows setting a flag (in Outlook 2003) if the hit level is 15 or more (this could, of course, be set to delete instead):
<img src="saoljunkfilter2.gif">

SAproxy, being based on PERL (it contains compiled PERL code) is rather a memory hog, you may need to be aware if this if you are running on a machine with marginal memory.

If you look in the installation directory, you may realise that some files seem to be missing. Worry not, look wherever your user files are kept (e.g. c:\Documents and Settings\<user>\ on XP) and you will find a "spamassassin" folder, the configuration and log files are there.

One other thing that may not be obvious is that you can add your own rules! If you look through the full SpamAssassin documentation, you will see that you can create rules against headers, body text and URIs. SAproxy supports all of these though you will need to be able to write PERL compatible regular expressions.

Check out the SAproxy forum on SourceForge for lots of help and information.

None-Local network tests require a DNS to be available. Unfortunately SA only checks for a valid, reachable DNS when the configuration is loaded. On a laptop this will never be the case if SAproxy is loaded on start-up. You can, however, disable the DNS check by including "dns_available yes" in the configuration settings (see above) and setting an environment variable "RES_NAMESERVERS" to the name of an accessible DNS. This part of SAproxy seems to be very buggy under Windows - you've been warned.

If you want the Bayesian learning capabilities, there is a lot more work to do! There are threads on the forum if you need to know how to do this. Note that turning this on seems to bump the memory requirements up to 10-20MB (yes MB).

You need to reload SAproxy after making changes to the settings files.

An alternative to SAproxy is Pop3proxy but this is much more complex to set up and requires PERL to be installed.

When playing with the settings, you will probably find that SAproxy is rather sensitive and often stops working. You can sometimes find out what the problem is by looking at the log file which is in the spamassassin folder in your user settings. But then again, often you can't and it's time to start again so keep a copy of your rules and settings. If you make an error in the rules settings in particular, either only the rules up to that point are run or you cannot reliably get email at all.

You may find that Outlook will seem to hang fairly regularly on certain emails, if this happens, try increasing the POP3 "Server Timeout" value (advanced settings for the POP3 account), I have mine set to 3 minutes and this seems to have stopped the issues (it doesn't take 3 minutes to download an email by the way)

Final comment: SA is powerful, complex and ... addictive! Don't let EITHER spam OR anti-spam rule your life! And don't let this page put you off, the basics are easy and the default settings will give you most of what you need.

Pages:

  • K9
  • Spamassassin
Valid HTML 4.01 iconValid CSS icon
© Copyright Julian Knight, July 2008 All rights reserved.
Page: Updated 2008-07-10 08:50:08, Author Julian Knight