SpamAssassin |
|
SpamAssissinUpdate August 2003 SpamAssassin is an open source (free) tool for helping to manage spam email. Not only is it a tool set in its own right (for use under Linux), it is also embedded into a number of other tools such as the excellent (and free) SAproxy which is a Windows POP3 proxy. This page is a record of the settings I personally use with SA both on this web/email server and for my work email client (MS Outlook). It is a bit of a brain dump I'm afraid (as is too much of this site) so the information gets a bit chaotic towards the end. If I find that people other than just me are looking at the page, I will tidy it up. Although this page may tend to put you off SA and SAproxy, it shouldn't! The basic setup for SAproxy is very easy and will more than suffice for most people. Sadly, I'm an inveterate tinkerer hence the details here. Please see the Anti Spam page for general details on dealing with spam. As well as the SpamAssassin.org web site, there is quite a good write up of what SA is all about here on the IBM Developerworks site. Server SpamAssassin Settings# Need a score of 5 or more required_hits 5 # Change the subject line on spam rewrite_subject 1 subject_tag [SPAM _HITS_] # Limit email languages to english and german ok_languages en de # Limit email locales (charsets) to western only ok_locales en # Tweak the scores if required (Note the 2 decimal places, doesnt seem to work with one) # Reduce score on Outlook MUA Forged as it sometimes gets it wrong (normally 3.5) score FORGED_MUA_OUTLOOK 3.00 # Increase score for attached executables as I dont generally get these score MICROSOFT_EXECUTABLE 5.00 # Force mail from following domains to be accepted whitelist_from *serifsoftware.com *moneyextra.net *winxpnews.com *woodyswatch.com whitelist_from *thegadgetstore.com *neptune.svr2-speedyservers.com *dennisnet.co.uk whitelist_from *palmdigitalmedia.com *wrdc.com *wrdclogsys.com *googlealert.com whitelist_from *developershed.com *sony-europe.com *spywareinfo.com *freshmeat.net Note the fair number of white-listed entries. I've found that spamassassin seems to give a larger number of false positives than the pure baysean based tools. I believe this is partly due to the fact that as spamassassin is running on a web host, I don't get the chance to update the baysean learning. It is also due to large numbers of positives from public blackhole lists. If this were for an organisation I would probably increase the required hits setting to ensure that there was even less chance of false positives. Client SAproxy SettingsIn Outlook I have the POP3 server set to "localhost" and the main id set to "username:serverIP" as opposed to just "username". Note though that I have had to use Outlook's advanced settings to independently set the outgoing SMTP user id and password as this client connects to a Microsoft Exchange server which requires a log-in to send email. Whilst playing with these settings, I've noticed that SA is significantly more accurate than even the new Outlook 2003 junk filters even on default settings and without the network and Bayesian learning tests switch on. With the Outlooks junk filters I have been fighting a loosing battle trying to keep up with spam domains as these seem to change faster than my underwear! Now I don't need to do this and I am getting fewer false positives too. However, SA still misses a few at present so I have left Outlook's own filters on while I continue to tweak the settings. # SpamAssassin user preferences file. ################################################################ # How many hits before a mail is considered spam. # required_hits 5 # Add indicative text to spam subject headings subject_tag ** POSSIBLE SPAM ** (_HITS_) # Allow mislabeled stuff whitelist_from *W2Knews.com # Force acknowledgement from some domains that SA missed blacklist_from *advizemark*.com *aoforu.com *.ew01.com *pogobuzz.com *ptofgld.com *pogstation.com *fundetective.com *amazdir.com # Tweak the scores if required # Note the 2 decimal places, doesnt seem to work with one # If invisible text is included then certainly spam or the mail is rubbish anyway! score HTML_FONT_INVISIBLE 2.00 # Hmm, these are only ever spam for me so bump up the scores score IMPOTENCE 4.00 score PENIS_ENLARGE 4.00 score PENIS_ENLARGE2 4.00 # No JS thanks! score HTML_JAVASCRIPT 1.00 score JAVASCRIPT_URI 1.00 score HTML_EVENT_UNSAFE 1.00 score HTML_WIN_OPEN 1.00 score HTML_WIN_BLUR 1.00 score HTML_EVENT 1.00 # New rules # kill stupid pogo* mails uri JK_POGO_URI /pgodir|ptogd\.com/i describe JK_POGO_URI Persistant spam domain (pogostation,etc) score JK_POGO_URI 20.00 # kill stupid azod* mails uri JK_AZOD_URI /azod1\.com/i describe JK_AZOD_URI Persistant spam domain (azod1.com) score JK_AZOD_URI 20.00 # kill mails with oin= in a url uri JK_OIN_URI /oin\=/i describe JK_OIN_URI oin= in url (typical spam url) score JK_OIN_URI 2.00 Note the black and white lists which override the default rules. As you can see, I've only had one domain generating false positives, this is due largely to the format of their HTML mailings and I could avoid the problem by switching to plain text emails if I wanted to. The mails from this domain were generating scores of less than 6. I certainly would not auto-delete emails using these settings unless the score exceeded 15 and in fact I am using Outlook filters to flag scores in excess of 15 to see if I can auto delete in the future. A very safe score for auto-deletes is 100 or above as this requires an entry in a blacklist. Here are the other settings:
Note that the next tab is for tweaking the rules. I have found that you do really need to edit the rules here rather than direct in the text configuration file - that seems to break SAproxy fairly easily. Also remember that you need to shut down and re-start SAproxy to pick up new/changed rules. General CommentsThe default required hit level for SA is 5. Whilst this is fine for personal use, it is rather too aggressive if you are running a server for an organisation (it may generate some false positives). In that case it is better to keep the server at a higher level and allow users to put more aggressive filters in place if required. SA by default will always add some x-headers to emails showing the hit number so most email clients can filter on this: X-Spam-Status: No, hits=0.1 required=5.0 tests=AWL version=2.52-cvs X-Spam-Level: X-Spam-Checker-Version: SpamAssassin 2.52-cvs (1.174.2.7-2003-03-20-exp) The above example is from an email that is not marked as spam, the following is marked: X-Spam-Flag: YES X-Spam-Status: Yes, hits=11.4 required=5.0 tests=CALL_FREE,CLICK_BELOW,DATE_IN_FUTURE_03_06,EXCUSE_1, EXCUSE_19,EXCUSE_3,EXCUSE_7,HTML_80_90,HTML_FONT_BIG, HTML_FONT_COLOR_BLUE,HTML_IMAGE_ONLY_04, HTML_LINK_CLICK_HERE,HTML_MESSAGE,HTML_TAG_EXISTS_TBODY, HTML_WEB_BUGS,MIME_HEADER_CTYPE_ONLY,MIME_HTML_ONLY, NORMAL_HTTP_TO_IP,OFFER,OFFERS_ETC,ONLY_COST,RECEIVE_OFFER, SUBJ_FOR_ONLY version=2.52-cvs X-Spam-Level: *********** X-Spam-Checker-Version: SpamAssassin 2.52-cvs 1.174.2.7-2003-03-20-exp The spam level header shows an asterix for every hit. You can use this with Outlook's filters say to delete spam with a score > 100 or at least to flag it. When an email is marked as spam, the actual mail is encapsulated into a mime attachment and a report appears detailing the tests and scores. All of these default settings are configurable.
SAproxy, being based on PERL (it contains compiled PERL code) is rather a memory hog, you may need to be aware if this if you are running on a machine with marginal memory. If you look in the installation directory, you may realise that some files seem to be missing. Worry not, look wherever your user files are kept (e.g. c:\Documents and Settings\<user>\ on XP) and you will find a "spamassassin" folder, the configuration and log files are there. One other thing that may not be obvious is that you can add your own rules! If you look through the full SpamAssassin documentation, you will see that you can create rules against headers, body text and URIs. SAproxy supports all of these though you will need to be able to write PERL compatible regular expressions. Check out the SAproxy forum on SourceForge for lots of help and information. None-Local network tests require a DNS to be available. Unfortunately SA only checks for a valid, reachable DNS when the configuration is loaded. On a laptop this will never be the case if SAproxy is loaded on start-up. You can, however, disable the DNS check by including "dns_available yes" in the configuration settings (see above) and setting an environment variable "RES_NAMESERVERS" to the name of an accessible DNS. This part of SAproxy seems to be very buggy under Windows - you've been warned. If you want the Bayesian learning capabilities, there is a lot more work to do! There are threads on the forum if you need to know how to do this. Note that turning this on seems to bump the memory requirements up to 10-20MB (yes MB). You need to reload SAproxy after making changes to the settings files. An alternative to SAproxy is Pop3proxy but this is much more complex to set up and requires PERL to be installed. When playing with the settings, you will probably find that SAproxy is rather sensitive and often stops working. You can sometimes find out what the problem is by looking at the log file which is in the spamassassin folder in your user settings. But then again, often you can't and it's time to start again so keep a copy of your rules and settings. If you make an error in the rules settings in particular, either only the rules up to that point are run or you cannot reliably get email at all. You may find that Outlook will seem to hang fairly regularly on certain emails, if this happens, try increasing the POP3 "Server Timeout" value (advanced settings for the POP3 account), I have mine set to 3 minutes and this seems to have stopped the issues (it doesn't take 3 minutes to download an email by the way) Final comment: SA is powerful, complex and ... addictive! Don't let EITHER spam OR anti-spam rule your life! And don't let this page put you off, the basics are easy and the default settings will give you most of what you need. | |
![]() ![]() |
Page: Updated 2008-07-10 08:50:08, Author Julian Knight |