Email anti spam strategies and tools

You are in section: Home > IT > Anti Spam

What Is Spam?

Spam is the common term for unsolicited emails, especially unsolicited commercial email (UCE) which is the official name for spam. Spam is not only unsolicited but is generally a large waste of time and money for those receiving it. Worse still it is all too often obscene and/or fraudulent. Indeed this can easily end up causing personal problems such as the British man who lost custody of his child and nearly ended up in prison after pornography was found on his PC put there by trojan software quite likely delivered by email (2003).

Techniques for Managing and Combating Spam

The best way to combat spam is to avoid it in the first place. For most people it is not too difficult to avoid most spam though pretty impossible to avoid it all.

There are, however, times where it cannot be avoided. For example, my work email got published on a security web site where I am a contributor. Because of that, even with my company's server based spam blocking I get more spam than legitimate mail but I cannot change my work email address.

Below I will outline strategies for avoiding spam and a strategy for managing it when you have no choice.

As always on this site, this information comes from my own experience and outlines what has worked for me, your mileage may, of course, be different.

Avoiding Spam

There are two places where spammers harvest most of their addresses from:

  • Web sites
    There are several ways your email address may end up on a web site.
    • Search engines - this usually happens when your address appears on another web site that then gets indexed
    • Archived newsgroups and mailing lists
    • Personal web sites
    • Someone else publishing your address (often trying to be helpful though also often not telling you that they have done it)
  • Newsgroups and mailing lists

So to avoid the majority of spam, you will need to avoid using your real email address in these places:

  • Web sites
    Make sure that you do not ever include your actual email address on a web site either as text or as a link. If you really must show it, make sure that it is in a format that humans can read but harvesting robot software cannot, e.g. f b l o g g s AT d o m a i n DOT c o m, or similar. Alternatively you could embed the text into a bitmap.
    Better still though is to have a form that people fill out to send email though this requires that you have some form of scripting capability on the server. Note that even most "free" web space providers have simple form based email scripts that you can use for this purpose.
  • Newsgroups
    Here you need to ensure that you again use an address that cannot be harvested but that humans can read should they need to send you a reply privately. Of course, you can go one step further if you are sure that you never want to be replied to privately, in this case set your email address to something like "Fred Bloggs <fred@[127.0.0.1]>" in your mail reading software. A final alternative would be to use a throw away address, preferably from a different domain from your main one (so if your main address is fbloggs@domain.com, use news001@domain.com or better fb1234@yahoo.com or similar).
  • Mailing Lists
    Although mailing lists are normally email based they are sometimes archived on one or more web sites and unless the mailing list software removes or hides email addresses, your address may suddenly appear on the web in a form that can be harvested. So ensure that you use throw-away addresses in mailing lists and confirm with the list owner that addresses are never shown in archives

The other places where you are likely to start receiving spam is from registering on web sites and being caught out by the small print. Make sure that whenever you register on a site, read the terms and conditions carefully and select/deselect the check boxes for receiving mail (especially from "selected" third parties!) as appropriate. Even better, ask yourself why the owners of the site want so much information about you and if it is really required? If it isn't, enter dummy entries, e.g. "a.b@c.com" for email and so on. Don't let people collect unnecessary information about you.

Dealing With Spam

OK, so you can't (or don't want to) change you old address but you are getting lots of spam - what can you do about it?

Well there are lots of software tools that claim to be able to deal with spam issues and you will almost certainly need to try some of these for yourself to see what works for you. However, before you go out and spend money, start by checking out what your email package can already do and then look for some free software to augment that if needed.

So, for example, if you are using Microsoft Outlook, this has some limited junk mail filtering (somewhat improved in Outlook 2003) but this far from perfect. However, you can easily augment this, if you are connecting to your mail server using POP3, by using one of the free tools listed on this page.

SAproxy is a windows program, Linux, BSD and other UNIX workstation users can use SpamAssassin natively. Using SpamAssassin means that you do not need to program lots of filters yourself but you should be aware that it will occasionally result in a "false positive" where a legitimate email is tagged as spam. Because of this you should not automatically delete tagged mail but should scan through it occasionally before deletion. Be vary wary indeed of any tool that claims to deal with spam without giving false positives, it is impossible to prove and should not be relied on.

See my SpamAssassin/SAproxy page for the two sets of settings I use with SA (one for my one email server and one for my work email client)

POP3 Filter Tools

There are a number of tools that will work with your current email client, at least if you are using POP3 to collect your mail. I've listed a few of the MS Windows ones here though I have only tried two: SAproxy and K9.

Of these, I currently use K9 as it seems to need minimal configuration and management and has a very small overhead.

  • K9
    This is the one that I currently use. I switched from SAproxy after I found that I was playing catch-up with the rules all of the time. K9 is also small (99kB) and fast compared to some of the others; though it is perhaps not quite as configurable. After a few weeks of use I am getting around one mail every few days that K9 does not correctly identify as SPAM. It is similar to POPfile (and based on it) using bayesian analysis but unlike the other tools is written in C and is therefore smaller and quicker than the other tools (which are generally based on PERL or PYTHON scripts). It is a Windows only solution though. Use the link above to see more detail.
  • POPfile
    POPfile is similar to K9 (uses "Naive Bayesian" analysis and so needs to be trained) but is based on a cross-platform PERL based version and so is very large at 2.5MB . It also has black & white lists.
    There is a dedicated Windows version and a PERL version which works both in Windows and Linux. This is the one to use if you are on Linux or need to maintain both Windows and Linux workstations with similar settings. There is active discussion on POPfile and K9 on the grc.spam newsgroup on news.grc.com.
    Note that PopFile now has a 3rd party (but still free) utility that works natively with MS Outlook. This even works when getting emails from MS Exchange.
    PopFile also does any generic classifications so you don't just have to use it for anti-spam.
  • SAproxy
    Update: July 2004: The freeware version of this seems to have gone, it is replaced by a paid for "pro" version.
    This is based on SpamAssassin (for Linux) which is probably the most popular open source server based SPAM handling software. It is compiled PERL (a cross-platform scripting language) and so is much larger than K9 (684kB), it is also much slower. It is however, highly configurable and you can specify your own rules and change the weighting of any of the rules. Unfortunately, I have never been able to reliably get the network based rules (that look up emails in public black-lists) to work; in addition, getting the probability based rules to work is also difficult. This means that SAproxy is far less accurate than K9 without constantly tweaking the rules. SAproxy also has the tendency to hang over large emails and until these issues are sorted out, I cannot really recommend it
  • SpamBayes
    Don't know too much about this one except that it is based on a cross-platform PYTHON (another popular scripting language) version and so again is very large at 3MB.

Pages:

Valid HTML 4.01 iconValid CSS icon
© Copyright Julian Knight, July 2008 All rights reserved.
Page: Updated 2008-07-10 08:50:08, Author Julian Knight