Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. It was originally written by Eric S. Raymond, and is now maintained together with a group of contributors by David Relson, Matthias Andree and Greg Louis.
The statistical technique used is known as Bayesian filtering and its use for spam was first described by researchers at Microsoft in the paper A Bayesian Approach to Filtering Junk E-mail. Gary Robinson, in his weblog Rants, suggests some refinements for improved discrimination between spam and ham. Bogofilter's primary algorithm uses the f(w) parameter and the Fisher inverse chi-square technique that he describes.
Bogofilter is run by an MDA script to classify an incoming message as spam or ham (using wordlists stored by BerkeleyDB, SQLite3 or QDBM). Bogofilter provides processing for plain text and HTML. It supports multi-part MIME message with decoding of base64, quoted-printable, and uuencoded text and ignores attachments, such as images.
Standard tests at TREC 2005 show that Bogofilter compares well to its competitors spambayes, CRM114 and DSPAM. Other competitors include, but are not limited to Spamprobe and QSF.
Bogofilter is written in C, and runs on Linux, FreeBSD, NetBSD, OpenBSD, Solaris, Mac OS X, HP-UX, AIX and other platforms.
This article, or an earlier revision of it, was edited from bogofilter's homepage.