Bogofilter

Bogofilter is a mail filter that classifies e-mail as spam or ham (non-spam) by a statistical analysis of the message's header and content (body). The program is able to learn from the user's classifications and corrections. It was originally written by Eric S. Raymond after he read Paul Graham's article A Plan for Spam and is now maintained together with a group of contributors by David Relson, Matthias Andree and Greg Louis.

The statistical technique used is known as Bayesian filtering. Bogofilter's primary algorithm uses the f(w) parameter and the Fisher inverse chi-square technique that he describes.

Bogofilter may be run by a MDA or mail client to classify messages as they are delivered to recipient mailboxes, or be used by a MTA to classify messages as they are received from the sending SMTP server. Bogofilter examines tokens in the message body and header, and refers to wordlists stored by BerkeleyDB, SQLite or QDBM to calculate a probability score that a new message is spam. Bogofilter provides processing for plain text and HTML and supports reading multi-part MIME message including base64, quoted-printable, and uuencoded text or HTML. Bogofilter ignores non-text attachments, such as images.

It is possible to tune Bogofilter's statistical algorithms by modifying various coefficients and other settings in its configuration file, or by using the automated bogotune utility included with the software, which attempts to optimise various coefficients to maximise filtering efficiency for a particular corpus of spam and non-spam.

Standard tests at TREC 2005 show that Bogofilter compares well to its competitors spambayes, CRM114 and DSPAM. Other competitors include, but are not limited to Spamprobe and QSF.

Bogofilter is written in C, and runs on Linux, FreeBSD, NetBSD, OpenBSD, Solaris, Mac OS X, HP-UX, AIX and other platforms and is released under the GNU GPL.

See also

This article, or an earlier revision of it, was edited from bogofilter's homepage.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.