DSPAM
From Wikipedia, the free encyclopedia
DSPAM is a statistical spam filter for MTA's written by Jonathan A. Zdziarski.
It is a scalable, open-source content-based spam filter designed for multi-user enterprise systems. DSPAM is MTA independent, and can integrate with many different types of unix-based email systems. DSPAM is an adaptive filter which makes it capable of learning and adapting to each user's email. Instead of working off of a list of "rules" to identify spam, DSPAM's probabilistic engine examines the content of each message and learns what type of content the user believes to be desirable.
libdspam is the core engine used in DSPAM, and can be linked to third party applications in accordance with the Gnu Public License for drop-in spam filtering in a project. libdspam contains the core routines for performing email (or document) classification and optionally storage backend support.
Accuracy. DSPAM author Jonathan Zdziarski claims that typical users of DSPAM have reported between 99.5 - 99.95% accuracy, however this varies from user to user. The highest claimed level of accuracy is 99.991%, followed by 99.987%. Accuracy refers to the percentage of messages correctly classified. At the independent spam filter test performed at TREC 2005 the leading DSPAM filter had the following missclassification rates: Spam misclassification rate 1.88%, Ham misclassification rate 0.58%. These results were signiificantly worse than the leading filters in the test and far below stated levels of accuracy.
Training is necessary to any adaptive filter, and therefore DSPAM must be trained to learn what a user believes is and isn't spam. Alternatively, a single training database can be established for a large group to avoid the need for each user to participate. DSPAM can also accommodate a hybrid of this called "merged groups", which uses a central, community database for initial decision making and then stores diff information against it as each user's personal mail is learned.