DSPAM
From Wikipedia, the free encyclopedia
DSPAM | |
Latest release: | 3.6.8 / June 07, 2006 |
---|---|
OS: | Unix |
Use: | Email spam filter |
License: | GPL |
Website: | dspam.nuclearelephant.com |
DSPAM is a statistical spam filter for MTA's written by Jonathan A. Zdziarski.
It is a scalable, open-source content-based spam filter designed for multi-user enterprise systems. DSPAM is MTA independent, and can integrate with many different types of unix-based email systems. DSPAM is an adaptive filter which makes it capable of learning and adapting to each user's email. Instead of working off of a list of "rules" to identify spam, DSPAM's probabilistic engine examines the content of each message and learns what type of content the user believes to be desirable.
[edit] Engine
libdspam is the core engine used in DSPAM, and can be linked to third party applications in accordance with the GNU General Public License for drop-in spam filtering in a project. libdspam contains the core routines for performing email (or document) classification and optionally storage backend support.
[edit] Accuracy
DSPAM author Jonathan Zdziarski claims that typical users of DSPAM have reported between 99.5 - 99.95% accuracy, however this varies from user to user. The highest claimed level of accuracy is 99.991%, followed by 99.987%. Accuracy refers to the percentage of messages correctly classified. At the independent spam filter test performed at TREC 2005 the best-performing DSPAM configuration had the following missclassification rates: Spam misclassification rate 1.88%, Ham misclassification rate 0.58%. These results were significantly worse than the leading filters in the test and far below stated levels of accuracy.
Training is necessary to any adaptive filter, and therefore DSPAM must be trained to learn what a user believes is and isn't spam. Alternatively, a single training database can be established for a large group to avoid the need for each user to participate. DSPAM can also accommodate a hybrid of this called "merged groups", which uses a central, community database for initial decision making and then stores diff information against it as each user's personal mail is learned.