DSPAM

From Wikipedia, the free encyclopedia

This article or section is written like an advertisement.
Please help rewrite this article from a neutral point of view per Wikipedia policy.
Mark blatant advertising for speedy deletion with {{db-spam}}. (help, talk)

This article or section does not adequately cite its references or sources.
Please help improve this article by adding citations to reliable sources. (help, get involved!)
Any material not supported by sources may be challenged and removed at any time. This article has been tagged since March 2007.

An editor has expressed a concern that the subject of the article does not satisfy the notability guideline or one of the following guidelines for inclusion on Wikipedia: Biographies, Books, Companies, Fiction, Music, Neologisms, Numbers, Web content, or several proposals for new guidelines.

If you are familiar with the subject matter, please expand the article to establish its notability. The best way to address this concern is to reference published, third-party sources about the subject. If notability cannot be established, the article is more likely to be considered for deletion, per Wikipedia:Guide to deletion. (See also Wikipedia:Notability)
This article has been tagged since March 2007.

DSPAM

Latest release:	3.6.8 / June 07, 2006
OS:	Unix
Use:	Email spam filter
License:	GPL
Website:	dspam.nuclearelephant.com

DSPAM is a statistical spam filter for MTA's written by Jonathan A. Zdziarski.

It is a scalable, open-source content-based spam filter designed for multi-user enterprise systems. DSPAM is MTA independent, and can integrate with many different types of unix-based email systems. DSPAM is an adaptive filter which makes it capable of learning and adapting to each user's email. Instead of working off of a list of "rules" to identify spam, DSPAM's probabilistic engine examines the content of each message and learns what type of content the user believes to be desirable.

[edit] Engine

libdspam is the core engine used in DSPAM, and can be linked to third party applications in accordance with the GNU General Public License for drop-in spam filtering in a project. libdspam contains the core routines for performing email (or document) classification and optionally storage backend support.

[edit] Accuracy

DSPAM author Jonathan Zdziarski claims that typical users of DSPAM have reported between 99.5 - 99.95% accuracy, however this varies from user to user. The highest claimed level of accuracy is 99.991%, followed by 99.987%. Accuracy refers to the percentage of messages correctly classified. At the independent spam filter test performed at TREC 2005 the best-performing DSPAM configuration had the following missclassification rates: Spam misclassification rate 1.88%, Ham misclassification rate 0.58%. These results were significantly worse than the leading filters in the test and far below stated levels of accuracy.

Training is necessary to any adaptive filter, and therefore DSPAM must be trained to learn what a user believes is and isn't spam. Alternatively, a single training database can be established for a large group to avoid the need for each user to participate. DSPAM can also accommodate a hybrid of this called "merged groups", which uses a central, community database for initial decision making and then stores diff information against it as each user's personal mail is learned.