Breidbart Index

From Wikipedia, the free encyclopedia

The Breidbart Index, developed by Seth Breidbart[1], provides a measure of severity of newsgroup spam. The Breidbart Index is calculated over a 45-day window, and takes into account the number of newsgroups to which a message is posted. It is defined as the sum over each copy of the message of the square root of the number of newsgroups to which that copy is cross posted, which may be stated as \mbox{BI} = \sum_{k=1}^m \sqrt{n_k} for m messages, these being considered the same if they are substantively identical, cross-posted to n newsgroups.

For the Big 8 and alt.* hierarchies, it is generally agreed[2] that messages are cancellable spam when the Breidbart Index exceeds 20, at which point they can be auto-cancelled from news servers. Other hierarchies have their own rules; many (smaller, local ones) are much more restrictive.

Example: If two copies of a posting are made, one to 9 groups, and one to 16, the BI value is \sqrt{9} + \sqrt{16} = 3 + 4 = 7. This is not cancellable spam according to the above criterion, and a single message would have to be cross-posted to more than 400 groups to be considered as cancellable spam; it is beyond the capacity of most news servers to achieve this.

A more aggressive criterion, Breidbart Index Version 2, has been proposed[3]. This is calculated as \mbox{BI2} = \sum_{k=1}^m (n_k + \sqrt{n_k})/2.

In the above example, this would evaluate as (9 + \sqrt{9} + 16 + \sqrt{16})/2 = (9 + 3 + 16 + 4)/2 = 16, leading to a lower threshold for auto-cancellation. A single message would only need to be crossposted to 34 newsgroups to breach this standard.

[edit] References

Languages