Talk:Bloom filter

From Wikipedia, the free encyclopedia

I propose changing insert to add and test membership to query. I think add better matches what actually happens and query is just more concise. --Pfunk42 22:42, 1 Jun 2005 (UTC)

Contents

[edit] Discoverer?

I propose changing discoverer in "...named after its discoverer Burton H. Bloom," to inventor. Algorithms like this aren't 'discovered' so much as invented.

I tried "conceived by" whoever in year whatever, a pattern I've seen elsewhere. Deco 01:21, 5 Jun 2005 (UTC)

[edit] Removing from a Bloom filter

It feels like you can't. This should probably be emphasized in the article, as this is a distinction from similar data structures. It's mentioned in the intro, but then never discussed...I don't have the technical expertise to say definitively how to handle this, so I can't write it up myself. -- Metahacker 15:32, 21 Jun 2005 (UTC)

Good idea. I added a short explanation. You think it'll do? Deco 19:46, 21 Jun 2005 (UTC)

[edit] Adding and removing in a single-bit Bloom filter

You could implement a voting algorithm in a single bit Bloom filter to allow both adds and deletes, at the cost of admitting false negatives, with an add setting k hash-bits, and a delete clearing k hash bits. The difference from an ordinary Bloom filter would be that the decision threshold would be less than requiring all k bits to be set for a rejection: for example, say, k/2, for say k=20. In this way, although both false-positive and false-negative results would be possible, the threshold factor could be tuned to give low error rates for both types of error.

Has anyone done this? Or are multi-bit Bloom filters simply superior? Inquiring minds weant to know. -- The Anome 13:54, September 9, 2005 (UTC)

[edit] Adding and removing in a counting Bloom filter

Riffing on The Anome's suggestion above: You could also implement a "counting" Bloom filter, by using b bits per hash instead of only one. For example, using b=2, you'd replace Bloom's array of m bits with an array of m 2-bit counters. Adding an element would increment all k relevant counters; if any counter was already at 3, it would "stick" there instead of wrapping around. Therefore, a count of 0 represents "absence," and a count of 1 or 2 indicates 1 or 2 entries sharing that counter. A count of 3 indicates at least three entries sharing that counter.

To remove an element, you'd decrement all k relevant counters, except for those counters that contained 3 — such a counter you'd leave at 3, since to decrement it would falsely imply that it was shared by exactly two entries.

This kind of "counting" Bloom filter would have all the problems with false positives of a normal Bloom filter, but it would allow a certain number of removals to proceed correctly. Add enough elements, though, and removals would stop working — you'd be unable to remove any element all of whose counters had maxed out.

I suspect that the added complexity of implementation wouldn't make a "counting" Bloom filter worth the trouble, but it's worth a mention on this Talk page, at least. :) --Quuxplusone, 19:06, 18 March 2006 (UTC)

[edit] Bloomier filter insertion

The bloomier filter insertion description is wrong:

"Now that we have the structure and a search algorithm, we also need to know how to insert new key/value pairs. The program must not attempt to insert the same key with both values. If the value is 0, insert the key into A0 and then test if the key is in B0. If so, this is a false positive for B0, and the key must also be inserted into A1 recursively in the same manner. If we reach the last level, we simply insert it. When the value is 1, the operation is similar but with A and B reversed.".

So any key in A0 that is a false positive in B0 must be present in A1 (and so on). But a key inserted in A0 might BECOME a false positive key in B0 after other insertions in B0. Imagine we insert a key in A0 which is "almost present" in B0 - only one of the k bits is not set; the key is not (yet) a false positive in B0. But after this, maybe we insert a key in B0 that also sets that kth bit to 1; now the initial key is in A0, it is a false positive in B0, and it is not in A1 - violating the data structure invariants. --Raduberinde 08:09, 28 July 2006 (UTC)