Talk:Challenge-response spam filtering

From Wikipedia, the free encyclopedia

Hey, I just broke this out from Stopping_e-mail_abuse, so the content still needs a bit of massaging. Any help is welcome!

One of my goals for this page is to produce a compendium of all the objections to C/R along with a 1-2 sentence rebuttal of each (where there exists a sensible rebuttal). Basically a summary of the arguments we've all heard a thousand times by now.

-- Megacz 04:16, 18 April 2006 (UTC)

This page has a long way to go towards meeting Wikipedia's neutral tone guidelines. It's mostly a "why this is evil" page right now. It's hard to say where to begin to edit it.

It's a NPOV summary of both sides of a debate; I would be worried if half of the the content wasn't negative. For the record, I've added very little since moving the content here from Stopping_e-mail_abuse, and nobody was complaining about the tone when it was over there. Additions are quite welcome! Megacz 17:34, 29 April 2006 (UTC)

The page is 90% criticisms, very little else. I don't see that as NPOV or a summary of both sides. It also includes side-notes about captchas, which are fairly rare in C/R systems (and much more common in web systems for signup and posting). Not that the note about most captchas and the blind isn't true.

I have found that while everybody agrees that they don't want autoresponders (of any kind, including vacation programs, virus software, mail bouncers and mailing list opt-in confirmations) to go to forged addresses, just about everybody also agrees they would much rather get a challenge than have their mail discarded. Of course everybody would prefer it to just be delivered with no extra work but we would all like a pony too.

Bradtem, you certainly have valid concerns, and I do not dispute them. But the debate is far more complex than that, and I think that the history of the debate deserves to be documented. For the record, the page is not 90% criticisms, and many of the criticisms are criticisms only of particular misimplimentations of C/R. Distinguishing these from "universal criticisms" is one of the primary objectives of this page. Megacz 18:10, 24 May 2006 (UTC)

Excellent initiative Megacz, but let me help you by trying to make the page unbiased. In my opinion the subject (as any subject) deserves a proper introduction and explanation and only 1 section describing its controversy. And though I do not agree that the debate is that complex, I do agree that it could be valuable to document the discussion you and others conducted. But in this 'discussion' page only, not in the main page explaining 'C-R' of course. OldCar 16:44, 18 June 2007 (UTC)

1 That seems backwards (compendium of all the objections to C/R)
2 NPOV
3 requested e-mail
4 Forwarders
5 Basic Assumptions
6 Tune up
7 Flawed C/R implementations
8 C/R or C-R
9 Best practices
10 Criticisms

[edit] That seems backwards (compendium of all the objections to C/R)

I'd think that NPOV would require that the article first describe C/R systems. Later in the article criticisms and responses could appear. A good exercise for the anti-C/R folks would be to properly present the NPOV case for C/R.

Minasbeede 15:47, 26 June 2006 (UTC)

Sounds like a great idea. I myself use a C/R system, although I wrote it myself because I am against all existing systems [known to me]. So I'm not sure that I fall clearly into either of these camps. You're welcome to add the parts you suggest! Megacz 17:49, 27 June 2006 (UTC)

A number of the problems I see cited are not ones I understand to be common, such as loops between fighting C/R systems, and common challenge to properly classified mailing list mail with a Precendence header. It would be good to see citations of which systems had these bugs. --Bradtem 22:49, 28 October 2006 (UTC)

[edit] NPOV

"the inconsiderate burden their beloved C/R systems place on innocent bystanders" ?

This could be fixed by removing "inconsiderate", and "beloved", couldn't it?ConditionalZenith 10:58, 8 September 2006 (UTC)

No reponse and original complainer didn't leave details. Making changes as specified and removing POV template.ConditionalZenith 11:02, 11 September 2006 (UTC)

[edit] requested e-mail

In the comment on this edit Megacz claims replies are passed through by all sane CR systems. only UNREQUESTED emails are challenged. I agree: a "sane" C/R system would not challenge requested email, but Megacz fails to show that such C/R systems exist. If C/R systems could determine requested email, they wouldn't need to challenge. Erik Warmelink 19:11, 2 June 2007 (UTC)

My own C/R system has this property; it uses the fact^{[citation needed]} that the Message-ID of an email is propagated into the In-Reply-To and References headers of replies to that email. Moreover, the "requested" you mention refers to the challenge, not the original message. The challenge is requested by the original message. Megacz 01:18, 6 June 2007 (UTC)

[edit] Forwarders

The article says:

If the message was sent to an address which forwards to the C/R user's address, an extra message will be generated. Alternatively, one could employ C/R on the forwarding machine and then unconditionally accept messages which pass through it.

I am not convinced that the "alternative" would be a major problem of using SMTP response codes. A better alternative is using spam defences on the forwarding machines, instead of aiding a Denial-of-service attack. Erik Warmelink 20:52, 7 June 2007 (UTC)

Unfortunately, there are no "spam defences" which can be employed that are both effective and do not cause false positives (which are far more harmful than spam or blowback). Megacz 19:20, 8 June 2007 (UTC)

Since challenges look like spam (IMHO, they are spam), sending challenges does not eliminate false positives: a reasonable spam filter will reject them. Anyway, you didn't answer the question why depending on the spam defences of the forwarder (or depending on its challenges, if you insist) would be a major problem of using SMTP response codes. Erik Warmelink 20:17, 9 June 2007 (UTC)

[edit] Basic Assumptions

C/R isn't an anti-spam technique, it sends spam. It can't guarantuee zero false positives, because if the challenges get filtered (and challenges are unsolicited bulk e-mail) non-spam will be refused.

If one really believes both assumptions, one can easily remove all spam defences. That does guarantuee zero false positives (assuming unlimited bandwith and disk space) and spam was considered less harmfull than false positives anyway. Erik Warmelink 20:43, 9 June 2007 (UTC)

Those false positives are the result of decisions by other machines which fail to check the In-Reply-To and References headers. Those decisions are not decisions which are made by the C/R system, and therefore the C/R has not "falsely made a positive identification of spam". Megacz 03:49, 13 June 2007 (UTC)

Rejecting unsolicited bulk e-mail before inspecting the body isn't a failure, it is a feature. Erik Warmelink 03:26, 14 June 2007 (UTC)

You are referring to the situation in which the original sender wanted to send an email to the CR user. In that situation, a reply was solicited by the original sender. Therefore the message which was rejected was not "unsolicited". Megacz 16:13, 14 June 2007 (UTC)

A challenge isn't the kind of reply which was solicited. Anyway, whether you like it or not, challenges do get filtered. Sometimes because the machine sending them, has already been determined to be a spam source. Erik Warmelink 12:10, 16 June 2007 (UTC)

C-R systems do not send spam. The challenge that is sent might be unsolicited, but is definitely not bulk e-mail. Every incoming e-mail would receive maximum one challenging reply. OldCar 16:26, 18 June 2007 (UTC)

Challenges are e-mail. Since they are sent by automated systems and are substantively identical, they are bulk. Erik Warmelink 00:17, 6 July 2007 (UTC)

That would make all auto-responders, like the very useful 'change-of-address' notices intended to inform the sender of a message that the recipient addres he used is obsolete and that a different address should be used instead qualify as bulk e-mail. And, following the same reasoning as above, qualify as spam. In the definition for spam the word 'bulk' means that 'numerous recipients' are involved and it's obvious that 'numerous' there means more that only the recipients that send me an e-mail. I comments in the article referring to challenges as 'spam'. OldCar 17:32, 30 July 2007 (UTC)

All 'change-of-address' auto-responders written after RFC 821 (which documented the reply code 551 User not local; please try <forward-path>) are spambots. Of course that RFC was written only 25 years ago, so perhaps you have not heard of it yet. Erik Warmelink 00:18, 31 July 2007 (UTC)

Could you (or someone) please give some sound proof of that bold statement? And could you please explain then why currently the IETF is proposing RFC 3834 as a standard on 'Automatic Responses to Electronic Mail' if auto-responders would generally be considered spambots? OldCar 07:07, 31 July 2007 (UTC)

They are spambots because they are automatted systems (bots) which send unsolicited bulk e-mail (spam). RFC 3834 has a section labelled 6. Security Considerations. Erik Warmelink 20:47, 3 August 2007 (UTC)

I've removed the 'Basic Assumptions' for 2 reasons:

The article as it is was biased enough as it was (see this discussion page). Adding a section with a 'justification' for C-R systems doesn't really help.
One of the two assumptions given was nonsensical: 'False positives are more harmful than spam itself ...', the other irrelevant for an understanding of C-R.

OldCar 16:26, 18 June 2007 (UTC)

[edit] Tune up

I've tuned up some of these descriptions and added dates to the reference articles, organizing them chronologically. Some C-R applications take many steps before sending a challenge with the goal of minimizing challenges. Pjbrockmann 01:23, 18 July 2007 (UTC)

Removed offensive blog post highlight social problems with C-R. Pjbrockmann 02:39, 18 July 2007 (UTC)

[edit] Flawed C/R implementations

I propose to remove the section on 'flawed C/R implementations' to give this article a more NPOV. There will always be flawed implementations, but as long as there's one good implementation (possible) they're not very relevant to the topic of C/R systems. OldCar 17:30, 30 July 2007 (UTC)

As far as I know, a good C/R implementation is impossible. Challenges either

slow/block legitimate e-mail and bother legitimate senders (which is not good)
send challenges to spammers (which is neither good or bad)
send backscatter (which is unsolicited bulk e-mail, a crime in civilized countries). Erik Warmelink 23:48, 30 July 2007 (UTC)

Which are objections against C/R in general and could be treated in the general 'Criticisms' section for as far as they make sense. OldCar 07:22, 31 July 2007 (UTC)

Removing objections to 'flawed C/R implementations' would make the article less neutral, since all C/R implementations are flawed. To get a more NPOV, the section could be re-added to Criticisms. Erik Warmelink 20:41, 3 August 2007 (UTC)

Saying that 'all C/R implementations are flawed' is not really a NPOV in my opinion. But if no-one has any further objections I'll remove the 'flawed C/R implementations' section and I'll try to combine the paragraphs that are in it with the 'Critisisms' section. OldCar 13:07, 6 August 2007 (UTC)

I concur with OldCar. Megacz 17:50, 6 August 2007 (UTC)

[edit] C/R or C-R

What's best? 'C/R' or 'C-R' systems? If 'C/R' is better, shouldn't the topic be changed to 'Challenge/response spam filtering'? OldCar 17:37, 30 July 2007 (UTC)

IMHO, C/R is the accepted usage, but I suspect there's a problem including a slash in a title ... richi 20:49, 30 July 2007 (UTC)

[edit] Best practices

I tried to give this section a bit more NPOV and tried to 'clean it up' a little so it would read nicer. I removed the following paragraph for the reason that it is (currently) a completely impractical suggestion and probably too technical to bother an average reader with this early in the article. It might be better to swap this section with the 'Critisisms' section and let this section be some kind of 'recommendations' on (the use and implementation) of C/R systems following from the critisisms. OldCar 06:51, 1 August 2007 (UTC)

Some suggest that challenges should be issued not by creating a new message, but by placing the challenge in the SMTP session rejection-code. When the receiving mail system rejects an e-mail this way, it is the sending system that actually creates the bounce message. As a result, the bounce message will almost always be sent to the real sender, and it will be in a format and language that the sender will usually recognize. Unfortunately, this technique has two major problems:

The bounce message by the sender's MTA typically does not include the rejection code in the subject; some MTAs use a subject like "COULD NOT DELIVER MESSAGE", and bury the rejection code in a large mass of cryptic SMTP transcript at the end of the email. Novice users (and even sophisticated users) may mistake such messages for true delivery failures and give up.

If the message was sent to an address which forwards to the C/R user's address, an extra message will be generated. Alternatively, one could employ C/R on the forwarding machine and then unconditionally accept messages which pass through it. —The preceding unsigned comment was added by OldCar (talk • contribs) 06:51, 1 August 2007 (UTC).

[edit] Criticisms

I removed the 'Flawed C/R implementations' section and rewrote the 'Criticisms' section. I tried to rephrase some of the criticisms and think I succeeded in explaining some of them more clearly. But as I am not a big critic of C/R myself, I hope real critics don't think I did this to weaken the criticisms and I invite them to edit it if they think I did.

What I did do is remove some of the criticisms or move 'm to this page as in my opinion they don't belong in the article itself. I will give my reasons for that per criticism that I (re)moved:

From "Challenges sent to forged e-mail addresses" I removed "Misdirected challenges can also result in C/R systems being blacklisted on DNSBLs^[1]^[2]. This is likely to harm the deliverability of a C/R user's email."

This is not specific for C/R, but could hold for all systems possibly creating e-mail backscatter. So either we put this statement in all articles descibing such systems or someone puts it in the article about e-mail backscatter once and we refer to that article. I vote for the latter.

"Such challenges are also more likely to be filtered as spam by conventional spam filters, thus increasing the number of false positives suffered by C/R users^{[citation needed]}."

I didn't understand this and it seems to be waiting for a citation since June.

"===Discrimination against the disabled ===" was removed, also for the reason that it's not specific for C/R, but belongs in the article about CAPTCHA tests where we can refer to.

"===Challenges often look like spam or phishing e-mails === Many challenge messages look like phishing attempts since they often ask the receiver to visit an unfamiliar website or perform an equally confusing task. Other challenge messages contain advertising for the C/R system that is being used. Challenge messages that don't disclose any information about the original e-mail will often confuse the receiver of the challenge as to why they are getting it, but challenge messages that do disclose information will, in cases of forged e-mail addresses, effectively be forwarding the original spam."

I moved it here, because this could use some discussion. I think it involves features of specific C/R systems (that are not specified by the way) but don't add something that is valid for the general principal of C/R.

"===Interaction with other challenge-response systems === C/R systems can interact badly with other C/R systems. If two persons both use C/R and one emails the other, the two C/R systems can generally exhibit one of three behaviors:

They become trapped in a loop, each challenging the other, neither one willing to deliver the challenge messages -- or the original message.
The C/R systems attempt to avoid this by automatically whitelisting addresses to which mail is sent.
They may recognize the challenge as "bulk" priority and elect not to challenge it, thereby avoiding a loop, but failing to deliver the message. In this case, SMTP-level rejections can ensure that the initial sender is notified of the failure by his/her own outgoing SMTP server, thereby bypassing the C/R."

As it says: C/R systems can interact badly... It's not general for C/R and there's already a remark in the 'Best Practises' that C/R systems should comply to RFC 3834 which solves any such problem.

OldCar 08:17, 8 August 2007 (UTC)

I removed the following criticism:

"===Effectiveness===

While C/R systems could be extremely effective at eliminating spam, critics state that the criticisms mentioned here will currently still force users of C/R systems to review their challenged mail regularly, looking for wanted mail for which the sender has not responded to the challenge."

As far as I know false positives are unfortunately inherent to almost any of todays commonly used anti-spam measurements and therefore this remark doesn't add specific criticism on C/R.
It is not true that users of C/R systems are forced to review their blocked messages. Quite the contrary, but this is my personal experience/meaning: When I was still using baysian filtering, before I discovered C/R, I regularly checked (part of) my blocked messages. Now every (non-whitelisted) sender gets a very polite request with a challenge and I feel very comfortable not checking them at all. A sender might still choose not to answer to the challenge, but at least he KNOWS I didn't get it...

If there's critics who are determined that this aspect should be mentioned I suggest changing the whole 'criticisms' section to a 'pros/cons' section, as I see the effectiveness of C/R as a big plus.

OldCar (talk) 17:45, 20 November 2007 (UTC)

Agreed. That's precisely the point of CR -- that you don't have to check any sort of "spam folder" Megacz (talk) 15:21, 21 November 2007 (UTC)

I removed a blog reference as one of the citations of criticisms, but it was reverted, so I'll post my reasons, wait a couple of days, and try again. The blog article in question is self-described as a "rant". The author in question is complaining, but he does not do any analysis or critique, he's merely annoyed. He is not, apart from these few comments, a critic of challenge/response or e-mail systems in general. Finally, the inflammatory title compromises the neutrality of the article. Do editors really consider something like this site-worthy? Anyway, there is already a second citation, so on top of everything else, it's redundant. Robben (talk) 02:33, 13 January 2008 (UTC)

Agreed. I reverted it originally, but that's because I disagreed with your opinion that Jeremy isn't a WP:RS. If however we're already using that link as a cite, that's cool ... richi (talk) 12:00, 17 January 2008 (UTC)

Wait, no I disagree (I misread your last point, sorry). Jeremy is a widely-read, well-known Yahoo! employee. Yes, the laguage he uses is strong, but his sentiments and the discussion it generated in the comments are extremely germane, IMHO. The ref has been there for some time, and I'd like to see some editorial consensus for its removal first ... richi (talk) 12:15, 17 January 2008 (UTC)

OK, Mr. Zawodny may be notable, but including a personal insult as a citation makes it difficult to maintain neutrality, even if the article is balanced in pros and cons later. You do know what "blow me" means, right? He's telling users of that software to suck his ****. Yes, that is strong language, as you put it.

If you honestly feel this statement *requires* more than one citation, then how about an alternative cite? Here's a professional article from a recognized source: http://www.usatoday.com/tech/news/2003-06-05-spam-challenge-response_x.htm. It has the signature quote we're looking for, "Challenge-response is very, very unfriendly and rude to legitimate senders of e-mail." Robben (talk) 12:19, 23 January 2008 (UTC)