Talk:Scunthorpe Problem

From Wikipedia, the free encyclopedia

Perhaps link to Medireview, since it is the same idea? - Jax

First paragraph could use some serious work toward conciseness. - IstvanWolf 04:06, 20 May 2006 (UTC)

Bullets 1 and 3 are not Scunthorpe problems, they're simple word filtering. Bullet 2 should be made clear that it was the substring, not just the "word" Allah. The external link cited says that "Kallahar" and "Callahan" were blocked, so #2, once edited, will be a substring/Scunthorpe problem. The prose section is good but I think it would make sense to mention how rarely the short list of (American) obsceninties are substrings of inoffensive (American) words. The problem disproportionately affects proper nouns and compounded or elided words (such as domain names or login IDs). —Preceding unsigned comment added by 63.251.87.214 (talkcontribs) 19:15, August 3, 2006

There are some fair points here. In its strictest sense, the phrase Scunthorpe Problem refers to substrings of letters within a word. However, I added bullets 1 and 3 because they are also examples of computers failing to show the sort of common sense interpretation of language that humans take for granted. While this may not please the purists, it could be argued that these fall within the general definition of the term Scunthorpe Problem since they are all examples of computer obscenity filters doing silly things. I'll also have a look at the first paragraph. --Ianmacm 06:27, 4 August 2006 (UTC)

Contents

[edit] UK bias?

From reading this article, you'd almost think this sort of thing only happens in the U. K. But I think the real question is, does anyone call it this outside of the U. K.? — The Storm Surfer 08:29, 4 May 2007 (UTC)

This is an interesting point. Although most pieces of computer jargon were coined in the USA, the term Scunthorpe Problem comes from the UK and seems to have stuck. There is a piece about the Scunthorpe Problem on CNET at [1] which mentions the now famous problems that the people of Scunthorpe had in accessing the internet back in 1996. To the best of my knowledge, the term Scunthorpe Problem is still the best-known way of describing overenthusiastic computer obscenity filters.--Ianmacm 20:49, 4 May 2007 (UTC)

[edit] Text cleanup

Irina Slutskaya can be banned by the Scunthorpe Problem
Irina Slutskaya can be banned by the Scunthorpe Problem

The article has been given a slight rewrite and an image added showing the comical results when an attempt was made to do a Google search on Irina Slutskaya using a public library computer. The specific claim about the Salon.com message boards and Cialis was removed, because while it was made in good faith and may well be true, it lacks a citation.--Ianmacm 15:14, 15 May 2007 (UTC)

To expand slightly, the program that banned Irina Slutskaya was RM SafetyNet Plus [2], a popular piece of software used to filter internet content on public computers. I first came across bizarre behaviour by this program when it blocked a search on Procter and Gamble. After scratching my head for a while, I realised that it was blocking the word gamble, and it also blocked searches on the the word casino. This is word filtering rather than a strict example of the Scunthorpe Problem, but it shows how computers can be tripped up in this area. RM SafetyNet Plus allows searches on the word dick, but blocks searches on the word cock.

The origin of the phrase The Scunthorpe Problem may be traced back to the article Google's chastity belt too tight on CNET news in April 2004 [3]. This contains a paragraph entitled The Scunthorpe problem, and may have helped to popularise the phrase, if not actually inventing it. It is fair to say that typing the phrase Scunthorpe problem into a search engine usually brings up either the Wikipedia or the CNET article, and there is not a great deal else to read.

Nowadays most computers have learned not to block words like Sussex or Scunthorpe, but there is still room for mistaken decisions, as this article shows.--Ianmacm 18:28, 17 May 2007 (UTC)

[edit] LiveJournal

I removed this from the article pending further research:

* In 2007, the filtering of the racial slur spic from the LiveJournal search engine prohibited users from searching for spicy food, the Spice Girls, or hospice.

This appears to be based on a blog entry at [4]. Intrigued by this, and looking for more information, I created an account at LiveJournal and did a search on the Spice Girls and some of the other "banned" terms. Although they did not return results, the search engine did not warn that the terms were banned on the grounds of taste and decency. Some clear confirmation is needed here, or there is the possibility of an urban legend slipping into the article. Further comments on this are welcome. --♦IanMacM♦ (talk to me) 11:26, 9 December 2007 (UTC)

We don't have a source - people have noticed that some terms have become unsearchable and have been experimenting with it. There does seem to be hitlist of terms based on sex, ethnic insults and some nazi-related stuff. However there are no real sources for this stuff - it's not published outside of LJ and isn't officially admitted to as far as I know. This is too recent to make WP I think, and needs to be picked up by someone else first. Secretlondon (talk) 16:47, 9 December 2007 (UTC)
I don't think it's required that the term be blocked out of reasons of taste or decency - it still counts as an example of filtering based on substring, which ends up catching longer innocent terms. But yes, I agree it's probably best to leave out for now - I think entries should really have some 3rd party source, otherwise it ends up as original research. Mdwh (talk) 17:32, 9 December 2007 (UTC)

More updated "hit list" is at [5] (by same author as original blog entry; slightly cleaned-up language; more accurate list.) Confirmation of code ignoring some interest searches and returning an error message is at [6] LiveJournal staff admits the situation exists, but admission is buried in discussion threads at the official announcement communities. Waiting on more detailed official comment or explanation, which has been promised. Elfwreck (talk) 18:42, 9 December 2007 (UTC)

Thanks for the feedback. This is an interesting situation, and if true it would be well worth mentioning in the article. As the other contributors have pointed out, the situation with the evidence at the moment is anecdotal and could be seen as original research. No mainstream media outlet has picked up on this yet, and the staff at LiveJournal have not formally confirmed that the list of banned terms exists. For these reasons it is right to remain cautious at the moment, although the information could be included at a later date. --♦IanMacM♦ (talk to me) 19:38, 9 December 2007 (UTC)

Datapoints: Livejournal added the interest-search-blocking code mid 2007: the changes are visible in the <a href="http://community.livejournal.com/changelog/5260013.html">changelog here</a>. I have extensively tested elfwreck's findings (I have no previous connection to elfwreck) and blogged about them <a href="http://viv.id.au/blog/?p=1205"here</a>. —Preceding unsigned comment added by Waawa (talk • contribs) 02:22, 10 December 2007 (UTC)

We still can't use blogs as a source. this has a comment by a LJ engineer. She says implemented end Oct and only changed significantly once. However this is still very recent, not officially admitted to, with no reliable sources. Secretlondon (talk) 05:12, 10 December 2007 (UTC)

Update: it appears that LJ has now said as much as they're going to say at this stage. A Staff member representing LJ has updated here admitting to the search interest blocking, and explicitly refusing to elaborate. Further here, in reference to the fale-positive substring blocking issue: "Unfortunately, we can't clarify the problem which caused the blocked search-terms. Over time we hope to eliminate terms and substring matches, but it won't be something we *can* comment on." —Preceding unsigned comment added by Waawa (talk • contribs) 02:27, 20 December 2007 (UTC)

I'm a LiveJournal user who's taken an interest in this, and at the moment I believe that the situation is as follows. Most of the blocked terms are still blocked. Matters are slightly confused by the fact that a different interest-"censoring" problem has come to light (and, I think, been fixed) in the meantime, but the blocks we're discussing here are still in place. As far as I know, there has been no further staff comment at all, even on the lines of "we can't say anything" or "we're under legal constraints" or whatever. Most pertinently to us here on WP, also as far as I know there has been no coverage at all of the issue in anything we could use as a reliable source. I did wonder whether The Register might pick up on it, even if only to snigger, but no, not a peep. (And yes, "snigger" is blocked as well.) Loganberry (Talk) 16:00, 28 March 2008 (UTC)

[edit] Cumming

I removed this from the article:

  • Genealogists researching the surname Cumming have found that their correspondence has been blocked.

This was done partly because it lacked a citation, and partly because it repeated the point about magna cum laude being blocked. --♦IanMacM♦ (talk to me) 10:19, 21 January 2008 (UTC)

A quick update: RM SafetyNet Plus [7] managed to block both magna cum laude and Cumming, in addition to Irina Slutskaya as mentioned in the article. This program is an excellent example of the problems caused by enforcing rigid rules about perceived "obscene" strings of letters. --♦IanMacM♦ (talk to me) 09:02, 23 January 2008 (UTC)

[edit] The opposite of the Scunthorpe Problem

On the other hand, some deviant Net users purposefully bypass a swear filter, creating the opposite of the Scunthorpe problem.

[edit] Examples

  • For the F-word: "phuc," "phuck," or "phukk."
  • For the A-word: "Azzholl," "asholl," "ashol," or "azzhoe."
  • For the B-word: "Bytch," "Bithc," "Bittch," or "Bicch."
  • For the S-word: "Sith," "siht," "shyte," or "5hi+."

Other examples apply, especially those in leetspeak. (Added 10 April 2008 by 220.55.128.75)

fsck is interesting too, and is also a minced oath. -- 201.9.109.30 (talk) 23:02, 3 May 2008 (UTC)