Talk:Clustering illusion

From Wikipedia, the free encyclopedia

Socrates This article is within the scope of the WikiProject Philosophy, which collaborates on articles related to philosophy. To participate, you can edit this article or visit the project page for more details.
??? This article has not yet received a rating on the quality scale.
Mid This article has been rated as mid-importance on the importance scale.

Contents

[edit] SAT Reference?

The claim that SAT answers are intentionally declustered should have an attribution.

has this been proven?

In another example, Londoners during World War II developed elaborate theories on the impacts of Nazi V-2 rocket attacks on the city. Dividing up the city in certain ways seemed to produce clusters of bombings that were believed to be intentional. In fact there was no way the V-2 rockets could have been so precise, and any clustering was due solely to random variation.

Lazarus666 07:26, 4 Sep 2004 (UTC)

The example is from the referenced Gilovich book, page 19. Is there any specific part you have doubts about?

--Taak 23:14, 5 Sep 2004 (UTC)

The following quote is shamelessly borrowed, without permission - from V2ROCKET.COM, make of it what you will.

Several factors come into play for the "modest" number of V-2s Antwerp suffered each day, but the main reasons were the German bottleneck in their alcohol and liquid-oxygen supply and the enormous dispersion of the still imperfect weapon. Antwerp would probably have suffered more direct impacts if the Germans would have equipped all of their units with the Leitstrahl remote guidance apparatis instead of just the single SS 500 Batterie.

From other information on the same site one may notice that the Leitstrahl remote guidance apparatus equipped V-2 rockets had the ability to strike a target within 250 meters, even at a 250 kilometer range, whereas the less accurate version had a typical dispersion at the target of 4 to 11 km.

Lazarus666 18:47, 11 Sep 2004 (UTC)

[edit] Erroneous example?

User:80.175.217.179 removed the following, calling it an erroneous example in the edit summary:

Believing that a date and time with an obvious pattern (e.g., 01:02:03 04/05/06) is rarer (i.e., "won't ever happen again") than one without an obvious pattern (e.g., 07:03:34 10/24/06).

I see why one may find this erroneous, as nothing is really clustering here, but the intro defines the illusion like this:

the clustering illusion refers to the natural human tendency to associate some meaning to certain types of patterns which must inevitably appear in any large enough data set

- and that seems applicable.--Niels Ø 07:45, 7 May 2006 (UTC)

I'm the one who added the example, and I agree with Niels Ø. While the name of the illusion is clustering, the description encompasses various kinds of "pattern illusions". So could User:80.175.217.179 please give a brief explanation of why the example is erroneous. Thanks. --Nick 15:06, 12 June 2006 (UTC)


[edit] Theory v. Practice

                   "Consider the sequence "XXOXOXOOOXOXOOOXOX"; is it random?"

YES, i CONSIDERED ! As a practical gambler, I am obsessed with my own theory, that: "all sequences are random sequences if we define some limit.."; Because of that reason, I made a comparison between the given sequence and my database derived from diligent note taking and analyzing gambling outcomes. I wouch for the validity and correctness of all of my data,(they are archived), however a few errors could have crept in. Presently I have a corresponding RANDOM SEQUENCE experienced in the Casino wich match the given one in sixteen (16) places. I suppose, cannot calculate, that even this matching has an extraordinary low probability. (Moreover, the latter day I experienced a series of actions which matched the random binary result eleven or twelve times. Interesting!) When I am ready with my present tasks, I'll return to my search and if I found the matching sequence, I will post the date when it occured and the exact data sequence which caused it. Till then, I hope that my fate allow me to reach the level of knowledge necessary to understand your teaching. Yours with thanks 144.139.11.122 02:51, 21 June 2007 (UTC)

Your suggestion that "all sequences are random sequences if we define some limit" is not quite accurate. Of course, it depends on what definition of randomness you choose. If we use the standard definition that "outcomes cannot be predicted in advance", then it is certainly not true, since many sequences are generated deterministically. If you are using the definition of information theory, that a sequence is random if it cannot be described more concisely than simply by reading it out, then by definition some sequences are not random since otherwise the definition would be meaningless. For example, "00000000000000000000000000000000000000000000000000" can more easily be described as "50 0s". Robin S 16:40, 18 September 2007 (UTC)


I beg to differ! I am sorry of not being able to determine my point(s) using axiomatic mathematical language, but my deliberate avoidance of formal mathematics led to my conclusion - which could be falsified as any philosophical thought. Regarding your fifty zeroes example : I am sure that in the UNIVERSE, there once was, or there will be a random sequence which corresponds to your sample. Moreover, with my very limited knowledge, I assume, that the first combination of 37 Roulette outcomes in 37 repeated trials with replacement shall be : 00000000000000000000000000000000000, that is 37 zeroes, which shall be part of the all available outcomes (37 on the power of 37, which is the number of possible events. Bewersdorff, Luck, Logic and White Lies Ch14, p90). Regarding the "definition" - it is exactly what it says : definition. By human being, to go around in the UNIVERSE, until they lost their way. Then : We will find a new "definition" of which transformations I saw quite a few in my life. Yours 121.210.9.3 21:04, 4 December 2007 (UTC)

[edit] Sequence of Prime Numbers isn't Random?

Another way of looking at Robin's argument (above) is to say, "Is there a way to 'compress' the data?" Obviously the statement "write fifty zeros" is more "compressed" than "00000000000000000000000000000000000000000000000000", just as "write the first 20 natural numbers" is more compressed than "0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19". So one way of saying a sequence is random is to say "there is no more compressed way to describe this sequence."

But this is exactly why I believe the article is incorrect to say that "OXXOXOXOOOXOXOOOXOX" is non-random. Simply attaching meaning to the sequence doesn't mean it's not random. Saying "an X stands for a prime number" doesn't predict when the next X will occur; in fact, it is well known that the distribution of prime numbers is (in one sense) random.

One might say, "But you can 'compress' the sequence by saying, 'X stands for a prime number.'" But this ignores the fact that calculating the next prime number takes more informations that simply stating what the next prime number is. The algorithm for creating fifty zeros or twenty natural numbers is much shorter than a prime computing function.

So I believe the statement about "OXXOXOXOOOXOXOOOXOX" being non-random should be remove or re-written. --KSnortum 21:57, 2 December 2007 (UTC)

This is truly tricky. If you write down 10 random dice throws, is it still random? It's what you actually got (which is now in the unalterable past); it is what is written on your paper; the probability that the first number is a six is either 0 or 1 (either/or). But if you then phone a friend and ask him about the first number, he'd say the probability is 1/6. So randomness is about what you know. If you know for a fact that OXXOXOXOOOXOXOOOXOX is defined by the prime sequence, it's not random; you'd even be able to generate the next letter with certainty. If you don't know (and haven't noticed the possibility), it is as random as your ten dice throws, and odds for the next letter being X are close to 50% (or might it even be a Z?!?). If you don't know but have hypothesised the connection to the prime sequence, it's something in between random and non-random. You might (quite subjectively, of course) estimate the probability of the next letter being X much below 50% - say if you had to make a bet on it.
And what does this mean in terms of improving our article? I'm afraid I don't really know.--Niels Ø (noe) 08:42, 3 December 2007 (UTC)
What I was trying to say about compression is said better here.
And I still think that just because you can calculate the next number in a sequence doesn't make it non-random. The digits in the decimal approximation of π are certainly random, yet they can be calculated. Taken to the extreme, you would never be able to produce random numbers, because you would always have to "know" how to calculate the next one! --KSnortum (talk) 04:35, 6 December 2007 (UTC)
You could use a physical noise signal to generate true random numbers. Numbers from a formula are pseudorandom numbers, at best. - If you ask someone to give you 15 random digits and they said 314159265358979 - wouldn't you say they'd been cheating?--Niels Ø (noe) (talk) 17:06, 6 December 2007 (UTC)

[edit] Patterns where none exist?

The intro says: "see patterns where actually none exist". I say: if one can see a pattern, then it definitely exists. SJ2571 (talk) 11:36, 30 January 2008 (UTC)

Clarification: I'm talking about seeing patterns literally. SJ2571 (talk) 11:37, 30 January 2008 (UTC)

[edit] Rewrite

I'm rewriting this article as it has lost sight of the original definition of the clustering illusion. "The natural human tendency to 'see patterns where actually none exist'" is an overgeneralization from its more specific meaning, Gilovich's defines it as "the intuition that random events such as coin flips should alternate between heads and tails more than they do... Random distributions seem to us to have too many clusters or streaks of consecutive outcomes of the same type, and so we have difficulty accepting their true origins." As this person defines it: "The observation that people frequently view random distributions, for example, sequences of coin-tosses, as seeming to have too many clusters or ‘streaks’ of consecutive outcomes of the same type" [1].

Much of the deleted content would be better placed in apophenia or pareidolia.


Since, according to a branch of mathematics known as Ramsey Theory, complete mathematical disorder in any physical system is an impossibility, it may be more correct to state, however, that the clustering illusion refers to the natural human tendency to associate some meaning to certain types of patterns which must inevitably appear in any large enough data set.

I'm not 100% sure what the above means, but the clustering illusion refers to seeing "clusters" or "streaks" in typical and most small samples of random data, not patterns that will eventually appear in a large amount of data.


Whether or not patterns exist in a data set can often be decided by means of statistical analysis, or even methods of computational cryptanalysis. The sequence "OXXOXOXOOOXOXOOOXOX" may appear random to most viewers, but if the position of the X's are associated with prime numbers, and the O's with composite numbers, the pattern is clearly non-random. Data compression algorithms are designed, in a sense, to "look for patterns" in data, and to create alternative representations from which it is possible to reconstruct the original data from a compressed form. Large datasets which contain "clusters" of a non-random nature can in general be expected to compress well, given the right encoding algorithm. On the other hand, if there is no real clustering, or pattern, in a particular data set, then one would expect it to compress poorly, if at all.

The clustering illusion is not about any kind of pattern, it's about streaks or clusters specifically, and in small sets of data, not large ones.

--Taak (talk) 00:23, 29 April 2008 (UTC)