Talk:Sieve of Eratosthenes
From Wikipedia, the free encyclopedia
Contents |
[edit] Comments
Apparently:
KOΣKINON EPATOΣΘENOϒΣ. or, The Sieve of Eratosthenes. Being an Account of His Method of Finding All the Prime Numbers, by the Rev. Samuel Horsley, F. R. S. Samuel Horsley Philosophical Transactions (1683-1775), Vol. 62. (1772), pp. 327-347.
Is the paper that popularized this algorithm, if anyone wants to fill in historical details.
--Imran 21:15 13 Jun 2003 (UTC)
If a person wants fully working code, they can check code repositories. If a person wants performance hacks, they can check programmer's journals. Wikipedia is neither a code repository nor a programmer's journal.
Pseudocode is used on Wikipedia as a last resort to explain an algorithm, because most programming languages are not readable by common encyclopedia readers. There is also no excuse for repeating the fairly understandable explanation of the algorithm with pseudocode. Feel free, however, to insert external links to either. For an example, see Sieve of Atkin, which, while it does not have a good explanation yet, does have an external link to working, optimized code. — 131.230.74.194 21:28, 11 July 2005 (UTC)
I added a more simplified pseudocode -- ba 24.57.157.81 06:52, 5 March 2007 (UTC)
[edit] "A Reduced Redundant Exclusions (RRE) Prime Sieve"
The main reason for the inefficiency of the Sieve of Eratosthenes is simply that it spends much time excluding non-primes which have already been excluded. This is not so apparent in the range 1- 100 where it is usually demonstrated. But a quick check of prime densities: 25 upto 100; 168 upto 1,000; 78,498 upto 1,000,000 and 50,847,534 upto 1,000,000,000 makes it obvious that a prime sieve that reduces the number of these redundant exclusions will go much faster. I was unable to check the Atkins Sieve due to it's complexity but a simple variation of the Sieve of Eratosthenes will vastly reduce redundant exclusions if not eliminate them altogether:
Step 1 Sequentially write down the integers from 2 to m
Step2 Cross out all numbers > 2; which are products of 2 and numbers in the table >=2
Step 3 Find the smallest remaining number >2; it is 3
Step 4 Cross out all numbers > 3; which are products of 3 and un-crossed out numbers, in the table, >=3
Step 5 Find the smallest remaining number >3; it is 5
Step 6 Cross out all numbers > 5; which are products of 5 and un-crossed out numbers, in the table, >=5
Step 7 Continue while " the smallest remaining number" squared is =< m
—Preceding unsigned comment added by Rhnmcl (talk • contribs)
- I'm sorry, but I don't see any difference with the algorithm in the article, except for redundancy in your description (all numbers >n are >=n). Qwertyus 12:46, 17 June 2006 (UTC)
Consider step 4 (for example); " the algorythm in the article " will cross out { 6,9,12,15,18, etc ); where 6,12,18 are redundant exclusions....they have already been crossed out ! Whereas step 4 (if read as I intend) will cross out only ( 9, 15, 21, 27, 33 etc} —Preceding unsigned comment added by Rhnmcl (talk • contribs)
- The article already states: The crossing-off of multiples can be started at the square of the number, as lower multiples have already been crossed out in previous steps. This saves a lot of time, particularly for large numbers. Qwertyus 15:30, 19 June 2006 (UTC)
A good first step is to disregard all multiples of two: that is, the sieve represents only odd numbers. This is easy enough, because the pattern is just as simple as for consecutive numbers, you just have to be careful ablout the starting point for striking out every third, fifth, seventh, etc. Having the sieve elements not represent multiples of three as well as more difficult, and matters worsen if even more primes are not represented. Possibly this is what the fancier method involves for a certain collection of primes to be skipped. Another ploy is to choose the seive length to be the product of the first few primes, say N, with the intention of producing a batch of primes over 0 to N - 1, then start again with N to 2N - 1 for a second batch, and so on for however many batches you wish (since in actual computers, sieve arrays cannot be of indefinite length). The point of this is that the initial pattern of the sieve is the same and so does not have to be prepared more than once. But one can do better...
- The above ploy works well if the sieve length is the biggest primorial that fits in fast memory. For PCs with i86 processors, 16-bit intrasegment addressing is fastest and accomodates a sieve size 17# = 2.3.5.7.11.13.17 = 510510, letting each data bit represent an odd number.Cuddlyable3 22:43, 9 September 2007 (UTC)
[edit] No redundant exclusions
Nevertheless, it is irksome to sieve away and strike out elements that have previously been struck out when seiving with a smaller prime. This can be handled. So far, prime production is via division (test factorisation), and addition (seiving). Now comes prime production via multiplication, relying on the fundamental theorem of arithmetic that every number has a unique prime factorisation. The plan therefore is to generate all products of primes, and strike out the corresponding element of the seive: there will be one strike only on each composite number. Imagine a sieve of some large size. Start a process say "Breed" with two; it generates all powers of two and naturally, strikes them from the sieve up to the limit of the sieve. Further, for each power of two (being 1 to whatever) it takes that number and commands a Breed with that and three (the next higher prime) and five, and seven, and so forth until the multiple exceeds the seive length. Likewise start a basic Breed with three to generate all powers of three and further with those powers and the next primes up, five, seven, etc. And likewise a basic Breed with five, and so on, until the first level would start beyond the length of the seive. In this scheme it is helpful to schedule matters so that the smaller composite numbers are generated first so that the needed additional primes for the process can be extracted from the seive as it is being adjusted.
As for running speeds, the administration is much more complex. Clearly the multiplicative process requires time slightly less than linear with respect to the highest number N, being one action for every composite number, of which there are N less the number of primes up to N, or N - (N/Ln(N)) approximately, whereas the additive sieving method requires one pass for every prime less than the square root of N (very slightly less), each pass of which requires N/prime(i) actions, that is N/2 actions for 2, N/3 for 3 and so on; the summation of 1/p(i) for i = 1 to n, such that p(n + 1)**2 > N. The summation of 1/p(i) has been studied (by Hardie, I recall), but I lack the reference just now. Whichever, the action count is proportional to N*(summation of 1/p(i)), or N*(something that increases with N), which is clearly more than linear.
Supposing that multiplication and addition take about the same time (analysis of the processes say they have the same complexity, but offer little advice on methods to perform the processes), accordingly, for ever larger N, the multiplicative generation of composite numbers should eventually outrun the additive generation with redundancy. NickyMcLean 22:13, 18 June 2006 (UTC)
Apart from the computational advantaages of a "zero redundant exclusions" prime sieve; which remain to be established. There is the theoretical possibility that such a sieve could be used to establish an expression for the number of compound numbers less than n; and hense the number of primes less than n, say p; from p = n - c;regards rhnmcl
[edit] A Fully Deterministic Boolean Approach For The Sieve of Eratosthenes
--Ausples 22:46, 14 May 2007 (UTC) The following algorithm/code has runtime of Big-O(N−P). This means the range we are checking for prime numbers minus the number of primes found (there are no calculations for finding a prime number here, only true/false checks, primality was determined based on previous factors).
In computer science you can use boolean conditions to create the Sieve of Eratosthenes that runs sub-linear Big-O(N-P), by using a boolean memory location for each prime number in N, as follows:
O(100000000);
private void O(int N) { //<-- start copy.
int S = N; int p = 0; //this will always be a prime number only. Pre-Deterministic. int d = 0; //this will always be a factor of a prime number only. Deterministic. bool[] primeCheck = new bool[S]; //create a boolean array for the length to be checked for primes 0->N. for (p=2; p<S; p++) { switch(primeCheck[p]) { case true: continue; case false: { for (d=p+p; d < S; d+=p) { switch(primeCheck[d]) { case true: continue; case false: primeCheck[d] = true; break; } } //trace(p); //p will be the NEXT prime number. break; } } }
} //<-- end paste.
Functional closed circuit case.
Closed Circuit Time Testing
6/01/2006 4:19:20 PM
Scope(O): 100,000,000
Primes Found(P): 5,761,455
Memory Operations(N): 94,238,545 (Actual CPU runtime operations) (Time Complexity. Processor memory operations)
Time(Actual):
hours:0
minutes:0
seconds:10
milliseconds:843
6/01/2006 4:19:31 PM
As you can see, the code finds 5.7 million primes in 100 million possible choices in N-P operations. Always. This code means that it is possible to run the Sieve of Eratosthenes in sub-linear Big-O(N-P) operations on a computer without factoring. Modifying the "for" loops to skip 2's does not improve the time complexity, nor does it improve the speed of the actual runtime.
At first glance, it would seem this algorithm is of complexity Big-O(N^2) or simply N^2, as they are nested loops. However, the sieve logic predetermines what the computer will or will not operate on at runtime. The computer algorithm will only operate on memory once for each location and does not operate on prime locations (but may optionally output at runtime). This gives us a sub-linear Big-O(N-P) where 0->N is the range the computer will check for primes, and P is the number of prime locations visited. This algorithm is of course restricted to finding primes in the range 0->N, where N can be as large as the maximum array size allowable by the compiler of your choice. There is no math in this code. We are not factoring primes here. They are fully determined during runtime. Big-O encapsulates the time complexity of loop control and conditional structures used for any program, as they are constants.
We use control structures to determine true/false conditions in the boolean array to determine a prime location based on when the memory is checked, which are constants for Big-O notation. Since we are concerned with primes, we are concerned with location rules, just like calculus, which is always represented as a straight line, for a polynomial equation. In this case, the above code appears to be N^2, but further examination of loop structure, control structure and runtime mapping prove complexity Big-O(N-P). The rules of the Sieve of Eratosthenes can be found in this article from Wichita State University's Math and Statistics department and the National Institue of Standards and Technology, respectively ->-> [1],[2]. We determine prime locations in range 0->N by marking all non-primes using the inner loop, then visiting the next prime using the outer loop (which will be predetermined), then repeating until we reach N. This is sub-linear by virtue that P is not a constant. If P were a constant, then the time complexity would be Big-O(N). Above it is Big-O(N-P). There is no log N here. We always know that the algorithm will produce the same results each time for infinite N. There is no guess work, or modulus factoring.
Bibliography:
A1. Computer Science: An Overview by J. Glenn Brookshear, 1997 Addison Wesley publishing company. 0.3 The Evolution of Computer Science, p. 9 (Conceptual Theory)
B1. The World Treasury of Physics, Astronomy, and Mathematics: Edited by Timothy Ferris, Time Warner Book Group. Part Three: The Cosmos of Numbers::Artificial Intelligence and All That:::”The Computer and the Brain” by John Von Neuman 1958. p.483. (Full excerpt: p478-491) (Application Theory)
B2. The World Treasury of Physics, Astronomy, and Mathematics: Edited by Timothy Ferris, Time Warner Book Group. Part Three: The Cosmos of Numbers::Artificial Intelligence and All That:::”Can a Machine Think?” by Alan Turning ~1950. p.495-496. (Full excerpt: p492-519) (Application Theory)
C1. Algorithmics: The Spirit of Computing by David Harel, 1987~1996 Addison Wesley publishing company. 6.3 Order of Magnitude Improvements. p. 130-131. (Conceptual Theory)
D1. The Elegant Universe: Superstrings, Hidden Dimensions, and the Quest for the Ultimate Theory by Brian Greene, 1999 Vintage Books. The Super In Superstrings. p. 166,167 (System~Model Conceptual Theory)
Keep in mind this is a clever computer science solution, not a conventional one. --Ausples 16:04, 11 May 2007 (UTC)
- O(N-P) is not sublinear, but linear: O(N-P) = O(N - N/ln N) = O(N). -- Jitse Niesen (talk) 16:09, 11 May 2007 (UTC)
- Below comments of the form "<-- (...)" were inserted by Ausples. PrimeHunter 18:04, 11 May 2007 (UTC)
Comments by Ausples in PrimeHunter's comments removed by Ausples. Sorry about that. Won't happen again. :-) --Ausples 21:52, 14 May 2007 (UTC)
- Your code makes the assignment "primeCheck[d] = true" when the first prime factor of the composite d is found. There are N-P composites, so the instruction is performed N-P times. As Jitse Niesen notes, O(N-P) = O(N). See Big-O notation.
- But reading memory is also a memory operation, and it takes time (at least one clock cycle and usually more). So all evaluations of primeCheck[d] in your "switch(primeCheck[d])" must also be counted. The total number in the nested loop is above linear. There does not exist an implementation of the Sieve of Eratosthenes which is sublinear or linear in total memory operations or time. It can easily be modified to use sublinear memory size but that is another issue. Also note Wikipedia:No original research. Even if you think you have a sublinear algorithm, you are not allowed to add that claim to an article unless it has been supported by a reliable source. By the way, I have implemented the Sieve of Eratosthenes at least a dozen times and know it well. The Sieve of Atkin is asymptotically faster. PrimeHunter 16:37, 11 May 2007 (UTC)
The coded algorithm does not operate on prime number locations, therefore, it runs Big-O(N-P). As above. We do not know until the end of N how many exist in P, so P is not a constant. P is different for each N, therefore it's not constant. If it were, we could concede this to be Big-O(N). If it moves faster than N, but the variable P is not known until the end of runtime, the algorithm time complexity is reduced by the factor P, so it must run at Big-O(N-P), which is less than linear for some factor which equals the total number of primes found in N. How do we say that in Mathematics? Is that not sub-linear? Also, we're not concerned with the physical time results. Only that it runs algorithmically in Big-O(N-P). This also prevents overworking your processors and wear on the memory chips by not operating on them so much. :-/ --Ausples 17:24, 11 May 2007 (UTC)
- Please don't delete talk page comments by others (unless they are vandalism, or deletion is allowed by a specific Wikipedia guideline). Write your own comments below and not inside the comments of others.
- The "segmented" Sieve of Eratosthenes (you can Google it) uses sublinear memory. People often omit to say "segmented". There are often claims that the Sieve of Eratosthenes is linear in time (also in Wikipedia), but it's actually slower. However, it's so close that the difference doesn't matter in practice at the sizes computers can handle. The sieve detects each composite d for each prime factor below sqrt(d). And the average number of prime factors is log (log N) which grows very slowly.
- Apparently we have different definitions of "memory operations", but the important issue is time, and we agree that memory reads take time. Sublinear time means o(N) time. (Note the little o, it's not O(N)). See Big-O notation for definitions. PrimeHunter 18:04, 11 May 2007 (UTC)
You can test how many memory operations by incrementing a counter, once after a line of code that sets a value in memory. e.g. "x=3". or in the case above, "primeCheck[d] = true". Although, intereseting to note: Asymptotic Time Complexity-The limiting behavior of the execution time of an algorithm when the size of the problem goes to infinity. This is usually expressed in Big-O notation ->[3].
So then based on Computational_complexity_theory.
This algorithm would have following complexities:
Time Complexity: Big-O(N-P) ~Here we are using memory operations time. Conditional operators are not expressed in Big-O time complexity. That is a technological limitation, not an algorithmic one.
Note1: The "continue" statement forces the processor to move on before any operations occur.
Note2: There is no known equation that produces primes, because it's exclusively an algorithm (thus a computer science problem). This can only be accomplished through an computer algorithm, following the instructions of the algorithm designer. This is exclusively a Sieve of Eratosthenes.[4],[5]
Note 3: The algorithm starts slow (it marks the smallest prime along N before visiting the next prime) but gets faster as P gets bigger. At N/2 it has finished checking off non-primes and only visits prime locations. So log(log N) does not apply for the duration of the algorithm, if it ever did. It still gets done in N-P operations. So for N=100,000,000 it will take the time it takes the processor to operate on N-P bits of memory (for Big-O(N-P)) but there will also be constant time for the control structures (which again are not expressed in the notation.[6],[7],[8]). --Ausples 18:38, 11 May 2007 (UTC)
- Circuit complexity is not about time. Time complexity is the asymptotic total number of individual steps needed to solve a problem. It's not limited to memory writes. And as I said, it's not linear for the Sieve of Eratosthenes. It's actually O(N*log log N) for primes up to N. Counting steps or measuring time in examples with specific N can only allow guessing of the asymptotic behaviour. The algorithm must be analyzed mathematically to prove the complexity. Other qualified mathematicians have already done that, and the log log n term is real. There does not exist a constant k such that the total number of steps is below k*N for all N. This means the algorithm is not linear.
- Category:Primality tests mentions several algorithms which can prove primality without testing potential prime factors, but that is unrelated to the Sieve of Eratosthenes. PrimeHunter 22:10, 11 May 2007 (UTC)
- okay. we said above it was linear, now it's not. Which is it? And why? We need more computer science references. It seems like we're not sure, but I am. Can we find another *I* on this? Jitse says it is not sub-linear, but it is linear. Prime Hunter agrees, but no longer agrees. Anyone else? --Ausples 21:56, 14 May 2007 (UTC)
- Jitse only noted that O(N-P) = O(N) is linear and not sublinear, and I agreed. None of us claimed it was the time complexity of the Sieve of Eratosthenes. PrimeHunter 23:34, 11 May 2007 (UTC)
- Jitse only noted that O(N-P) = O(N) is linear and not sublinear, and I agreed. None of us claimed it was the time complexity of the Sieve of Eratosthenes. PrimeHunter 23:34, 11 May 2007 (UTC)
-
-
- Computer science algorithm time complexities-> [9]. Test the code; runs O(N), "running in at most N steps on inputs of length N exists." It seems like that sums it up. For computer science, this code has algorithm time complexity O(N-P) processor memory operations. Which according to the books below, is how we guage time complexity in computer science. (as above):
- Computer science algorithm time complexities-> [9]. Test the code; runs O(N), "running in at most N steps on inputs of length N exists." It seems like that sums it up. For computer science, this code has algorithm time complexity O(N-P) processor memory operations. Which according to the books below, is how we guage time complexity in computer science. (as above):
-
A1. Computer Science: An Overview by J. Glenn Brookshear, 1997 Addison Wesley publishing company. 0.3 The Evolution of Computer Science, p. 9 (Conceptual Theory)
C1. Algorithmics: The Spirit of Computing by David Harel, 1987~1996 Addison Wesley publishing company. 6.3 Order of Magnitude Improvements. p. 130-131. (Conceptual Theory)
The Sieve of Eratosthenes is an NP complete problem. [10] so far, this is the only known sieve that satisfies the computer science definition for time complexity as per references above in N. Could another computer scientist explain further? Or explain why this is not Big-O(N-P) algorithm time complexity based on computer science, not in math? This isn't just a math problem. Math just claims it, but still can't solve it in N. We want it back. This is OUR Sieve of Eratosthenes.
--Ausples 01:38, 12 May 2007 (UTC)
- I seriously doubt any of the mentioned books use your time complexity definition which does not indicate the time to run the algorithm. An algorithm with eN iterations of a loop but no memory writes would have time complexity 0 by your definition. NP-complete problems are a class of decision problems. The sieve of Eratosthenes is an algorithm and not a decision problem. It makes no sense to say it's NP-complete or even discuss it. Your reference [11] says nothing about the Sieve of Eratosthenes. PrimeHunter 02:47, 12 May 2007 (UTC)
- When you set X=3. That is one memory operation. At runtime, the algorithm sets N-P bits of memory according to Eratosthenes instruction to determine that the next location is prime. What is "0" complexity in Big-O notation is the time to check whether the space is true or false(this is part of a conditional control structure, and constant as far as expressing Big-O is concerned). It writes to memory once and only once for each memory location that does not correspond to a prime number location. You can compare against time complexity of other known solutions, by adding a counter for each time the computer operates on memory, inside the control structures, after the memory is assigned.--Ausples 21:56, 14 May 2007 (UTC)
- Checking for true or false takes time so I would not call it "free". I don't think you have a reliable source (or any source) which says that it can be ignored in time complexity. If I run a program, I care about how long it takes, and not about how the time is split between memory reads, memory writes, boolean checks, and so on. PrimeHunter 13:03, 12 May 2007 (UTC)
- The above references are college text books and fit the requirements for reliable sources. Here, Wikipedia says that PRIMES are in NP or coNP. Primality_test#Complexity. We are still not concerned with the actual time in seconds (again, technological limitation), as loop control structures and conditional structures are constant time. They factor into the time complexity as C*O(N) which makes them arbitrary in Big-O notation (see above references). (although, this algorithm's actual time so far is less than any other known sieve, probably because it's not factoring. It is determined at runtime without calculation (i.e. factoring integers... 7%7=0 makes 7 prime, etc). One could not apply the current conventional rules to this algorithm. This has little to do with math. It's all boolean logic. This algorithm follows Eratosthenes requirements listed in the 3rd paragraph here -> [12]. This also is a reliable source. So far, we find this is pretty general knowledge and known. The time complexity is correct for memory operations. The loops are constants. So, I believe that it is safe to say that the algorithm/code above has runtime complexity of C*O(N-P). We can drop the C, since C is constant, and say Big-O(N-P). ->[13],[14]. I would suggest testing the code above against other sieves for Big-O, for your personal satisfaction. One might have to detach themselves from mathematical thoughts about the Sieve of Eratosthenes as it's not really a math problem, what if we didn't use numbers? What if we used letters? He was pretty clear about the requirements, he simply used the number stream as an example for a prime sieve, and gave us instructions on how to get to each prime location one in linear time (N). The code above literally follows his instructions.[15],[16]. Most current implementations use factoring to determine primality. Math. Again, we are determining the prime factors at runtime above, as per the written user requirements. For comparison consideration, here is a square root implementation (so far, fastest published deterministic Sieve of Eratosthenes code that could be found) that has been tested against the above code. The following referenced algorithm comes from a graduate level, advanced artificial intelligence class. Read Algorithm Implementation 1 here.->[17]. This referenced Sieve of Eratosthenes is also deterministic. It uses an initialization loop, a nested loop, and another loop to display the primes. It does not report at runtime (this a polynomial solution). In the code above, we visit prime locations (and may report them) at runtime, in Big-O(N-P). Again, there are conditions. You must start at 2, and you must have memory size for 0->N. There is no factoring. Boolean logic only.
Note 1: If you want it to go faster, just add processors to reduce actual runtime to (N/number of processors). This is basic parallel computing.
--Ausples 04:28, 13 May 2007 (UTC)
-
- The only thing important for Wikipedia is whether there is a reliable source stating whether the algorithm you show runs in linear time.
- The question whether the algorithm does in reality run in linear time may be interesting, but it is not of relevance for Wikipedia. We all agree that the line primeCheck[d] = true is executed N−P times. We disagree about whether the line switch(primeCheck[d]) is free, as you claim; where in what text book did you find this? What computational model are you using? And we haven't even talked about the instruction d+=p . -- Jitse Niesen (talk) 06:23, 13 May 2007 (UTC)
- Switches are conditional control structures set up by the compiler, therefore, it does not interfere with Big-O time complexity. Second paragraph and conditional section for if/else(applies to switches) -> [18]. Again, a switch is not an operation, but a conditional control structure set up to reduce algorithm operational runtime and improve actual time results making them arbitrary for expressing the time complexity in Big-O. Switches check for true and false. Memory operations that require the computer to change memory are 1 each in complexity for the duration of the structure that controls the operation statement. We make N-P statements throughout the algorithm. Our worst case time complexity could therefore be said to be Big-O(N-P). As for d+=p: one must control the loop counter in order to produce the desired result of sieving out the next non-prime, before the location is checked by the outer loop structure, as it has been determined that "p" is a prime location based on the rules of the algorithm written by Eratosthenes. Further, using "continue" tells the compiler to move to the next location before any operation/calculation is made by the processor. This reduces the number of operations/calculations to N-P. The algorithm knows a location is prime by virtue of the fact it will be the next location it visits in the outer loop. Please ask more questions. A graph of actual test data using real time output (i.e. Sec/Millisec), will still produce a linear result. --Ausples 10:42, 13 May 2007 (UTC)
- As the link you gave says, not only statements incur a cost but expressions do too. There is a cost associated with the expression primeCheck[d] in the switch statement; O(1) according to the lecture notes. So they're not free. -- Jitse Niesen (talk) 11:03, 13 May 2007 (UTC)
- Exactly. And that O(1) expression is evaluated O(n*log log n) times in total, so the time complexity (following conventional definitions) is at least that. Ausples wrote: "One could not apply the current conventional rules to this algorithm". That sums up the main problem. Ausples is in WP:OR territory and Wikipedia does not allow that. By the way, as one who has implemented and seen the sieve many times, I can say with confidence that Ausples' implementation is relatively slow in actual running time compared to many other implementations. PrimeHunter 11:39, 13 May 2007 (UTC)
- I believe that if you plot the result, it looks linear. It's hard to see the difference between O(N) and O(N log N) and O(N log log N), and in practice there is little difference. -- Jitse Niesen (talk) 11:24, 13 May 2007 (UTC)
- As the link you gave says, not only statements incur a cost but expressions do too. There is a cost associated with the expression primeCheck[d] in the switch statement; O(1) according to the lecture notes. So they're not free. -- Jitse Niesen (talk) 11:03, 13 May 2007 (UTC)
The statement->"One could not apply the current conventional rules to this algorithm"--retracted. You can certainly apply conventional computer science rules here, as this algorithm follows those standards (see above references). Prime Hunter wrote: "I can say with confidence that Ausples' implementation is relatively slow in actual running time compared to many other implementations." Again, this does not matter for Big-O notation. Again, this is a technological limitation, not an algorithmic one. Big-O(N-P) does not change from computer to computer. But here is some hard data from my system:
The processor speed is really arbitrary here. It could be done with a 1MHz processor, it's still going to come out as Big-O(N-P)). You will want to plot starting at N=1,000,000(1M) as the computer does not spend any actual time in milliseconds finding primes until N=1,000,000. (And apparently finds 1,624,528 primes when N=26,000,0000 in about 2 and a half seconds on my system. At higher ranges, it still runs Big-O(N-P))
|N (range 0->N)| | |Time in Milliseconds| | |Primes Found| | |Total Processor Memory Operations (N-P)| |
1000 | 0 | 169 | 830 |
10000 | 0 | 1230 | 8769 |
100000 | 0 | 9593 | 90406 |
1000000 | 46 | 78499 | 921500 |
2000000 | 171 | 148934 | 1851065 |
3000000 | 265 | 216817 | 2783182 |
4000000 | 343 | 283147 | 3716852 |
5000000 | 484 | 348514 | 4651485 |
6000000 | 562 | 412850 | 5587149 |
7000000 | 656 | 476649 | 6523350 |
8000000 | 750 | 539778 | 7460221 |
9000000 | 906 | 602490 | 8397509 |
10000000 | 921 | 664580 | 9335419 |
11000000 | 1015 | 726518 | 10273481 |
12000000 | 1156 | 788061 | 11211938 |
13000000 | 1250 | 849253 | 12150746 |
14000000 | 1406 | 910078 | 13089921 |
15000000 | 1468 | 970705 | 14029294 |
16000000 | 1578 | 1031131 | 14968868 |
17000000 | 1671 | 1091315 | 15908684 |
18000000 | 1718 | 1151368 | 16848631 |
19000000 | 1859 | 1211051 | 17788948 |
20000000 | 1921 | 1270608 | 18729391 |
21000000 | 2015 | 1329944 | 19670055 |
22000000 | 2140 | 1389262 | 20610737 |
23000000 | 2234 | 1448222 | 21551777 |
24000000 | 2375 | 1507123 | 22492876 |
25000000 | 2421 | 1565928 | 23432071 |
26000000 | 2531 | 1624528 | 24375471 |
and on and on |
You can get more results for 100M->1xxM and the result will be the same. Linear. Slight millisecond variances (if any) are due to OS restrictions on processor time for multi-threading OS background tasks. This is common on popular operating systems. You can test this up to the limit the OS will allocate for a boolean array. It will still be a linear result. I can draw the graph for you, or you can plot it in a spreadsheet yourself. Big-O(N-P). Actual runtime indicates it, actual operation count proves it. The graph is a straight line. There is no logN anywhere in the above code. Please provide references for your refutation, otherwise this cannot be challenged. I do not understand terms like: "I think", "I believe", "I feel", they are not academic terms. Please break down the time complexity using computer science based references. This has been done by myself throughout this discussion. Where is your reference for refutation? If I were to "believe", I believe that this is what Eratosthenes had in mind, since it runs O(N-P) to the computer, and matches exactly his user requirements.[19],[20] --Ausples 13:35, 13 May 2007 (UTC)
- The below mentioned bug in the original version [21] has been fixed above in the current version. PrimeHunter 20:28, 14 May 2007 (UTC)
- As I have said, asymptotic time complexity must be proved by analyzing the algorithm mathematically. It can never be proved by running test cases. As MathWorld says [22], the average number below N has O(log log N) prime factors. That log log N gets into the real time complexity. This is really outside the scope of a talk page which is intended for discussing improvements to the article, but I will compare actual timings to my code. I debugged (the last "break" must be removed) and ran Ausples' C code:
- 5761455 primes up to 100000000 computed in 3.41 seconds.
- An old C implementation by me with same compiler and cpu:
- 5761455 primes up to 100000000 computed in 0.47 seconds (7 times faster).
- My implementation is the segmented Sieve of Eratosthenes and works to 10^12 with a couple MB ram. Many people have made faster implementations than me. Use Google to find some. By the way, the mentioned bug means that Ausples' program, as displayed above and previously inserted in the article, actually is linear, but it's not the Sieve of Eratosthenes, and it does not compute primes. I have spent far too much time on this discussion and may stop now. As long as you don't put original research into the article, we don't have to agree. PrimeHunter 17:56, 13 May 2007 (UTC)
Please provide references. Further, no breaks need be removed from this code. The last break belongs to the false case in the first switch statement, this is not a bug. In N=100M (as you claim above producing primes in .47 seconds), we compute that around ~10 seconds using the code above, not 3.4 seconds. Actual time is a technological limitation, not an algorithmic one, therefore arbitrary to Big-O time complexity. What is important is that over several controlled tests that the algorithm produces Big-O(N-P). You are also claiming that your algorithm would go faster than the above mentioned "square sieve", (or the quadratic sieve for that matter) cited in that masters level artificial intelligence class ->[23] . Call me skeptical on that note. ;-) If you do not provide references for these claims, they must be false because they would then be based in ego("I"), not reality. We are made to "believe", which again is not academic, even less scientific. Further, if the code above is not producing prime numbers at runtime, then what is it doing? That trace line is a console output. "p" is always a prime number in this algorithm. This can be proven by adding the console output line. Which, was apparently not done, as Prime Hunter says, "but it's not the Sieve of Eratosthenes, and it does not compute primes." Right, it's determinisitic; the outer loop knows that the next location it visits will be a prime location. And it does in fact meet the user requirements set out by Eratosthenes [24],[25]. We covered that. But at least we agree it runs Big-O(N-P). As for your citation above, this applies outside the scope of this article in "factoring" primes. The Sieve of Eratosthenes is concerned with the NEXT prime, not factoring them. Read the requirements again [26],[27], and use a historical context this time. There is no mention of factoring anywhere. It's all about how you get to the NEXT prime (last I checked in calculus, a prime is a derivative location represented by a linear slope for an equation.). This is not a matter of agreeing. This is the Sieve of Eratosthenes as per user requirements [28],[29] . It is a hard fact.--Ausples 20:02, 13 May 2007 (UTC)
- Check your above code which you also inserted in the article. It contains 5 '{' and only 4 '}'. Your indentation clearly indicates the missing '}' should be at the end. The last break does not belong to a switch as you claim. It jumps out of the inner for loop after a single iteration, and that is certainly wrong. The output for N=50 if your alleged primes p are printed is:
- 2 3 5 7 8 9 11 12 13 15 17 19 20 21 23
- If a '}' is instead added in a strange (in view of indentation) place before the last break, then that would fix your bug without having to delete the break.
- As I wrote, my implementation (which is not fast compared to several other) of the Sieve of Eratosthenes runs around 7 times faster than your implementation. I did not compare it to your references. The timings are on one core of my 2.4 GHz Core 2 Duo, where your code takes 3.4s to 10^8, and mine takes 0.47s. You cannot expect me to time it on the same hardware as you happen to have, where your code apparently takes 10s. 7 times faster is an approximately constant factor due to optimization details of the essentially same algorithm. It's not asymptotically faster but it appears you don't understand the concept of asymptotic time complexity as expressed with big-O notation. You point to many references but those I and Jitse have examined don't support your claims. You question my work and integrity on sieving. I will let my record speak for itself: My prime sieves, usually based on Eratosthenes' idea, have set over 100 computational prime records. [30] The primes are published on the web at the links and you are welcome to verify them (preferably not with your own software). I have spent hours on this discussion with no result and will try to focus on more productive editing now. But I will continue to watch the article and look out for original research. PrimeHunter 00:51, 14 May 2007 (UTC)
- Harry G. Mairson, Some new upper bounds on the generation of prime numbers, Communications of the ACM 20(9):664-669, 1977, doi:10.1145/62065.62072, states that the time complexity of a standard implementation of the Sieve of Eratosthenes is O(N log log N) (look at the very start of the second page). None of the links provided by Ausples say anything on the alternative algorithm that Ausples shows, therefore, we can't add to the article. -- Jitse Niesen (talk) 01:31, 14 May 2007 (UTC)
- "}" was added in the wrong place [31], at the end as the indentation incorrectly indicated. It still computes the sequence 2 3 5 7 8 9 11 12 13 15 17 19 20 21 23. As I wrote, one way to fix it is adding a "}" before the last break (and before the output of p). PrimeHunter 12:00, 14 May 2007 (UTC)
Thank you for your references PH. It seems there was a copy/paste error. You can now copy/paste the code block, and modify data types as necessary. The output will be correct(as long as the datatypes are equivalent). Now you can see the alignment more clearly (all braces have end braces). And of course, there are now copy/paste instructions. I'm not quite sure why the code would stream those values, but the code block above is currently producing 2 3 5 7 11 13 17 19 23 -> N. Not sure why you would get that sequence you're referring to. If it was missing a brace, it shouldn't even have compiled. But thanks for trying to fix it... It's fine now. Thanks again for the feedback. One may copy/paste the block again. And no, we cannot add the article at this time. It must remain in discussion for now. :-| --Ausples 19:17, 14 May 2007 (UTC)
- Yes, it's correct now. The mentioned sequence 2 3 5 7 8 9 11 12 ... was the incorrect output when "}" was put at the end of the original version, which I first tried because of the indentation, and which your first correction did. PrimeHunter 20:28, 14 May 2007 (UTC)
Hi everyone,
The order of the entire algorithm is O(N * N / ln(N)).
The outer loop executes once. During its execution, it repeats O(N) times.
The inner loop executes for each prime found. From the prime number theorem, there will be O(N/ln(N)) primes in the range, 0..N.
Each time the inner loop executes, it repeats O(N) times.
Each repeat of the inner loop takes O(1).
--Kevinkor2 15:00, 15 May 2007 (UTC)
- Your statement "Each time the inner loop executes, it repeats O(N) times" is false. For the prime p, the number of repetitions of the inner loop is approximately N/p. That must be summed over primes p < N. http://www-lipn.univ-paris13.fr/~lavault/Articles/SMER05.pdf has the correct argument, with a reference to the reliable and respected: G. Hardy and E. Wright, An introduction to the theory of numbers. Oxford: Clarendon Press, 1979.
- In total there are O(N*log log N) repeats of the inner loop. One way to view this is that on average each number d < N is reached in the inner loop O(log log N) times, because numbers below N have on average O(log log N) prime factors.[32] PrimeHunter 17:15, 15 May 2007 (UTC)
[edit] Another animation
Your opinions are invited on this graphic view of Eratosthenes sieve.Cuddlyable3 20:42, 8 September 2007 (UTC)
- Cuddlyable3 replaced the article's Image:Animation Sieve of Eratosth.gif with Image:EratosthenesSieve.gif which is the image with circles displayed here. I reverted and suggested to discuss first. I think Image:Animation Sieve of Eratosth.gif is much better. It seems very hard to follow what happens with the circles. They have different sizes for no apparent reason, there is a varying number in each ring for no apparent reason, only few circles are numbered, and those representing composites disappear very quickly. If you don't already know how the algorithm works then I don't think the circle image is helpful. PrimeHunter 21:25, 8 September 2007 (UTC)
-
- I agree with PrimeHunter here. I had to watch the animation a couple of times and read the text below before I understood what's going on, and I do already know the algorithm. I like the new image from an artistic point of view, but pedagogically the old image is better. -- Jitse Niesen (talk) 02:57, 9 September 2007 (UTC)
Is the second image better? Cuddlyable3 16:43, 19 September 2007 (UTC)
Features of the latest animation:
- First the natural numbers grow, then the sieve is applied.
- 1 is never prime.
- The spiral layout--
-- is a way of suggesting that the domain of composite and prime numbers is unlimited i.e. not an intrinsic limit of the algorithm
-- avoids emphasizing any particular multiples, which all rectangular tables are bound to do
-- contrasts the regularity of the natural numbers with the irregularity of the prime numbers. Cuddlyable3 19:39, 25 September 2007 (UTC)
- I still think the rectangular image is much better, even more so after it was improved today. Your spiral has more arbitrary limits like how many integers per turn and when to stop writing numbers. I don't think there exists any circle or spiral image I would prefer over the nice rectangular image in the current article. PrimeHunter 23:49, 25 September 2007 (UTC)
- I think it's a good representation. It makes sense from the algorithm perspective. The only suggestion for modification might be to not show numbers that have already been eliminated. For example. 3 already eliminates 15. When the animation gets to 5, it shows 15 again, which has already been eliminated by 3. [Added by 76.105.181.71]
- I am wary of atributing to ancient Eratosthenes any but the simplest version of a prime-finder algorithm. He had no use for cataloging very large primes, nor a practical number representation for doing so; not having a computer he could hardly have an interest in programming one. His insight, and that of Euclid, were about the nature of the whole set of prime numbers and IMO is better represented as a sorting of the natural numbers in place than by showing mechanical processes on rigidly limited lists. "Simplest version" means striking every multiple of each successively discovered prime. That works and involves only additions. It does mean that numbers that are multiples of different primes viz. 15, 21, 33, 35, 55, 77, 165, 231,... get struck repeatedly; avoiding that requires added complications such as multiplication.Cuddlyable3 (talk) 19:40, 13 January 2008 (UTC)
- I think the current version with the squares is much, much easier to understand. Tempshill (talk) 17:27, 26 May 2008 (UTC)
[edit] Possible improvement on picture
I've made a new gif for this page based on some comments stated here: Wikipedia:Featured picture candidates/Sieve of Eratosthenes. Should the old one be replaced with this one? Improvements:
- Clearly shows what's a prime and what's not.
- Fast animation during crossing out. (Although this was already done by brian0918 as well.
Note: I'm quite new. I do my best to do everything roughly right, but there's lots of guidelines to adhere to. If I did anything wrong, just poke me in the right direction. Thanks :) M.qrius (talk) 22:25, 27 December 2007 (UTC) Hi, That is a fantastic GIF picture!But yours is not as pretty as the old one,I am afraid.How do you make these pictures,can you tell me? I want to make some myselfVisame (talk) 18:38, 3 May 2008 (UTC)
[edit] Featured picture candidate
The animation (in rectangular form) is nominated to become a Featured Picture, see Wikipedia:Featured picture candidates/Sieve of Eratosthenes. I said there that animation should be changed so that, when crossing off multiples of p, it starts at 2p instead of p2. I think the current animation, which starts at p2, will confuse some readers (see also the comments at the featured picture candidates page), and it is not an essential part of the algorithm to start at p2 (it does not change the N log log N complexity). However, I'm not sure about the latter bit, so please correct me if I'm wrong. -- Jitse Niesen (talk) 01:59, 26 September 2007 (UTC)
[edit] animated picture
Is it just me that finds animated pictures in articles really annoying? How can you read an article if there's something jumping around on the screen at the same time? I'd much prefer a link to an animated visualization. 78.52.193.99 (talk) 09:47, 24 March 2008 (UTC)
- Anonymous 78.52.193.99 you may represent a small minority. You may exert one click on the Edit tab if you wish to read text with nothing jumping around. Alternatively in an Internet Explorer browser you can go to Tools - Internet options - Advanced - Multimedia - [ ]Play animations in web pages. Remove the tick in the box, click OK, refresh the page and then diagrams won't move. (I checked this in MS IE v. 6.) Info added by Cuddlyable3 (talk) 20:04, 26 May 2008 (UTC)