Talk:Trial division

From Wikipedia, the free encyclopedia

However, for n with at least one small factor, trial division can be a quick way to find that small factor. It is worthwhile to note that for random n, there is a 50% chance that 2 is a factor of n, and a 33% chance that 3 is a factor, and so on. It can be shown that 88% of all positive integers have a factor under 100, and that 91% have a factor under 1000.

I'm not a mathematician, but this seems off to me. It is obviously true that any random positive integer has a 50% chance of being even, but is "88% of all positive integers have a factor under 100" meaningful? Can you apply percentages to an infinite set? Is this really shorthand for "for every random positive integer there is an 88% chance that it has a factor under 100" (which in turn presumably means "for every set of 100 random integers approximately 88 will have a factor under 100")? This is probably just me not understanding how these expressions are used, but still: is "88% of all positive integers" strictly speaking correct? 82.92.119.11 23:14, 28 April 2006 (UTC)

You're right; you can't assign proportions to the entire set of the integers in this way. However, you can look at it as a limit: as n tends to infinity, the proportion of integers below n that have a factor under 100 tends to 88%. — ciphergoth 08:05, 29 April 2006 (UTC)
So you'd say that the expression as used is indeed meaningful, just shorthand for something formally more complicated? 82.92.119.11 09:47, 29 April 2006 (UTC)
Yes. Normally I'd be wary of simplifications like this, but it's such a well behaved proportion that I don't think there are any problems expressing it this way. — ciphergoth 09:54, 29 April 2006 (UTC)

[edit] Reducing the number of trials to nearly (39/105)*N1/2 instead of (1/2)*N1/2

The naive solution to getting the prime factors of a number N by trial division is picking them up one by one and applying modulo division, and if the remainder is zero, it is a prime factor. Trial division begins with 2,3,4,5,6... and ends with N1/2 (or the square root of N) rounded down. Thus the number of trial divisions is roughly equal to -1+N1/2 at the worst case. (The worst case happens when the number is prime or semiprime. Semiprime numbers are numbers with two prime factors of roughly equal size.)

Meanwhile, we can afford to take out all other even factors, since it is the case that if a number K is not divisible by 2, it is also not divisible by positive multiples of 2. (In general, if K is not divisible by L, K is not divisible by all positive multiples of L.) Thus we can first try dividing by 2, then 3,5,7,9,11,13,15,17... until N1/2 rounded down. The number of trials is roughly equal to -1+(1/2)*N1/2 at the worst case. This is the notation which can be found on the current article. The number of divisions is (roughly) equal to the square root of N over 2.

From here it must be noted that it is much cost-efficient to naively apply trial division without regarding the primality testing of such factors. The overhead of getting all prime factors less than or equal to N1/2 is too large for sufficiently large N.

Now a good question is: Is there a way to reduce the number of trials? We saw that by removing all other even numbers from the trials, we basically cut down the trials in half. Now how about removing all remaining multiples of 3? Obviously we do not reduce the number of trials by another 1/3 because some multiples of 3 are also multiples of two. Thus we will only be able to reduce the trials by 1/3 of 1/2, or equivalently, 1/6. How does this happen?

Initially, the numbers in your sequence are 2,3,4,5,6,7,8,9,10,... Applying a 2-sieve, you get 3,5,7,9,11,13,15,17,19,... which is exactly the sequence of trial division without the even numbers. Now by applying a 3-sieve we get 5,7,11,13,17,19,23,25,29,31,35,37,...

Thus in this case we will only have to divide by 2 then 3 then 5,7,11,13,17,19,23,25,29,31,35,37,... We have thus reduced the number of divisions to 1+(1/3)*N1/2.

In the trial sequence 2,3,4,5,6,7,8,9,10,11,12... our common difference between the terms is 1. In the trial sequence 3,5,7,9,11,13,15,17,19... our common difference between the terms is 2. In the trial sequence 5,7,11,13,17,19,23,25,29,31... we do not have a common difference but we do have a difference sequence of 2,4 that repeats. Meaning, we add 2 to the first term to get the next term (5+2=7) and then add 4 to the next term which should give you 7+4=11, etc.

From these three sequence we will easily see that the first composite number that appears in the sequence is q2, where q is the first number in the sequence. It is also true that in order to find all the prime numbers from 2 to some number N, we only have to sieve using prime numbers from 2 to the prime number less than or equal to the square root of N. Thus in order to get the prime numbers up to 100, we only need to sieve by the prime numbers less than or equal to 10, which is the square root of 100. Or in other words, it is sufficient to make 2-sieve, 3-sieve, 5-sieve and 7-sieve to get the prime numbers up to 100. The proof to this is based on the same proof as to why you should do trial division on factors less than or equal to the square root of N.

Of course, we can make improvements on our latest trial sequence by taking out all multiples of 5, the current smallest prime number in the sequence. The sequence will now be: 7,11,13,17,19,23,29,31,37,41,43,47,49,53,59,61,67,71,73,77,79,83,89,91,97,101,103,107,...

Again, the first composite number in the sequence is 72 or 49. The new difference sequence is 4,2,4,2,4,6,2,6. (Meaning from 7, add 4 to get the 2nd term, 2 to get the 3rd term, and so on.)

How much did we reduce? Since some multiples of 5 are also multiples of 2 and/or 3, we were only able to take out 1/30 of the multiples. You now only have approximately 2+(3/10)*N1/2 trials at the worst case.

Does continuous sieving like so advantageous for us in the long run? Not actually. If we try to apply a 7-sieve this time, we will only be able to take out 1/210 of the trials and we will now have 48 numbers in our difference sequence. Our trial sequence is now 11,13,17,19,23,29,31,37,41, 43,47,53,59,61,67,71,73,79,83,89,97,101,103,107,109,113,121,127,131,137,139,143,149,151,157,163,167,169,173,181,187,191, 193,197,199,209,211,221,223,227,229,...

The new difference sequence, with 48 numbers, is the following: 2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10.

The approximate number of trials at the worst case is approximately equal to 3+(31/105)*N1/2. This algorithm when applied is approximately (and asymptotically) 59.04% faster than the conventional trial division with 2 followed by odd numbers. The prime number 923,456,789,012,347 has been tried for this algorithm and the program was able to factor it (trivially, since it is prime) in less than 2 seconds. (The program was written in Java 1.5.0 and run on a Pentium IV 2.6GHz with Windows 2000 SP4. The source code can be found below.)

---

public static void main(String [] args) {
 double N=Double.parseDouble(JOptionPane.showInputDialog(null, "Input number to factorize."));

 System.out.println(d2S(N)+"="+factorize(N).trim().replaceAll(" ","*"));
 System.exit(1);
 /* IMPORTANT: save these three methods in one class.
 /* And don't forget to add:
 /*               import java.io.*;
 /*               import javax.swing.*;
 */
}
public static String factorize(double N) {
 int [] inc={2,4,2,4,6,2,6,4,2,4,6,6,2,6,4,2,6,4,6,8,4,2,4,2,4,8,6,4,6,2,4,6,2,6,6,4,2,4,6,2,6,4,2,4,2,10,2,10};
 if (N<1) return "";
 else if (N<3) return d2S(N)+"";
 else {
  String output="";
  while (N%2==0) {    output+="2 ";    N/=2;   }
  if (N==1) return output;
  else while (N%3==0) {    output+="3 ";    N/=3;   }
  if (N==1) return output;
  else while (N%5==0) {    output+="5 ";    N/=5;   }
  if (N==1) return output;
  else while (N%7==0) {    output+="7 ";    N/=7;   }
  if (N==1) return output;
  else {
   double y=Math.sqrt(N);
   for (int i=11, j=0; i<y; i+=inc[j], j=(j+1)%48) {
    while (N%i==0) {
     output+=i+" ";
     N/=i;
    }
   }
   if (N==1) return output;
   else return output+d2S(N);
  }
 }
}
public static String d2S(double u) {
 u=u-(u%1);
 String output="";
 while (u>0) {
  int temp=(int)(u%10);
  output=temp+output;
  u=u/10;
  u=u-(u%1);
 }
 return output;
}

---

The length of the increment sequence grows faster than O(N!) and yet the reduction of terms converges to almost zero. Thus even though it is practical to implement a trial division algorithm with less trials, the number of terms in the difference sequence is another issue to tackle.

Although this is not yet proven, we can see from our examples above some sort of pattern found in the difference sequences. It is generally a palindromic sequence of even integers followed by an even integer Q (which is apparently the largest in the sequence) followed by 2 and another Q. For example:

2,4 can be expanded into 2,4,2,4. Thus the palindromic sequence is 2, and the Q2Q sequence is 4,2,4. In 4,2,4,2,4,6,2,6 the palindromic sequence is 4,2,4,2,4 and the Q2Q sequence is 6,2,6. In the difference sequence with 48 terms, we can verify that the preceding sequence of 45 even integers is a palindrome, followed by a 10,2,10.

Another thing to note is the sum of the digits of each difference sequence: 2 = 2 = 2, 2+4 = 6 = 2*3, 4+2+4+2+4+6+2+6 = 30 = 2*3*5, 2+4+2+4+6+2+6+4+2+4+6+6+2+6+4+2+6+4+6+8+4+2+4+2+4+8+6+4+6+2+4+6+2+6+6+4+2+4+6+2+6+4+2+4+2+10+2+10 = 210 = 2*3*5*7, etc.

These sums also represent the reciprocals of the amount of composites removed for every sieve you do. (For example, by making a 2-sieve, you remove 1/2. By making a 3-sieve, you remove 1/6. By making a 5-sieve, you remove 1/30. etc.)

Right now I am trying to prove the convergence of the sum of the reciprocals of 2, 6, 30, 210, 2310, 30030, 510510, 9699690, 223092870, 6469693230, etc. They seem to converge somewhere close to 0.70523. It could mean that approximately 70.523% of the positive integers are composite, and the rest are prime (except for 1) which is coincident with the theorem that the set of prime numbers are infinite. This sum of infinite series is asymptotically the number of composites that you will be able to remove by k-sieving from 2 to N1/2 for any sufficiently large N. Best123 04:00, 29 November 2006 (UTC)

I don't understand that. The sum of reciprocals of primorial numbers clearly converges faster than a geometric series, and as you say to about 0.705230.... But the density of primes falls towards zero due to the prime number theorem. --Henrygb 21:45, 5 December 2006 (UTC)
Clarification: I was not trying to obtain the reciprocals of primorial numbers. I was trying to obtain the sum of the reciprocals of 2, 6, 30, 210, 210, 2310, etc. What is this trying to show? This is showing the approximate amount of composite numbers removed after a certain amount of times you k-sieved the set of natural numbers. (Let us not forget that the goal here is to be able to reduce the number of trial divisions by taking out numbers that are surely composites. Recall: if N is not divisible by some k<\sqrt{N}, then N is not divisible by all other positive multiples of k.) For example, by 2-sieving the set of natural numbers, you got one sure prime number (2) and slashed down all other multiples of 2. Since the set of natural numbers are countable, we can safely assume that we "cut down half" of the trials when dividing from 2 to N1/2. We now have a factor of 0.5, because we will be trial dividing only by half of what we should have if we didn't 2-sieve the possible factors. If what we did next was to make a 3-sieve, we would reduce further the set of possible factors, but not by 1 / 3 because some multiples of 3 are also multiples of 2. Thus the number of multiples of 3 which are not factors of 2 are 1 / 2 * 1 / 3 = 1 / 6. Approximately one-sixth of the set of composite natural numbers are divisible by 3 but not by 2. Thus we now have 1 − (1 / 2 + 1 / 2 * 1 / 3) = 1 − (2 / 3) = 1 / 3. After 2-sieving and 3-sieving we will only need to make approximately 1 / 3 of the trial divisions we would have initially had if we tried from 2 to N1/2. If we tried to make an additional 5-sieve then we would now have 1 − (1 / 2 + 1 / 2 * 1 / 3 + 1 / 2 * 1 / 3 * 1 / 5) = 1 − (7 / 10) = 3 / 10. (This is because some multiples of 5 are also multiples of 2 and 3 which were already eliminated during the 2-sieve and 3-sieve.) After 5-sieving we only need to do only approximately 30% of the amount of trial divisions. If we tried to make a 7-sieve (reduces (1 / 2 * 1 / 3 * 1 / 5 * 1 / 7)), an 11-sieve (reduces (1 / 2 * 1 / 3 * 1 / 5 * 1 / 7 * 1 / 11)), a 13-sieve (reduces (1 / 2 * 1 / 3 * 1 / 5 * 1 / 7 * 1 / 11 * 1 / 13)), and so on, the amount of comparisons reduced asymptotically is:
Q_m = 1/2+1/6+1/30+1/210+1/2310+1/30030+1/510510+1/9699690+... \approx 0.70523...
where m is the last prime number we sieved (m-sieve.) Clearly, the infinite series above converges because it is bounded (upper bound) by a geometric sequence with common ratio of ½, which is also convergent. Also, this means that in an asymptotic basis, we can potentially reduce the number of trial divisions to nearly 0.29477 of the original number of trials. However, we can say that it is quite sufficient to reduce the number of trials to 0.3 of the original, since repeatedly sieving our possible factors yields only diminishing returns. Up to 5-sieve should be enough for practical purposes.
In corollary to this we can also say that the trial division algorithm can get no better than O(\sqrt{N}), since no matter how many composites we remove, the number of trials at the worst case is still proportional to \sqrt{N}. Best123 07:03, 15 December 2006 (UTC)