Pollard's rho algorithm

Pollard's rho algorithm is an algorithm for integer factorization. It was invented by John Pollard in 1975.[1] It uses only a small amount of space, and its expected running time is proportional to the square root of the size of the smallest prime factor of the composite number being factorized.

Core ideas

Suppose we need to factorize a number , where is a non-trivial factor. A polynomial modulo , called (e.g., ), is used to generate a pseudo-random sequence: A starting value, say 2, is chosen, and the sequence continues as , , , etc. The sequence is related to another sequence . Since is not known beforehand, this sequence cannot be explicitly computed in the algorithm. Yet, in it lies the core idea of the algorithm.

Because the number of possible values for these sequences are finite, both the sequence, which is mod , and sequence will eventually repeat, even though we do not know the latter. Assume that the sequences behave like random numbers. Due to the birthday paradox, the number of before a repetition occurs is expected to be , where is the number of possible values. So the sequence will likely repeat much earlier than the sequence . Once a sequence has a repeated value, the sequence will cycle, because each value depends only on the one before it. This structure of eventual cycling gives rise to the name "Rho algorithm", owing to similarity to the shape of the Greek character ρ when the values , , etc. are represented as nodes in a directed graph.

This is detected by the Floyd's cycle-finding algorithm: two nodes and (i.e., and ) are kept. In each step, one moves to the next node in the sequence and the other moves to the one after the next node. After that, it is checked whether . If it is not 1, then this implies that there is a repetition in the sequence (i.e. . This works because if the is the same as , the difference between and is necessarily a multiple of . Although this always happens eventually, the resulting GCD is a divisor of other than 1. This may be itself, since the two sequences might repeat at the same time. In this (uncommon) case the algorithm fails, and can be repeated with a different parameter.

Algorithm

The algorithm takes as its inputs n, the integer to be factored; and , a polynomial in x computed modulo n. In the original algorithm, , but nowadays it is more common to use . The output is either a non-trivial factor of n, or failure. It performs the following steps:[2]

    x ← 2; y ← 2; d ← 1
    while d = 1:
        x ← g(x)
        y ← g(g(y))
        d ← gcd(|x - y|, n)
    if d = n: 
        return failure
    else:
        return d

Here x and y corresponds to and in the section about core idea. Note that this algorithm may fail to find a nontrivial factor even when n is composite. In that case, the method can be tried again, using a starting value other than 2 or a different .

Example factorization

Let n = 8051 and g(x) = (x2 + 1) mod 8051.

i x y GCD(|x y|, 8051)
1 5 26 1
2 26 7474 1
3 677 871 97
4 7474 1481 1

97 is a non-trivial factor of 8051. Starting values other than x = y = 2 may give the cofactor (83) instead of 97. One extra iteration is shown above to make it clear that y moves twice as fast as x. Note that even after a repetition, the GCD can return back to 1.

Variants

In 1980, Richard Brent published a faster variant of the rho algorithm. He used the same core ideas as Pollard but a different method of cycle detection, replacing Floyd's cycle-finding algorithm with the related Brent's cycle finding method.[3]

A further improvement was made by Pollard and Brent. They observed that if , then also for any positive integer b. In particular, instead of computing at every step, it suffices to define z as the product of 100 consecutive terms modulo n, and then compute a single . A major speed up results as 100 gcd steps are replaced with 99 multiplications modulo n and a single gcd. Occasionally it may cause the algorithm to fail by introducing a repeated factor, for instance when n is a square. But it then suffices to go back to the previous gcd term, where , and use the regular ρ algorithm from there.

Application

The algorithm is very fast for numbers with small factors, but slower in cases where all factors are large. The ρ algorithm's most remarkable success was the factorization of the ninth Fermat number, F8 = 1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321. The ρ algorithm was a good choice for F8 because the prime factor p = 12389263661552897 is much smaller than the other factor. The factorization took 2 hours on a UNIVAC 1100/42.

The example n = 10403 = 101 · 103

Here we introduce another variant, where only a single sequence is computed, and the gcd is computed inside the loop that detects the cycle.

C++ code sample

The following code sample finds the factor 101 of 10403 with a starting value of x = 2.

int gcd(int a, int b) {
	int remainder;
	while (b != 0) {
		remainder = a % b;
		a = b;
		b = remainder;
	}
	return a;
}

int main () {
	int number = 10403, x_fixed = 2, cycle_size = 2, x = 2, factor = 1;

	while (factor == 1) {
		for (int count=1;count <= cycle_size && factor <= 1;count++) {
			x = (x*x+1)%number;
			factor = gcd(x - x_fixed, number);
		}

		cycle_size *= 2;
		x_fixed = x;
	}
	cout << "\nThe factor is  " << factor;
}

The results

In the following table the third and fourth columns contain secret information not known to the person trying to factor pq = 10403. They are included to show how the algorithm works. If we start with x = 2 and follow the algorithm, we get the following numbers:

x xfixed x mod 101 xfixed mod 101 step
22 2 20
52521
2622622
6772671263
5982693264
39032665265
34182685266
156341855857
3531341897858
5168341817859
37243418888510
9783418698511
98123418158512
59833418248513
99703418728514
2369970347215
36829970467216
20169970977217
70879970177218
102899970887219
25949970697220
84999970157221
49739970247222
27999970727223

The first repetition modulo 101 is 97 which occurs in step 17. The repetition is not detected until step 23, when x (mod 101) = xfixed (mod 101). This causes gcd(x - x_fixed, n) = gcd(2799 - 9970, n) to be p = 101, and a factor is found.

Complexity

If the pseudo random number x = g(x) occurring in the Pollard ρ algorithm were an actual random number, it would follow that success would be achieved half the time, by the Birthday paradox in iterations. It is believed that the same analysis applies as well to the actual rho algorithm, but this is a heuristic claim, and rigorous analysis of the algorithm remains open.[4]

References

  1. Pollard, J. M. (1975), "A Monte Carlo method for factorization", BIT Numerical Mathematics, 15 (3): 331–334, doi:10.1007/bf01933667
  2. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. & Stein, Clifford (2009), "Section 31.9: Integer factorization", Introduction to Algorithms (third ed.), Cambridge, MA: MIT Press, pp. 975–980, ISBN 978-0-262-03384-8 (this section discusses only Pollard's rho algorithm).
  3. Brent, Richard P. (1980), "An Improved Monte Carlo Factorization Algorithm", BIT, 20: 176–184, doi:10.1007/BF01933190
  4. Galbraith, Steven D. (2012), "14.2.5 Towards a rigorous analysis of Pollard rho", Mathematics of Public Key Cryptography, Cambridge University Press, pp. 272–273, ISBN 9781107013926.

Additional reading

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.