Pollard's rho algorithm

This article is about the integer factorization algorithm. For the discrete logarithm algorithm, see Pollard's rho algorithm for logarithms.

Pollard's rho algorithm is a special-purpose integer factorization algorithm. It was invented by John Pollard in 1975.[1] It is particularly effective for a composite number having a small prime factor.

Core ideas

The ρ algorithm is based on Floyd's cycle-finding algorithm and on the observation that (as in the birthday problem) t random numbers x1, x2, ... , xt in the range [1, n] will contain a repetition with probability P > 0.5 if t > 1.177n1/2. The constant 1.177 comes from the more general result that if P is the probability that t random numbers in the range [1, n] contain a repetition, then P > 1 - exp{ - t2/2n }. Thus P > 0.5 provided 1/2 < exp{ - t2/2n }, or t2 > 2nln 2, or t2 > 2n ln 2, or t > (2ln 2)1/2n1/2 = 1.177n1/2.

The ρ algorithm uses g(x). a polynomial modulo n as a generator of a pseudo-random sequence. (The most commonly used function is g(x) = x2 mod n.) Let's assume n = pq. The algorithm generates the sequence x1 = g(2), x2 = g(g(2)), x3 = g(g(g(2))), and so on. Two different sequences will in effect be running at the same time--the sequence {xk} and the sequence {xk mod p}. Since p < n1/2, the latter sequence is likely to repeat earlier than the former sequence. The repetition of the mod p sequence will be detected by the fact that gcd(xk mod p - xm mod p, n) = p, where k < m. Once a repetition occurs, the sequence will cycle, because each term depends only on the previous one. The name ρ algorithm derives from the similarity in appearance between the Greek letter ρ and the directed graph formed by the values in the sequence and their successors. Once it is cycling, Floyd's cycle-finding algorithm will eventually detect a repetition. The algorithm succeeds whenever the sequence {xk mod p} repeats before the sequence {xk}. The randomizing function g(x) must be a polynomial modulo n, so that it will work both modulo p and modulo n. That is, so that g(x mod p) ≡ g(x) (mod p).

Algorithm

The algorithm takes as its inputs n, the integer to be factored; and g(x), a polynomial p(x) computed modulo n. This will ensure that if p|n, and x ≡ y mod p, then g(x) ≡ g(y) mod p. In the original algorithm, g(x) = x2 - 1 mod n, but nowadays it is more common to use g(x) = x2 + 1 mod n. The output is either a non-trivial factor of n, or failure. It performs the following steps:[2]

  1. x ← 2; y ← 2; d ← 1;
  2. While d = 1:
    1. x ← g(x)
    2. y ← g(g(y))
    3. d ← gcd(|x - y|, n)
  3. If d = n, return failure.
  4. Else, return d.

Note that this algorithm may fail to find a nontrivial factor even when n is composite. In that case, you can try again, using a starting value other than 2 or a different g(x). The name ρ algorithm comes from the fact that the values of x (mod d) eventually repeat with period d, resulting in a ρ shape when you graph the values.

Variants

In 1980, Richard Brent published a faster variant of the rho algorithm. He used the same core ideas as Pollard but a different method of cycle detection, replacing Floyd's cycle-finding algorithm with the related Brent's cycle finding method.[3]

A further improvement was made by Pollard and Brent. They observed that if \gcd (a,n) >1, then also \gcd (ab,n)>1 for any positive integer b. In particular, instead of computing \gcd (|x-y|,n) at every step, it suffices to define z as the product of 100 consecutive |x-y| terms modulo n, and then compute a single \gcd (z,n). A major speed up results as 100 gcd steps are replaced with 99 multiplications modulo n and a single gcd. Occasionally it may cause the algorithm to fail by introducing a repeated factor, for instance when n is a square. But it then suffices to go back to the previous gcd term, where \gcd(z,n)=1, and use the regular ρ algorithm from there.

Application

The algorithm is very fast for numbers with small factors, but slower in cases where all factors are large. The ρ algorithm's most remarkable success was the factorization of the eighth Fermat number, F8 = 1238926361552897 * 93461639715357977769163558199606896584051237541638188580280321. The ρ algorithm was a good choice for F8 because the prime factor p = 12389263661552897 is much smaller than the other factor. The factorization took 2 hours on a UNIVAC 1100/42.

Example factorization

Let n = 8051 and g(x) = (x2 + 1) mod 8051.

i xi yi xiyi|, 8051)
1 5 26 1
2 26 7474 1
3 677 871 97

97 is a non-trivial factor of 8051. Starting values other than x = y = 2 may give the cofactor (83) instead of 97.

The Example n = 10403 = 101 . 103

Here we introduce another variant, where only a single sequence is computed, and the gcd is computed inside the loop that detects the cycle.

C++ Pseudocode

The following pseudocode finds the factor 101 of 10403 with a starting value of x = 2.

int g (x) {
	return (x * x + 1) % n;
}
 
int main () {
 
	int n = 10403;
	int x_fixed = 2;
	int cycle_size = 2;
	int x = 2;
	int h = 1;
 
	while (h == 1) {
		int count = 1;
 
		while (count <= cycle_size && h == 1) {
			x = g(x);
			count = count + 1;
			h = gcd(x - x_fixed, n);
		}
 
		if (h != 1)
			break;
 
		cycle_size = 2 * cycle_size;
		x_fixed = x;
	}
	cout << "\nThe factor is  " << h;
}

The Results

In the following table the second and fourth columns contain secret information not known to the person trying to factor pq = 10403. They are included to show how the algorithm works. If we start with x = 2 and follow the algorithm, we get the following numbers:

x xfixed x mod 101 xfixed mod 101 step
22 2 20
52521
2622622
6772671263
5982693264
39032665265
34182685266
156341855857
3531341897<--858
5168341817859
37243418888510
9783418698511
98123418158512
59833418248513
99703418728514
2369970347215
36829970467216
2016997097<--7217
102899970887218
25949970697219
84999970157220
49739970247221
2799997072<--7222

The first repetition modulo 101 is 97 which occurs in step 17. The repetition is not detected until step 22, when x = xfixed (mod 101). This causes gcd(x - x_fixed, n) = gcd(2799 - 9970, n) to be p = 101, and a factor is found.

Complexity

If the pseudorandom number x = g(x) occurring in the Pollard ρ algorithm were an actual random number, it would follow that success would be achieved half the time, by the Birthday paradox in O(\sqrt p)\le O(n^{1/4}) iterations. It is believed that the same analysis applies as well to the actual rho algorithm, but this is a heuristic claim, and rigorous analysis of the algorithm remains open.[4]

References

  1. Pollard, J. M. (1975), "A Monte Carlo method for factorization", BIT Numerical Mathematics 15 (3): 331–334, doi:10.1007/bf01933667
  2. Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L. & Stein, Clifford (2001), "Section 31.9: Integer factorization", Introduction to Algorithms (Second ed.), Cambridge, MA: MIT Press, pp. 896–901, ISBN 0-262-03293-7 (this section discusses only Pollard's rho algorithm).
  3. Brent, Richard P. (1980), "An Improved Monte Carlo Factorization Algorithm", BIT 20: 176–184, doi:10.1007/BF01933190
  4. Galbraith, Steven D. (2012), "14.2.5 Towards a rigorous analysis of Pollard rho", Mathematics of Public Key Cryptography, Cambridge University Press, pp. 272–273, ISBN 9781107013926.

Additional reading

External links