Pollard's p-1 algorithm
From Wikipedia, the free encyclopedia
Pollard's p − 1 algorithm is a number theoretic integer factorization algorithm, invented by John Pollard in 1974. It is a special-purpose algorithm, meaning that it is only suitable for integers with specific types of factors.
The algorithm is based on the insight that numbers of the form ab − 1 tend to be highly composite when b is itself composite. Since it is computationally simple to evaluate numbers of this form in modular arithmetic, the algorithm allows one to quickly check many potential factors with great efficiency. In particular, the method will find a factor p if b is divisible by p − 1, hence the name. When p − 1 is smooth (the product of only small integers) then this algorithm is well-suited to discovering the factor p.
Contents |
[edit] Base concepts
Let n be a composite integer with prime factor p. By Fermat's little theorem, we know that
- for a coprime to p
Let us assume that p − 1 is B-powersmooth for some reasonably sized B (more on the selection of this value later). Recall that a positive integer m is called B-smooth if all prime factors pi of m are such that pi ≤ B. m is called B-powersmooth if all prime powers piei dividing m are such that piei ≤ B.
Let p1, ..., pL be the primes less than B and let e1, ..., eL be the exponents such that
Let
As a shortcut, M = lcm{1, ..., B}. As a consequence of this, (p − 1) divides M, and also if pe divides M this implies that pe ≤ B. Since (p − 1) divides M we know that aM ≡ 1 (mod p), and because p divides n this means gcd(aM − 1, n) > 1.
Therefore if gcd(aM − 1, n) ≠ n, then the gcd is a non-trivial factor of n.
If p − 1 is not B-powersmooth, then aM ≢ 1 (mod p) for at least half of all a.
[edit] Pollard concepts
Let n = pqr, where p and q are distinct primes and r is an integer, such that p − 1 is B-powersmooth and q − 1 is not B-powersmooth. Now, gcd(aM − 1, n) yields a proper factor of n.
In the case where q − 1 is B-powersmooth, the gcd may yield a trivial factor because q divides aM − 1. This is what makes the algorithm specialized. For example, 172189 = 421 × 409. 421 − 1 = 22×3×5×7 and 409 − 1 = 23×3×17. So, an appropriate value of B would be from 7 to 16. If B was selected less than 7 the gcd would have been 1 and if B was selected higher than 16 the gcd would have been n. Of course, we do not know what value of B is appropriate in advance, so this will factor into the algorithm.
To speed up calculations, we also know that when taking the gcd we can reduce one part modulo the other, so gcd(aM − 1, n) = gcd(aM − 1 mod n, n). This can be efficiently calculated using modular exponentiation and the Euclidean algorithm.
[edit] Algorithm and running time
The basic algorithm can be written as follows:
- Inputs: n: a composite integer
- Output: a non-trivial factor of n or failure
-
- select a smoothness bound B
- randomly pick a coprime to n (note: we can actually fix a, random selection here is not imperative)
- for each prime q ≤ B
- (note: this is aM)
- g ← gcd(a − 1, n)
- if 1 < g < n then return g
- if g = 1 then select a higher B and go to step 2 or return failure
- if g = n then go to step 2 or return failure
If g = 1 in step 6, this indicates that for all p − 1 that none were B-powersmooth. If g = n in step 7, this usually indicates that all factors were B-powersmooth, but in rare cases it could indicate that a had a small order modulo p.
The running time of this algorithm is O(B × log B × log2n), so it is advantageous to pick a small value of B.
[edit] Large prime variant
A variant of the basic algorithm is sometimes used. Statistically, there is often a factor p of n such that p − 1 = fq such that f is B-powersmooth and B < q ≤ B', where q is a prime and B' is called a semi-smoothness bound.
As a starting point, this would work into the basic algorithm at step 6 if we encountered gcd = 1 but didn't want to increase B. For all primes B < q1, ..., qL ≤ B', we check if
to obtain a non-trivial factor of n. This is quickly accomplished, because if we let c = aM, and d1 = q1 and di = qi − qi − 1, then we can compute
The running time of the algorithm with this variant then becomes O(B' × log B' × log2n).
[edit] Additional information
Because of this algorithm's effectiveness on certain types of numbers the RSA specifications require that the primes, p and q, be such that p-1 and q-1 are non-B-powersmooth for small values of B.