Pollard's p-1 algorithm

From Wikipedia, the free encyclopedia

Pollard's p − 1 algorithm is a number theoretic integer factorization algorithm, invented by John Pollard in 1974. It is a special-purpose algorithm, meaning that it is only suitable for integers with specific types of factors.

The algorithm is based on the insight that numbers of the form a^b − 1 tend to be highly composite when b is itself composite. Since it is computationally simple to evaluate numbers of this form in modular arithmetic, the algorithm allows one to quickly check many potential factors with great efficiency. In particular, the method will find a factor p if b is divisible by p − 1, hence the name. When p − 1 is smooth (the product of only small integers) then this algorithm is well-suited to discovering the factor p.

1 Base concepts
2 Pollard concepts
3 Algorithm and running time
4 Large prime variant
5 Additional information

[edit] Base concepts

Let n be a composite integer with prime factor p. By Fermat's little theorem, we know that

$a^{p-1} \equiv 1\pmod{p}$ for $a \in (\mathbb{Z}/p\mathbb{Z})^*$

Let us assume that p − 1 is B-powersmooth for some reasonably sized B (more on the selection of this value later). Recall that a positive integer m is called B-smooth if all prime factors p_i of m are such that p_i ≤ B. m is called B-powersmooth if all prime powers p_i^e_i dividing m are such that p_i^e_i ≤ B.

Let p₁, ..., p_L be the primes less than B and let e₁, ..., e_L be the exponents such that

$p_i^{e_i} \le B < p_i^{e_i + 1}$

Let

$M = \prod_{i = 1}^{L} p_i^{e_i}$

As a shortcut, M = lcm{1, ..., B}. As a consequence of this, (p − 1) divides M, and also if p^e divides M this implies that p^e ≤ B. Since (p − 1) divides M we know that a^M ≡ 1 (mod p), and because p divides n this means gcd(a^M − 1, n) > 1.

Therefore if gcd(a^M − 1, n) ≠ n, then the gcd is a non-trivial factor of n.

If p − 1 is not B-powersmooth, then a^M ≢ 1 (mod p) for at least half of all a.

[edit] Pollard concepts

Let n = pqr, where p and q are distinct primes and r is an integer, such that p − 1 is B-powersmooth and q − 1 is not B-powersmooth. Now, gcd(a^M − 1, n) yields a proper factor of n.

In the case where q − 1 is B-powersmooth, the gcd may yield a trivial factor because q divides a^M − 1. This is what makes the algorithm specialized. For example, 172189 = 421 × 409. 421 − 1 = 2²×3×5×7 and 409 − 1 = 2³×3×17. So, an appropriate value of B would be from 7 to 16. If B was selected less than 7 the gcd would have been 1 and if B was selected higher than 16 the gcd would have been n. Of course, we do not know what value of B is appropriate in advance, so this will factor into the algorithm.

To speed up calculations, we also know that when taking the gcd we can reduce one part modulo the other, so gcd(a^M − 1, n) = gcd(a^M − 1 mod n, n). This can be efficiently calculated using modular exponentiation and the Euclidean algorithm.

[edit] Algorithm and running time

The basic algorithm can be written as follows:

Inputs: n: a composite integer

Output: a non-trivial factor of n or failure

select a smoothness bound B
pick a randomly in $(\mathbb{Z}/n\mathbb{Z})^*$ (note: we can actually fix a, random selection here is not imperative)
for each prime q ≤ B

$e \gets \bigg\lfloor \frac{\log{B}}{\log{q}} \bigg\rfloor$

$a \gets a^{q^e} \mod{n}$ (note: this is a^M)
g ← gcd(a − 1, n)
if 1 < g < n then return g
if g = 1 then select a higher B and go to step 2 or return failure
if g = n then go to step 2 or return failure

If g = 1 in step 6, this indicates that for all p − 1 that none were B-powersmooth. If g = n in step 7, this usually indicates that all factors were B-powersmooth, but in rare cases it could indicate that a had a small order modulo p.

The running time of this algorithm is O(B × log B × log²n), so it is advantageous to pick a small value of B.

[edit] Large prime variant

A variant of the basic algorithm is sometimes used. Statistically, there is often a factor p of n such that p − 1 = fq such that f is B-powersmooth and B < q ≤ B', where q is a prime and B' is called a semi-smoothness bound.

As a starting point, this would work into the basic algorithm at step 6 if we encountered gcd = 1 but didn't want to increase B. For all primes B < q₁, ..., q_L ≤ B', we check if

$\gcd\left(a^{q_im}-1, n\right) \neq 1$

to obtain a non-trivial factor of n. This is quickly accomplished, because if we let c = a^M, and d₁ = q₁ and d_i = q_i − q_{i − 1}, then we can compute

$a^{q_1m} = c^{d_1}, a^{q_2m} = c^{d_1}c^{d_2} = a^{q_1m}c^{d_2}, \ldots$

The running time of the algorithm with this variant then becomes O(B' × log B' × log²n).

[edit] Additional information

Because of this algorithm's effectiveness on certain types of numbers the RSA specifications require that the primes, p and q, be such that p-1 and q-1 are non-B-powersmooth for small values of B.

Retrieved from "http://en.wikipedia.org../../../p/o/l/Pollard%27s_p-1_algorithm.html"

Category: Integer factorization algorithms