Pollard's p-1 algorithm

From Wikipedia, the free encyclopedia

Pollard's p − 1 algorithm is a number theoretic integer factorization algorithm, invented by John Pollard in 1974. It is a special-purpose algorithm, meaning that it is only suitable for integers with specific types of factors.

The algorithm is based on the insight that numbers of the form ab − 1 tend to be highly composite when b is itself composite. Since it is computationally simple to evaluate numbers of this form in modular arithmetic, the algorithm allows one to quickly check many potential factors with great efficiency. In particular, the method will find a factor p if b is divisible by p − 1, hence the name. When p − 1 is smooth (the product of only small integers) then this algorithm is well-suited to discovering the factor p.

Contents

[edit] Base concepts

Let n be a composite integer with prime factor p. By Fermat's little theorem, we know that

a^{p-1} \equiv 1\pmod{p} for a \in (\mathbb{Z}/p\mathbb{Z})^*

Let us assume that p − 1 is B-powersmooth for some reasonably sized B (more on the selection of this value later). Recall that a positive integer m is called B-smooth if all prime factors pi of m are such that piB. m is called B-powersmooth if all prime powers piei dividing m are such that pieiB.

Let p1, ..., pL be the primes less than B and let e1, ..., eL be the exponents such that

p_i^{e_i} \le B < p_i^{e_i + 1}

Let

M = \prod_{i = 1}^{L} p_i^{e_i}

As a shortcut, M = lcm{1, ..., B}. As a consequence of this, (p − 1) divides M, and also if pe divides M this implies that peB. Since (p − 1) divides M we know that aM ≡ 1 (mod p), and because p divides n this means gcd(aM − 1, n) > 1.

Therefore if gcd(aM − 1, n) ≠ n, then the gcd is a non-trivial factor of n.

If p − 1 is not B-powersmooth, then aM ≢ 1 (mod p) for at least half of all a.

[edit] Pollard concepts

Let n = pqr, where p and q are distinct primes and r is an integer, such that p − 1 is B-powersmooth and q − 1 is not B-powersmooth. Now, gcd(aM − 1, n) yields a proper factor of n.

In the case where q − 1 is B-powersmooth, the gcd may yield a trivial factor because q divides aM − 1. This is what makes the algorithm specialized. For example, 172189 = 421 × 409. 421 − 1 = 22×3×5×7 and 409 − 1 = 23×3×17. So, an appropriate value of B would be from 7 to 16. If B was selected less than 7 the gcd would have been 1 and if B was selected higher than 16 the gcd would have been n. Of course, we do not know what value of B is appropriate in advance, so this will factor into the algorithm.

To speed up calculations, we also know that when taking the gcd we can reduce one part modulo the other, so gcd(aM − 1, n) = gcd(aM − 1 mod n, n). This can be efficiently calculated using modular exponentiation and the Euclidean algorithm.

[edit] Algorithm and running time

The basic algorithm can be written as follows:

Inputs: n: a composite integer
Output: a non-trivial factor of n or failure
  1. select a smoothness bound B
  2. pick a randomly in (\mathbb{Z}/n\mathbb{Z})^* (note: we can actually fix a, random selection here is not imperative)
  3. for each prime qB
    e \gets \bigg\lfloor \frac{\log{B}}{\log{q}} \bigg\rfloor
    a \gets a^{q^e} \mod{n} (note: this is aM)
  4. g ← gcd(a − 1, n)
  5. if 1 < g < n then return g
  6. if g = 1 then select a higher B and go to step 2 or return failure
  7. if g = n then go to step 2 or return failure

If g = 1 in step 6, this indicates that for all p − 1 that none were B-powersmooth. If g = n in step 7, this usually indicates that all factors were B-powersmooth, but in rare cases it could indicate that a had a small order modulo p.

The running time of this algorithm is O(B × log B × log2n), so it is advantageous to pick a small value of B.

[edit] Large prime variant

A variant of the basic algorithm is sometimes used. Statistically, there is often a factor p of n such that p − 1 = fq such that f is B-powersmooth and B < qB', where q is a prime and B' is called a semi-smoothness bound.

As a starting point, this would work into the basic algorithm at step 6 if we encountered gcd = 1 but didn't want to increase B. For all primes B < q1, ..., qLB', we check if

\gcd\left(a^{q_im}-1, n\right) \neq 1

to obtain a non-trivial factor of n. This is quickly accomplished, because if we let c = aM, and d1 = q1 and di = qiqi − 1, then we can compute

a^{q_1m} = c^{d_1}, a^{q_2m} = c^{d_1}c^{d_2} = a^{q_1m}c^{d_2}, \ldots

The running time of the algorithm with this variant then becomes O(B' × log B' × log2n).

[edit] Additional information

Because of this algorithm's effectiveness on certain types of numbers the RSA specifications require that the primes, p and q, be such that p-1 and q-1 are non-B-powersmooth for small values of B.

In other languages