Dixon's factorization method

From Wikipedia, the free encyclopedia

In number theory, Dixon's factorization method (also Dixon's algorithm) is a general-purpose integer factorization algorithm; it is the prototypical factor base method, and the only factor base method for which a run-time bound not reliant on conjectures about the smoothness properties of values of a polynomial is known.

The algorithm was designed by John D. Dixon, a mathematician at Carleton University, and was published in 1981.

Contents

[edit] Basic idea

Dixon's method is based on finding a congruence of squares modulo the integer N which we intend to factor. Fermat's factorization algorithm finds such a congruence by selecting random or pseudo-random x values and hoping that the integer x^2 \mod N is the square of an integer:

x^2\equiv y^2\quad(\hbox{mod }N),\qquad x\not\equiv\pm y\quad(\hbox{mod }N).

For example, if N=84923, we notice (by starting at 292, the first number greater than \sqrt N and counting up) that 505^2 \mod 84923 is 256, the square of 16. So (505-16)(505+16) = 0 \mod N. Computing the GCD of 505-16 and N using Euclid's algorithm gives us 163, which is a factor of N.

In practice, selecting random x values will take an impractically long time to find a congruence of squares, since there are so few squares less than N.

Dixon's method replaces the condition 'is the square of an integer' with the much weaker one 'has only small prime factors'; for example, there are 292 squares less than 84923, 662 numbers whose prime factors are only 2,3,5 or 7, and 4767 whose prime factors are all less than 30.

If we have lots of numbers a_1 \ldots a_n whose squares can be factorized as a_i^2 \mod n = \prod_{j=1}^m b_i^{e_{ij}} for a fixed set b_1 \ldots b_m of small primes, linear algebra modulo 2 on the matrix eij will give us a subset of the ai whose squares combine to a product of small primes to an even power -- that is, a subset of the ai whose squares combine to a square.

[edit] Method

Firstly, a set of primes less than some bound B is chosen. This set of primes is called the factor base. Then, using the polynomial

p(x) = x2(mod n)

many values of x are tested to see if p(x) factors completely over the factor base. If it does, the pair (x,p(x)) is stored. Such a pair is called a relation. Then, once the number of relations collected exceeds the size of the factor base, we can enter the next stage.

The p(x) values are factorized (this is easy since we are certain they factorize completely over the factor base) and the exponents of the prime factors are converted into an exponent vector mod 2. For example, if the factor base is {2, 3, 5, 7} and the p(x) value is 30870, we have:

30870=2^1\cdot3^2\cdot 5^1\cdot 7^3

This gives an exponent vector of:

\mathbf{v}_i=\begin{bmatrix}1 \\ 2 \\ 1 \\ 3\end{bmatrix}=\begin{bmatrix}1 \\ 0 \\ 1 \\ 1\end{bmatrix}\hbox{ mod 2}

If we can find some way to add these exponent vectors together (equivalent to multiplying the corresponding relations together) to produce the zero vector (mod 2), then we can get a congruence of squares. Thus we can put the exponent vectors together into a matrix, and formulate an equation:

c_1\mathbf{v}_1+c_2\mathbf{v}_2+\cdots+c_n\mathbf{v}_n=\mathbf{0}\hbox{ (mod 2)}

This can be converted into a matrix equation:

\begin{bmatrix}
v_{11} & v_{12} & \cdots & v_{1n}\\
v_{21} & v_{22} & \cdots & v_{2n}\\
\vdots & \vdots & \ddots & \vdots\\
v_{n1} & v_{n2} & \cdots & v_{nn}
\end{bmatrix}\begin{bmatrix}c_1 \\ c_2 \\ \vdots \\ c_n\end{bmatrix}=\begin{bmatrix}0\\0\\ \vdots\\0\end{bmatrix}\hbox{ (mod 2)}

This matrix equation is then solved (using, for example, Gaussian elimination) to find the vector c. Then:

\prod_{k}x_k^2\equiv\prod_{k}p(x_k)\pmod{n}

where the products are taken over all k for which ck = 1. At least one of the ck must be one. Because of the way we have solved for c, the right-hand side of the above congruence is a square. We then have a congruence of squares.

[edit] Example

Considering the factor base {2,3,5,7}, we will try to factor 84923.

513^2 \mod 84923 = 8400 = 2^4 * 3 * 5^2 * 7

537^2 \mod 84923 = 33600 = 2^6 * 3 * 5^2 * 7

So (513 \cdot 537)^2 \mod 84923 = 2^{10} \cdot 3^2 \cdot 5^4 \cdot 7^2

513 times 537 is 20712 (mod 84923).

That is, 20712^2 \mod 84923 = (2^5 \cdot 3 \cdot 5^2 \cdot 7)^2 \mod 84923 = 16800^2 \mod 84923

We then look at 20712-16800 = 3912 and 20712+16800 = 37512, and compute their greatest common divisors with 84923 by using Euclid's algorithm. This is 163 in the case of 3912, and 521 in the case of 37512; and, indeed, 84923 = 521 * 163.

[edit] Optimizations

The quadratic sieve is an optimization of Dixon's method. It solves a quadratic congruence to find suitable x values much faster than simply by random selection.

Other ways to optimize Dixon's method include using a better algorithm to solve the matrix equation, taking advantage of the sparsity of the matrix: a number of size N can't have more than log2N factors, so each row of the matrix is almost all zeros. In practice, the block Lanczos algorithm is often used. Also, the size of the factor base must be chosen carefully: if it is too small, it will be difficult to find numbers that factorize completely over it, and if it is too large, more relations will have to be collected.

A more sophisticated analysis, using the approximation that a number has all its prime factors less than N1 / a with probability about a a (an approximation to the Dickman-de Bruijn function), indicates that choosing too small a factor base is much worse than too large, and that the ideal factor base size is some power of \exp\left(\sqrt{\log N \log \log N}\right).

The optimal complexity of Dixon's method is [1]

O\left(\exp\left(2 \sqrt 2 \sqrt{\log n \log \log n}\right)\right).

[edit] References

  • J. D. Dixon, "Asymptotically fast factorization of integers," Math. Comput., 36(1981), p. 255-260.
Languages