Talk:Quadratic sieve

From Wikipedia, the free encyclopedia

The paragraph named "Data processing" needs a serious rewrite. I just fixed an error that was introduced six(!) months ago, but even though it does not contain any more "logical errors", it does not show what happens at all. The 350^2 = (-90)^2 example is a very, very bad one, because actually 350 *is* -90 (mod 55). It only happens to prove the result because 350-90 = 2*350 has factor 5 common with 55, which itself is true only because 350 contains factor 5. This is nonsense! I really believe a less contrived example would be welcome, as it would decrease the probability of such errors and would help readers get a better grasp of the concept.

OK, I wrote a new example, with a bigger n and more strictly following the procedure outlined in the article (this example uses sieving). Decrypt3 23:44, Dec 13, 2004 (UTC)

Contents

[edit] New stuff

I tried to add a more approachable introduction to the ideas behind the algorithm, based roughly on the presentation from Prime Numbers: A Computational Perspective. Please scan my contributions for errors - number theory isn't quite my forte. Thanks. Deco 22:05, 8 October 2005 (UTC)


[edit] Corrections

I'm not familiar with how to edit these pages, but with the example of 1817, you should also produce the term (5,392) with 392=2^3*7^2 being smooth.

Also the line ...

  x^2\equiv 1817\pmod{p}  

should read (x+42)^2\equiv 1817\pmod{p}

[edit] Space complexity

I'm not yet sure whether the recent correction to the space complexity is correct or not. The optimal choice of the smoothness bound B is indeed the square root of the time complexity. On the other hand, if the bit matrix is represented explicitly, it requires B2 space. On the other other hand, we don't normally represent it explicitly - we use a sparse matrix representation that only stores a small number of factors per row, and reduce it using the Lanczos method for large sparse linear systems. We can't guarantee that the matrix will be sparse, but in practice it is. The cited reference referred to Number Field Sieve and so does not apply. Until someone finds an explicit reference describing the space complexity, I've removed any mention of it, since I'm uncertain myself. Deco 18:53, 26 November 2006 (UTC)

  • I searched high and low, but I couldn't find anything on the space complexity, either. Perhaps it's time to ask someone with more knowledge in number theory. On another note, I'd imagine that each main step of the quadratic sieve would have different complexities, with the sieving and matrix-solving steps being particularly large when compared to the other three. Personally, I think that there are still many things that can be added to the article. --Ixfd64 11:55, 27 November 2006 (UTC)
    • My computational number theory book (already referenced in the article) has the following to say in section 6.1.3:
      • There have been suggested at least three alternative sparse-matrix methods intended to replace Gaussian elimination [...] the space required is O(N ln B). Since our factorization matrices have at most O(ln n) nonzero entries per row, the space requirement for the matrix stage of the algorithm, using a sparse encoding, is O(B ln2 n).
    • If you put this together with the optimal choice of B being O\left(\exp\frac{1}{2}\left(\sqrt{\log n \log\log n}\right)\right), this means the space in the matrix stage is:
      • O\left(\log^2 n \exp\frac{1}{2}\left(\sqrt{\log n \log\log n}\right)\right)
    • On the other hand, my book also says that "there are hidden costs for increasing the list of primes p with which we sieve. One is that it is unlikely we can fit the entire sieve array into memory on a computer, so we segment it [...]". It seems difficult to establish which stage dominates in space, although in practice the space issues of the sieve are more easily overcome. As always, little is rigorous when it comes to QS. I suggest that we say that the matrix stage requires the amount of memory shown above, and leave it at that. Deco 12:21, 27 November 2006 (UTC)

[edit] To do

Some stuff that needs to be done in this article:

  • Explain how the asymptotically optimal value of the smoothness bound B is chosen.
  • Explain better the "sign" bit of the vectors and why it's needed.
  • Talk about multiple polynomial variations.
  • Talk about sparse matrix representations and solution methods and their applicability.
  • Talk about how large a region needs to be sieved.
  • Talk about the use of approximate logarithms to speed up calculations.
  • Similarities and differences with number field sieve.
  • Lots more stuff.

Just some things to keep in mind for future expansion. There is plenty to add here. Deco 01:13, 29 November 2006 (UTC)

In the German Article http://de.wikipedia.org/wiki/Quadratisches_Sieb some of thoose questions (like multiple polynomials, approximate logarithms) were discussed. An Example is also given. 87.234.198.146 11:42, 6 February 2007 (UTC)

[edit] Implementing a Quadratic Sieve

Hey, as part of my 3rd year project, one of the things I am doing is implementing this. I think I have found an error in the selection of the Factor Base, could someone please check that 13 should not be in the factor base? Also should 23 be? It's a factor of 1817. I will be further correcting this description if I find what I feel is a mistake, but could people possibly review my changes? Cheers, Tompsci 16:01, 18 January 2007 (UTC)

I don't see why 13 shouldn't be in the factor base. (1817/13) = (10/13) and 6^2 = 10 mod 13. Thus the Legendre symbol is 1. And you can't exclude 23 unless you know in advance that it is a factor of 1817. But you can't know that without factoring the number, which is the object of the exercise. In practice a quadratic sieve implementation will trial divide by all primes upto the largest prime in the factor base, it will check that the number being factored is not prime and it will check that the number being factored is not a perfect power. In all these situations the quadratic sieve will fail to find a factor otherwise. Someone ought to remove the thing about the article being factually incorrect (apart from my comments below), and perhaps the talk page shouldn't be used as a means of asking questions for a 3rd year project. :-) Anyhow, 23 is not mentioned as an element in the factor base in the article is it? wbhart 86.20.38.38 03:45, 11 February 2007 (UTC)
I think the error in question has been sufficiently addressed by wbhart, so I am going to remove the tag. Schneau 17:25, 25 June 2007 (UTC)

[edit] crossover with NFS

In various experiments using best-available-free implementations (ggnfs for the NFS, msieve for MPQS) I found the crossover was nearer 100 than 110 digits; a 102-digit number was 17hrs msieve and 8hrs ggnfs on the same hardware Fivemack 11:42, 7 February 2007 (UTC)

- This is a moving target dependent on:

* the architecture of the hardware used, memory throughput vs processor clock speed vs hard disk speed, etc
* the amount of time invested in optimizing the number field sieve and quadratic sieve
* the "goodness" of the polynomials used with the number field sieve (still an open research problem)
* the particular numbers being factored - wbhart 86.20.38.38 03:45, 11 February 2007 (UTC)

[edit] Fastest factoring method?

The statement that the quadratic sieve is the fastest factoring method for numbers of less than 100 digits and that the number field sieve is the fastest otherwise is factually incorrect. Many numbers of hundreds of decimal digits have been factored with the elliptic curve algorithm and yet the sun would die of heat death before the quadratic sieve or number field sieve would factor them. Also, below about 40 digits, the quadratic sieve is not usually the best algorithm. In practice a cocktail of one or more of SQUFOF, P-1, P+1, elliptic curve method, Pollard-Rho, trial factoring and the Brent variant are used.

The *only* correct statement is that the number field sieve is asymptotically the fastest known algorithm for factoring general integers. The quadratic sieve held this title until the NFS was invented. The elliptic curve method has the same asymptotics, but depends only on the size of the factor, not the number being factored. This makes the elliptic curve algorithm your only hope in certain situations.

For numbers of a special form, many faster algorithms exist, e.g: Zhang's sieve, hybrid QS-GNFS's, the special number field sieve, certain variants of Fermat's algorithm (even special numbers of thousands of digits can be factored with these), etc, etc. Thus the qualifier of "general integer" is required when talking about comparative times. - wbhart 86.20.38.38 04:02, 11 February 2007 (UTC)

Heat death refers to the death of the universe, and according to current models will take 101000 years, which is more than enough time to factor numbers with 100s of digits. I think you meant to say the Sun would die of gravitational collapse. Arvindn 06:15, 12 February 2007 (UTC)

It is the second fastest (randomized) algorithm for general inputs, as stated in the third sentence: "It is a general-purpose factorization algorithm, meaning that its running time depends solely on the size of the integer to be factored, and not on special structure or properties." No other algorithms can ensure a better running time on all possible inputs (except the NFS for big inputs). It is a Las Vegas algorithm, which can ensure the running time under certain mathematical conditions (which hold with a high probability). Of course for some specific Inputs there were faster algorithms. The running time of these algorithms depend on certain conditions of the number to factorize. The running time of the ECM depends on the (length) of the factors. --87.234.198.146 14:45, 20 February 2007 (UTC)

Right, but that is not what the article says in the first sentence. wbhart —The preceding unsigned comment was added by 86.20.36.114 (talk) 12:17, 6 April 2007 (UTC).

[edit] Partial Relations & Cycles = Large Primes?

As I see it the section labeled "partial relations and cycles" is identical to the methods described in section "large primes." The latter seems to do a much more concise job with the description. Am I right or wrong on this?