Talk:Subset sum problem
From Wikipedia, the free encyclopedia
The Subset Sum problem is a good one for addressing the NP-complete class of problems. There are three reasons for this
- It is an exact, and not an optimal problem
- It has a very simple formal definition and problem statement
- It explicitly uses the constraints of numerical addition as part of the problem.
To understand the true nature of the problem you have to understand that what makes the problem difficult is that the number of binary place values it takes to state the problem needs to be equal to the number of decision variables in the problem. Take N as the number of decision variables, and P as the precision of the problem (stated as the number of binary place values that it takes to state the problem)
P is equivalent to the number of constraints on the problem. With P < N, the problem is underdetermined, and it is easy to find a solution that will fit. With P > N, the problem is overdetermined, and the search space is so small, that it is again easy to find an N. The problem is only difficult when P = N. Addressing any other state of the problem than P = N is effectively changing the subject.
Also inherent in the Subset Problem is a concise exposition of the NP-complete conundrum. That is the conundrum of infeasible problems. Let's say you had a magical way of solving the problem in polynomial time. If you can deliver a feasible solution, then you can say "it doesn't matter how I did it, here is the solution and here is the proof that it is right (it adds up)". However, what if someone, without your knowledge, gives you an infeasible problem? Your magical solution algorithm can not deliver a feasible solution. It will either fail, go on for ever, or deliver a solution for which there is no proof of correctness and hence no one will believe you.
This means that if you actually wanted to solve real subset sum problems, you would have to come up with a polynomially sized algorithm for proving infeasibility for problems where P = N. Anything short of that is an evasion.
Most physical problems in life can be solved quite nicely with a +/- 1% error. Being asked to solve an N = 100 subset sum problem with a precision of +/- 1/(2^100) seems silly and irrelevant. However, what it does do is to state in numerical precision terms the logical complexity of the problem. The advantage of this is that it allows you to use the language of numerical analysis to address what is typically seen as a problem of logical analysis. -- PeterG
- That is actually quite a useful explanation and would probably be better placed in the article than sitting here on the talk page. -- Derek Ross | Talk 04:39, 22 Jun 2004 (UTC)
- You misunderstand the subset sum problem. First, it gives YES/NO, not solutions; it is possible for a provably correct algorithm to not find an actual solution. Second, in order for it to give NO, of course it must be able to handle infeasible sets. Writings about magical solutions to a related, but different, problem do not belong here; especially if they are written as if they apply to this problem. 12.207.151.144 13:59, 9 Apr 2005 (UTC)
I've done something about that. Charles Matthews 19:07, 27 Jun 2004 (UTC)
- Thanks, Charles. Good Job! I didn't feel confident enough about the subject to do it myself. -- Derek Ross | Talk 15:32, 2004 Jun 28 (UTC)
Contents |
[edit] Questions about recent addition
I am quite familiar with complexity theory. The text on the talk page contains several ideas I have never seen before. I have several questions about it.
- Why is it important that "It explicitly uses the constraints of numerical addition as part of the problem"?
- I agree about problem being underdetermined/overdetermined when P<N or P>N but I do not see why it means that finding solution is easy. If I write it as a system of P equations in the most natural way, the equations are not linear and the system does not look easy to solve.
- What is the meaning of "The advantage of this is that it allows you to use the language of numerical analysis to address what is typically seen as a problem of logical analysis."?
I would prefer to see the addition removed from the article (it can certainly stay on talk page) if those concerns are not addressed. If the ideas come from a published source (textbook/paper), a reference would be useful. If they are not published, they should not be in the article. Andris 16:01, Jun 28, 2004 (UTC)
Well, no to the last part, anyway. There is plenty of room on WP for discussion round a problem. Charles Matthews 16:40, 28 Jun 2004 (UTC)
Let me clarify. The main problem is with "undetermined/overdetermined" part. Presently, that claim (that subset sum problem is hard only if P=N) looks incorrect to me. I'm fine with discussion but I would rather not have mathematically incorrect claims.
We can wait for clarification and I can look up more literature but, if we do not find evidence that the claim is correct, I still think it should not be there. The rest of discussion can stay. Andris 17:11, Jun 28, 2004 (UTC)
Yes - if it turns out that 'P < N' is really saying 'when P is much smaller than N', for example, that should be explained and modified, with a clearer explanation. Charles Matthews 18:52, 28 Jun 2004 (UTC)
[edit] "equivalence" of two probems
Let P(s) be the problem of deciding if there is a subset summing to s. The article says that the general case P(s) is "equivalent" to the special case P(0). What does this mean? In complexity theory there are many notions of equivalence. Of course we know they are both NP-complete so there is some polynomial-time transformation between them, but I guess something simpler is meant. Certainly P(0) is a special case of P(s), but how does one use P(0) to solve the general P(s)? --Zero 10:59, 3 Aug 2004 (UTC)
- Good point. I have read the article before but never thought about this place. The simplest thing would be to add -s as one more number. Then, a subset summing to s plus -s gives 0. The problem is that there might be a subset of the original instance that sums to 0 and an algorithm for P(0) might return that. There might be some way of tweaking this but it is not immediately clear. Andris 11:23, Aug 3, 2004 (UTC)
- This works, although I am not sure whether this is what was meant. Find a number t>0 such that everything in the original instance of P(s) is more than -t. Add t to every number in the instance of P(s), making them all positive. Create n instances of P(0), one by adding an extra number -s-t, another by adding an extra -s-2t, etc., the last one by adding extra -s-n*t If the original P(s) instance had a set of k numbers summing up to s, then adding -s-k*t creates a solution to P(0).
- The disadvantage is we have to create a separate instance for each -s-i*t. Putting two of them into one instance might result in a solution to P(0) which involves both of them and corresponds to no solution of P(s). I don't yet see how to bypass that without using multiple instances of P(0). Andris 11:32, Aug 3, 2004 (UTC)
[edit] Accuracy problems with "General discussion" section
The general discussion section states that subset sum is not an optimization problem, but a few lines later it says that there is an approximation algorithm for subset sum. By definition, an approximation algorithm is a solution to an optimization problem, so these two statements seem to be contradictory.
Shlurbee 17:32, 23 December 2005 (UTC)
- I've made the change to 'approximate' algorithm in the section title. There should also be some qualification made where the piped link to approximation algorithm occurs, such as 'although SSP is not strictly ...' . Charles Matthews 08:03, 24 December 2005 (UTC)
In my opinion, the General Discussion section adds nothing to the article and should be removed. Mostly it consists of "refutation" of an uninteresting strawman. --Zero 08:24, 24 December 2005 (UTC)
- The trouble with cutting out all 'thinking aloud' sections is that it tends to make the coverage as a whole less accessible to non-experts. So please don't remove it, if it can be improved instead. Charles Matthews 08:37, 24 December 2005 (UTC)
I added "Although the subset sum problem is a decision problem," and also the phrase "approximation version of" in the last paragraph to make the difference between the standard and approximate versions more clear. I think that and the previous changes resolve the dispute. Shlurbee 14:45, 6 January 2006 (UTC)
[edit] Dynamic programming solution is incorrect?
It is clearly stated that Q(1,s) is true. For any natural number n (excluding 0) we have: Q(n, 0) implies Q(n+1, 0) therefore for any natural n Q(n, 0) is true. This correspond to situation when we take an empty subset. IMO the algorithm should be changed to the fallowing:
The problem can be solved as follows using dynamic programming. Suppose the sequence is
- x1, ..., xn
and we wish to find a subset which sums to 0. Let N be the sum of the negative values and P the sum of the positive values. Define the function
- Q(i,s)
to be 0 if there is no subset of x1, ..., xi which sums to s; 1 if there is a nonempty such subset; or 2 if only empty subset sumst to s (i.e. when s is zero).
(Thus, the value we really want is whether Q(n,0) is 2.)
Clearly
- Q(i,s) = 0
if s<N or s>P. Create an array to hold the values Q(i,s) for 1≤i≤n and N≤s≤P. The array can now be filled in using a simple recursion.
Initialize all Q(1, s) to 0. Let Q(1, 0) be 1. Let Q(1, x1) be 2. For i>1, if Q(i-1,s-xi) is nonzero let Q(i,s) be 2 otherwise let it be value of Q(i-1,s).
Moreover, I think it should be clearly stated that we are interested only in nonempty subsets.
Correct me if I'm wrong or correct the article if I'm right :) -- mina86, 217.96.228.130 11:09, 3 April 2006 (UTC)
Yeah, surely 2N subsets means including the empty subset? That subset doesn't sum to 0 because there's nothing in it. Therefore, it should be 2N − 1 subsets. CloudNine 07:37, 29 April 2006 (UTC)
- You are wrong. By convention, is 0. --Mellum 12:43, 29 April 2006 (UTC)
- (nullary sum. :)