Talk:Subset sum problem

From Wikipedia, the free encyclopedia

The Subset Sum problem is a good one for addressing the NP-complete class of problems. There are three reasons for this

- It is an exact, and not an optimal problem

- It has a very simple formal definition and problem statement

- It explicitly uses the constraints of numerical addition as part of the problem.

To understand the true nature of the problem you have to understand that what makes the problem difficult is that the number of binary place values it takes to state the problem needs to be equal to the number of decision variables in the problem. Take N as the number of decision variables, and P as the precision of the problem (stated as the number of binary place values that it takes to state the problem)

P is equivalent to the number of constraints on the problem. With P < N, the problem is underdetermined, and it is easy to find a solution that will fit. With P > N, the problem is overdetermined, and the search space is so small, that it is again easy to find an N. The problem is only difficult when P = N. Addressing any other state of the problem than P = N is effectively changing the subject.

Also inherent in the Subset Problem is a concise exposition of the NP-complete conundrum. That is the conundrum of infeasible problems. Let's say you had a magical way of solving the problem in polynomial time. If you can deliver a feasible solution, then you can say "it doesn't matter how I did it, here is the solution and here is the proof that it is right (it adds up)". However, what if someone, without your knowledge, gives you an infeasible problem? Your magical solution algorithm can not deliver a feasible solution. It will either fail, go on for ever, or deliver a solution for which there is no proof of correctness and hence no one will believe you.

This means that if you actually wanted to solve real subset sum problems, you would have to come up with a polynomially sized algorithm for proving infeasibility for problems where P = N. Anything short of that is an evasion.

Most physical problems in life can be solved quite nicely with a +/- 1% error. Being asked to solve an N = 100 subset sum problem with a precision of +/- 1/(2^100) seems silly and irrelevant. However, what it does do is to state in numerical precision terms the logical complexity of the problem. The advantage of this is that it allows you to use the language of numerical analysis to address what is typically seen as a problem of logical analysis. -- PeterG

That is actually quite a useful explanation and would probably be better placed in the article than sitting here on the talk page. -- Derek Ross | Talk 04:39, 22 Jun 2004 (UTC)
You misunderstand the subset sum problem. First, it gives YES/NO, not solutions; it is possible for a provably correct algorithm to not find an actual solution. Second, in order for it to give NO, of course it must be able to handle infeasible sets. Writings about magical solutions to a related, but different, problem do not belong here; especially if they are written as if they apply to this problem. 12.207.151.144 13:59, 9 Apr 2005 (UTC)

I've done something about that. Charles Matthews 19:07, 27 Jun 2004 (UTC)

Thanks, Charles. Good Job! I didn't feel confident enough about the subject to do it myself. -- Derek Ross | Talk 15:32, 2004 Jun 28 (UTC)

Contents

[edit] Questions about recent addition

I am quite familiar with complexity theory. The text on the talk page contains several ideas I have never seen before. I have several questions about it.

  • Why is it important that "It explicitly uses the constraints of numerical addition as part of the problem"?
  • I agree about problem being underdetermined/overdetermined when P<N or P>N but I do not see why it means that finding solution is easy. If I write it as a system of P equations in the most natural way, the equations are not linear and the system does not look easy to solve.
  • What is the meaning of "The advantage of this is that it allows you to use the language of numerical analysis to address what is typically seen as a problem of logical analysis."?

I would prefer to see the addition removed from the article (it can certainly stay on talk page) if those concerns are not addressed. If the ideas come from a published source (textbook/paper), a reference would be useful. If they are not published, they should not be in the article. Andris 16:01, Jun 28, 2004 (UTC)

Well, no to the last part, anyway. There is plenty of room on WP for discussion round a problem. Charles Matthews 16:40, 28 Jun 2004 (UTC)

Let me clarify. The main problem is with "undetermined/overdetermined" part. Presently, that claim (that subset sum problem is hard only if P=N) looks incorrect to me. I'm fine with discussion but I would rather not have mathematically incorrect claims.

We can wait for clarification and I can look up more literature but, if we do not find evidence that the claim is correct, I still think it should not be there. The rest of discussion can stay. Andris 17:11, Jun 28, 2004 (UTC)

Yes - if it turns out that 'P < N' is really saying 'when P is much smaller than N', for example, that should be explained and modified, with a clearer explanation. Charles Matthews 18:52, 28 Jun 2004 (UTC)

[edit] "equivalence" of two probems

Let P(s) be the problem of deciding if there is a subset summing to s. The article says that the general case P(s) is "equivalent" to the special case P(0). What does this mean? In complexity theory there are many notions of equivalence. Of course we know they are both NP-complete so there is some polynomial-time transformation between them, but I guess something simpler is meant. Certainly P(0) is a special case of P(s), but how does one use P(0) to solve the general P(s)? --Zero 10:59, 3 Aug 2004 (UTC)

Good point. I have read the article before but never thought about this place. The simplest thing would be to add -s as one more number. Then, a subset summing to s plus -s gives 0. The problem is that there might be a subset of the original instance that sums to 0 and an algorithm for P(0) might return that. There might be some way of tweaking this but it is not immediately clear. Andris 11:23, Aug 3, 2004 (UTC)
This works, although I am not sure whether this is what was meant. Find a number t>0 such that everything in the original instance of P(s) is more than -t. Add t to every number in the instance of P(s), making them all positive. Create n instances of P(0), one by adding an extra number -s-t, another by adding an extra -s-2t, etc., the last one by adding extra -s-n*t If the original P(s) instance had a set of k numbers summing up to s, then adding -s-k*t creates a solution to P(0).
The disadvantage is we have to create a separate instance for each -s-i*t. Putting two of them into one instance might result in a solution to P(0) which involves both of them and corresponds to no solution of P(s). I don't yet see how to bypass that without using multiple instances of P(0). Andris 11:32, Aug 3, 2004 (UTC)

[edit] Accuracy problems with "General discussion" section

The general discussion section states that subset sum is not an optimization problem, but a few lines later it says that there is an approximation algorithm for subset sum. By definition, an approximation algorithm is a solution to an optimization problem, so these two statements seem to be contradictory.

Shlurbee 17:32, 23 December 2005 (UTC)

I've made the change to 'approximate' algorithm in the section title. There should also be some qualification made where the piped link to approximation algorithm occurs, such as 'although SSP is not strictly ...' . Charles Matthews 08:03, 24 December 2005 (UTC)

In my opinion, the General Discussion section adds nothing to the article and should be removed. Mostly it consists of "refutation" of an uninteresting strawman. --Zero 08:24, 24 December 2005 (UTC)

The trouble with cutting out all 'thinking aloud' sections is that it tends to make the coverage as a whole less accessible to non-experts. So please don't remove it, if it can be improved instead. Charles Matthews 08:37, 24 December 2005 (UTC)

I added "Although the subset sum problem is a decision problem," and also the phrase "approximation version of" in the last paragraph to make the difference between the standard and approximate versions more clear. I think that and the previous changes resolve the dispute. Shlurbee 14:45, 6 January 2006 (UTC)

[edit] Dynamic programming solution is incorrect?

It is clearly stated that Q(1,s) is true. For any natural number n (excluding 0) we have: Q(n, 0) implies Q(n+1, 0) therefore for any natural n Q(n, 0) is true. This correspond to situation when we take an empty subset. IMO the algorithm should be changed to the fallowing:


The problem can be solved as follows using dynamic programming. Suppose the sequence is

x1, ..., xn

and we wish to find a subset which sums to 0. Let N be the sum of the negative values and P the sum of the positive values. Define the function

Q(i,s)

to be 0 if there is no subset of x1, ..., xi which sums to s; 1 if there is a nonempty such subset; or 2 if only empty subset sumst to s (i.e. when s is zero).

(Thus, the value we really want is whether Q(n,0) is 2.)

Clearly

Q(i,s) = 0

if s<N or s>P. Create an array to hold the values Q(i,s) for 1≤i≤n and N≤s≤P. The array can now be filled in using a simple recursion.

Initialize all Q(1, s) to 0. Let Q(1, 0) be 1. Let Q(1, x1) be 2. For i>1, if Q(i-1,s-xi) is nonzero let Q(i,s) be 2 otherwise let it be value of Q(i-1,s).


Moreover, I think it should be clearly stated that we are interested only in nonempty subsets.

Correct me if I'm wrong or correct the article if I'm right :) -- mina86, 217.96.228.130 11:09, 3 April 2006 (UTC)

Yeah, surely 2N subsets means including the empty subset? That subset doesn't sum to 0 because there's nothing in it. Therefore, it should be 2N − 1 subsets. CloudNine 07:37, 29 April 2006 (UTC)


You are wrong. By convention, \sum_{x \in \emptyset} x is 0. --Mellum 12:43, 29 April 2006 (UTC)
(nullary sum. :)

On the page itself, the quote reads: The running time is of order O(2NN), since there are 2N subsets and, to check each subset, we need to sum at most N elements.

This is specifically talking about a computer program (or Turing machine if you like) running an algorithm to solve the problem. The notation is introduced for the specific purpose of calculating the running time. It seems perfectly obvious that the program will never perform any calculations on the empty set, so the existence of the empty set does not have any effect on the running time. Therefore, there are only 2N − 1 subsets being used to calculate the running time, and the running time is O(2N − 1N).86.151.205.224 19:38, 8 September 2007 (UTC)


[edit] awkward definition

Almost everywhere that I have seen subset-sum defined has defined the numbers in the set to be either over the natural numbers or the positive integers. This is certainly so for Garey/Johnson and CLRS, the stated sources. Shouldn't our definition follow these canonical sources? Fremerl 19:22, 13 January 2007 (UTC)

[edit] "any" vs. "some"

I have rephrased the intro here to expressed the problem as finding some non-empty subset for which the sum equals exactly zero. For a layman, at least, the use of any is ambiguous, and suggests that the sum of any subset equals zero. Others may feel that any is more precise, and want to revert this change. That's OK, but perhaps you could find some choice of words that's not ambiguous for the lay reader. Rupert Clayton 12:11, 6 February 2007 (UTC)

[edit] Generalizations

The statement in the generalizations section, "[the subset sum problem] can actually be defined using any group", is not exactly accurate. For example Z_2 (integers modulo 2) under addition is a group, but finding the answer to the subset sum problem for a set of integers in Z_2 alone is trivial - the power set of Z_2 contains only four sets. Thus, I believe n must be a parameter of the problem, rather than as any particular fixed number (which "it can actually be defined using any group" seems to imply is allowed). I will rephrase the offending line. -- Ben-Arba 18:50, 11 February 2007 (UTC)

[edit] Exponential time algorithm not clear

I'm not convinced by the description of the exponential algorithm by Horowitz and Sahni as described in this article. My main point of confusion is on how do you decide on how to split N elements into N/2 sets? Would you get any improvement if you split on negative vs positive integers to solve the sum = 0 special case? Can someone cite to a more in depth discussion of this algorithm?