Talk:Brent's method
From Wikipedia, the free encyclopedia
Contents |
[edit] Unclear
The article states:
- then we set e = d, otherwise we set e = m. In the latter case, we do a step according to the bisection method.
Maybe I'm dense, but this is unclear to me. Bisect, using which points? In general, the whole if e then .. part of the article seems opaque, and I can't quite make sense of the intention of the code here. Can someone clarify? linas 23:22, 17 April 2006 (UTC)
BTW, I looked at Mathworld's entry for this. It seems to be a direct rip-off of what's in "Numerical Recipes in C". One minor advantage is that it seems to minimize the number of divisions required (useful in the case of divisions taking a long time, e.g. high-precision work). linas 12:54, 18 April 2006 (UTC)
- I added an algorithm, which should at least be unambiguous. I'll return and see if I can rewrite the text to make it clearer.
- I assume the "Numerical Recipes" book has a good reason for their formulation of inverse quadratic interpolation (perhaps it's minimizing the number of divisions, as you suggest), but I don't know it, which is why I stick to the more obvious formulation. -- Jitse Niesen (talk) 08:11, 19 April 2006 (UTC)
-
- Thank you! FWIW, I've now implemented a one-off version in GMP. For my data, the thing very nearly doubles the number of sig figs each iteration, which is a vast improvement over linear interpolation or binary search. For your amusement (my amusement?), heres actual output, showing width of the bracket bracketing the zero, per iteration:
# count=0 delt 0.9958715e-1 # count=1 delt 0.1000055e0 # count=2 delt 0.4072413e-3 # count=3 delt 0.1123433e-7 # count=4 delt 0.1257072e-14 # count=5 delt 0.1586553e-28 # count=6 delt 0.6180884e-54 # count=7 delt 0.3400586e-100
linas 13:55, 21 April 2006 (UTC)
[edit] Questions about Brent's method
To Jitse Niesen:
1. How do you know that the method will "fall back to the more robust bisection method if necessary"?
2. What is the justification for assuming that the root is closer to b than to a when you choose between d and m?
3. Would it not be better to make the new c equal to whichever of the old a or old b does not become the new a, so that c will always (after the first iteration) be the most recently generated value outside the interval [a,b] or [b,a]? Thus the last part of the code would become:
- if f(b) f(d) < 0 then
- c := a
- a := b
- else
- c := b
- (a remains a)
- end if
- b := d
Thanks for your assistance. JRSpriggs 07:02, 25 April 2006 (UTC)
- Explaining my third question. c is only used to perform inverse quadratic interpolation. But that interpolation will not be done if c is the same as either a or b. So we should try to avoid choosing a value for c which will be equal to a when in the "then" case. That is why I am suggesting that it should be the old a rather than the old b which is the new a. JRSpriggs 06:38, 26 April 2006 (UTC)
-
- Ad 1. I suppose that you realize that the statement d := m is the bisection method. As for how I know that the method falls back to the bisection method: I know it because I read it in Atkinson (as referenced in the article). I've looked a bit into it now, and this is discussed in Sections 4.2 and 4.3 in Brent. However, Atkinson's description of Brent's method is not complete, hence the algorithm given in the article is not quite the same as Brent's method, and therefore, the algorithm in the article might not always fall back to the bisection method. I'll look into it when I have some time.
- Ad 2. b is the latest guess for the root, and since the guesses converge to the root, b is usually a much better approximation than a. Perhaps the work assume should not have been used. It is not certain that b is closer to the root, but typically that is the case, and it makes sense to use this.
- Ad 3. I see your point. Let me think a bit about this and consult the original code. Brent does not say anything about it, as far as I can see. -- Jitse Niesen (talk) 13:19, 26 April 2006 (UTC)
Counter-example: If I understand the current version of Brent's method, it will NOT fall back on bisection in the following situation. Let f(x)=x^3+x^2-5x+3=(x+3)(x-1)^2. Let the initial values for a=-4 and b=+4/3. So that initially, f(a)=-25 and f(b)=13/27. For this problem, the bisection method will converge slowly to -3. Newton's method (beginning at b) will converge slowly to +1. The secant method will converge slowly to +1. The inverse quadratic method will converge slowly to +1. And Brent's method will NOT converge, because a will remain -4 while b slowly converges to +1. So the length of the interval from a to b will never drop below 5. JRSpriggs 04:40, 27 April 2006 (UTC)
- Hmm, yes, I guess I have no choice but to add the complete algorithm as given by Brent. Not sure what's the best way to present it though. Perhaps I should start with Dekker's algorithm and then say what Brent added to make sure that it falls back to the bisection method. Let me see whether I can track down the reference to Dekker's algorithm. -- Jitse Niesen (talk) 11:39, 27 April 2006 (UTC)
- I do not know what Brent would have done. But if it was up to me to write a program with what I know now, I would force a bisection step after any step which did not reduce the length of the interval to half or less of its previous length. JRSpriggs 06:28, 18 May 2006 (UTC)
I have taken the liberty of modifying the algorithm to: (1) agree with the verbal text as far as I can make sense of it; (2) avoid the problems which I mentioned above in this section of talk; and (3) add more detail to the input and output parts. If I have gotten any of it wrong (contrary to Brent, that is), feel free to fix it. JRSpriggs 04:08, 14 June 2006 (UTC)
- Regarding the third point, is that really necessary? I'd say that things like checking of inputs, explicitly mentioning that f(a) should be calculated, and outputting the result go to much into detail. After all, this is an encyclopaedia and not a source code repository (see WP:NOT). I added the algorithm to supplement the text, which is hard to follow, but I don't think it should be turned into a computer program. -- Jitse Niesen (talk) 05:34, 14 June 2006 (UTC)
- I used to be a computer programmer, so I have a strong habit of adding more and more detail until I have a workable program. I could change it back, if you think that that would make it more readable. JRSpriggs 05:44, 14 June 2006 (UTC)
-
- As far as mentioning when f should be calculated, I was trying to make clear that it only needs to be calculated once per iteration (and twice during initialization) because f might be arbitrarily difficult to calculate and it might be the limiting cost of performing the algorithm. JRSpriggs 07:52, 15 June 2006 (UTC)
-
-
- That's a good point. -- Jitse Niesen (talk) 11:37, 16 June 2006 (UTC)
-
[edit] How slow can Brent's method be?
Since Jitse added "Brent ... inserted an additional test which must be satisfied before the result of the secant method is accepted as the next iterate. ... otherwise the midpoint m is used for the next iterate. This modification ensures that a bisection step is done if the secant method does not progress fast enough. Brent proved that his method requires at most N2 iterations, where N denotes the number of iterations for the bisection method." to explain part of how Brent modified Dekker's method. Each iteration reduces the length of the interval within which the root can lie. If an iteration using either the inverse quadratic interpolation or secant method reduces the length to half or less of its previous length, then it is better than a bisection step. Otherwise, a bisection step is used as the next iteration. So the length is halved or better in either one or two iterations. Thus no more than two times as many iterations are necessary to reduce the length by the same amount as the bisection method does to reach convergence. So I do not see how you can say it is N-squared rather than two times N. JRSpriggs 04:24, 14 June 2006 (UTC)
- The extra test that Brent introduced does not look at the length of the interval [a_k, b_k], but at the difference b_{k-1} − b_k. So there is no guarantee that the length of the interval [a_k, b_k] is halved every second iteration. -- Jitse Niesen (talk) 05:09, 14 June 2006 (UTC)
- PS: What do you think about adding an example; would that help to explain the method? -- Jitse Niesen (talk) 05:11, 14 June 2006 (UTC)
-
- Then I should take out the "if |b-a| > L/2 then (force bisection step)" if-statement, but that would make the method fail again because of the counter-example I gave in the previous section of talk.
- Regarding an example, we could add "debug" print statements to the algorithm and then gave a listing of the printout they would generate. I do not have a compiler presently, so I cannot write it now. I tried doing it by hand, and it quickly becomes very tedious (after about 3 iterations). JRSpriggs 05:58, 14 June 2006 (UTC)
How clever of you to use my "counter-example" as your example. Now I see that I misunderstood the condition for using m instead of s. I thought that it was as given in the version of the algorithm that was there when I first looked at this and which was used by Dekker. So, I will try to change the algorithm to use Brent's actual test. JRSpriggs 07:57, 15 June 2006 (UTC)
- Well, your "counter-example" illustrates a lot of cases, which is the main reason why I decided to use it. And that reminds me, the condition is actually slightly different in Brent's algorithm.
- However, I think it still may happen that the b_n converge to a root, but the a_n stay at the same location. But that does not mean that the algorithm has failed: if the b_n converge, we have a root and we're happy. :) -- Jitse Niesen (talk) 11:37, 16 June 2006 (UTC)
- What then is the test for convergence? We cannot just compare the length of [a,b] to ε then. JRSpriggs 08:35, 17 June 2006 (UTC)
- Actually, on checking Brent's book I discovered that I was wrong. The length of [a,b] always converges to zero, and the convergence test is whether the length of [a,b] is small enough. -- Jitse Niesen (talk) 11:19, 17 June 2006 (UTC)
- What then is the test for convergence? We cannot just compare the length of [a,b] to ε then. JRSpriggs 08:35, 17 June 2006 (UTC)
[edit] Swap all ak with bk or just the last?
I am wondering whether, when we swap the most recent a with b, we should also swap the earlier values? So that instead of "if |f(a)| < |f(b)| then swap (a,b) end if" we would have "if |f(a)| < |f(b)| then swap (a,b) and swap (p,c) and swap (q,d) end if" where p would be the old a and q would be the old p. JRSpriggs 11:37, 18 June 2006 (UTC)
- No, just the last. So "if |f(a)| < |f(b)| then swap (a,b) end if" is correct. -- Jitse Niesen (talk) 12:56, 18 June 2006 (UTC)
[edit] "f(a)*f(s) <= 0"
The condition "f(a)*f(s) <= 0" in the if-else at the end of the pseudocode was originally "f(a)*f(s) < 0" which I believe did not work for the following reason: if f(s)=0, then f(a)*f(s)=0 and we would land in the 'else' side, hence setting a := s. Upon the next repeat until we would then miss that s was actually a root.
On the other hand, if the test is f(a)*f(s) <= 0, we have the following possibilities:
- 1. if f(s)=0 then b := s, the next repeat until is the last since f(b)=0, and b is returned
- 2. if f(a)=0 then
- 2.a if f(b)!=0 then we swap a and b just after in "if |f(a)| <= |f(b)| then swap (a,b) end if" and a will therefore be identified as the root in the next repeat until test.
- 2.b or f(b)=0 and b is a root as well.
Erwan.lemonnier 14:07, 15 December 2006 (UTC)
- If f(s)=0, then a would become s and the argument which you gave for case 2 would work for case 1 as well. I am reverting both additions of equal signs. JRSpriggs 06:14, 16 December 2006 (UTC)
[edit] Discrepancy between discussion and example algorithm?
-
- d := c
- c := b
- if f(a) f(s) < 0 then b := s else a := s end if
- if |f(a)| < |f(b)| then swap (a,b) end if
At the end, it seems that the previous value of a could end up in c instead of b, whereas the discussion of Brent's improvement discusses using the old values of b. To match that discussion, surely the top 2 lines I've quoted should be after the other 2, not before? 193.133.239.201 09:38, 24 April 2007 (UTC)
- The algorithm is correct. Do not take the words too seriously. a itself is usually a previous value of b as are c and d. s is a probable future value. JRSpriggs 10:17, 24 April 2007 (UTC)
[edit] Possible algorithm error
Hello,
Towards the end of the algorithm there is the statement:
- d := c
however d is never mentioned earlier or later as a variable. Is it possible that it's supposed to be 's'? I see that in earlier incarnations of the page 's' was called 'd'. —Preceding unsigned comment added by 132.206.77.89 (talk • contribs)
- No. The variable d is used in one line
-
- if s is not between (3a + b)/4 and b or (mflag is set and |s−b| ≥ |b−c| / 2) or (mflag is cleared and |s−b| ≥ |c−d| / 2) then
- And it is supposed to be there and is distinct from s. JRSpriggs 03:11, 6 July 2007 (UTC)
How come d is used before it is declared?
[edit] D declaration?
As stated above it isn't clear what d is initialized to, this needs a little attention from someone that understands the algorithm better than I do. 58.6.101.164 (talk) 13:42, 14 February 2008 (UTC)
- As JRSpriggs says, d is only used in one line:
- if s is not between (3a + b)/4 and b or (mflag is set and |s−b| ≥ |b−c| / 2) or (mflag is cleared and |s−b| ≥ |c−d| / 2) then
- In the first iteration of the repeat loop, mflag is set so it doesn't matter what d is. In all other iterations, the statement d := c has been executed so it's clear what d is. Does this answer your question? -- Jitse Niesen (talk) 13:54, 14 February 2008 (UTC)
[edit] Question on Brent's extra tests
Hello,
The article states a few things without giving a precise reason for them, could you pease explain the following?
- Why does the "convergence speed test" depend on whether or not interpolation was used in the previous step? Why is it not possible to require whatever the method used in the previous step, even if interpolation was used?
- When using inverse quadratic interpolation, why does "the condition for accepting s (the value proposed by either linear interpolation or inverse quadratic interpolation)" have to be changed to "s has to lie between (3ak + bk) / 4 and bk"? (which is less constraining than "s has to lie between (ak + bk) / 2 and bk")
Thanks for your help. laug (talk) 12:47, 10 May 2008 (UTC)
- I do not know. It has been a long time since I worked on this article, and I do not have the reference. But remember, this is Brent's method, not Niesen's method, Spriggs's method, or Laug's method. If I were writing the program freely for my own use, I would change the algorithm in some ways. But it would not then be Brent's method. And an article on Spriggs's method would be Original Research which is frowned on here. JRSpriggs (talk) 19:58, 14 May 2008 (UTC)
- Also, notice that the only way to guarantee that the convergence will not be slower than the Bisection method is to use the Bisection method itself which also guarantees that it will not converge faster. JRSpriggs (talk) 00:36, 18 May 2008 (UTC)