Talk:Interpolation search
From Wikipedia, the free encyclopedia
Contents |
[edit] Error
implementation seems to be defective.
it crashes with an exception!
No, it doesn't... What exceptions are you getting and what parameters do you use??
int[] a={1,2,3,4,5,6,7,13,43,65, 87,111,134,153,413,654,764,875,987};
System.out.println(interpolationSearchb(a,8)); // Prints correctly -1
System.out.println(interpolationSearchb(a,153)); // Prints correct 13
System.out.println(interpolationSearchb(a,999)); // Prints correct -1
--Dodno 08:00, 19 November 2006 (UTC)
- The anon might've run into a signed integer overflow in
(toFind - sortedArray[low]) * (high - low)
. --Mrwojo (talk) 04:27, 30 March 2008 (UTC)
[edit] Higher order
Is higher order interpolation possible ? Frigo 12:58, 4 January 2007 (UTC)
Sure, but except for special cases, would the extra computation be less trouble than a few more probes? The real competitor is the binary chop in one corner and hash table methods in another.NickyMcLean 19:42, 18 February 2007 (UTC)
[edit] Endless repetition
The Java code is littered with multiple comparisons: there are five comparisons against X! Is this really necessary? Especially the addendum to check for the case X = A[L]? A recast would seem in order. Compare with the non-C pseudocode in the binary search discussion page. And for the benefit of non-Java readers, just what are the bounds of the array? That is, if there are N elements, are they indexed as 1:N, or 0:N - 1, or what? Using A.length merely adds to the confusion.
Meanwhile, I've re-ordered the calculation to make more clear the comparison with a binary chop.
public int interpolationSearch(int[] A, int X){ // Returns the index of X in the sorted array A, or -1 if not found int L = 0; //Low bound. int R = A.length - 1; //High bound. int P; //Probe position. while (A[L] < X && A[R] >= X) { P = L + (R - L)*(X - A[L])/(A[R] - A[L]); //Interpolate on the difference X - A[L] //P = L + (R - L)/2 //Binary chop position. if (A[P] < X) L = P + 1; else if (A[P] > X) //Repetition of the comparison code is forced by syntax limitations. R = P - 1; else return P; } if (A[L] == X) return L; else return -1; // Not found }
NickyMcLean 20:10, 18 February 2007 (UTC)
[edit] Complexity of arithmetic?
I think the statement on the page that "In reality, an interpolation search is often no faster..." needs either a recent citation, or should be removed. The costs of branch-cuts and cache-misses in modern architectures should dominate computation of probe location in most cases. —The preceding unsigned comment was added by 18.157.7.102 (talk) 03:48, 24 February 2007 (UTC).
All you have to do is think about it a little. One can revert to actual measurements of cpu time and elapsed time for suitable test runs, but it should be clear that these depend very much on the exact details of the tests and their environment precisely due to fancy modern computer features. As a crude example, I once ran tests purporting to assess how long an arctan(x) took compared to an add(x,y) with the floating-point co-processor: it should surely have been the slowest operation, but there appeared to be no difference. Test programmes for sorting and searching typically prepare an array of simple integers, with no associated information and thus will give little guide to the behaviour of a specific actual usage, in which the key might be complex and there be associated data which is processed on finding the index. In a real application the whole point of finding the index is to do something with it, not discard it as in test runs! If the associated data is remote (as on disc) then this will be far more important than any search of an in-memory array of keys. If the array of keys is itself on disc then minimising such accesses will be important, and so on. None of this can be assessed in simple tests.
So then, look at the pseudocode above. Each cycle as written involves six (or seven, thanks to the repeated comparison to distinguish < from > and =) comparisons to the array of keys, plus some arithmetic on integers. I've decorated the above code with what would be the binary chop probe point, use of which instead would alone remove three key array references and involve much simpler arithmetic. This is a significant difference. Further, the binary chop can be written rather better (this is from the discussion: the pseudocode on the main page is often in error)
L:=0; %Establish outer bounds. R:=N + 1; %One before, and one after, the first and last. Probe:P:=(R - L)/2; %Truncate to integer. Beware integer overflow with (L + R)/2. if P <= 0 then Return(NotFound); %Aha! Nowhere! P:=P + L; %Probe the middle of the span L:R. Case Sign(X - A(P)) %Perform the comparison once. Positive:L:=P, go to Probe; %Shift the left bound up: X follows A(P). Negative:R:=P, go to Probe; %Shift the right bound down: X precedes A(P). Zero:Return(P); %So, X is found, here! X equals A(P). Otherwise; %No other cases are possible.
This has one key array reference per iteration and, one comparison only (if your language's syntax offers a three-way if test or equivalent) plus only very simple integer arithmetic to both control the search and find the next position. Such code will run much more rapidly than the more complex interpolation search, so the question comes down to the likely number of iterations. Twenty binary chop iterations will distinguish amongst a million entries whatever the nature of their distribution of key values, and will have done so with twenty key array accesses and comparisons and twenty runs of simple integer arithmetic. To beat that, the iteration search (as above) would have to make no more than three iterations. If the keys are linearly distributed (exactly), it will manage with one iteration only, but is this likely outside of special cases? NickyMcLean 20:33, 25 February 2007 (UTC)
There are cases where the interpolation may be well worth the effort. If getting a sample for comparison is an expensive operation, quite a lot can be gained even if getting the next sample position is difficult. 217.213.5.57 (talk) 22:44, 10 January 2008 (UTC)