Linear search

From Wikipedia, the free encyclopedia

In computer science, linear search is a search algorithm, also known as sequential search, that is suitable for searching a set of data for a particular value.

It operates by checking every element of a list one at a time in sequence until a match is found. Linear search runs in O(N). If the data are distributed randomly, on average N/2 comparisons will be needed. The best case is that the value is equal to the first element tested, in which case only 1 comparison is needed. The worst case is that the value is not in the list (or is the last item in the list), in which case N comparisons are needed.

The simplicity of the linear search means that if just a few elements are to be searched it is less trouble than more complex methods that require preparation such as sorting the list to be searched or more complex data structures, especially when entries may be subject to frequent revision. Another possibility is when certain values are much more likely to be searched for than others and it can be arranged that such values will be amongst the first considered in the list.

The following pseudocode describes the linear search technique.

For each item in the list:
  Check to see if the item you're looking for matches the item in the list.
    If it matches.
      Return the location where you found it (the index).
    If it does not match.
      Continue searching until you reach the end of the list.

If we get here, we know the item does not exist in the list. Return -1.

In computer implementations, it is usual to search the list in order, from element 1 to N (or 0 to N - 1, if array indexing starts with zero instead of one) but a slight gain is possible by the reverse order. Suppose an array A having elements 1 to N is to be searched for a value x: if not found, the result is zero.

for i:=N:1:-1 do                 %Search from N down to 1. (The step is -1)
 if A[i] = x then QuitLoop i;
next i;
Return(i);          %Or otherwise employ the value.

Implementations of the loop must compare the index value i to the final value to decide whether to continue or terminate the loop. If this final value is some variable such N then a subtraction (i - N) must be done each time, but in going down from N the loop termination condition is for a constant, and moreover a special constant. In this case, zero. Most computer hardware allows the sign to be tested, especially the sign of a value in a register, and so execution would be faster. In the case where the loop was for arrays indexed from zero, the loop would be for i:=N - 1:0:-1 do and the test on the index variable would be for it negative, not zero.

The pseudocode as written relies on the value of the index variable being available when the for-loop's iteration is exhausted, as being the value it had when the loop condition failed, or a 'QuitLoop' was executed. Some compilers take the position that on exit from a for-loop no such value is defined, in which case it would be necessary to copy the index variable's value to a reporting variable before exiting the loop, or to use another control structure such as a while loop, or else explicit code with go to statements in pursuit of the fastest-possible execution.

The following Code example for the Java programming language is a simple implementation of a linear search.

private int linearSearch(int a[], int valueToFind) {
  //a[] is an array of integers to search.
  //valueToFind is the number that will be found.
  //The function returns the position of the value if it was found.
  //The function returns -1 if valueToFind was not found.
  for (int i=0; i<a.length; i++) {
    if (valueToFind == a[i]) {
      return i;
    }
  }
  return -1;
}

The List module in the OCaml standard library defines a function called "mem" that returns true if the given element is in the given list or false if not. This function could be defined as:

let rec mem x = function
    [] -> false
  | h :: t -> h=x || mem x t

Mathematica's unusually powerful pattern matching allows linear search to be implemented by a pattern match:

Mem[x_, {___, x_, ___}] := True
Mem[_, _] := False

Linear search can be used to search an unordered list. The more efficient binary search can only be used to search an ordered list.

If more than a small number of searches are needed, it is advisable to use a more efficient data structure. One approach is to sort and then use binary searches. Another common one is to build up a hash table and then do hash lookups.

[edit] See also

[edit] References

  • Donald Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89685-0. Section 6.1: Sequential Searching, pp.396–408.

[edit] External links