Set (computer science)
From Wikipedia, the free encyclopedia
In computer science, a set is a collection (container) of certain values without any particular order, and no repeated values. It corresponds with the mathematical concept of set, but with the restriction that it has to be finite. Disregarding sequence, and the fact that there are no repeated values, it is the same as a list. A set can be seen as an associative array (partial mapping) in which the value of each key-value pair is ignored.
Contents |
[edit] Implementations
Sets can be implemented using various data structures. Ideal set data structures make it efficient to check if an object is in the set, as well as enabling other useful operations such as iterating through all the objects in the set, performing a union or intersection of two sets, or taking the complement of a set in some limited domain. Any associative array data structure can be used to implement a set by letting the set of keys be the elements of the set and ignoring the values. Because of the similarity to associative arrays, sets are commonly implemented in the same ways, namely, a self-balancing binary search tree for sorted sets (which has O(log n) for most operations), or a hash table for unsorted sets (which has O(1) average-case, but O(n) worst-case, for most operations).
Other popular methods include arrays. In particular a subset of the integers 1..n can be implemented efficiently as an n-bit bit array, which also support very efficient union and intersection operations. A Bloom map implements a set probabilistically, using a very compact representation but risking a small chance of false positives on queries.
However, very few of these data structures support set operations such as union or intersection efficiently. For these operations, more specialized set data structures exist.
[edit] Language Support
One of the earliest languages to support sets was Pascal; many languages now include it, whether in the core language or in a standard library. The Java programming language offers the Set
interface to support sets (with the HashSet
class implementing it using a hash table), and the SortedSet
sub-interface to support sorted sets (with the TreeSet
class implementing it using a binary search tree). In C++, STL provides the "set
" class, which implements a sorted set using a binary search tree; and SGI's STL provides the "hash_set
" class, which implements a set using a hash table. Python has a built-in set type, but no set literal.
[edit] Multiset
A variation of the set is the multiset, which is the same as a set data structures, but allows repeated values. Formally, a multiset can be thought of as an associative array that maps unique elements to positive integers, indicating the mulplicity of the element, although actual implementation may vary. C++'s Standard Template Library provides the "multiset
" class for the sorted multiset, and SGI's STL provides the "hash_multiset
" class, which implements a set using a hash table.
[edit] See also
- Bloom map
- Disjoint-set data structure
- Standard Template Library - contains a C++ set class
- NGenerics - implementation in C#