Dynamic array

From Wikipedia, the free encyclopedia

A dynamic array, growable array, resizable array, dynamic table, or array list is a data structure, an array which is automatically expanded to accommodate new objects if filled beyond its current size. It may also automatically deallocate some of its unused space to save memory. It has become a popular data structure in modern mainstream programming languages, supplied with most standard libraries.

Note that in this article, a dynamic array is not the same thing as a dynamically-allocated array, which is just an ordinary fixed-size array whose size is selected at runtime.

Contents

[edit] Overview

One of the main disadvantages of a simple array is that it has a single fixed size, and although its size can be altered in some environments (for example, with C's realloc function), this is an expensive operation that may involve copying the entire contents of the array.

Dynamic arrays automatically perform this resizing as late as possible, when the programmer attempts to add an element to the end of the array and there is no more space. However, if we added just one element to the array each time it runs out of space, the cost of the resizing operations rapidly becomes prohibitive.

To deal with this, we instead resize the array by a large amount, such as doubling its size. Then, the next time we need to enlarge the array, we just expand it into some of this reserved space. The amount of space we have allocated for the array is called its capacity, and it may be larger than its current logical size. Here's how the operation adding an element to the end might work in pseudocode:

function insertEnd(dynarray a, element e) {
    if (a.size = a.capacity) {
              /* resize a to twice its current capacity */
        a.capacity := a.capacity × 2;
              /* increasing capacity often requires reallocating
                 memory in a different location; then we will need
                 to copy the contents to the new location here */
    }
    a[a.size] := e;
    a.size := a.size + 1;
}

Using amortized analysis, it can be shown that as long as we expand the array by some fixed percentage each time, the cost for inserting n elements will be O(n); we say the insertions have an amortized cost of O(1) each (see big-O notation for an explanation of this notation).

Many dynamic arrays also deallocate some of the underlying storage if its size drops below a certain threshold, such as 30% of the capacity.

In computer science education, dynamic arrays often serve as an elementary example of amortized analysis because of their simplicity and familiarity.

[edit] Performance

In most ways the dynamic array has performance similar to an array, with the addition of new operations to add and remove elements from the end:

  • Getting or setting the value at a particular index (constant time)
  • Iterating over the elements in order (linear time, good cache performance)
  • Add an element to the end (constant amortized time)
  • Remove an element from the end (constant amortized time, or just constant time if there is no buffer shrinking)

Dynamic arrays benefit from many of the advantages of arrays, including good locality of reference and data cache utilization, compactness (low memory use), and random access. They usually have only a small fixed additional overhead for storing information about the size and capacity. This makes dynamic arrays an attractive tool for building cache-friendly data structures.

The main disadvantage of dynamic arrays compared to normal arrays is that when they grow they reserve a great deal of space (linear space in fact) that may never be used. This becomes especially problematic when there are a large number of such arrays. There are variants however (see Variants) which waste only \sqrt{n} space at any time.

[edit] Analysis

In choosing the percentage by which to enlarge the table, there is a time-space tradeoff: the average time per insertion operation is about a/(a−1), where a is the multiplier factor such as 2 by which the table size increases each time. On the other hand, capacity minus size space is allocated but not in use, a quantity bounded above by (a−1)n. The choice a=2 is a commonly-used value, but depending on the application many values between about a=1.2 and a=4 may be suitable.

[edit] Variants

Dynamic arrays supporting additional operations efficiently can be created. For example, by using cells from the middle of the buffer instead of the beginning of the buffer, one can allow amortized constant time insertion and deletion at both ends, at the expense of keeping track of the starting index of the data (this variation may also require copying during buffer growing more often, since functions like realloc() typically only grow in one direction). See the array-based implementation of deques for more information.

Optionally, this double-ended variant can wrap elements around from the beginning to the end or vice versa when it runs out of room, creating a growable circular buffer which only needs to be enlarged when all cells are full. This is more space-efficient but also destroys the useful property that the array's cells occupy contiguous words of memory, allowing normal indexing.

Another variation, notably used by the library sort algorithm, leaves carefully placed gaps between elements to facilitate rapid insertion in the middle of the list.

One of the problems with the simple dynamic array is that it often has a constant fraction of cells not in use, which wastes space. In a 1999 paper[1], Brodnik et al. describe a dynamic array data structure which wastes only \sqrt{n} space for n elements at any point in time, and they prove a lower bound showing that any dynamic array must waste this much space if the operations are to remain amortized constant time. Additionally, they present a variant where growing and shrinking the buffer has not only amortized but worst-case constant time. However, like the circular buffer variant, these structures have the problem that indexing into the array is somewhat slower in practice.

[edit] Language support

C++'s std::vector is an implementation of dynamic arrays, as are the ArrayList classes supplied with the Java API and the .NET Framework. The generic List<> class supplied with version 2.0 of the .NET Framework is also implemented with dynamic arrays. Delphi and D implements dynamic arrays at the language's core. Many scripting languages such as Perl offer dynamic arrays as a built-in primitive data type.

[edit] See also

  • Gap buffer, a data structure that improves the performance of dynamic arrays under some circumstances.
  • Linked list, an alternative to dynamic arrays, offering faster inserts and deletes at the cost of less efficient seek.

[edit] References

  1. ^ Andrej Brodnik, Svante Carlsson, Erik D. Demaine, J. Ian Munro, and Robert Sedgewick. Resizable Arrays in Optimal Time and Space (1999). Workshop on Algorithms and Data Structures, pp.37–48

[edit] External links