Purely functional data structure
In computer science, a purely functional data structure is a data structure that can be implemented in a purely functional language.
Definition
The main two data structures differenciate between each other when an arbitrary data structure turns weaker than a purely functional, where the latter is strongly immutable. This restriction ensures that the data structure possesses the advantages of immutable objects: full persistency, quick copy of objects, and thread safety. Efficient purely functional data structures may require the use of lazy evaluation and memoization.
Purely functional data structures are often represented in a different way than their imperative counterparts.[1] For example, an array with constant-time access and update is a basic component of most imperative languages. Many imperative data structures, such as the hash table and binary heap, are based on arrays. An array can be replaced by a map or random access list, which admits a purely functional implementation, but access and update operations may run in logarithmic time. Purely functional data structures can be implemented in imperative and object-oriented languages, but their time and/or space performance may be inferior to that of data structures lacking purely functional properties.
Ensuring that a data structure is purely functional
A data structure is never inherently functional. For example, a stack can be implemented as a singly-linked list. This implementation is purely functional as long as the only operations on the stack return a new stack without altering the old stack. However, if the language is not purely functional, the run-time system may be unable to guarantee immutability.
In order to ensure that a data structure is used in a purely functional way in an impure functional language, modules or classes can be used to ensure manipulation via authorized functions only.
Examples
Here is a list of an abstract data structure with a purely functional implementation:
- Stack (first in, last out) implemented as singly linked list,
- Queue, implemented as real-time queue,
- Double-ended queue, implemented as real-time double-ended queue,
- (Multi)set of ordered elements and map indexed by ordered keys, implemented as red–black tree, or more generally by search tree,
- Priority queue, implemented as Brodal queue
- Random access list, implemented as skew-binary random access list
Design and implementation
In his book Purely Functional Data Structures, computer scientist Chris Okasaki describes techniques used to design and implement purely functional data structures, a small subset of which are summarized below.
Laziness and memoization
Lazy evaluation is particularly interesting in a purely functional language because the order of the evaluation never changes the result of a function. Therefore, lazy evaluation naturally becomes an important part of the construction of purely functional data structures. It allows computation to be done only when its result is actually required. Therefore, the code of a purely functional data structure can, without loss of efficiency, consider similar data which will effectively be used and data which will be ignored. The only computation required is for the first kind of data that will actually be performed.
One of the key tools in building efficient, purely functional data structures is memoization. When a computation is done, it is saved and does not have to be performed a second time. This is particularly important in lazy implementations; additional evaluations may require the same result, but it is impossible to know which evaluation will require it first. There are actually many books that pertain to purely functional data structures that can give you a more in depth sight of laziness and memoization.
Amortized analysis and scheduling
Some data structures, even non-purely-functional ones such as dynamic arrays, admit operation which is efficient (constant time for dynamic arrays) most of the time, and inefficient (linear time for dynamic arrays) rarely. Amortization can then be used to prove that the average running time of the operations are efficient. That is, the few inefficient operations are rare enough, and does not change the asymptotical evolution of the time complexity when a sequence of operations is considered.
In general, having inefficient operations is not acceptable for persistent data structures, because this inefficient operation can be called many times. It is not acceptable either for real-time or imperative systems, where the user may require the time taken by the operation to be predictable. Furthermore, this unpredictability complicates the use of parallelism.
In order to avoid those problems, some data structures allow for the inefficient operation to be postponed, this is called scheduling. The only requirement is that the computation of the inefficient operation ends before the result of the operation is actually needed. A constant part of the inefficient operation is performed simultaneously with the following call to an efficient operation, so that, the inefficient operation is already totally done when it is needed, and each individual operations remains efficient.
Example: queue
For example, amortized queues are composed of two singly linked lists: the front and the reversed rear. Elements are added to the rear list and are removed from the front list. Furthermore, each time that the front queue is empty, the rear queue is reversed and becomes the front queue, while the rear queue becomes empty. The amortized time complexity of each operation is constant. Each cell of the list is added, reversed and removed at most once. In order to avoid the inefficient operation where the rear list is reversed, real-time queues, adds the restriction that the rear list is only as long as the front list. To ensure that the rear list becomes longer than the front list, the front list is appended and reversed to the rear list. Since this operation is inefficient, it is not performed immediately. Instead, it is performed for each of the operations. Therefore, each cell is computed before it is needed, and the new front list is totally computed before the moment when a new inefficient operation needs to be called.
Bibliography
- ↑ Purely functional data structures by Chris Okasaki, Cambridge University Press, 1998, ISBN 0-521-66350-4
External links
- Purely Functional Data Structures thesis by Chris Okasaki (PDF format)
- Making Data-Structures Persistent by James R. Driscoll, Neil Sarnak, Daniel D. Sleator, Robert E. Tarjan (PDF)
- Fully Persistent Lists with Catenation by James R. Driscoll, Daniel D. Sleator, Robert E. Tarjan (PDF)
- Persistent Data Structures from the MIT OpenCourseWare course Advanced Algorithms
- What's new in purely functional data structures since Okasaki? on Theoretical Computer Science StackExchange