Referential transparency

From Wikipedia, the free encyclopedia

Referential transparency is a property of computer programming languages, essentially stating that any expression may be replaced with its result without changing the behavior of the program.

Contents

[edit] Definition

An expression is said to be referentially transparent if it can be replaced with its value without changing the program (in other words, yielding a program that has the same effects and output on the same input). The opposite term is referentially opaque.

If all functions involved in the expression are pure functions, then the expression is referentially transparent. Also, some impure functions can be included in the expression if their values are discarded.

This is a cornerstone of functional programming, and it allows memoization (caching of function values).

[edit] Examples and counterexamples

Arithmetic operations are referentially transparent: 5*5 can be replaced by 25, for instance.

Many functions, particularly most of those in the mathematical sense, are referentially transparent: sin(x) is transparent, since it will always give the same result for any given x.

The C++ expression x++; is not transparent, as it changes the value of x.

In most languages, print( "Hello world" ) is not transparent, as replacing it by its value (say, 0) changes the behavior of the program, as "Hello world" isn't printed.

today() is not transparent, as if you evaluate it and replace it by its value (say, "Jan 1, 2001"), you don't get the same result as you will if you run it tomorrow.

[edit] Contrast to imperative programming

Another term for a referentially transparent function is a determinate function.

If the substitution of an expression with its value is valid only at a certain point in the execution of the program, then the expression is not referentially transparent. The definition and ordering of these sequence points are the theoretical foundation of imperative programming, and part of the semantics of an imperative programming language.

However, because a referentially transparent expression can be evaluated at any time, it is not necessary to define sequence points nor any guarantee of the order of evaluation at all. Programming done without these considerations is called purely functional programming.

The chief advantage of writing code in a referentially transparent style is that given an intelligent compiler, static code analysis is easier and better code-improving transformations are possible automatically. For example, when programming in C, there will be a performance penality for including a call to an expensive function inside a loop, even if the function call could be moved outside of the loop without changing the results of the program. The programmer would be forced to perform manual code motion of the call, possibly at the expense of source code readability. However, if the compiler is able to determine that the function call is referentially transparent, it can perform this transformation automatically.

The primary disadvantage of languages which enforce referential transparency is that it makes the expression of operations that naturally fit a sequence-of-steps imperative programing style more awkward and less concise. Such languages often incorporate mechanisms to make these tasks easier while retaining the purely functional quality of the language, such as definite clause grammars and monads.

While in mathematics all function applications are referentially transparent, in programming this is not always the case. For example, take a function that takes no parameters and returns input from the keyboard. A call to this function might be GetInput(). The return value of GetInput() depends on what the user types in, so multiple calls to GetInput() with identical parameters (the empty list) may return different results.

A more subtle example is that of a function that uses a global variable (or a dynamically scoped variable, or a lexical closure) to help it compute its results. Since this variable is not passed as a parameter but can be altered, the results of subsequent calls to the function can differ even if the parameters are identical. (In pure functional programming, assignment is not allowed; thus a function that uses global (or dynamically scoped) variables is still referentially transparent, since these variables cannot change.)

The importance of referential transparency is that it allows a programmer (or compiler) to reason about program behavior. This can help in proving correctness, finding bugs that could not be found through testing, simplifying an algorithm, assisting in modifying code without breaking it, or optimizing code by means of memoization, common subexpression elimination or parallelization.

Some functional programming languages enforce referential transparency for all functions.

With referential transparency, no difference is made or recognised between a reference to a thing and the corresponding thing itself. Without referential transparency, such difference can be easily made and utilized in programs.

[edit] Command-Query Separation principle

The Eiffel method, although based on an imperative programming language, enforces a strict separation between commands, which can produce side effects, and queries, which must be referentially transparent: they return a result but do not change the environment. This rule is known as the Command-Query Separation principle and results in a style that clearly separates the referentially transparent parts. For example, in manipulating lists:

my_list.finish        -- Move cursor to the end of the list
value := my_list.item -- Get value at cursor position: referentially transparent

This even affects such strongly imperative features as input:

my_file.read_integer          -- Get next integer; side effect, but no return value
value := my_file.last_integer -- Get last integer read: referentially transparent

Calling last_integer several times in a row is guaranteed to yield the same result each time.

[edit] Other example

As an example, let's use two functions, one which is referentially opaque, and the other which is referentially transparent:

globalValue = 0;
integer function rq(integer x)
begin
  globalValue = globalValue + 1;
  return x + globalValue;
end
integer function rt(integer x)
begin
  return x + 1;
end

Now, rt is referentially transparent, which means that rt(x) = rt(x) as long as x is the same value. For instance, rt(6) = rt(6) = 7, rt(4) = rt(3+1) = 5, and so on. However, we can't say any such thing for rq because it uses a global value which it modifies.

So, how is this a bad thing? Well let's say we want to do some reasoning about the following chunk of code:

integer p = rq(x) + rq(y) * (rq(x) - rq(x));

Now, right off-hand, one would be tempted to simplify this line of code to:

integer p = rq(x) + rq(y) * (0) = 
integer p = rq(x) + 0           = 
integer p = rq(x);

However, this will not work for rq() because rq(x) does not equal rq(x)! Remember, that the return value of rq is based on a global value which isn't passed in and which gets modified all over the place. This goes against common sense since anything minus itself should be zero.

This however will work for rt, because it is a referentially transparent function.

Therefore we can reason about our code which will lead to more robust programs, the possibility of finding bugs that we couldn't hope to find by testing, and even the possibility of seeing opportunities for optimization.

[edit] See also