Referential transparency (computer science)

From Wikipedia, the free encyclopedia

Referential transparency and referential opaqueness are properties of parts of computer programs. An expression is said to be referentially transparent if it can be replaced with its value without changing the program (in other words, yielding a program that has the same effects and output on the same input). The opposite term is referentially opaque.

While in mathematics all function applications are referentially transparent, in programming this is not always the case. The importance of referential transparency is that it allows a programmer (or compiler) to reason about program behavior. This can help in proving correctness, finding bugs that could not be found through testing, simplifying an algorithm, assisting in modifying code without breaking it, or optimizing code by means of memoization, common subexpression elimination or parallelization.

Referential transparency is one of the principles of functional programming; only referentially transparent functions can be memoized (transformed into equivalent functions which cache results). Some programming languages provide means to guarantee referential transparency. Some functional programming languages enforce referential transparency for all functions.

As referential transparency requires the same results for a given set of inputs at any point in time, a referentially transparent expression is therefore deterministic by definition.

Contents

[edit] Examples and counterexamples

If all functions involved in the expression are pure functions, then the expression is referentially transparent. Also, some impure functions can be included in the expression if their values are discarded and their side effects are insignificant.

Take a function that takes no parameters and returns input from the keyboard. A call to this function might be GetInput(). The return value of GetInput() depends on what the user types in, so multiple calls to GetInput() with identical parameters (the empty list) may return different results. Therefore, GetInput() is neither determined nor referentially transparent.

A more subtle example is that of a function that uses a global variable (or a dynamically scoped variable, or a lexical closure) to help it compute its results. Since this variable is not passed as a parameter but can be altered, the results of subsequent calls to the function can differ even if the parameters are identical. (In pure functional programming, destructive assignment is not allowed; thus a function that uses global (or dynamically scoped) variables is still referentially transparent, since these variables cannot change.)

Arithmetic operations are referentially transparent: 5*5 can be replaced by 25, for instance. In fact, all functions in the mathematical sense are referentially transparent: sin(x) is transparent, since it will always give the same result for each particular x.

Assignments are not transparent. For instance, the C++ expression ++x, which evaluates to x+1, is not transparent, since it changes the value assigned to the variable x. However, calling a function such as int plusone(int x) {return x+1;} is transparent, as it has no such side effects.

In most languages, print( "Hello world" ) is not transparent, as replacing it by its value (say, 0) changes the behavior of the program, as "Hello world" isn't printed.

today() is not transparent, as if you evaluate it and replace it by its value (say, "Jan 1, 2001"), you don't get the same result as you will if you run it tomorrow. This is because it depends on a state (the time).

[edit] Contrast to imperative programming

If the substitution of an expression with its value is valid only at a certain point in the execution of the program, then the expression is not referentially transparent. The definition and ordering of these sequence points are the theoretical foundation of imperative programming, and part of the semantics of an imperative programming language.

However, because a referentially transparent expression can be evaluated at any time, it is not necessary to define sequence points nor any guarantee of the order of evaluation at all. Programming done without these considerations is called purely functional programming.

The chief advantage of writing code in a referentially transparent style is that given an intelligent compiler, static code analysis is easier and better code-improving transformations are possible automatically. For example, when programming in C, there will be a performance penalty for including a call to an expensive function inside a loop, even if the function call could be moved outside of the loop without changing the results of the program. The programmer would be forced to perform manual code motion of the call, possibly at the expense of source code readability. However, if the compiler is able to determine that the function call is referentially transparent, it can perform this transformation automatically.

The primary disadvantage of languages which enforce referential transparency is that it makes the expression of operations that naturally fit a sequence-of-steps imperative programming style more awkward and less concise. Such languages often incorporate mechanisms to make these tasks easier while retaining the purely functional quality of the language, such as definite clause grammars and monads.

With referential transparency, no difference is made or recognized between a reference to a thing and the corresponding thing itself. Without referential transparency, such difference can be easily made and utilized in programs.

[edit] Command-Query Separation principle

The Eiffel method, although based on an imperative programming language, enforces a strict separation between commands, which can produce side effects, and queries, which must be referentially transparent: they return a result but do not change the environment. This rule is known as the Command-Query Separation principle and results in a style that clearly separates the referentially transparent parts. For example, in manipulating lists:

my_list.finish        -- Move cursor to the end of the list
value := my_list.item -- Get value at cursor position: referentially transparent

This even affects such strongly imperative features as input:

my_file.read_integer          -- Get next integer; side effect, but no return value
value := my_file.last_integer -- Get last integer read: referentially transparent

Calling last_integer several times in a row is guaranteed to yield the same result each time.

[edit] Other example

As an example, let's use two functions, one which is referentially opaque, and the other which is referentially transparent:

 globalValue = 0;

 integer function rq(integer x)
 begin
   globalValue = globalValue + 1;
   return x + globalValue;
 end

 integer function rt(integer x)
 begin
   return x + 1;
 end

Now, rt is referentially transparent, which means that rt(x) = rt(y) as long as x has the same value as y. For instance, rt(6) = 6 + 1 = 7, rt(4) = 4 + 1 = 5, and so on. However, we can't say any such thing for rq because it uses a global value which it modifies.

So, how is this a bad thing? Well let's say we want to do some reasoning about the following chunk of code:

integer p = rq(x) + rq(y) * (rq(x) - rq(x));

Now, right off-hand, one would be tempted to simplify this line of code to:

integer p = rq(x) + rq(y) * (0) = 
integer p = rq(x) + 0           = 
integer p = rq(x);

However, this will not work for rq() because each occurrence of rq(x) evaluates to a different value. Remember, that the return value of rq is based on a global value which isn't passed in and which gets modified all over the place. This goes against common sense since anything minus itself should be zero.

This however will work for rt, because it is a referentially transparent function.

Therefore we can reason about our code which will lead to more robust programs, the possibility of finding bugs that we couldn't hope to find by testing, and even the possibility of seeing opportunities for optimization.

[edit] References

  • Harald Sondergaard and Peter Sestoft, Referential transparency, definiteness and unfoldability, Acta Informatica, 1990, Volume 27 , Issue 6, Pages: 505 - 517.

[edit] See also