Data-flow analysis

From Wikipedia, the free encyclopedia

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control flow graph is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

A simple way to perform dataflow analysis of programs is to set up dataflow equations for each node of the control flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes, i.e., it reaches a fixpoint. To guarantee termination it is required that the data flow equations form a fixed-point data-flow analysis, i.e., the local updates by the equations are monotone. This general approach was developed by Gary Kildall while teaching at the Naval Postgraduate School.[1]

Contents

[edit] An iterative algorithm

The fixpoint of the dataflow equations can be calculated using the round-robin iterative algorithm:

\texttt{for}\ i\ \leftarrow\ 1\ \texttt{to}\ N
    \mathit{initialize\ node}\ i
\texttt{while}\ (\mathit{sets\ are\ still\ changing}) \texttt{for}\ i\ \leftarrow\ 1\ \texttt{to}\ N \mathit{recompute\ sets\ at\ node}\ i

[edit] The order matters

The efficiency of iteratively solving data-flow equations is influenced by the order at which local nodes are visited. For the above round-robin iterative algorithm it depends in which order nodes are selected from the set of nodes. Furthermore, it depends, whether the data-flow equations are used for forward or backward data-flow analysis over the CFG. In the following, a few iteration orders for solving data-flow equations are discussed (a related concept to iteration order of a CFG is tree traversal of a tree).

  • random order This iteration order is not aware whether the data-flow equations solve a forward or backward data-flow problem. Therefore, the performance is relatively poor compared to specialized iteration orders.
  • postorder This is a typical iteration order for backward data-flow problems. In postorder iteration a node is visited after all its successor nodes have been visited. Typically, the postorder iteration is implemented with the depth-first strategy.
  • reverse postorder This is a typical iteration order for forward data-flow problems. In reverse-postorder iteration a node is visited before all its successor nodes have been visited, except when the successor is reached by a back edge. (Note that this is not the same as preorder.)

[edit] Examples

The following are examples of properties of computer programs that can be calculated by data-flow analysis. Note that the properties calculated by data-flow analysis are typically only approximations of the real properties. This is because data-flow analysis operates on the syntactical structure of the CFG without simulating the exact control flow of the program. However, to be still useful in practice, a data-flow analysis algorithm is typically designed to calculate an upper respectively lower approximation of the real program properties.

[edit] Forward Analysis

  • reaching definitions

The reaching definition analysis calculates for each program point the set of variable values that may potentially reach this program point.

1: if b==4 then
2:    a = 5;
3: else 
4:    a = 3;
5: endif
6:
7: if  a < 4 then
8:    ...

The reaching definition of variable "a" at line 7 is the set {3,5}.

[edit] Backward Analysis

  • live variables

The live variable analysis calculates for each program point the variables that may be potentially read afterwards before their next write update.

1: a = 3;
2: b = 5;
3: c = a + b;

The set of live variables at line 2 is {a,b}, but the set of live variables at line 1 is only {a} since variable "b" is updated in line 2.

[edit] Sensitivities

Data-flow analysis is inherently flow-sensitive. Data-flow analysis is typically path-insensitive, though it is possible to define data-flow equations that yield a path-sensitive analysis.

These terms aren't specific to data-flow analysis. The following defintions should be moved somewhere else and fleshed out with more verbose examples. Wikipedia's inter-document structure for static analysis could use some rework.

  • A flow-sensitive analysis takes into account the order of statements in a program. For example, a flow-insensitive pointer alias analysis may determine "variables x and y may refer to the same location", while a flow-sensitive analysis may determine "after statement 20, variables x and y may refer to the same location".
  • A path-sensitive analysis only considers valid paths through the program. For example, if two operations at different parts of a function are guarded by equivalent predicates, the analysis must only consider paths where both operations execute or neither operation executes. Path-sensitive analyses are necessarily flow-sensitive.
  • A context-sensitive analysis is an interprocedural analysis that takes the calling context into account when analyzing the target of a function call. For example, consider a function that accepts a file handle and a boolean parameter that determines whether the file handle should be closed before the function returns. A context-sensitive analysis of any callers of the function should take into account the value of the boolean parameter to determine whether the file handle will be closed when the function returns.

[edit] Notes

  1. ^ Kildall, Gary (1973). "A Unified Approach to Global Program Optimization". Proceedings of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. Retrieved on 2006-11-20.

[edit] Further reading

  • Aho, Alfred V. Sethi, Ravi. Ullman, Jeffrey D. Compilers: Principles, Techniques and Tools. Addison Wesley. 1986.
  • Appel, Andrew W. Modern Compiler Implementation in ML. Cambridge University Press. 1999.
  • Cooper, Keith D. and Torczon, Linda. Engineering a Compiler. Morgan Kaufmann. 2005.
  • Muchnick, Steven S. Advanced Compiler Design and Implementation. Morgan Kaufmann. 1997.
  • Hecht, Matthew S. Flow Analysis of Computer Programs. Elsevier North-Holland Inc. 1977.
In other languages