Data dependency

From Wikipedia, the free encyclopedia

A data dependency in computer science is a situation in which a program statement(instruction) refers to the data of a preceding statement(instruction). In compiler theory, the technique used to discover data dependencies among statements (or instructions) is called Dependence analysis.

There are two types of dependencies: data and control.

Contents

[edit] Data dependencies

Assuming statement S1 and S2. S2 depends on S1 if

 (I (S1) ∩ O(S2)) ∪ (O(S1) ∩ I(S2)) ∪ (O(S1) ∩ O(S2)) ≠ Φ

Where

  • I(Si) is the set of memory locations read by Si and
  • O(Sj) is the set of memory locations written by Sj
  • and there is a feasible run-time execution path from S1 to S2

Three cases exist

  • true (data) dependence: O(S1) ∩ I (S2) , S1-> S2 and S1 writes something read by S2
  • anti dependence: I(S1) ∩ O(S2) , mirror relationship of true dependence
  • output dependence: O(S1) ∩ O(S2), S1->S2 and both write the same memory location.

[edit] True dependency

A true dependency, also known as a data dependency, occurs when an instruction depends on the result of a previous instruction:

1. A = 3
2. B = A
3. C = B

Instruction 3 is truly dependent on instruction 2, as the final value of C depends on the instruction updating B. Instruction 2 is truly dependent on instruction 1, as the final value of B depends on the instruction updating A. Since instruction 3 is truly dependent upon instruction 2 and instruction 2 is truly dependent on instruction 1, instruction 3 is also truly dependent on instruction 1. Instruction level parallelism is therefore not an option in this example. [1]

[edit] Anti-dependency

An anti-dependency occurs when an instruction requires a value that is later updated. In the following example, instruction 3 anti-depends on instruction 2 - the ordering of these instructions cannot be changed, nor can they be executed in parallel (possibly changing the instruction ordering), as this would affect the final value of A.

1. B = 3
2. A = B + 1
3. B = 7

An anti-dependency is an example of a name dependency. That is, renaming of variables could remove the dependency, as in the next example:

1. B = 3
N. B2 = B
2. A = B2 + 1
3. B = 7

A new variable, B2, has been declared as a copy of B in a new instruction, instruction N. The anti-dependency between 2 and 3 has been removed, meaning that these instructions may now be executed in parallel. However, the modification has introduced a new dependency: instruction 2 is now truly dependent on instruction N, which is truly dependent upon instruction 1. As true dependencies, these new dependencies are impossible to safely remove. [1]

[edit] Output dependency

An output dependency occurs when the ordering of instructions will affect the final output value of a variable. In the below example, there is an output dependency between instructions 3 and 1 - changing the ordering of instructions in this example will change the final value of A, thus these instructions cannot be executed in parallel.

1 A = 2 * X
2 B = A / 3
3 A = 9 * Y

As with anti-dependencies, output dependencies are name dependencies. That is, they may be removed through renaming of variables, as in the below modification of the above example:

1 A2 = 2 * X
2 B = A2 /3
3 A = 9 * Y

A commonly used naming convention for data dependencies is the following: Read-after-Write (true dependency), Write-after-Write (output dependency), and Write-After-Read (anti-dependency).

An instruction B is control dependent on a preceding instruction A if the latter determines whether B should execute or not. In the following example, instruction 2 is control dependent on instruction 1.

1.         if a == b goto AFTER
2.         A = 2 * X
3. AFTER:

[1]

[edit] Control Dependence

Intuitively, there is control dependence between two statements S1 and S2 if

  • S1 could be possibly executed before S2
  • The outcome of S1 execution will determine whether S2 will be executed.

A typical example is that there is control dependence between if statement's condition part and the statements in the corresponding true/false bodies.

A formal definition of control dependence can be presented as follows:

A statement S2 is said to be control dependent on another statement S1 iff

  • there exists a path P from S1 to S2 such that every statement Si ≠ S1 within P will be followed by S2 in each possible path to the end of the program and
  • S1 will not necessarily be followed by S2, i.e. there is an execution path from S1 to the end of the program that does not go through S2.

Expressed with the help of (post-)dominance the two conditions are equivalent to

  • S2 post-dominates all Si
  • S2 does not post-dominate S1

[edit] Implications

Conventional programs are written assuming the sequential execution model. Under this model, instructions execute one after the other, atomically (i.e., at any given point of time only one instruction is executed) and in the order specified by the program.

However, dependencies among statements or instructions may hinder parallelism - parallel execution of multiple instructions, either by a parallelizing compiler or by a processor exploiting instruction level parallelism. Recklessly executing multiple instructions without considering related dependences may cause danger of getting wrong results, namely hazard.

[edit] References

  1. ^ a b c John L. Hennessy; David A. Patterson (2003). Computer Architecture: a quantitative approach (3rd ed). Morgan Kaufman. ISBN 1-55860-724-2.