Differential execution
From Wikipedia, the free encyclopedia
Differential execution refers to a method of executing a computer subroutine (See control flow) in such a way that differences from prior executions can be detected and acted upon. If the subroutine is one that walks through a data structure, differential execution can be used to detect changes in the data structure and produce notifications or take actions so as to communicate the changes. This has applications in algorithm animation or graphical user interfaces where a remote image must be kept in agreement with the data structure via a low-bandwidth connection. It has other uses as well.
Contents |
[edit] Brief Intro
This image is a brief intro to differential execution. It combines the ideas of Serialization and SIMD processor enablement masking.
Suppose there is a data structure consisting of
- numeric variable A,
- boolean variable B,
- numeric variable C that only exists if B equals 1,
- numeric variable D that only exists if B equals 0,
- and numeric variable E.
Suppose there are two processors, (1) for reading the data structure, and (2) for writing it. The image illustrates
- writing it when B=1
- reading it
- writing it when B=0
- reading it
If the middle two steps are performed together in parallel, values can be compared and differences noted. Where the two processors are run in parallel, they function as a two-processor SIMD machine, where they achieve conditional execution by selectively enabling each processor.
Differential execution is just a software simulation, in any language, of this two-processor SIMD machine.
[edit] Basic outline
The basic outline of the method is simple. It consists of repeatedly reading the data structure and comparing it against values saved in a sequential file. However, this basic method becomes more elaborate when conditional statements are added.
To illustrate the method, assume a normal programming language, in this case C. The statements of that language are shown in lower case, such as if and while. Further assume that there exists a pair of sequential files, one from which numbers are read (fOld), and one to which numbers are written (fNew), and a pair of boolean variables (initially false) read and write. Reading from fOld only occurs if read is true, and writing to fNew only occurs if write is true.
A particular subroutine (toplevel) can be written to scan the data structure of interest, and it is used in a series of repetitions:
write = true; open fNew as a new file toplevel(); copy fNew to fOld read = true; allow the data structure to change open fNew as a new file toplevel(); copy fNew to fOld repeat the above 4 lines as many times as desired write = false; toplevel(); read = false;
In other words, toplevel is invoked a number of times. write is true in all invocations but the last. read is true in all invocations but the first. Between any two invocations, fNew is copied to fOld, and fNew is opened as a new file.
Suppose there is a subroutine scannumber(int n) written as follows:
void scannumber(int n){ int oldn; if (read) fread( &oldn, fOld); if (write) fwrite( n, fNew); if (!read && write){ n has come into existence. take appropriate action. } if (read && !write){ n has gone out of existence. take appropriate action. } if (read && write && n != oldn){ n has changed. take appropriate action. } }
The purpose of this routine is to be called from toplevel and to detect changes in numbers. Suppose the "data structure" simply consists of global integer variables a, b, c. Then toplevel could be written as:
toplevel(){ scannumber(a); scannumber(b); scannumber(c); }
[edit] Conditional statements
Suppose there is a new boolean global variable bValid indicating whether or not integer b exists. if bValid is false, the scan should skip b. bValid can also change between invocations of toplevel. This produces a complication that the file can have either 2 numbers in it, or 3, and how should this be handled?
The conditional is handled by also scanning the value of bValid, and using that to determine whether to read b, write b, or both. This requires a new boolean-valued function and some structured code. The boolean-valued function is called ifUtil. (Note that boolean values are treated as integers.)
bool ifUtil(bool test){ bool oldtest; if (read) fread( &oldtest, fOld); if (write) fwrite( test, fNew); read &= oldtest; write &= test; return read || write; }
Then the call to scannumber(b) is enclosed in the following structured conditional statement:
{bool svread = read, svwrite = write; if ( ifUtil( bValid )){ scannumber(b); }read = svread; write = svwrite;}
The effect of this is to read/write the value of bValid in the files, and use it to control the execution of scannumber(b). Since ifUtil can optionally turn off either the read or write flag, the flags must be restored when leaving the body of the if statement.
So what does this do? Consider the case where read and write are both true (i.e. all invocations but the first and last). There are four cases:
- bValid is unchanged and false. The call to scannumber(b) is skipped.
- bValid is unchanged and true. The call to scannumber(b) is performed.
- bValid is changed to false. write is turned off, and the call to scannumber(b) is performed, which notes that b has gone out of existence.
- bValid is changed to true. read is turned off, and the call to scannumber(b) is performed, which notes that b has come into existence.
Thus, if bValid was false last time, the b is not read. If bValue is false this time, then b is not written. In this way, the files grow or shrink as needed.
It should be obvious that the body of the conditional statement can contain any number of scannumber statements. It can also contain further conditional statements nested within it as the need arises. It should also be clear that further subroutines can be called from within toplevel, as long as they only call routines that are legal to call from toplevel. For example, to walk a tree structure, a recursive routine can be used.
It should be clear that this method can be generalized to walk any data structure. For example, if instead of bValid being a boolean, there could be a pointer p to a structure, and if that pointer is not null, then the structure can be walked.
Since the conditional as written above is tiresome to write, it can be turned into a pair of macros, as in:
IF(bValid) statements END
Looping is accomplished with a slight variation on the macro above. Simply replace the if statement with while! This can also be turned into a macro, as in:
WHILE( test ) statements END
In order to use this to index over an array of n elements (where n can change), an additional concept is needed, the idea of protected computations.
- Protected computations If any significant computations are to be executed under toplevel but above the level of primitive such as scannumber, they must not be executed when write is false.
In this case it is necessary to initialize a counter variable i and increment it within the loop, but those should only be done it write is true. Fortunately in the C language this is easily done:
(write ? (i = 0) : 0); WHILE( i < n) statements (write ? (i++) : 0); END
The reason for protecting computations is that, when write is false it is because data is going out of existence, so computations on nonexistent data can do unpredictable things. Division by zero, indexing out of arrays, following garbage pointers, and other more subtle errors are possible when computations are not protected.
Another rule is important:
- Data structure cannot change during the execution of toplevel.
The reason for this should be obvious. Any program that walks a data structure generally assumes that the data structure does not change beneath its feet.
When these rules are followed, differential execution is very robust.
Many variations on the theme of differential execution have been tried, such as additional modes for handling user input in graphical user interfaces.
[edit] Discussion
In the example above scannumber is an example of a routine that examines a number to see if it has changed. It is a primitive of differential execution, since it can only be called beneath toplevel and it must respect the convention of reading and writing. Another example could be in the computer graphics domain, where one could have a box(int x, int y, int w, int h) to draw a box on a remote graphic screen. By looking at the history of its arguments, it can decide if it is necessary to draw, erase, redraw the box, or leave it alone. If the files are stored in memory, the computational cost of reading and writing the values is insignificant compared to the cost of drawing the box.
This method works no matter how much the data structure changes between invocations of toplevel, meaning it is not necessary to check the data structure frequently for changes, but only when needed.
This technique is related to the software design methodology called the Linguistic Method.
[edit] Dynamic Dialog
Differential execution has been used as the basis for class called Dynamic Dialog for presenting controls and forms under Microsoft Windows. In this application, each control, such as a button, is a primitive of differential execution. The following animation shows a single routine being used to show, update, and erase a set of four buttons, of which buttons 2 and 3 are conditional on a boolean called 'bMyBool'. Showing is done by setting 'write to true, updating is done by setting 'write' and 'read' to true, and erasing is done by setting only 'read' to true. The two files are shown at the bottom.
This procedure has the property that at the end of every pass (but the last) the set of controls visible is exactly the same as if that had been the first pass. Much more complicated forms can be displayed by calling subroutines within the procedure. Controls that are not visible are not simply invisible, but they actually do not exist. This means that the number of potentially visible controls is not limited by available storage. It is not necessary to perform an update pass after every minor change to the user's data structure (represented here by the variable 'bMyBool'. Updates can be performed as seldom as desired, or as often as desired. This example does not show how user-input events are handled. Briefly, each control primitive is treated as a boolean function, i.e.
if(BUTTON(30, 9, "My Caption")){ statements to do if button is pressed }
This retains locality between the control and its action code, and does not require the programmer to provide a name for the control, which permits an unlimited number of potential controls.
[edit] References
- Dunlavey, "Differential Evaluation: a Cache-based Technique for Incremental Update of Graphical Displays of Structures" Software Practice and Experience 23(8):871-893, August 1993.
[edit] Why it works
Informally, it works for the same reason that binary data Serialization works, such as in the Microsoft Foundation Class Library, in which, for each serializable class, there is a common Serialize function, which can either write or read the data, depending on a boolean bSerializing. Since the order of reading the data and writing the data are the same, to a large extent common code can be used to do both. This ensures that if an instance variable is added to or removed from the class, it is easy to modify the Serialize function accordingly.
If the class contains an array of variable length, when that array is written, first its length is written. When it is read, first the length is read, which tells the routine how many array elements to read.
If the class contains a pointer to a subordinate data structure, where the pointer may be null, obviously the pointer is not written. Rather a boolean value is written indicating if the pointer is non-null. Then if the pointer is non-null the subordinate data structure is serialized. When the function is reading, first it reads the boolean, telling it whether or not to read the subordinate data structure.
If there is a component of the class which either exists or not, depending on a boolean, first the boolean is written, and then if the boolean is true, the component is written. Upon reading, the boolean is read, which tells the reader if the component should be read.
In this way, the sequence of reading operations when deserializing is the same as the sequence of writing operations when serializing.
In serializing, the serialization function writes if bSerializing is true, and reads if it is false. In Differential Execution, there are two booleans bRead and bWrite, controlling the reading and writing independently at the same time, like a Serialize function that can do both at once.
Conceptually, in such a Serialize function a) if it is writing, it is because it has the current data structure, and b) if it is reading, it is getting a copy of the past data structure. Then code to note the differences between the two can be inserted into the Serialize routine. If the sole reason for doing this is to detect those differences, it is not necessary to actually retain the past data structure, but only to read it. In the same way, the writing of the current data does not imply that a current data structure actually has to exist. It can write anything, as long as it is also able to read it.
Consider the component of the class that exists or not based on a boolean variable. The Serialization function contains a conditional statement based on that variable. If it is writing, it writes the value of that variable. If it is reading, it reads the old value of that variable and stores it in a temporary location. If it is writing and the value of the variable is true, it makes sure to write the component. If it reading and the old value of the variable is true, it makes sure to read the component. It can do these independently, or both at the same time. Suppose it does them both at the same time. If it is only writing, it "knows" the object has come into existence. If it is only reading, it "knows" the prior object has gone out of existence. If both, then it can detect changes between current and prior objects. If neither, then it simply skips.
That is what the IF statement in Differential Execution does.
Additional statements are variations on this theme.
The operation of differential execution has been likened to the mechanical operation of the Knitting machine, which has a carriage that passes along a row of hooks containing the previous row of knots, and adds a new row. Different patterns can be created by conditionally altering the settings of the hooks.