Symbolic execution

In computer science, symbolic execution (also symbolic evaluation) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would, a case of abstract interpretation. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch.

The field of symbolic simulation applies the same concept to hardware. Symbolic computation applies the concept to the analysis of mathematical expressions.

Example

Consider the program below, which reads in a value and fails if the input is 6.

y = read()
y = 2 * y
if (y == 12)
   fail()
print("OK")

When a program analyzer symbolically executes this program, it does not have a concrete number for the input value the result of read() so assigns the symbol 's' to it. The statement "y = read()" thus assigns the value 's' to program variable y. Then the statement "y = 2 * y" assigns the value '2 * s' to y. The next statement has two possible control flows: the true branch and the false branch. Which gets taken when the program executes for real depends upon the input value ('s'). The analyzer associates the constraint '2 * s == 12' with the true branch, which means it recognizes that the "fail()" statement executes in real execution if and only if '2 * s == 12' is true. The constraint "2 * s == 12 is true" is known as a path constraint.

Assume the goal of the analysis is to determine what inputs cause the "fail()" statement to execute. The analyzer uses a constraint solver to determine what values of 's' make '2 * s == 12' true, and thus determines that '6' is the answer.

Limitations

Path Explosion

Symbolically executing all feasible program paths does not scale to large programs. The number of feasible paths in a program grows exponentially with an increase in program size and can even be infinite in the case of programs with unbounded loop iterations.[1] Solutions to the path explosion problem generally use either heuristics for path-finding to increase code coverage,[2] or reduce execution time by parallelizing independent paths.[3]

Program-Dependent Efficacy

Symbolic execution is used to reason about a program path-by-path which is an advantage over reasoning about a program input-by-input as other testing paradigms use (e.g. Dynamic program analysis). However, if few inputs take the same path through the program, there is little savings over testing each of the inputs separately.

Tools

Tool It can analyze Arch/Lang url Can anybody use it/ Open source/ Downloadable
KLEE LLVM http://klee.github.io/ yes
Clang LLVM http://clang.llvm.org/ yes
FuzzBALL VineIL/native http://bitblaze.cs.berkeley.edu/fuzzball.html yes
JPF java http://babelfish.arc.nasa.gov/trac/jpf yes
jCUTE java https://github.com/osl/jcute yes
janala2 java https://github.com/ksen007/janala2 yes
S2E llvm/qemu/x86 http://dslab.epfl.ch/proj/s2e yes
Pathgrind[4] native 32bit valgrind based https://github.com/codelion/pathgrind yes
Mayhem binary http://forallsecure.com/mayhem.html no
Otter C https://bitbucket.org/khooyp/otter/overview yes
SymDroid Dalvik bytecode http://www.cs.umd.edu/~jfoster/papers/symdroid.pdf no
Rubyx Ruby http://www.cs.umd.edu/~avik/papers/ssarorwa.pdf no
Pex .NET Framework http://research.microsoft.com/en-us/projects/pex/ no
Jalangi JavaScript https://github.com/SRA-SiliconValley/jalangi yes
Kite llvm http://www.cs.ubc.ca/labs/isd/Projects/Kite/ yes
pysymemu amd64/native https://github.com/feliam/pysymemu/ yes

History

The concept of symbolic execution was introduced academically with descriptions of: the Select system ,[5] the EFFIGY system ,[6] the DISSECT system ,[7] and Clarke's system .[8] See a bibliography of more technical papers published on symbolic execution.

References

  1. Anand, Saswat; Patrice Godefroid; Nikolai Tillmann (2008). "Demand-Driven Compositional Symbolic Execution". Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science 4963: 367–381. Retrieved 03/04/2013. Check date values in: |accessdate= (help)
  2. Ma, Kin-Keng; Khoo Yit Phang; Jeffrey S. Foster; Michael Hicks (2011). "Directed Symbolic Execution". Proceedings of the 18th International Conference on Statis Analysis: 95–111. Retrieved 03/04/2013. Check date values in: |accessdate= (help)
  3. Staats, Matt; Corina Pasareanu (2010). "Parallel symbolic execution for structural test generation". Proceedings of the 19th international symposium on Software testing and analysis: 183–194. Retrieved 03/04/2011. Check date values in: |accessdate= (help)
  4. "Exploiting Undefined Behaviors for Efficient Symbolic Execution". ICSE 2014. Retrieved 2014-12-02.
  5. Robert S. Boyer and Bernard Elspas and Karl N. Levitt SELECT--a formal system for testing and debugging programs by symbolic execution, Proceedings of the International Conference on Reliable Software, 1975,page 234--245, Los Angeles, California
  6. James C. King,Symbolic execution and program testing, Communications of the ACM, volume 19, number 7, 1976, 385--394
  7. William E. Howden, Experiments with a symbolic evaluation system, Proceedings, National Computer Conference, 1976.
  8. Lori A. Clarke, A program testing system, ACM 76: Proceedings of the Annual Conference, 1976, pages 488-491, Houston, Texas, United States

See also

External links