Symbolic execution

In computer science, symbolic execution (also symbolic evaluation) is a means of analyzing a program to determine what inputs cause each part of a program to execute. An interpreter follows the program, assuming symbolic values for inputs rather than obtaining actual inputs as normal execution of the program would, a case of abstract interpretation. It thus arrives at expressions in terms of those symbols for expressions and variables in the program, and constraints in terms of those symbols for the possible outcomes of each conditional branch.

The field of symbolic simulation applies the same concept to hardware. Symbolic computation applies the concept to the analysis of mathematical expressions.

Example

Consider the program below, which reads in a value and fails if the input is 6.

y = read()
y = 2 * y
if (y == 12)
   fail()
print("OK")

When a program analyzer symbolically executes this program, it does not have a concrete number for the input value — the result of read() — so assigns the symbol 's' to it. The statement "y = read()" thus assigns the value 's' to program variable y. Then the statement "y = 2 * y" assigns the value '2 * s' to y. The next statement has two possible control flows: the true branch and the false branch. Which gets taken when the program executes for real depends upon the input value ('s'). The analyzer associates the constraint '2 * s == 12' with the true branch, which means it recognizes that the "fail()" statement executes in real execution if and only if '2 * s == 12' is true. The constraint "2 * s == 12 is true" is known as a path constraint.

Assume the goal of the analysis is to determine what inputs cause the "fail()" statement to execute. The analyzer uses a constraint solver to determine what values of 's' make '2 * s == 12' true, and thus determines that '6' is the answer.

Limitations

Path Explosion

Symbolically executing all feasible program paths does not scale to large programs. The number of feasible paths in a program grows exponentially with an increase in program size and can even be infinite in the case of programs with unbounded loop iterations.^[1] Solutions to the path explosion problem generally use either heuristics for path-finding to increase code coverage,^[2] or reduce execution time by parallelizing independent paths.^[3]

Program-Dependent Efficacy

Symbolic execution is used to reason about a program path-by-path which is an advantage over reasoning about a program input-by-input as other testing paradigms use (e.g. Dynamic program analysis). However, if few inputs take the same path through the program, there is little savings over testing each of the inputs separately.

Tools

Tool	It can analyze Arch/Lang	url	Can anybody use it/ Open source/ Downloadable
KLEE	LLVM	http://klee.github.io/	yes
Clang	LLVM	http://clang.llvm.org/	yes
FuzzBALL	VineIL/native	http://bitblaze.cs.berkeley.edu/fuzzball.html	yes
JPF	java	http://babelfish.arc.nasa.gov/trac/jpf	yes
jCUTE	java	https://github.com/osl/jcute	yes
janala2	java	https://github.com/ksen007/janala2	yes
S2E	llvm/qemu/x86	http://dslab.epfl.ch/proj/s2e	yes
Pathgrind^[4]	native 32bit valgrind based	https://github.com/codelion/pathgrind	yes
Mayhem	binary	http://forallsecure.com/mayhem.html	no
Otter	C	https://bitbucket.org/khooyp/otter/overview	yes
SymDroid	Dalvik bytecode	http://www.cs.umd.edu/~jfoster/papers/symdroid.pdf	no
Rubyx	Ruby	http://www.cs.umd.edu/~avik/papers/ssarorwa.pdf	no
Pex	.NET Framework	http://research.microsoft.com/en-us/projects/pex/	no
Jalangi	JavaScript	https://github.com/SRA-SiliconValley/jalangi	yes
Kite	llvm	http://www.cs.ubc.ca/labs/isd/Projects/Kite/	yes
pysymemu	amd64/native	https://github.com/feliam/pysymemu/	yes

History

The concept of symbolic execution was introduced academically with descriptions of: the Select system ,^[5] the EFFIGY system ,^[6] the DISSECT system ,^[7] and Clarke's system .^[8] See a bibliography of more technical papers published on symbolic execution.

References

↑ Anand, Saswat; Patrice Godefroid; Nikolai Tillmann (2008). "Demand-Driven Compositional Symbolic Execution". Tools and Algorithms for the Construction and Analysis of Systems, Lecture Notes in Computer Science 4963: 367–381. Retrieved 03/04/2013. Check date values in: |accessdate= (help)
↑ Ma, Kin-Keng; Khoo Yit Phang; Jeffrey S. Foster; Michael Hicks (2011). "Directed Symbolic Execution". Proceedings of the 18th International Conference on Statis Analysis: 95–111. Retrieved 03/04/2013. Check date values in: |accessdate= (help)
↑ Staats, Matt; Corina Pasareanu (2010). "Parallel symbolic execution for structural test generation". Proceedings of the 19th international symposium on Software testing and analysis: 183–194. Retrieved 03/04/2011. Check date values in: |accessdate= (help)
↑ "Exploiting Undefined Behaviors for Efficient Symbolic Execution". ICSE 2014. Retrieved 2014-12-02.
↑ Robert S. Boyer and Bernard Elspas and Karl N. Levitt SELECT--a formal system for testing and debugging programs by symbolic execution, Proceedings of the International Conference on Reliable Software, 1975,page 234--245, Los Angeles, California
↑ James C. King,Symbolic execution and program testing, Communications of the ACM, volume 19, number 7, 1976, 385--394
↑ William E. Howden, Experiments with a symbolic evaluation system, Proceedings, National Computer Conference, 1976.
↑ Lori A. Clarke, A program testing system, ACM 76: Proceedings of the Annual Conference, 1976, pages 488-491, Houston, Texas, United States

External links

Symbolic Execution for finding bugs