Automatic bug fixing

Automatic bug-fixing is the automatic repair of software bugs without the intervention of a human programmer. [1][2] It is also commonly referred to as automatic patch generation, automatic bug repair, or automatic program repair. The typical goal of such techniques is to automatically generate correct patches to eliminate bugs in software programs without causing software regression.[3]

Basic concept

At a high level, automatic bug-fixing techniques generate a patch for a program in two steps. The first step is patch generation, which analyzes the original program and produces one or more candidate patches with a variety of techniques such search-based program mutation, machine learning, and genetic programming. The second step is patch validation, which validates the produced candidate patches from the previous step with either a formal specification or a test suite of the program. [4] [5] [6] [7] [8]

Generate-and-validate techniques

The generate-and-validate techniques are arguably the standard approach to generate repairs for a bug of a program. Such a technique typically starts with a test suite of the program, i.e., a set of test cases, at least one of which exposes the bug.[5][8][9][10][11] The technique assumes that the bug can be fixed with a series of mutation operations on the source code of the program. It first uses fault localization techniques to identify suspicious statements in the program. It then applies the mutation operations to the identified statements to generate a space of candidate patches and searches this space to find validated patches that produce correct outputs for all cases in the test suite.[5][6][7][8][9][11]

Generate-and-validate techniques for automatic bug-fixing are still in their infancy. Two earliest generate-and-validate bug-fixing systems are GenProg[5] and ClearView.[4] The effectiveness of generate-and-validate techniques remains controversial,[7][12] because they typically do not provide patch correctness guarantees. Nevertheless, the reported results of recent state-of-the-art techniques are generally promising. For example, on systematically collected 69 real world bugs in eight large C software programs, the state-of-the-art bug-fixing system Prophet generates correct patches for 18 out of the 69 bugs.[8]

Targeted techniques

Targeted automatic bug-fixing techniques generate repairs for specific classes of errors such as null pointer exception [13] [14] ,[15] integer overflow ,[16] buffer overflow ,[16] memory leak ,[17] etc.. Such techniques often use empirical fix templates to fix bugs in the targeted scope. For example, insert a conditional statement to check whether the value of a variable is null[15] or insert missing memory deallocation statements.[17] Comparing to generate-and-validate techniques, targeted techniques tend to have better bug-fixing accuracy but a much narrowed scope.[7][17]

Patch generation

At the patch generation step, automatic bug-fixing techniques use one or more of the followings to generate a set of candidate patches.

Program mutation

One way to generate candidate patches is to apply mutation operators on the original program. Mutation operators manipulate the original program, potentially via its abstract syntax tree representation, or a more coarse-grained representation such as operating at the statement-level or block-level. Earlier genetic improvement approaches operate at the statement level and carry out simple delete/replace operations such as deleting an existing statement or replacing an existing statement with another statement in the same source file.[5][6] Recent machine learning and solver approaches use more fine-grained operators at the abstract syntax tree level to generate more diverse set of candidate patches.[8][9][11][18]

Fix template

Using fix templates is an alternative way to generate candidate patches. Fix templates are typically predefined program mutation rules for fixing specific classes of bugs.[15] Examples of fix templates include inserting a conditional statement to check whether the value of a variable is null to fix null pointer exception,[15] inserting memory deallocation statement to fix memory leaks,[17] and changing a integer constant by one to fix off-by-one errors.[15] Comparing to program mutation techniques, fix templates tend to achieve better candidate patches for bugs within its scope.[4][15] Fix templates are therefore often adopted by targeted techniques.[4][13][15][17]

Code learning and transfer

Machine learning techniques can improve the effectiveness of automatic bug-fixing systems.[8] One example of such techniques learns from past successful patches from human developers collected from open source repositories in GitHub and SourceForge.[8] It then use the learned information to recognize and prioritize potentially correct patches among all generated candidate patches.[8] Alternatively, patches can be directly mined from existing sources. Example approaches include mining patches from donor applications[16] or from QA web sites.[19]

SMT Solver

Another way to generate candidate patches is to use solvers for satisfiability modulo theories (SMT). Solver-based techniques convert a program into SMT formulas. They then query SMT solvers to solve the converted SMT formulas to find candidate patches that allow the patched program to pass all supplied test cases.[9][18] The benefit of this approach is that SMT solvers can quickly find patches passing test cases for small to medium sized formulas. The limitation of this approach is that real world programs are often converted to intractably large formulas especially for modifying statements with side effects.[9][18]

Patch validation

At the patch validation step, automatic bug-fixing techniques rely on either formal specifications or a test suite to validate generated candidate patches. Formal specifications specify the correct behavior of a program and are often only available for specific classes of errors in practice. Therefore, standard generate-and-validate techniques typically rely on the availability of a test suite.

Validation with a test suite

A test-suite - the input/output pairs specify the functionality of the program, possibly captured in assertions can be used as an oracle to validate candidate patches. This oracle can in fact be divided between the bug oracle that exposes the faulty behavior, and the regression oracle, which encapsulates the functionality any program repair method must preserve. Generate-and-validate approaches use automated testing techniques to compile and test each candidate patch produced in the patch generation stop and collect all validated patches that produce expected outputs for all inputs in the test suite .[5][7][8]

Note that a test suite is typically incomplete and does not cover all possible cases. Therefore it is often possible for a validated patch to produce expected outputs for all inputs in the test suite but incorrect outputs for other inputs.[7] The existence of such validated but incorrect patches is a major challenge for generate-and-validate techniques.[7] Recent successful automatic bug-fixing techniques often rely on additional information other than the test suite, such as information learned from previous human patches, to further identify correct patches among validated patches.[8]

Validation with formal specifications

Another way to validate candidate patch is to verify the patched program against formal specifications which specify the correct behavior of the program [20] .[21] Verification against full specifications that specify the whole program behavior including functionalities is less common because such specifications are typically not available in practice and the computation cost of such verification is prohibitive. For specific classes of errors, however, implicit partial specifications are often available. For example, there are targeted bug-fixing techniques validating that the patched program can no longer trigger overflow errors in the same execution path.[16]

Limitations of automatic bug-fixing

Automatic bug-fixing techniques that rely on a test suite do not provide patch correctness guarantees, because the test suite is incomplete and does not cover all cases.[7] A weak test suite may cause generate-and-validate techniques to produce validated but incorrect patches that have negative effects such as eliminating desirable functionalities, causing memory leaks, and introducing security vulnerabilities.[7] This weak test suite issue is alternatively called as "overfitting" of bug-fixing systems .[22] For example, a recent study shows that for 52 out of 55 cases that the previous bug-fixing system GenProg reports to generate repairs, all of the generated patches are incorrect.[7] Recent state-of-the-art systems with machine learning or SMT solver techniques are able to generate correct patches for much more cases on the same benchmark set, but there are still plenty of validated but incorrect patches generated by these systems.[8][18]

Another limitation of generate-and-validate systems is the search space explosion.[23] For a program, there are a large number of statements to change and for each statement there are a large number of possible modifications. State-of-the-art systems often explicitly or implicitly assume that a small modification is enough for fixing a bug. As a result, such systems can currently repair a small proportion of bugs in real world programs.[8][15][18]

Example tools

Automatic bug-fixing is an active research topic in computer science. There are many implementations of various bug-fixing techniques especially for C and Java programs. Note that most of these implementations are research prototypes for demonstrating their techniques, i.e., it is unclear whether their current implementations are ready for industrial usage or not.

GenProg[5][6] and ClearView[4] are two earliest generate-and-validate bug-fixing tools published at 2009. The benchmark collected by GenProg authors contains 69 real world defects and it is widely used to evaluate many other bug-fixing tools in C as well.[6][8][11][18] The state-of-the-art tool evaluated on GenProg benchmark is Prophet,[8] generating correct patches for 18 out of 69 defects.

Tools for C

Here is a list of bug-fixing tools for C programs ordered by the published year.

ClearView:[4] A generate-and-validate tool of generating binary patches for deployed systems. It is evaluated on 10 security vulnerability cases. A later study shows that it generates correct patches for at least 4 of the 10 cases.[7]

GenProg:[5][6] A genetic improvement generate-and-validate bug-fixing tool. It is evaluated on 105 cases and reported to generate repairs for 55 cases. A later study shows that only 69 cases out of the 105 evaluated cases correspond to bugs and correct patches for only 2 cases out of the 69 cases are generated.[7]

SemFix:[9] The first solver-based bug-fixing tool for C.

CodePhage:[16] The first bug-fixing tool that directly transfer code across programs to generate patch for C program. Note that although it generates C patches, it can extract code from binary programs without source code.[16]

LeakFix:[17] A tool that automatically fixes memory leaks in C programs.

Prophet:[8] The first generate-and-validate tool that uses machine learning techniques to learn useful knowledge from past human patches to recognize correct patches. It is evaluated on the same benchmark as GenProg and generate correct patches (i.e., equivalent to human patches) for 18 out of 69 cases.[8]

Angelix:[18] An improved solver-based bug-fixing tool. It is evaluated on the GenProg benchmark. For 10 out of the 69 cases, it generate patches that is equivalent to human patches.

Tools for Java

PAR:[15] A generate-and-validate tool that uses a set of manually defined fix templates. A later study raised concerns about the generalizability of the fix templates in PAR.[12]

NOPOL:[24] A solver-based tool focusing on modifying condition statements.

QACrashFix:[19] A tool that fixes Java crash bugs by mining fixes from Q&A web site.

Astor:[25] An automatic repair library for Java, containing jGenProg, a Java implementation of GenProg.

Tools for other languages

AutoFixE:[20] A bug-fixing tool for Eiffel language. It relies the contracts (i.e., a form of formal specification) in Eiffel programs to validate generated patches.

Research tools

Datasets, tools, etc., related to automated program repair research http://program-repair.org/

References

  1. Patching program errors (CACM 2008) doi:10.1145/1409360.1409381
  2. Automated Patching Techniques: The Fix Is In (CACM 2010) doi:10.1145/1735223.1735248
  3. Tan, Shin Hwei; Roychoudhury, Abhik (2015). "relifix: Automated repair of software regressions". Proceedings of the 37th International Conference on Software Engineering-Volume 1. pp. 471–482.
  4. 1 2 3 4 5 6 Perkins, Jeff H.; Kim, Sunghun; Larsen, Sam; Amarasinghe, Saman; Bachrach, Jonathan; Carbin, Michael; Pacheco, Carlos; Sherwood, Frank; Sidiroglou, Stelios; Sullivan, Greg (2009). "Automatically patching errors in deployed software". Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles. SOSP '09. ACM. pp. 87–102. ISBN 978-1-60558-752-3. doi:10.1145/1629575.1629585.
  5. 1 2 3 4 5 6 7 8 Weimer, Westley; Nguyen, ThanhVu; Le Goues, Claire; Forrest, Stephanie (2009). "Automatically finding patches using genetic programming". Proceedings of the 31st International Conference on Software Engineering. pp. 364–374.
  6. 1 2 3 4 5 6 Le Goues, Claire; Dewey-Vogt, Michael; Forrest, Stephanie; Weimer, Westley (2012). "A Systematic Study of Automated Program Repair: Fixing 55 out of 105 Bugs for \$8 Each". Proceedings of the 2012 International Conference on Software Engineering. ICSE 2012. IEEE Press. pp. 3–13. ISBN 978-1-4673-1067-3.
  7. 1 2 3 4 5 6 7 8 9 10 11 12 Qi, Zichao; Long, Fan; Achour, Sara; Rinard, Martin (2015). "An Anlysis of Patch Plausibility and Correctness for Generate-and-Validate Patch Generation Systems". Proceedings of the ACM/SIGSOFT International Symposium on Software Testing and Analysis, ISSTA 2015.
  8. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Long, Fan; Rinard, Martin. "Automatic patch generation by learning correct code". Proceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 2016, St. Petersburg, FL, USA, January 20 - 22, 2016. pp. 298–312.
  9. 1 2 3 4 5 6 Nguyen, Hoang Duong Thien; Qi, Dawei; Roychoudhury, Abhik; Chandra, Satish (2013). "SemFix: Program Repair via Semantic Analysis". Proceedings of the 2013 International Conference on Software Engineering. ICSE '13'. Piscataway, NJ, USA: IEEE Press. pp. 772–781. ISBN 978-1-4673-3076-3.
  10. Qi, Yuhua; Mao, Xiaoguang; Lei, Yan; Dai, Ziying; Wang, Chengsong (2014). "The Strength of Random Search on Automated Program Repair". Proceedings of the 36th International Conference on Software Engineering. ICSE 2014. New York, NY, USA: ACM. pp. 254–265. ISBN 978-1-4503-2756-5. doi:10.1145/2568225.2568254.
  11. 1 2 3 4 Long, Fan; Rinard, Martin (2015). "Staged Program Repair with Condition Synthesis". Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2015. New York, NY, USA: ACM. pp. 166–178. ISBN 978-1-4503-3675-8. doi:10.1145/2786805.2786811.
  12. 1 2 Monperrus, Martin (2014). "A Critical Review of "Automatic Patch Generation Learned from Human-written Patches": Essay on the Problem Statement and the Evaluation of Automatic Software Repair". Proceedings of the 36th International Conference on Software Engineering. ICSE 2014. New York, NY, USA: ACM. pp. 234–242. ISBN 978-1-4503-2756-5. doi:10.1145/2568225.2568324.
  13. 1 2 Long, Fan; Sidiroglou-Douskos, Stelios; Rinard, Martin (2014). "Automatic Runtime Error Repair and Containment via Recovery Shepherding". Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI '14'. New York, NY, USA: ACM. pp. 227–238. ISBN 978-1-4503-2784-8. doi:10.1145/2594291.2594337.
  14. Dobolyi, Kinga; Weimer, Westley (2008). "Changing Java's Semantics for Handling Null Pointer Exceptions". 2013 IEEE 24th International Symposium on Software Reliability Engineering (ISSRE). 0: 47–56.
  15. 1 2 3 4 5 6 7 8 9 Kim, Dongsun; Nam, Jaechang; Song, Jaewoo; Kim, Sunghun (2013). "Automatic Patch Generation Learned from Human-written Patches". Proceedings of the 2013 International Conference on Software Engineering. ICSE '13'. IEEE Press. pp. 802–811. ISBN 978-1-4673-3076-3.
  16. 1 2 3 4 5 6 Sidiroglou, Stelios; Lahtinen, Eric; Long, Fan; Rinard, Martin (2015). "Automatic Error Elimination by Multi-Application Code Transfer". Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation.
  17. 1 2 3 4 5 6 Gao, Qing; Xiong, Yingfei; Mi, Yaqing; Zhang, Lu; Yang, Weikun; Zhou, Zhaoping; Xie, Bing; Mei, Hong (2015). "Safe Memory-leak Fixing for C Programs". Proceedings of the 37th International Conference on Software Engineering - Volume 1. ICSE '15'. Piscataway, NJ, USA: IEEE Press. pp. 459–470. ISBN 978-1-4799-1934-5.
  18. 1 2 3 4 5 6 7 Mechtaev, Sergey; Yi, Jooyong; Roychoudhury, Abhik (2016). "Angelix: scalable multiline program patch synthesis via symbolic analysis". Proceedings of the 38th International Conference on Software Engineering, ICSE 2016, Austin, TX, USA, May 14-22, 2016. pp. 691–701.
  19. 1 2 Gao, Qing; Zhang, Hansheng; Wang, Jie; Xiong, Yingfei; Zhang, Lu; Mei, Hong (2015). "Fixing Recurring Crash Bugs via Analyzing Q&A Sites". Proceedings of 30th IEEE/ACM International Conference on Automated Software Engineering. ASE 2015.
  20. 1 2 Pei, Yu; Furia, Carlo A.; Nordio, Martin; Wei, Yi; Meyer, Bertrand; Zeller, Andreas (May 2014). "Automated Fixing of Programs with Contracts". IEEE Trans. Softw. Eng. 40 (5): 427–449. doi:10.1109/TSE.2014.2312918.
  21. Contract-based Data Structure Repair Using Alloy http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.182.4390&rep=rep1&type=pdf
  22. Smith, Edward K.; Barr, Earl T.; Le Goues, Claire; Brun, Yuriy (2015). "Is the Cure Worse Than the Disease? Overfitting in Automated Program Repair". Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. ESEC/FSE 2015. New York, NY, USA: ACM. pp. 532–543. ISBN 978-1-4503-3675-8. doi:10.1145/2786805.2786825.
  23. Long, Fan; Rinard, Martin (2016). "An Analysis of the Search Spaces for Generate and Validate Patch Generation Systems". Proceedings of the 38th International Conference on Software Engineering. ICSE '16. New York, NY, USA: ACM. pp. 702–713. ISBN 978-1-4503-3900-1. doi:10.1145/2884781.2884872.
  24. Xuan, Jifeng; Martinez, Matias; DeMarco, Favio; Clément, Maxime; Lamelas, Sebastian; Durieux, Thomas; Le~Berre, Daniel; Monperrus, Martin (2016). "Nopol: Automatic Repair of Conditional Statement Bugs in Java Programs". {IEEE Transactions on Software Engineering}. doi:10.1109/TSE.2016.2560811.
  25. Martinez, Matias; Monperrus, Martin (2016). "ASTOR: A Program Repair Library for Java". Proceedings of ISSTA, Demonstration Track. ISSTA 2016.
This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.