SHF: Medium: Collaborative Research: Semi and Fully Automated Program Repair and Synthesis via Semantic Code Search

Date:

Abstract: In software development, regular expressions are a common programming construct used for many purposes, including querying databases, searching documents, validating user input, and parsing files. Most programming languages have standard libraries or built-in support for regular expression processing. Despite their frequent appearance in software development activities, regular expressions are prone to programming errors. When a regular expression is responsible for a software bug, the impact can be severe, possibly resulting in corrupted data, security vulnerabilities, denial of service attacks, or website outages. This research develops new techniques to test, understand, reuse, and maintain regular expressions, in an effort to improve developer comprehension and reduce related bugs.

The approach is to develop coverage criteria for test suites, similarity metrics, and semantics-preserving transformations for regular expressions. The coverage criteria apply to the automata representation of the regular expression and are used to automatically generate test inputs to help developers adequately test regular expressions. The similarity metrics allow developers to find regular expressions that are similar to a buggy regular expression, as well as explain how the behavior differs among them. The semantics-preserving transformations enhance comprehension and maintenance, and also support the migration of regular expressions between languages. The broader impacts come primarily from the goal of reducing bugs related to regular expressions, which creates more reliable software for all.