CAREER: On the Foundations of Semantic Code Search

Date:

During software development, programmers frequently use search to find code to reuse. At minimum, code reuse requires that the behavior of the found code satisfies the needs of the programmer. Current search tools consume a textual description of the desired code as a query, which ignores the behavior of the source code. Semantic code search finds code based on behavior, and recent research has demonstrated its potential to find source code to reuse code as well as repair software faults. Challenges arise when 1) the desired code does not exist; 2) there are too many results to navigate efficiently; or 3) it is difficult to differentiate between similar code snippets. These challenges are especially pronounced for programmers in languages that are less supported, such as those used by end-user programmers.

This research uses an approach to semantic search that leverages a constraint solver as the matching engine. Code fragments are indexed using symbolic analysis to obtain a constraint representation of the code behavior. Given a query in the form of input and output behavior examples, and constraints that represent the code’s behavior, the solver determines if the code satisfies the query. This research develops novel techniques to 1) find approximate solutions to semantic queries; 2) enable richer query models; 3) use the constraints to characterize the differences and similarities in behavior between code snippets; and 4) efficiently navigate the space of potential solutions. The broader impact of this research is on the millions of end-user programmers and professional programmers, allowing them to more effectively reuse code.

This award reflects NSF’s statutory mission and has been deemed worthy of support through evaluation using the Foundation’s intellectual merit and broader impacts review criteria.