Kathryn T. Stolee

Active Research Projects

Semantic Code Search | MSR 2018 | FSE 2015 | JSS 2015 | TOSEM 2014 | PhD Thesis | FSE NIER 2012 | ICSE 2012 DS

Funding: CAREER: On the Foundations of Semantic Code Search

Students: Kai Presler-Marshall (PhD in progress), Rui Bai (PhD in progress), Joshua Kayani (REU 2016), James Saylor (Ugrad, 2014)

Searching for code is a common task among programmers, with the ultimate goal of reuse. While the process of searching for code – issuing a query and selecting a relevant match – is straightforward, several costs must be balanced, including the costs of specifying the query, examining the results to find desired code, and not finding a relevant result. For syntactic searches the query cost is quite low, but the results are often irrelevant, so the examination cost is high and matches may be missed. Semantic searches may return more relevant results, but current techniques that involve writing complex specifications or executing code against test cases are costly to the developer, and close matches cannot be easily identified. We have developed an approach for semantic search in which developers specify lightweight specifications and an SMT solver identifies matching programs from a repository. A program repository is automatically encoded offline so the search for programs is efficient. Programs are also encoded at various abstraction levels to enable partial matches when no, or few, exact matches exists.

Semantic Code Search Driven Program Repair | ICSE NIER 2017 | ASE 2015

Funding: SHF: EAGER: Collaborative Research: Demonstrating the Feasibility of Automatic Program Repair Guided by Semantic Code Search

                 SHF: Medium: Collaborative Research: Semi and Fully Automated Program Repair and Synthesis via Semantic Code Search

Students: George Mathew (PhD in progress), Andrew Hill (M.S. 2018), Yalin Ke (M.S. 2015)

Automated program repair can potentially reduce debugging costs and improve software quality but recent studies have drawn attention to shortcomings in the quality of automatically generated repairs. We propose a new kind of repair that uses the large body of existing open-source code to find potential fixes. The key challenges lie in efficiently finding code semantically similar (but not identical) to defective code and then appropriately integrating that code into a buggy program. We present SearchRepair, a repair technique that addresses these challenges by (1) encoding a large database of human-written code fragments as SMT constraints on input-output behavior, (2) localizing a given defect to likely buggy program fragments and deriving the desired input-output behavior for code to replace those fragments, (3) using state-of-the-art constraint solvers to search the database for fragments that satisfy that behavior and replacing the likely buggy code with these potential patches, and (4) validating that the patches repair the bug against program test suites.

Regular Expression Analysis | FSE 2018 | ASE 2017 | ISSTA 2016

Funding: SHF: Small: Supporting Regular Expression Testing, Search, Repair, Comprehension, and Maintenance

Students: Peipei Wang (PhD in progress), Carl Chapman (M.S. 2016)

Due to the popularity and pervasive use of regular expressions, researchers have created tools to support their creation, validation, and use. However, little is known about the context in which regular expressions are used, the features that are most common, and how behaviorally similar regular expressions are to one another. In this project, we explore the context in which regular expressions are used through a combination of developer surveys and repository analysis. This is the first rigorous examination of regex usage and it provides empirical evidence to support design decisions by regex tool builders.

Inactive Research Projects

Crowdsourcing and Software Engineering | CSI-SE 2016 | CSI-SE 2015 | ESEM 2015 | ESEM 2010

Past Students: Peng Sun (Ph.D. in progress)

Crowdsourcing is a compelling approach for accomplishing tasks that require opinions or work from a large number of people. I am interested in techniques and approaches to help researchers and practitioners to best leverage crowdsourcing to conduct software engineering tasks and to evaluate software engineering research.

Code Smells and Refactoring for End-User Programs | VL/HCC 2016 | TSE 2013 | ICSE 2011 | ESEM 2010 | MS Thesis

One of the most popular end user programming domains is mashups. Mashup programming environments are popping up to help end users to create mashups that tailor and individualize data streams. This means that the power of creation is in the hands of the end user. However, the mashups created by end users are often littered with errors and deficiencies that can make them error-prone and hard to understand. Further, users often 'reinvent the wheel' by creating mashups that have the same functionality as mashups created by other users.

Our work with web mashups deals with refactoring techniques to reduce the complexity, increase abstraction, updated broken or dated components, and standardize the programs to fit community development patterns. Results from our empirical study refactoring 8,051 Yahoo! Pipes programs and details regarding the manipulation infrastructure can be found here.

End-User Programmers and Their Communities | ESEM 2011

End-user programmers outnumber professionals programmers, write software that matters to an increasingly large number of users, and face software engineering challenges that are similar to their professionals counterparts. Yet, we know little about how these end-user programmers create and share artifacts as part of a community. To gain a better understanding of these issues, we perform an artifact-based community analysis of 32,887 mashups from the the Yahoo! Pipes repository. We observed that, like with other online communities, there is great deal of attrition but authors that persevere tend to improve over time, creating pipes that are more configurable, diverse, complex, and popular. We also discovered, however, that end- user programmers employ the repository in different ways than professionals, do not effectively reuse existing programs, and in most cases do not have an awareness of the community.

We make the data used in our analysis available here.

Copy and Paste Habits of End Users | VL/HCC 2009

By observing the clipboard as a mode of data transfer in the desktop environment, we are searching for patterns in end users' usage history for the purposes of finding areas in which users are inefficient in transferring data and could benefit from automation and validation of copy and paste activities.

Robofox | FSE 2008

Using automatically-generated assertions to improve the robustness of Web macros. For details, visit the website.