ResearchProjects / Model-inference-Assisted Concolic Execution

Revolutionizing Software Security and Reliability with Grammar Inference

Decision procedures have revolutionized software verification and security. Nowadays, it's difficult to find a software security paper not using a decision procedure at a premiere security conference. However, there are many other formal reasoning techniques that could improve reasoning about software, so what is next?

Key Ideas

My observation is that most existing test generation techniques (e.g., DART, concolic,...) lack global understanding of program's structure. Grammar inference is capable of recovering (i.e., learning) such high-level info and encode it compactly as a suitable automaton.
Besides recovering the program structure, the approach can be used to reverse-engineer program's communication with the environment.

Published Work

My first paper on this topic appeared at CCS'2010. In that paper, my coauthors and I improved on the state-of-the-art automatic finite-state-machine inference, and applied it to automatic inference of proprietary botnet protocols. We managed not only to infer the protocol, but we also (1) discovered flaws enabling us to milk the spam templates, meaning that we could update spam filters even before the first spam hits the network, (2) proved existence of background communication channels even though we could neither access nor eavesdrop those channels, and (3) identified differences between the botnet SMTP server and the standard SMTP implementations (e.g., Postfix), which means that we could detect active bots on a live network. Our work even got some press coverage in an Information Week's article.

The follow-up paper focuses on a new grammar inference technique that works in a lock-step with concolic execution so as to explore the program's state space more effectively. We found a number of new vulnerabilities and showed a significant improvement in the coverage and exploration depth.