Pubs / Malware Analysis with Tree Automata Inference

Malware Analysis with Tree Automata Inference

CAV version: [PDF] [PS] [PS.BZ2] [VIEW]
Full version: [PDF] [PS] [PS.BZ2] [VIEW] Note: The full version also fixes a couple of typos.

Bibtex:

@inproceedings{babic11malware,
  author = {Domagoj Babi\'c and Daniel Reynaud and Dawn Song},
  title = {{Malware Analysis with Tree Automata Inference}},
  booktitle = {CAV'11: Proceedings of the 23rd Int. Conference on
    Computer Aided Verification},
  year = {2011},
  publisher = {Springer},
  series = {Lecture Notes in Computer Science},
  volume = {6806},
  pages = {116--131},
  location = {Cliff Lodge, Snowbird, Utah, USA},
}

Abstract: The underground malware-based economy is flourishing and it is evident that the classical ad-hoc signature detection methods are becoming insufficient. Malware authors seem to share some source code and malware samples often feature similar behaviors, but such commonalities are difficult to detect with signature-based methods because of an increasing use of numerous freely-available randomized obfuscation tools. To address this problem, the security community is actively researching behavioral detection methods that commonly attempt to understand and differentiate how malware behaves, as opposed to just detecting syntactic patterns. We continue that line of research in this paper and explore how formal methods and tools of the verification trade could be used for malware detection and analysis. We propose a new approach to learning and generalizing from observed malware behaviors based on tree automata inference. In particular, we develop an algorithm for inferring k-testable tree automata from system call dataflow dependency graphs and discuss the use of inferred automata in malware recognition and classification.