Malware Analysis with Tree Automata Inference
CAV version: [PDF] [PS] [PS.BZ2] [VIEW]
Full version: [PDF] [PS] [PS.BZ2] [VIEW] Note: The full version also fixes a couple of typos.
Bibtex:
@inproceedings{babic11malware,
author = {Domagoj Babi\'c and Daniel Reynaud and Dawn Song},
title = {{Malware Analysis with Tree Automata Inference}},
booktitle = {CAV'11: Proceedings of the 23rd Int. Conference on
Computer Aided Verification},
year = {2011},
publisher = {Springer},
series = {Lecture Notes in Computer Science},
volume = {6806},
pages = {116--131},
location = {Cliff Lodge, Snowbird, Utah, USA},
}
Abstract:
The underground malware-based economy is flourishing and it is evident
that the classical ad-hoc signature detection methods are becoming
insufficient. Malware authors seem to share some source code and malware
samples often feature similar behaviors, but such commonalities are
difficult to detect with signature-based methods because of an
increasing use of numerous freely-available randomized obfuscation
tools. To address this problem, the security community is actively
researching behavioral detection methods that commonly attempt to
understand and differentiate how malware behaves, as opposed to just
detecting syntactic patterns. We continue that line of research in this
paper and explore how formal methods and tools of the verification trade
could be used for malware detection and analysis. We propose a new
approach to learning and generalizing from observed malware behaviors
based on tree automata inference. In particular, we develop an
algorithm for inferring k-testable tree automata from system call
dataflow dependency graphs and discuss the use of inferred automata in
malware recognition and classification.