AVAudit Part 1: Introduction
The code for AVAudit is available at
https://github.com/calebcheng00/AVAudit
Fingerprinting antiviruses emulators through blackbox analysis
Consumer antiviruses have evolved tremendously in both complexity and effectiveness over the past decade in response to the increasing prevalence of sophisticated malware. Antiviruses initially relied on static techniques to detect suspicious files, but as virus authors began to encrypt and obfuscate their programs, static detection became useless. Now, malicious binaries are mostly detected through their interactions with their environment and user data — on runtime.
One method of measuring the intentions of an unknown program (without disturbing the original environment) is to run the binary in an emulator. An emulator is an artificial environment, like a “mini computer” running on your PC. The program is given free reign in this environment, but everything it does is contained and monitored by the emulator owner (like an antivirus).
Of course, virus authors realise what antivirus authors are doing, and have thought of ways to bypass emulation. If the program can detect whether it is running in an emulator or not, it can choose to do bad things or act normal (or exit), avoiding malicious classification.
So how do you detect what an emulator looks like? Emulators can be described as “black box” — you can only see what goes in and what comes out.
The only thing that comes out of the emulator is whether it is detected or not.. meaning you can’t just MessageBox(0, GetUsername(), 0, 0)
to get the emulators username. Well, one surefire way to analyse the emulator would be to reverse-engineer it, and find any hardcoded returns and flaw, but there are a number of complications with reverse-engineering complex programs like antiviruses:
- You need expert reversing skills and lots of experience with similar programs
- The binaries can be huge, taking hours to days to decompile
- Antiviruses are written in a mixture of languages
- Time & energy spent on one antivirus can’t be generalised to other antiviruses
Instead of reverse engineering, we can employ technique called black box analysis, in which we abstract the emulator to a set of input and outputs to guess what happens in the middle.
Previous work
All of the following research was inspired by Alexei Bulazel’s previous work on blackbox analysis and reversing Defender. Please check out the following links — they are really interesting and worth a read!
Black box analysis of consumer antivirus
Black box analysis
Lets say we want to dump 1 single byte of information from the emulator (like the first byte of the username/MAC address/any identifier that might be hardcoded). We can create a dictionary with every single possible value of the byte, and assign it to a unique virus identifier.
{"Worm:Win32/Nodoom.A@mm": {
"value": 0,
},
"Worm:Win32/Cult.D@mm": {
"value": 1,
},
...
...
...
"Worm:Win32/Azag.A": {
"value": 255,
}
}
Here the theory to all this: If we pass all the binaries at the same time, and only drop the virus that matches the first byte, there will only be one virus detection which we can refer to our dictionary to find what the value was.
Pseudo code: emulateme.c
char* username = GetUsername(); // JohnDoe
for(int i = 0; i < len(username); i++){
dropFile(username[0]); // Will drop the virus with the ASCII value for 'J' = 74
}av output detects:
"DoS:Win32/Jolt2" -> 74 = J
"HackTool:Win32/Auha.A" -> 111 = o
"Worm:Win32/Energy.G@mm" -> 104 = h
... -> n
... -> D
... -> o
"Worm:Win32/Alcaul.R@mm" -> eavleak: dumped string: "JohnDoe"
AVAudit
AVAudit is a framework I have written in Python for the black box analysis of antivirus emulators.
It exposes an abstracted API for developing test cases called “fingerprints”, which can be used in multiple antiviruses. Unlike reverse-engineering, where you have to start from scratch for every emulator, additional antiviruses can be added in mere minutes.
Here is an example fingerprint file, that dumps all possible byte values.
dump_all.c
#include <windows.h>
#include "leaker.h"
void leakResource(int num);
void* readConfig(void);
void * memcpymyas(void* dst, const void* src, unsigned int cnt);
size_t mystrlen(const char * str);int WINAPI WinMain(HINSTANCE a,HINSTANCE b,LPSTR c,int d){
int* data = (int*)readConfig(); // first int is the start index, second is the end index /* BEGIN FINGERPRINTING */
unsigned char all_bytes[256];
for (int i = 1; i < 256; i++){ // skip the null terminator
all_bytes[i-1] = i;
}
for(int i = data[0]; i < data[1]; i++){
if(all_bytes[i] == '\0'){
return 0;
}
leakResource(all_bytes[i]);
} /* END FINGERPRINTING */ ExitProcess(0);
return 0;
}
The purpose of the readConfig(); function is to indicate how many, and which bytes to dump in this run — each antivirus is different.
Framework quickstart
d = defender.Defender(“./antiviruses/defender/samples/”, logging.INFO)
First thing is to declare an antivirus class with a folder containing 256 uniquely identified samples, in this case Windows Defender. An antivirus object basically declares the filter functions for antivirus output and how to call it via the command line. This is the first time I’ve used a multi-file Python program so would appriciate feedback if this is the right thing to do.
d.leak(“fingerprints\\GetUsernameA.c”, [“-ladvapi32”])
All antivirus class files expose a function `leak` that takes in a number of parameters, pretty self-explanatory
There is also a simple template system, which just substitutes strings in the fingerprint file. This is useful when we want to iterate over a number of different environmental variables.
That’s pretty much it for the introduction. The rest of the series will be dedicated to comparing antivirus fingerprints and seeing if we can’t bypass any of them :)
Part II will be exploring what we can dump from Windows Defender’s emulator — for the first time ever!