[HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector - Deque - 06-20-2014
MalDet: An Anomaly-Statistics Based PE Malware Detector
What does it do?
MalDet calculates a probability for a file to be malicious based on anomalies in the Portable Executable format.
How does it do it?
Certain anomalies are more prevalent in malware than in normal files. MalDet uses statistitical information about the occurance of anomalies in malicious and non-malicious files to assign a probability. Usage
_ _ _ _ _
| | | | | | (_) |
| |__| | __ _ ___| | _____ ___ _ __ ___ _ __ ___ _ _ _ __ _| |_ _ _
| __ |/ _` |/ __| |/ / __/ _ \| '_ ` _ \| '_ ` _ \| | | | '_ \| | __| | | |
| | | | (_| | (__| < (_| (_) | | | | | | | | | | | |_| | | | | | |_| |_| |
|_| |_|\__,_|\___|_|\_\___\___/|_| |_| |_|_| |_| |_|\__,_|_| |_|_|\__|\__, |
__/ |
|___/
MalDet v0.2
-----------
Please note:
MalDet uses statistical information about file anomalies to assign a probability to a file for being malicious.
A probability of 50% means there is no knowledge about the file.
Files with 90% probability may still be non-malicious and vice versa for files with 10% probability.
MalDet is still experimental and not a substitute for any antivirus software!
MalDet is made with PortEx: https://github.com/katjahahn/PortEx
_ _ _ _ _
| | | | | | (_) |
| |__| | __ _ ___| | _____ ___ _ __ ___ _ __ ___ _ _ _ __ _| |_ _ _
| __ |/ _` |/ __| |/ / __/ _ \| '_ ` _ \| '_ ` _ \| | | | '_ \| | __| | | |
| | | | (_| | (__| < (_| (_) | | | | | | | | | | | |_| | | | | | |_| |_| |
|_| |_|\__,_|\___|_|\_\___\___/|_| |_| |_|_| |_| |_|\__,_|_| |_|_|\__|\__, |
__/ |
|___/
MalDet v0.2
-----------
Please note:
MalDet uses statistical information about file anomalies to assign a probability to a file for being malicious.
A probability of 50% means there is no knowledge about the file.
Files with 90% probability may still be non-malicious and vice versa for files with 10% probability.
MalDet is still experimental and not a substitute for any antivirus software!
MalDet is made with PortEx: https://github.com/katjahahn/PortEx
scala]
/**
* Provides detection heuristics based on statistical information about PE files.
* Only anomaly statistics are used at present.
*
* @author Deque
*/
class DetectionHeuristic(
private val anomalies: List[Anomaly],
private val probabilities: Map[AnomalySubType, AnomalyProb]) {
/**
* Calculates the probability for a file to be malicious based on the
* anomalies found in the file.
*
* @return probability P(BAD|Anomalies)
*/
def malwareProbability(): Double = {
val subtypes = anomalies.map(a => a.subtype).distinct
val probs = subtypes.map(subtype => probabilities.get(subtype)).flatten
val allBad = probs.foldRight(1.0) { (p, bad) => p.bad * bad }
val allGood = probs.foldRight(1.0) { (p, good) => p.good * good }
val bayes = allBad * 0.5 / (allGood * 0.5 + allBad * 0.5)
bayes
}
}
/**
* Represents the percentage of the two file sets, good and bad, to have one or
* several certain anomalies.
* This is equal to P(Anomaly|BAD) and P(Anomaly|GOOD)
*/
case class AnomalyProb(bad: Double, good: Double)
object DetectionHeuristic {
val threshold = 500
lazy val probabilities = readProbabilities()
private val version = """version: 0.2
|author: Deque
|last update: 21.Jun 2014""".stripMargin
private val title = """MalDet v0.2
|-----------
|Please note:
|MalDet uses statistical information about file anomalies to assign a probability to a file for being malicious.
|A probability of 50% means there is no knowledge about the file.
|Files with 99% probability may still be non-malicious and vice versa for files with 1% probability.
|MalDet is still experimental and not a substitute for any antivirus software!
|""".stripMargin
def apply(file: File): DetectionHeuristic = {
val data = PELoader.loadPE(file)
val scanner = PEAnomalyScanner.newInstance(data)
val list = scanner.getAnomalies.asScala.toList
new DetectionHeuristic(list, probabilities)
}
private def clean(bad: Map[String, Array[String]],
good: Map[String, Array[String]]): (Map[String, Double], Map[String, Double]) = {
val newBad = scala.collection.mutable.Map[String, Double]()
val newGood = scala.collection.mutable.Map[String, Double]()
for ((key, arr) <- bad) {
val goodArr = good.getOrElse(key, Array("0", "0.0"))
val goodNr = goodArr(0).toInt
val goodProb = goodArr(1).toDouble
val badNr = arr(0).toInt
val badProb = arr(1).toDouble
if (goodNr + badNr >= threshold) {
newGood += (key -> goodProb)
newBad += (key -> badProb)
}
}
(newBad.toMap, newGood.toMap)
}
/**
* Reads the probability statistics files for malware and non-malicious programs.
* Cleans the probabilities from insignificant values based on the threshold.
*/
private def readProbabilities(): Map[AnomalySubType, AnomalyProb] = {
val rawMalprobs = IOUtil.readMap("malwareanomalystats").asScala.toMap
val rawGoodprobs = IOUtil.readMap("goodwareanomalystats").asScala.toMap
val (malprobs, goodprobs) = clean(rawMalprobs, rawGoodprobs)
(malprobs map tupled { (key: String, malicious: Double) =>
val subtype = AnomalySubType.valueOf(key)
val good = goodprobs.getOrElse(key, 0.5)
val prob = AnomalyProb(malicious / 100.0, good / 100.0)
(subtype, prob)
}).toMap ++
(goodprobs.filterNot(t => malprobs.contains(t._1)) map tupled { (key, good) =>
val subtype = AnomalySubType.valueOf(key)
val malicious = malprobs.getOrElse(key, 0.5)
val prob = AnomalyProb(malicious / 100.0, good / 100.0)
(subtype, prob)
}).toMap
}
}
RE: [Release] MalDet: An Anomaly-Statistics Based PE Malware Detector - Inori - 06-23-2014
Holy crap.. This is mindblowing..
Aprox. how much data did you go through to get these stats so precise??
(I'm guessing somewhere in the Eb range)
RE: [Release] MalDet: An Anomaly-Statistics Based PE Malware Detector - Deque - 06-24-2014
(06-23-2014, 08:43 PM)[blank] Wrote: Holy crap.. This is mindblowing..
Aprox. how much data did you go through to get these stats so precise??
(I'm guessing somewhere in the Eb range)
Gigabyte range only.
It is more about the numbers than the size of the data.
I've got 103277 pieces of malware (75 GB) and 33178 pieces of clean samples (16 GB).
I am about to collect more clean samples, though. I will update the program when I got more data.
Here are some accuracy results of the current setup.
I think it can get better.
False positives for good files. E.g. if I decide files above 50% probability are malicious, 8,7% get a false positive.
Code:
files read: 33178
malicious by threshold 0.99: 174 ratio 0.005244439086141419
malicious by threshold 0.80: 1088 ratio 0.03279281451564289
malicious by threshold 0.50: 2883 ratio 0.08689493037555006
Bad files detection ratio. E.g. if I decide files above 50% are malicious, 94% of all malicious files are detected as malicious.
Code:
files read: 39000
malicious by threshold 0.99: 13888 ratio 0.3561025641025641
malicious by threshold 0.80: 21795 ratio 0.5588461538461539
malicious by threshold 0.50: 36657 ratio 0.939923076923077
(Note: I stopped here at 39000, because the ratio didn't change much.)
RE: [Release] MalDet: An Anomaly-Statistics Based PE Malware Detector - Aut•ono•mous - 06-24-2014
Wow, this is impressive. Great work, I'm going to play around with it a bit now.
RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector - Deque - 08-01-2014
Maldet V0.2 published and HC specific version created. --> Moved to HC Official Tools.
RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector - bluedog.tar.gz - 08-01-2014
That is sweet!
Great release!
How does it know whether a file is malicious though?
I mean it checks the statistical information but how does it flag it?
Keep it up.
Offtopic: In your repo 'papers' i found crypters.pdf. Your explination on the term 'hackers' is absolutely well written. I'd actually like to have this in the urban dictionary.
"The term hacker is probably the most misunderstood word. Not only by the people
who only hear the term in the media that likes to portrait hackers as cybercriminal
teenagers who sit all day in front the computer and compromise other people’s accounts;
but also by the people who want to become hackers and know that there is not only the
cybercriminal one.
Most people don’t realize that hacking has nothing to do with IT-security, programming
or computers at all. You can be a hacker without ever touching an electronic device
once in your live, you just might not know that you are one.
Definition 1. Hacking is innovatively modifying and using the things around you in a
way they weren’t meant to be.
"
RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector - Deque - 08-01-2014
Thank you for the feedback, @bluedog.tar.gz.
Quote:How does it know whether a file is malicious though?
I mean it checks the statistical information but how does it flag it?
It doesn't show that a file is malicious, it only shows the probability that it is. So there is no definite flag set. You would have to set yourself a threshold, like: "I won't touch this file if the probability is more than 50%"
It knows the probability, because I collected the statistical data about anomaly prevalence in malicious and in non-malicious files and these differ a lot from each other. Malicious files have certain anomalies to avoid detection by antivirus scanners.
RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector - 18Xray - 08-14-2014
Impressive tools. keep up the work hopefully it pays off for you as much as it does for us.
RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector - 18Xray - 08-14-2014
Impressive tools. keep up the work hopefully it pays off for you as much as it does for us.
RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector - lady_godiva - 08-14-2014
I have a question. How did you assign probabilities? I mean, did you follow some Machine Learning approach, analyzing different samples and using a classifier next? Or did you use some other approach?
I'm sorry but i didn't have time to look at the code yet, probably reading the code would give me an answer.