(08-14-2014, 03:42 PM)lady_godiva Wrote: I have a question. How did you assign probabilities? I mean, did you follow some Machine Learning approach, analyzing different samples and using a classifier next? Or did you use some other approach?
I'm sorry but i didn't have time to look at the code yet, probably reading the code would give me an answer.
No classifier, no machine learning, just plain statistic and stochastic. I collected statistical information and based on that I used Bayes' Theorem to calculate the conditional probability of a file being malicious. I can explain you the details, however, I am not sure if my approach is scientifically correct. I had a discussion with a professor in this field, who told me he would try to help me, but I didn't get any info so far. It seems that this isn't as easy as I thought it would be. I have made assumptions, e.g. the independence of probabilities that certain anomalies occur, which are probably not correct.
So, basically, I created something that is good enough to work in practice, but the scientific explanation is not yet sufficient.
It is part of my master thesis and would like you to wait until december for more details. My master thesis will be public after my graduation.
Thank you for your interest in my work.
Edit: Btw, the code that does the main calculation is just a few lines. I added comments for you to explain the details.
[
Code:
scala] /**
* Calculates the probability for a file to be malicious based on the
* anomalies found in the file.
*
* @return probability P(BAD|Anomalies)
*/
def malwareProbability(): Double = {
val subtypes = anomalies.map(a => a.subtype).distinct
// fetch the probabilities for every anomaly subtype
// such a probability contains two values bad and good
// bad == probability of a malicious file to have that single anomaly
// good == probability of a harmless file to have that single anomaly
val probs = subtypes.map(subtype => probabilities.get(subtype)).flatten
// allBad == probability for a malicious file to have all anomalies that where found
// this is the probability P(Anomalies | BAD)
val allBad = probs.foldRight(1.0) { (p, bad) => p.bad * bad }
// allGood == probability for a harmless file to have all anomalies that where found
// this is the probability P(Anomalies | GOOD)
val allGood = probs.foldRight(1.0) { (p, good) => p.good * good }
// calculates the probability for the file to be malicious with bayes theorem
// this is the probability P(BAD | Anomalies)
val bayes = allBad * 0.5 / (allGood * 0.5 + allBad * 0.5)
bayes
}
The real legwork was detection and collection of file anomalies.