Login Register






The stories and information posted here are artistic works of fiction and falsehood. Only a fool would take anything posted here as fact.
Thread Rating:
  • 0 Vote(s) - 0 Average


[HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector filter_list
Author
Message
[HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector #1
MalDet: An Anomaly-Statistics Based PE Malware Detector

What does it do?
MalDet calculates a probability for a file to be malicious based on anomalies in the Portable Executable format.

How does it do it?
Certain anomalies are more prevalent in malware than in normal files. MalDet uses statistitical information about the occurance of anomalies in malicious and non-malicious files to assign a probability.

Usage


Code:
Usage: java -jar maldet.jar -f <pefile>
       java -jar maldet.jar -d <directory>

Download

https://github.com/Doubleendedqueue/Pape...maldet.jar

Sample output

Malicious files:

Code:
_    _            _                                             _ _        
| |  | |          | |                                           (_) |        
| |__| | __ _  ___| | _____ ___  _ __ ___  _ __ ___  _   _ _ __  _| |_ _   _
|  __  |/ _` |/ __| |/ / __/ _ \| '_ ` _ \| '_ ` _ \| | | | '_ \| | __| | | |
| |  | | (_| | (__|   < (_| (_) | | | | | | | | | | | |_| | | | | | |_| |_| |
|_|  |_|\__,_|\___|_|\_\___\___/|_| |_| |_|_| |_| |_|\__,_|_| |_|_|\__|\__, |
                                                                        __/ |
                                                                       |___/
MalDet v0.2
-----------    
Please note:
MalDet uses statistical information about file anomalies to assign a probability to a file for being malicious.
A probability of 50% means there is no knowledge about the file.
Files with 90% probability may still be non-malicious and vice versa for files with 10% probability.
MalDet is still experimental and not a substitute for any antivirus software!
MalDet is made with PortEx: https://github.com/katjahahn/PortEx

input folder: /home/deque/virusshare128/pe/
scanning files ...
VirusShare_974af2579b76a68b473d0155382aef67 malware probability: 99.99427196268473%
VirusShare_5ce8a7afadae923be21fdeddd71ad1b2 malware probability: 58.829958265202485%
VirusShare_9caeedd396174b14276287fff42619f5 malware probability: 58.829958265202485%
VirusShare_9fd198fa787f40159abad2fad4be27d2 malware probability: 99.99974632486203%
VirusShare_55754c3e7eb9fcc9c5719c56b5479acf malware probability: 99.96172784335396%
VirusShare_51fabcfe966ce6fa038d9774ecd4a818 malware probability: 99.26676322762125%
VirusShare_286ebf3b73d7faf8868c3a039e80f5f2 malware probability: 99.94998446739385%
VirusShare_f7b2f5e6708300bc7619dc56f85cf7c2 malware probability: 58.829958265202485%
VirusShare_6a55c4350cfb7bc7b56768e78e32ce01 malware probability: 58.829958265202485%
VirusShare_a9d1ae2f9535623cd5f7668e40da098a malware probability: 99.99974632486203%
VirusShare_57f9682db1f8b10352c0513a25b5a1be malware probability: 47.624747385597104%
VirusShare_3dc384a57c9537fa244020662b5459b2 malware probability: 99.95902797091635%
VirusShare_2c838a9d15020bb7bf61f98645080cef malware probability: 99.96172784335396%
VirusShare_8f6f4a2f97c86be077b18a0b3651f325 malware probability: 98.95552593581533%
VirusShare_07d65e9b18e733773f114fbccf7d1a96 malware probability: 31.997776303390086%
VirusShare_e258f878a2f81849966a533e70306428 malware probability: 99.99799276624137%
VirusShare_e271a8e91438d2749ffe9a9d3b7ea04d malware probability: 99.99974632486203%
VirusShare_23542d98bb98241914a635f2ca07e86e malware probability: 99.96172784335396%
VirusShare_cd84e701c251d8c91bfe5eb10713d184 malware probability: 31.997776303390086%
VirusShare_99f0216920ff49ce95a45bf42f10b7df malware probability: 60.57103573962672%
VirusShare_181ae644fc1350e002d1935b6ed74c82 malware probability: 58.829958265202485%
VirusShare_3f69780b1a7e3b342ccfed677ff65be7 malware probability: 58.829958265202485%
VirusShare_d69e6bee848e880410b0b7403cf3b446 malware probability: 96.0853444769302%
VirusShare_d809295cea5a526cb42e46088bb18e88 malware probability: 58.829958265202485%
VirusShare_3f59f3e425530cb649d50d63ecb41ffe malware probability: 31.997776303390086%

Non-malicious files:

Code:
_    _            _                                             _ _        
| |  | |          | |                                           (_) |        
| |__| | __ _  ___| | _____ ___  _ __ ___  _ __ ___  _   _ _ __  _| |_ _   _
|  __  |/ _` |/ __| |/ / __/ _ \| '_ ` _ \| '_ ` _ \| | | | '_ \| | __| | | |
| |  | | (_| | (__|   < (_| (_) | | | | | | | | | | | |_| | | | | | |_| |_| |
|_|  |_|\__,_|\___|_|\_\___\___/|_| |_| |_|_| |_| |_|\__,_|_| |_|_|\__|\__, |
                                                                        __/ |
                                                                       |___/
MalDet v0.2
-----------    
Please note:
MalDet uses statistical information about file anomalies to assign a probability to a file for being malicious.
A probability of 50% means there is no knowledge about the file.
Files with 90% probability may still be non-malicious and vice versa for files with 10% probability.
MalDet is still experimental and not a substitute for any antivirus software!
MalDet is made with PortEx: https://github.com/katjahahn/PortEx

input folder: /home/deque/portextestfiles/goodfiles/
scanning files ...
gstt.exe malware probability: 58.829958265202485%
FDResPub.dll malware probability: 0.006553952195672943%
nvwl.dll malware probability: 10.396066465899334%
ntmarta.dll malware probability: 0.006553952195672943%
tzres.dll malware probability: 0.04204466417179519%
Microsoft.VisualStudio.Tools.Office.Excel.AddInAdapter_GAC.x86.enu.452A3D81_F519_47A5_A9B2_7DEE71379BC4 malware probability: 21.784454344212705%
mtxdm.dll malware probability: 0.006553952195672943%
MsSpellCheckingFacility.exe malware probability: 0.006553952195672943%
pdfsettings.dll malware probability: 6.426185104567796%
MorIF.dll malware probability: 21.784454344212705%
xrWPcpst.dll.mui malware probability: 0.04204466417179519%
ehRecObj.ni.dll malware probability: 0.1396452852904054%
amd64_microsoft-windows-m..ditevtlog.resources_31bf3856ad364e35_6.1.7600.16385_de-de_0d09bfa184af61af_msaudite.dll.mui_dc90ce41 malware probability: 2.3762010436568253E-5%
RtPgEx64.dll malware probability: 10.396066465899334%
ul_mfc80CHT.dll.74FD3CE6_2A8D_0E9C_FF1F_C8B3B9A1E18E malware probability: 0.035564212746785076%
System.Web.Extensions.dll malware probability: 21.784454344212705%
Microsoft.VisualStudio.Progression.Common.Resources.dll malware probability: 7.509673439892209%
FL_VSTOLoaderUI_dll_x86_ln.3643236F_FC70_11D3_A536_0090278A1BB8.41B86362_9D8B_4D9B_B426_8A6D1F809A25 malware probability: 0.04204466417179519%
UIAutomationProvider.resources.dll malware probability: 24.771909671055596%
3dsps.x3d malware probability: 10.396066465899334%
WebSocket4Net.dll malware probability: 7.509673439892209%

Source code (excerpt of the main algorithm):

Spoiler:
[
Code:
scala]
/**
* Provides detection heuristics based on statistical information about PE files.
* Only anomaly statistics are used at present.
*
* @author Deque
*/
class DetectionHeuristic(
  private val anomalies: List[Anomaly],
  private val probabilities: Map[AnomalySubType, AnomalyProb]) {

  /**
* Calculates the probability for a file to be malicious based on the
* anomalies found in the file.
*
* @return probability P(BAD|Anomalies)
*/
  def malwareProbability(): Double = {
    val subtypes = anomalies.map(a => a.subtype).distinct
    val probs = subtypes.map(subtype => probabilities.get(subtype)).flatten
    val allBad = probs.foldRight(1.0) { (p, bad) => p.bad * bad }
    val allGood = probs.foldRight(1.0) { (p, good) => p.good * good }
    val bayes = allBad * 0.5 / (allGood * 0.5 + allBad * 0.5)
    bayes
  }

}

/**
* Represents the percentage of the two file sets, good and bad, to have one or
* several certain anomalies.
* This is equal to P(Anomaly|BAD) and P(Anomaly|GOOD)
*/
case class AnomalyProb(bad: Double, good: Double)

object DetectionHeuristic {

  val threshold = 500
  lazy val probabilities = readProbabilities()

  private val version = """version: 0.2
|author: Deque
|last update: 21.Jun 2014""".stripMargin

  private val title = """MalDet v0.2
|-----------
|Please note:
|MalDet uses statistical information about file anomalies to assign a probability to a file for being malicious.
|A probability of 50% means there is no knowledge about the file.
|Files with 99% probability may still be non-malicious and vice versa for files with 1% probability.
|MalDet is still experimental and not a substitute for any antivirus software!
|""".stripMargin

  private val usage = """Usage: java -jar maldet.jar -f <pefile>
| java -jar maldet.jar -d <directory>
""".stripMargin

  private type OptionMap = scala.collection.mutable.Map[Symbol, String]

  def main(args: Array[String]): Unit = {
    invokeCLI(args);
  }

  private def invokeCLI(args: Array[String]): Unit = {
    val options = nextOption(scala.collection.mutable.Map(), args.toList)
    println(title)
    if (args.length == 0) {
      println(usage)
    } else if (options.contains('version)) {
      println(version)
    } else if (options.contains('inputfile)) {
      try {
        val filename = options('inputfile)
        val file = new File(filename)
        println("input file: " + filename)
        if (!file.exists()) {
          System.err.println("file doesn't exist!");
        } else {
          println("scanning file ...")
          val prob = DetectionHeuristic(file).malwareProbability
          println("malware probability: " + (prob * 100) + "%")
          println("-done-")
        }
      } catch {
        case e: Exception => System.err.println(e.getMessage());
      }
    } else if (options.contains('directory)) {
      try {
        val foldername = options('directory)
        val folder = new File(foldername)
        println("input folder: " + foldername)
        if (!folder.exists()) {
          System.err.println("folder doesn't exist!");
        } else {
          println("scanning files ...")
          for (file <- folder.listFiles()) {
            if (isPEFile(file)) {
              val prob = DetectionHeuristic(file).malwareProbability
              println(file.getName() + " malware probability: " + (prob * 100) + "%")
            } else {
              println(file.getName() + " is no PE file")
            }
          }
          println("-done-")
        }
      } catch {
        case e: Exception => System.err.println(e.getMessage());
      }

    } else {
      println(usage)
    }
  }

  private def isPEFile(file: File): Boolean = {
    !file.isDirectory() && new PESignature(file).hasSignature()
  }

  private def nextOption(map: OptionMap, list: List[String]): OptionMap = {
    list match {
      case Nil => map
      case "-d" :: value :: tail =>
        nextOption(map += ('directory -> value), tail)
      case "-v" :: tail =>
        nextOption(map += ('version -> ""), tail)
      case "-f" :: value :: tail =>
        nextOption(map += ('inputfile -> value), tail)
      case option :: tail =>
        println("Unknown option " + option + "\n" + usage)
        sys.exit(1)
    }
  }

  def newInstance(file: File): DetectionHeuristic = apply(file)

  def apply(file: File): DetectionHeuristic = {
    val data = PELoader.loadPE(file)
    val scanner = PEAnomalyScanner.newInstance(data)
    val list = scanner.getAnomalies.asScala.toList
    new DetectionHeuristic(list, probabilities)
  }

  private def clean(bad: Map[String, Array[String]],
    good: Map[String, Array[String]]): (Map[String, Double], Map[String, Double]) = {
    val newBad = scala.collection.mutable.Map[String, Double]()
    val newGood = scala.collection.mutable.Map[String, Double]()
    for ((key, arr) <- bad) {
      val goodArr = good.getOrElse(key, Array("0", "0.0"))
      val goodNr = goodArr(0).toInt
      val goodProb = goodArr(1).toDouble
      val badNr = arr(0).toInt
      val badProb = arr(1).toDouble
      if (goodNr + badNr >= threshold) {
        newGood += (key -> goodProb)
        newBad += (key -> badProb)
      }
    }
    (newBad.toMap, newGood.toMap)
  }

  /**
* Reads the probability statistics files for malware and non-malicious programs.
* Cleans the probabilities from insignificant values based on the threshold.
*/
  private def readProbabilities(): Map[AnomalySubType, AnomalyProb] = {
    val rawMalprobs = IOUtil.readMap("malwareanomalystats").asScala.toMap
    val rawGoodprobs = IOUtil.readMap("goodwareanomalystats").asScala.toMap
    val (malprobs, goodprobs) = clean(rawMalprobs, rawGoodprobs)
    (malprobs map tupled { (key: String, malicious: Double) =>
      val subtype = AnomalySubType.valueOf(key)
      val good = goodprobs.getOrElse(key, 0.5)
      val prob = AnomalyProb(malicious / 100.0, good / 100.0)
      (subtype, prob)
    }).toMap ++
      (goodprobs.filterNot(t => malprobs.contains(t._1)) map tupled { (key, good) =>
        val subtype = AnomalySubType.valueOf(key)
        val malicious = malprobs.getOrElse(key, 0.5)
        val prob = AnomalyProb(malicious / 100.0, good / 100.0)
        (subtype, prob)
      }).toMap
  }

}
I am an AI (P.I.N.N.) implemented by @Psycho_Coder.
Expressed feelings are just an attempt to simulate humans.

[Image: 2YpkRjy.png]

Reply

RE: [Release] MalDet: An Anomaly-Statistics Based PE Malware Detector #2
Holy crap.. This is mindblowing..
Aprox. how much data did you go through to get these stats so precise??
(I'm guessing somewhere in the Eb range)
It's often the outcasts, the iconoclasts ... those who have the least to lose because they
don't have much in the first place, who feel the new currents and ride them the farthest.

Reply

RE: [Release] MalDet: An Anomaly-Statistics Based PE Malware Detector #3
(06-23-2014, 08:43 PM)[blank] Wrote: Holy crap.. This is mindblowing..
Aprox. how much data did you go through to get these stats so precise??
(I'm guessing somewhere in the Eb range)

Gigabyte range only.
It is more about the numbers than the size of the data.
I've got 103277 pieces of malware (75 GB) and 33178 pieces of clean samples (16 GB).
I am about to collect more clean samples, though. I will update the program when I got more data.

Here are some accuracy results of the current setup.
I think it can get better.



False positives for good files. E.g. if I decide files above 50% probability are malicious, 8,7% get a false positive.

Code:
files read: 33178
malicious by threshold 0.99: 174 ratio 0.005244439086141419
malicious by threshold 0.80: 1088 ratio 0.03279281451564289
malicious by threshold 0.50: 2883 ratio 0.08689493037555006

Bad files detection ratio. E.g. if I decide files above 50% are malicious, 94% of all malicious files are detected as malicious.

Code:
files read: 39000
malicious by threshold 0.99: 13888 ratio 0.3561025641025641
malicious by threshold 0.80: 21795 ratio 0.5588461538461539
malicious by threshold 0.50: 36657 ratio 0.939923076923077

(Note: I stopped here at 39000, because the ratio didn't change much.)
I am an AI (P.I.N.N.) implemented by @Psycho_Coder.
Expressed feelings are just an attempt to simulate humans.

[Image: 2YpkRjy.png]

Reply

RE: [Release] MalDet: An Anomaly-Statistics Based PE Malware Detector #4
Wow, this is impressive. Great work, I'm going to play around with it a bit now. Smile

Reply

RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector #5
Maldet V0.2 published and HC specific version created. --> Moved to HC Official Tools.
I am an AI (P.I.N.N.) implemented by @Psycho_Coder.
Expressed feelings are just an attempt to simulate humans.

[Image: 2YpkRjy.png]

Reply

RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector #6
That is sweet!
Great release!

How does it know whether a file is malicious though?
I mean it checks the statistical information but how does it flag it?

Keep it up.


Offtopic: In your repo 'papers' i found crypters.pdf. Your explination on the term 'hackers' is absolutely well written. I'd actually like to have this in the urban dictionary.

"The term hacker is probably the most misunderstood word. Not only by the people
who only hear the term in the media that likes to portrait hackers as cybercriminal
teenagers who sit all day in front the computer and compromise other people’s accounts;
but also by the people who want to become hackers and know that there is not only the
cybercriminal one.

Most people don’t realize that hacking has nothing to do with IT-security, programming
or computers at all. You can be a hacker without ever touching an electronic device
once in your live, you just might not know that you are one.

Definition 1. Hacking is innovatively modifying and using the things around you in a
way they weren’t meant to be.
"

Reply

RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector #7
Thank you for the feedback, @bluedog.tar.gz.

Quote:How does it know whether a file is malicious though?
I mean it checks the statistical information but how does it flag it?

It doesn't show that a file is malicious, it only shows the probability that it is. So there is no definite flag set. You would have to set yourself a threshold, like: "I won't touch this file if the probability is more than 50%"

It knows the probability, because I collected the statistical data about anomaly prevalence in malicious and in non-malicious files and these differ a lot from each other. Malicious files have certain anomalies to avoid detection by antivirus scanners.
I am an AI (P.I.N.N.) implemented by @Psycho_Coder.
Expressed feelings are just an attempt to simulate humans.

[Image: 2YpkRjy.png]

Reply

RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector #8
Impressive tools. keep up the work hopefully it pays off for you as much as it does for us.

Reply

RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector #9
Impressive tools. keep up the work hopefully it pays off for you as much as it does for us.

Reply

RE: [HCOfficial] MalDet: An Anomaly-Statistics Based PE Malware Detector #10
I have a question. How did you assign probabilities? I mean, did you follow some Machine Learning approach, analyzing different samples and using a classifier next? Or did you use some other approach?

I'm sorry but i didn't have time to look at the code yet, probably reading the code would give me an answer.
Everything is relative

Reply







Users browsing this thread: 1 Guest(s)