AV vendors use various tools and techniques to identify the newly launched malware from the advisory. Before driving into reversing a malware or dynamic analysis. AV vendors always go with fuzzy hashing techniques (Machine learning) to quickly find the file is malicious or legit.
What is SSDEEP hashing?
Identifying the identical files using context triggered piecewise hashing (CTPH) technique. This program is effective in comparing the content similarity searches with existing well-known malware database.
Example: This is ransomware.
This is also a new ransomware.
Highlighted text indicates that ssdeep is capable to find and compares identical files with existing data.
Let’s do some practical work both as offender and defender to understand things better.
Offender Mindset :
Assume a scenario that you are a good programmer to reverse an existing malware binary and rewrite additional malicious codes to change the integrity ( HASH ) and do some code obfuscation.
Here we have a well-known Trojan called Njrat. Which is also called a RAT (Remote Access Trojan) which controls a compromised machine.
Above is the program which I have opened and Hex editor and changed some specific lines without affecting the software functionality.
Later I have used some online tools to obfuscate the software code and making it difficult for a third-party to access your source code (AV Vendors).
Now the real malicious files are ready for the target. Let us test both files before obfuscation and after obfuscation with Virustotal.
Most of the AV vendors flag this file as malicious 57/70.
After Obfuscation :
When you use various tools (Hex Editor, online software code obfuscation) and techniques to rewrite the code. As a result, we see Only 29/69 says flag this file as malicious and others say as clean.
Malware analyst is still capable to find the rabbit holes, let’s go with the machine learning technique ( SSDEEP ).
AV vendors will have a list of existing well-known malware and its ssdeep hash. Here I have calculated the SSDEEP hash for the original file njRAT v0.7d.exe ( Before obfuscation ). Context triggered piecewise hashing ( CTPH ) is up and ready now.
Comparing the Obfuscated executable with SSDEEP hash
The above-illustrated figure shows.When a list of SSDEEP hashes is available for existing well-known malware and these ssdeep hashes are compared with newly launched malware.
As a result of ssdeep machine learning tell us that 99% of the content is matched with an existing malware sample. I hope now we have marked this unknown file as malicious.
Machine learning is important for all cases in cybersecurity. But still sometimes with highly obfuscated malware, this kind of ML can still struggle and fail in some cases. So reversing an unknown malware will be quite helpful in most of the cases.