String-based representation techniques for malware detection model

Loading...
Thumbnail Image
Date
2022
Journal Title
Journal ISSN
Volume Title
Publisher
Universiti Teknologi Malaysia
Abstract
Limited attention has been given to the intrinsic trade-off between various malware detection techniques that malware creators apply to drive up the number of malware variants within the current literature that necessitates intense malware detection techniques within the security field. Recently, research has been conducted on malware detection techniques. However, the major problems of the existing models are poor matching algorithms and NP-complete problems. These lead to low accuracy and poor detection rate of the models. Thus, to address the above issues among others, this research utilized a String-Based Malware Representation Technique (SBMRT) that generates a string-based malware signature. The string-based malware signature is a compact representation of the signature in the malware where only the set of Application Programming Interface (API) Calls representing the actual malware behaviour for a better and more effective String-Based Malware Detection Model (SBMDM), comprising three main phases. Initially, a string-based malware representation was designed to present the malware parameters and their corresponding functions to build integrating parameter lists along with the function list. Then, an initial malware signature method was designed to identify the standard parameters repeated in each malware and detected the initial malware. Thus, the SBMDM was designed based on SBMRT for an accurate unknown malware file(s) detection rate. The dynamic Longest Common Sequence (LCS) algorithm was designed for the malware matching method to improve the SBMDM, which was then evaluated using 550 files, 98 benign files, and four 417 different malware samples of which 117 Trojans, 135 worms, and 165 viruses, ramified into 15, 19, and 21 families, respectively. The experimental evaluation demonstrated that the SBMRT used by the SBMDM in this research has improved the Detection Rate (DR) from 94.6% to 99.8% Viruses compared to the N-gram model with a Higher DR of 98.9% in Trojan and 98.4% in Worm and Highest Accuracy (ACC) as 100% in Virus, 98.42% in Trojan, and 98.12% in Worm. It illustrated an average DR of 99.03%, ACC of 98.84%, and zero false-positive rates with significant improvement in Precision, Recall, and F-measures. The SBMDM displayed a value of 1 for the area under the ROC Curve (AUC), True Positive (TP) =1, and False Positive (FP) = 0, reflecting a significant improvement in the TP rate. These results demonstrate the ability of the SBMDM to improve the detection and accuracy rates of unknown malware file(s) compared to all existing models.
Description
Thesis (PhD. (Computer Science))
Keywords
Computer security—Research, Malware (Computer software)—Prevention, Debugging in computer science
Citation