This repository contains the code and supplementary information for the Order invariant API based malware detection. We base this repository on a, soon to be publicly available, dataset that traced the API calls to ntdll.dll in Windows. The file Traced_Functions.md contains the names of the traced functions. The shas_by_families.json contains the names of the files and the malware family that it belongs to as well.
The dataset is available at https://zenodo.org/records/11079764
Malware_detection_RF.ipynb contains the random forest approach to detecting malware. The plot.ipynb notebook contains the different plots based on a max API call length of 2500. Results.ipynb contains the plot for the F1-Score at different lengths. The utils folder contains all the supporting functions for the notebook including creating the feature vectors.