EEMDS: Efficient and Effective Malware Detection System with Hybrid Model based on XceptionCNN and LightGBM Algorithm

Authors

  • Monday Onoja Federal University of Health Sciences, Nigeria
  • Abayomi Jegede University of Jos, Nigeria
  • Nachamada Blamah University of Jos, Nigeria
  • Olawale Victor Abimbola Creative Advanced Technologies, Dubai
  • Temidayo Oluwatosin Omotehinwa Federal University of Health Sciences, Nigeria

DOI:

https://doi.org/10.33736/jcsi.4739.2022

Keywords:

Anomaly-based Detection, Machine Learning, Malware Detection, XceptionCNN, LightGBM

Abstract

The security threats posed by malware make it imperative to build a model for efficient and effective classification of malware based on its family, irrespective of the variant. Preliminary experiments carried out demonstrate the suitability of the generic LightGBM algorithm for Windows malware as well as its effectiveness and efficiency in terms of detection accuracy, training accuracy, prediction time and training time. The prediction time of the generic LightGBM is 0.08s for binary class and 0.40s for multi-class on the Malimg dataset. The classification accuracy of the generic LightGBM is 99% True Positive Rate (TPR). Its training accuracy is 99.80% for binary class and 96.87% for multi-class, while the training time is 179.51s and 2224.77s for binary and multi classification respectively. The performance of the generic LightGBM leaves room for improvement, hence, the need to improve the classification accuracy and training accuracy of the model for effective decision making and to reduce the prediction time and training time for efficiency. It is also imperative to improve the performance and accuracy for effectiveness on larger samples. The goal is to enhance the detection accuracy and reduce the prediction time. The reduction in prediction time provides early detection of malware before it damages files stored in computer systems. Performance evaluation based on Malimg dataset demonstrates the effectiveness and efficiency of the hybrid model. The proposed model is a hybrid model which integrates XceptionCNN with LightGBM algorithm for Windows Malware classification on google colab environment. It uses the Malimg malware dataset which is a benchmark dataset for Windows malware image classification. It contains 9,339 Malware samples, structured as grayscale images, consisting of 25 families and 1,042 Windows benign executable files extracted from Windows environments. The proposed XceptionCNN-LightGBM technique provides improved classification accuracy of 100% TPR, with an overall reduction in the prediction time of 0.08s and 0.37s for binary and multi-class respectively. These are lower than the prediction time for the generic LightGBM which is 0.08s for binary class and 0.40s for multi-class, with an improved 100% classification accuracy. The training accuracy increased to 99.85% for binary classification and 97.40% for multi classification, with reduction in the training time of 29.97s for binary classification and 447.75s for multi classification. These are also lower than the training times for the generic LightGBM model, which are 179.51s and 2224.77s for the binary and multi classification respectively. This significant reduction in the training time makes it possible for the model to converge quickly and train a large sum of data within a relatively short period of time. Overall, the reduction in detection time and improvement in detection accuracy will minimize damages to files stored in computer systems in the event of malware attack.

References

Abbadi, M. A., Al-Bustanji, A. M., & Al-kasassbeh, M. (2020, April 30). Robust Intelligent Malware Detection using LightGBM Algorithm. International Journal of Innovative Technology and Exploring Engineering, 9(6), 1253–1260. https://doi.org/10.35940/ijitee.f4043.049620

Abusitta, A., Li, M. Q., & Fung, B. C. (2021). Malware Classification and Composition Analysis: A Survey of Recent Developments. Journal of Information Security and Applications, 59, 102828. https://doi.org/10.1016/j.jisa.2021.102828

Bazrafshan, Z., Hashemi, H., Fard, S. M. H., & Hamzeh, A. (2013). A Survey on Heuristic Malware Detection Techniques. The 5th Conference on Information and Knowledge Technology (pp. 113-120). https://doi.org/10.1109/ikt.2013.6620049

Bensaoud, A., & Kalita, J. (2022). Deep Multi-task Learning for Malware Image Classification. Journal of Information Security and Applications, 64, 103057. https://doi.org/10.1016/j.jisa.2021.103057

Bhodia, N., Prajapati, P., Di Troia, F., & Stamp, M. (2019). Transfer Learning for Image-based Malware Classification. Proceedings of the 5th International Conference on Information Systems Security and Privacy (pp 719-726). https://doi.org/10.5220/0007701407190726

Carneiro, T., Nobrega, R. V., Nepomuceno, T., Bian, G. B., De Albuquerque, V. H. C., & Filho, P. P. R. (2018). Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications. IEEE Access, 6, 61677–61685. https://doi.org/10.1109/access.2018.2874767

Chang, J., Venkatasubramanian, K. K., West, A. G., & Lee, I. (2013). Analyzing and Defending against Web-based Malware. ACM Computing Surveys, 45(4), 1–35. https://doi.org/10.1145/2501654.2501663

Chen, J., Guo, S., Ma, X., Li, H., Guo, J., Chen, M., & Pan, Z. (2020). SLAM: A Malware Detection Method Based on Sliding Local Attention Mechanism. Security and Communication Networks, 1–11. https://doi.org/10.1155/2020/6724513

Chollet, F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp.1251-1258). https://doi.org/10.1109/cvpr.2017.195

Damodaran, A., Troia, F. D., Visaggio, C. A., Austin, T. H., & Stamp, M. (2015). A Comparison of Static, Dynamic, and Hybrid Analysis for Malware Detection. Journal of Computer Virology and Hacking Techniques, 13(1), 1–12. https://doi.org/10.1007/s11416-015-0261-z

Fang, Z., Wang, J., Geng, J., & Kan, X. (2019). Feature Selection for Malware Detection Based on Reinforcement Learning. IEEE Access, 7, 176177–176187. https://doi.org/10.1109/access.2019.2957429

Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez, E., & Serra, X. (2017). Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks. In Virtanen,

T., Mesaros, A., Heittola, T., Diment, A., Vincent, E., Benetos, E., Martinez B. (Eds). Detection and Classification of Acoustic Scenes and Events 2017 Workshop: Tampere University of Technology (pp.37-41). http://hdl.handle.net/10230/33454

Gibert, D., Mateu, C., & Planes, J. (2020). The rise of Machine Learning for Detection and Classification of Malware: Research developments, Trends and Challenges. Journal of Network and Computer Applications, 153, 102526. https://doi.org/10.1016/j.jnca.2019.102526

Harikrishnan, B. (2019, December 10). Confusion Matrix, Accuracy, Precision, Recall, F1 Score Binary Classification Metric. National Institute of Advanced Studies, Bengaluru, India. https://medium.com/@harikrishnannb

Hossin, M., & Sulaiman, N. (2015). A Review on Evaluation Metrics for Data Classification. International Journal of Data Mining & Knowledge Management Process, 5(2), 01-11. https://dio.org/10.5121/ijdkp.2015.5201

Huang, K. (2020). An Optimized LightGBM Model for Fraud Detection. Journal of Physics: Conference Series, 1651(1), 012111. https://doi.org/10.1088/1742-6596/1651/1/012111

Hussain, S. J., Ahmed, U., Liaquat, H., Mir, S., Jhanjhi, N., & Humayun, M. (2019). IMIAD: Intelligent Malware Identification for Android Platform. 2019 International Conference on Computer and Information Sciences (ICCIS). (pp. 1- 6). https://doi.org/10.1109/iccisci.2019.8716471

Ju, Y., Sun, G., Chen, Q., Zhang, M., Zhu, H., & Rehman, M. U. (2019). A Model Combining Convolutional Neural Network and LightGBM Algorithm for Ultra-Short-Term Wind Power Forecasting. IEEE Access, 7, 28309–28318. https://doi.org/10.1109/access.2019.2901920

Kalash, M., Rochan, M., Mohammed, N., Bruce, N. D. B., Wang, Y., & Iqbal, F. (2018). Malware Classification with Deep Convolutional Neural Networks. 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS). https://doi.org/10.1109/ntms.2018.8328749

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 31st International Conference On Neural Information Processing Systems, (pp. 3149–3157). https://dl.acm.org/doi/10.5555/3294996.3295074

Khandelwal, P. (2017, June 12). Which Algorithm takes the crown: LightGBM vs XGBOOST? Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-

Kumar, A. (2017). A Frame work for Malware Detection with Static Features using Machine Learning Algorithms. [Doctoral thesis] Soongsil University. https://doi.org/10.13140/RG.2.2.35593.90723

Landage, J., & Wankhade, P. (2013). Malware and Malware Detection Techniques: A Survey. International Journal of Engineering Research & Technology, 2(12), 61 - 68. https://doi.org/ 10.17577/IJERTV2IS120163

Lo, W. W., Yang, X., & Wang, Y. (2019). An Xception Convolutional Neural Network for Malware Classification with Transfer Learning. 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS) (pp.1-5). https://doi.org/10.1109/ntms.2019.8763852

Machado, M. R., Karray, S., & Sousa, I. T. (2019). LightGBM: an Effective Decision Tree Gradient Boosting Method to Predict Customer Loyalty in the Finance Industry. 2019 14th International Conference on Computer Science & Amp; Education (ICCSE) (pp. 1111-1116). https://doi.org/10.1109/iccse.2019.8845529

Malith, O. (n.d.). A Simple Utility to Convert EXE Files to PNG Images and Vice Versa. Github. Retrieved from http://github.com/OsandaMalith/Exe2Image

Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Shabtai, A., Breitenbacher, D., & Elovici, Y. (2018). N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Computing, 17(3), 12–22. https://doi.org/10.1109/mprv.2018.03367731

Microsoft Cooperation. (2021). Read the Docs, LightGBM Release 3.2.1.99. Github. Retrieved from https://lightgbm.readthedocs.io/

Minastireanu, E. A., & Mesnita, G. (2019). LightGBM Machine Learning Algorithm to Online Click Fraud Detection. Journal of Information Assurance &Amp; Cybersecurity, 12, 1–12. https://doi.org/10.5171/2019.263928

Mishra, A. (2018, February 24). Metrics to Evaluate your Machine Learning Algorithm. Towards Data Science. https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234

Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B. (2011). Malware Images: Visualization and Automatic Classification. 8th International Symposium on Visualization for Cyber Security 2011 (pp.1–7). https://doi.org/10.1145/2016904.2016908

Nawaz, A. (2021). Feature Engineering based on Hybrid Features for Malware Detection over Android Framework. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(10), 2856–2864. https://doi.org/10.17762/turcomat.v12i10.4931

Pan, Q., Tang, W., & Yao, S. (2020). The Application of LightGBM in Microsoft Malware Detection. Journal of Physics: Conference Series, 1684(1), 012041. https://doi.org/10.1088/1742-6596/1684/1/012041

Pant, D., & Bista, R. (2021b). Image-based Malware Classification using Deep Convolutional Neural Network and Transfer Learning. 2021 3rd International Conference on Advanced Information Science and System (AISS 2021). https://doi.org/10.1145/3503047.3503081

Şahin, D. Z., Kural, O. E., Akleylek, S., & Kılıç, E. (2021). A Novel Permission-based Android Malware Detection System using Feature Selection based on Linear Regression. Neural Computing and Applications, 33, 1 – 16. https://doi.org/10.1007/s00521-021-05875-1

Shaheed, K., Mao, A., Qureshi, I., Kumar, M., Hussain, S., Ullah, I., & Zhang, X. (2022). DS-CNN: A pre-trained Xception Model based on Depth-Wise Separable Convolutional Neural Network for Finger Vein Recognition. Expert Systems With Applications, 191, 116288. https://doi.org/10.1016/j.eswa.2021.116288

Sharma, A. (2018, October 15). Understanding GOSS and EFB: The core Pillars of LightGBM. Towards Data Science. https://towardsdatascience.com/what-makes-lightgbm-lightning-fast-a27cf0d9785e

Singh, J., & Singh, J. (2021). A Survey on Machine Learning-based Malware Detection in Executable Files. Journal of Systems Architecture, 112, 101861. https://doi.org/10.1016/j.sysarc.2020.101861

Su, J., Vargas, V. D., Prasad, S., Daniele, S., Feng, Y., & Sakurai, K. (2018). Lightweight Classification of IoT Malware Based on Image Recognition. 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (pp. 664 - 669). https://doi.org/10.1109/compsac.2018.10315

Sun, X., Liu, M., & Sima, Z. (2020). A Novel Cryptocurrency Price Trend Forecasting Model Based on LightGBM. Finance Research Letters, 32, 101084. https://doi.org/10.1016/j.frl.2018.12.032

Venkat, T., Rao, N., Unnisa, A., & Sreni, K. (2020). Medicine Recommendation System based on Patient Reviews. International journal of Scientific & Technology research, 9(2), 3308 - 3312.

Wang, J. (2018). Detection and Analysis of Web-based Malware and Vulnerability [Doctoral thesis]. Nanyang Technological University, Singapore. https://doi.org/10.32657/10220/47659

Wong, M. Y., Landen, M., Antonakakis, M., Blough, M. D., Redmiles, M. E., & Ahamad, M. (2021). An inside look into the practice of Malware Analysis. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (pp. 3053–3069). ACM SIGSAC. https://doi.org/10.1145/3460120.3484759

Downloads

Published

2022-10-28

How to Cite

Onoja, M., Jegede, A., Blamah, N., Abimbola, O. V., & Omotehinwa, T. O. (2022). EEMDS: Efficient and Effective Malware Detection System with Hybrid Model based on XceptionCNN and LightGBM Algorithm. Journal of Computing and Social Informatics, 1(2), 42–57. https://doi.org/10.33736/jcsi.4739.2022