EEMDS: Efficient and Effective Malware Detection System with Hybrid Model based on XceptionCNN and LightGBM Algorithm
DOI:
https://doi.org/10.33736/jcsi.4739.2022Keywords:
Anomaly-based Detection, Machine Learning, Malware Detection, XceptionCNN, LightGBMAbstract
The security threats posed by malware make it imperative to build a model for efficient and effective classification of malware based on its family, irrespective of the variant. Preliminary experiments carried out demonstrate the suitability of the generic LightGBM algorithm for Windows malware as well as its effectiveness and efficiency in terms of detection accuracy, training accuracy, prediction time and training time. The prediction time of the generic LightGBM is 0.08s for binary class and 0.40s for multi-class on the Malimg dataset. The classification accuracy of the generic LightGBM is 99% True Positive Rate (TPR). Its training accuracy is 99.80% for binary class and 96.87% for multi-class, while the training time is 179.51s and 2224.77s for binary and multi classification respectively. The performance of the generic LightGBM leaves room for improvement, hence, the need to improve the classification accuracy and training accuracy of the model for effective decision making and to reduce the prediction time and training time for efficiency. It is also imperative to improve the performance and accuracy for effectiveness on larger samples. The goal is to enhance the detection accuracy and reduce the prediction time. The reduction in prediction time provides early detection of malware before it damages files stored in computer systems. Performance evaluation based on Malimg dataset demonstrates the effectiveness and efficiency of the hybrid model. The proposed model is a hybrid model which integrates XceptionCNN with LightGBM algorithm for Windows Malware classification on google colab environment. It uses the Malimg malware dataset which is a benchmark dataset for Windows malware image classification. It contains 9,339 Malware samples, structured as grayscale images, consisting of 25 families and 1,042 Windows benign executable files extracted from Windows environments. The proposed XceptionCNN-LightGBM technique provides improved classification accuracy of 100% TPR, with an overall reduction in the prediction time of 0.08s and 0.37s for binary and multi-class respectively. These are lower than the prediction time for the generic LightGBM which is 0.08s for binary class and 0.40s for multi-class, with an improved 100% classification accuracy. The training accuracy increased to 99.85% for binary classification and 97.40% for multi classification, with reduction in the training time of 29.97s for binary classification and 447.75s for multi classification. These are also lower than the training times for the generic LightGBM model, which are 179.51s and 2224.77s for the binary and multi classification respectively. This significant reduction in the training time makes it possible for the model to converge quickly and train a large sum of data within a relatively short period of time. Overall, the reduction in detection time and improvement in detection accuracy will minimize damages to files stored in computer systems in the event of malware attack.
References
Abbadi, M. A., Al-Bustanji, A. M., & Al-kasassbeh, M. (2020, April 30). Robust Intelligent Malware Detection using LightGBM Algorithm. International Journal of Innovative Technology and Exploring Engineering, 9(6), 1253–1260. https://doi.org/10.35940/ijitee.f4043.049620
Abusitta, A., Li, M. Q., & Fung, B. C. (2021). Malware Classification and Composition Analysis: A Survey of Recent Developments. Journal of Information Security and Applications, 59, 102828. https://doi.org/10.1016/j.jisa.2021.102828
Bazrafshan, Z., Hashemi, H., Fard, S. M. H., & Hamzeh, A. (2013). A Survey on Heuristic Malware Detection Techniques. The 5th Conference on Information and Knowledge Technology (pp. 113-120). https://doi.org/10.1109/ikt.2013.6620049
Bensaoud, A., & Kalita, J. (2022). Deep Multi-task Learning for Malware Image Classification. Journal of Information Security and Applications, 64, 103057. https://doi.org/10.1016/j.jisa.2021.103057
Bhodia, N., Prajapati, P., Di Troia, F., & Stamp, M. (2019). Transfer Learning for Image-based Malware Classification. Proceedings of the 5th International Conference on Information Systems Security and Privacy (pp 719-726). https://doi.org/10.5220/0007701407190726
Carneiro, T., Nobrega, R. V., Nepomuceno, T., Bian, G. B., De Albuquerque, V. H. C., & Filho, P. P. R. (2018). Performance Analysis of Google Colaboratory as a Tool for Accelerating Deep Learning Applications. IEEE Access, 6, 61677–61685. https://doi.org/10.1109/access.2018.2874767
Chang, J., Venkatasubramanian, K. K., West, A. G., & Lee, I. (2013). Analyzing and Defending against Web-based Malware. ACM Computing Surveys, 45(4), 1–35. https://doi.org/10.1145/2501654.2501663
Chen, J., Guo, S., Ma, X., Li, H., Guo, J., Chen, M., & Pan, Z. (2020). SLAM: A Malware Detection Method Based on Sliding Local Attention Mechanism. Security and Communication Networks, 1–11. https://doi.org/10.1155/2020/6724513
Chollet, F. (2017). Xception: Deep Learning with Depthwise Separable Convolutions. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (pp.1251-1258). https://doi.org/10.1109/cvpr.2017.195
Damodaran, A., Troia, F. D., Visaggio, C. A., Austin, T. H., & Stamp, M. (2015). A Comparison of Static, Dynamic, and Hybrid Analysis for Malware Detection. Journal of Computer Virology and Hacking Techniques, 13(1), 1–12. https://doi.org/10.1007/s11416-015-0261-z
Fang, Z., Wang, J., Geng, J., & Kan, X. (2019). Feature Selection for Malware Detection Based on Reinforcement Learning. IEEE Access, 7, 176177–176187. https://doi.org/10.1109/access.2019.2957429
Fonseca, E., Gong, R., Bogdanov, D., Slizovskaia, O., Gomez, E., & Serra, X. (2017). Acoustic Scene Classification by Ensembling Gradient Boosting Machine and Convolutional Neural Networks. In Virtanen,
T., Mesaros, A., Heittola, T., Diment, A., Vincent, E., Benetos, E., Martinez B. (Eds). Detection and Classification of Acoustic Scenes and Events 2017 Workshop: Tampere University of Technology (pp.37-41). http://hdl.handle.net/10230/33454
Gibert, D., Mateu, C., & Planes, J. (2020). The rise of Machine Learning for Detection and Classification of Malware: Research developments, Trends and Challenges. Journal of Network and Computer Applications, 153, 102526. https://doi.org/10.1016/j.jnca.2019.102526
Harikrishnan, B. (2019, December 10). Confusion Matrix, Accuracy, Precision, Recall, F1 Score Binary Classification Metric. National Institute of Advanced Studies, Bengaluru, India. https://medium.com/@harikrishnannb
Hossin, M., & Sulaiman, N. (2015). A Review on Evaluation Metrics for Data Classification. International Journal of Data Mining & Knowledge Management Process, 5(2), 01-11. https://dio.org/10.5121/ijdkp.2015.5201
Huang, K. (2020). An Optimized LightGBM Model for Fraud Detection. Journal of Physics: Conference Series, 1651(1), 012111. https://doi.org/10.1088/1742-6596/1651/1/012111
Hussain, S. J., Ahmed, U., Liaquat, H., Mir, S., Jhanjhi, N., & Humayun, M. (2019). IMIAD: Intelligent Malware Identification for Android Platform. 2019 International Conference on Computer and Information Sciences (ICCIS). (pp. 1- 6). https://doi.org/10.1109/iccisci.2019.8716471
Ju, Y., Sun, G., Chen, Q., Zhang, M., Zhu, H., & Rehman, M. U. (2019). A Model Combining Convolutional Neural Network and LightGBM Algorithm for Ultra-Short-Term Wind Power Forecasting. IEEE Access, 7, 28309–28318. https://doi.org/10.1109/access.2019.2901920
Kalash, M., Rochan, M., Mohammed, N., Bruce, N. D. B., Wang, Y., & Iqbal, F. (2018). Malware Classification with Deep Convolutional Neural Networks. 2018 9th IFIP International Conference on New Technologies, Mobility and Security (NTMS). https://doi.org/10.1109/ntms.2018.8328749
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. (2017). LightGBM: A Highly Efficient Gradient Boosting Decision Tree. 31st International Conference On Neural Information Processing Systems, (pp. 3149–3157). https://dl.acm.org/doi/10.5555/3294996.3295074
Khandelwal, P. (2017, June 12). Which Algorithm takes the crown: LightGBM vs XGBOOST? Analytics Vidhya. https://www.analyticsvidhya.com/blog/2017/06/which-algorithm-
Kumar, A. (2017). A Frame work for Malware Detection with Static Features using Machine Learning Algorithms. [Doctoral thesis] Soongsil University. https://doi.org/10.13140/RG.2.2.35593.90723
Landage, J., & Wankhade, P. (2013). Malware and Malware Detection Techniques: A Survey. International Journal of Engineering Research & Technology, 2(12), 61 - 68. https://doi.org/ 10.17577/IJERTV2IS120163
Lo, W. W., Yang, X., & Wang, Y. (2019). An Xception Convolutional Neural Network for Malware Classification with Transfer Learning. 2019 10th IFIP International Conference on New Technologies, Mobility and Security (NTMS) (pp.1-5). https://doi.org/10.1109/ntms.2019.8763852
Machado, M. R., Karray, S., & Sousa, I. T. (2019). LightGBM: an Effective Decision Tree Gradient Boosting Method to Predict Customer Loyalty in the Finance Industry. 2019 14th International Conference on Computer Science & Amp; Education (ICCSE) (pp. 1111-1116). https://doi.org/10.1109/iccse.2019.8845529
Malith, O. (n.d.). A Simple Utility to Convert EXE Files to PNG Images and Vice Versa. Github. Retrieved from http://github.com/OsandaMalith/Exe2Image
Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Shabtai, A., Breitenbacher, D., & Elovici, Y. (2018). N-BaIoT—Network-Based Detection of IoT Botnet Attacks Using Deep Autoencoders. IEEE Pervasive Computing, 17(3), 12–22. https://doi.org/10.1109/mprv.2018.03367731
Microsoft Cooperation. (2021). Read the Docs, LightGBM Release 3.2.1.99. Github. Retrieved from https://lightgbm.readthedocs.io/
Minastireanu, E. A., & Mesnita, G. (2019). LightGBM Machine Learning Algorithm to Online Click Fraud Detection. Journal of Information Assurance &Amp; Cybersecurity, 12, 1–12. https://doi.org/10.5171/2019.263928
Mishra, A. (2018, February 24). Metrics to Evaluate your Machine Learning Algorithm. Towards Data Science. https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234
Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B. (2011). Malware Images: Visualization and Automatic Classification. 8th International Symposium on Visualization for Cyber Security 2011 (pp.1–7). https://doi.org/10.1145/2016904.2016908
Nawaz, A. (2021). Feature Engineering based on Hybrid Features for Malware Detection over Android Framework. Turkish Journal of Computer and Mathematics Education (TURCOMAT), 12(10), 2856–2864. https://doi.org/10.17762/turcomat.v12i10.4931
Pan, Q., Tang, W., & Yao, S. (2020). The Application of LightGBM in Microsoft Malware Detection. Journal of Physics: Conference Series, 1684(1), 012041. https://doi.org/10.1088/1742-6596/1684/1/012041
Pant, D., & Bista, R. (2021b). Image-based Malware Classification using Deep Convolutional Neural Network and Transfer Learning. 2021 3rd International Conference on Advanced Information Science and System (AISS 2021). https://doi.org/10.1145/3503047.3503081
Şahin, D. Z., Kural, O. E., Akleylek, S., & Kılıç, E. (2021). A Novel Permission-based Android Malware Detection System using Feature Selection based on Linear Regression. Neural Computing and Applications, 33, 1 – 16. https://doi.org/10.1007/s00521-021-05875-1
Shaheed, K., Mao, A., Qureshi, I., Kumar, M., Hussain, S., Ullah, I., & Zhang, X. (2022). DS-CNN: A pre-trained Xception Model based on Depth-Wise Separable Convolutional Neural Network for Finger Vein Recognition. Expert Systems With Applications, 191, 116288. https://doi.org/10.1016/j.eswa.2021.116288
Sharma, A. (2018, October 15). Understanding GOSS and EFB: The core Pillars of LightGBM. Towards Data Science. https://towardsdatascience.com/what-makes-lightgbm-lightning-fast-a27cf0d9785e
Singh, J., & Singh, J. (2021). A Survey on Machine Learning-based Malware Detection in Executable Files. Journal of Systems Architecture, 112, 101861. https://doi.org/10.1016/j.sysarc.2020.101861
Su, J., Vargas, V. D., Prasad, S., Daniele, S., Feng, Y., & Sakurai, K. (2018). Lightweight Classification of IoT Malware Based on Image Recognition. 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC) (pp. 664 - 669). https://doi.org/10.1109/compsac.2018.10315
Sun, X., Liu, M., & Sima, Z. (2020). A Novel Cryptocurrency Price Trend Forecasting Model Based on LightGBM. Finance Research Letters, 32, 101084. https://doi.org/10.1016/j.frl.2018.12.032
Venkat, T., Rao, N., Unnisa, A., & Sreni, K. (2020). Medicine Recommendation System based on Patient Reviews. International journal of Scientific & Technology research, 9(2), 3308 - 3312.
Wang, J. (2018). Detection and Analysis of Web-based Malware and Vulnerability [Doctoral thesis]. Nanyang Technological University, Singapore. https://doi.org/10.32657/10220/47659
Wong, M. Y., Landen, M., Antonakakis, M., Blough, M. D., Redmiles, M. E., & Ahamad, M. (2021). An inside look into the practice of Malware Analysis. Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (pp. 3053–3069). ACM SIGSAC. https://doi.org/10.1145/3460120.3484759
Downloads
Published
How to Cite
Issue
Section
License
Copyright Transfer Statement for Journal
1) In signing this statement, the author(s) grant UNIMAS Publisher an exclusive license to publish their original research papers. The author(s) also grant UNIMAS Publisher permission to reproduce, recreate, translate, extract or summarise, and to distribute and display in any forms, formats, and media. The author(s) can reuse their papers in their future printed work without first requiring permission from UNIMAS Publisher, provided that the author(s) acknowledge and reference publication in the Journal.
2) For open access articles, the author(s) agree that their articles published under UNIMAS Publisher are distributed under the terms of the CC-BY-NC-SA (Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License) which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original work of the author(s) is properly cited.
3) For subscription articles, the author(s) agree that UNIMAS Publisher holds copyright, or an exclusive license to publish. Readers or users may view, download, print, and copy the content, for academic purposes, subject to the following conditions of use: (a) any reuse of materials is subject to permission from UNIMAS Publisher; (b) archived materials may only be used for academic research; (c) archived materials may not be used for commercial purposes, which include but not limited to monetary compensation by means of sale, resale, license, transfer of copyright, loan, etc.; and (d) archived materials may not be re-published in any part, either in print or online.
4) The author(s) is/are responsible to ensure his or her or their submitted work is original and does not infringe any existing copyright, trademark, patent, statutory right, or propriety right of others. Corresponding author(s) has (have) obtained permission from all co-authors prior to submission to the journal. Upon submission of the manuscript, the author(s) agree that no similar work has been or will be submitted or published elsewhere in any language. If submitted manuscript includes materials from others, the authors have obtained the permission from the copyright owners.
5) In signing this statement, the author(s) declare(s) that the researches in which they have conducted are in compliance with the current laws of the respective country and UNIMAS Journal Publication Ethics Policy. Any experimentation or research involving human or the use of animal samples must obtain approval from Human or Animal Ethics Committee in their respective institutions. The author(s) agree and understand that UNIMAS Publisher is not responsible for any compensational claims or failure caused by the author(s) in fulfilling the above-mentioned requirements. The author(s) must accept the responsibility for releasing their materials upon request by Chief Editor or UNIMAS Publisher.
6) The author(s) should have participated sufficiently in the work and ensured the appropriateness of the content of the article. The author(s) should also agree that he or she has no commercial attachments (e.g. patent or license arrangement, equity interest, consultancies, etc.) that might pose any conflict of interest with the submitted manuscript. The author(s) also agree to make any relevant materials and data available upon request by the editor or UNIMAS Publisher.