K-Means Clustering and Naive Bayes Classification for Intrusion Detection

Authors

Z. Muda Faculty of Computer Science and Information Technology, University Putra Malaysia
W. Yassin Faculty of Computer Science and Information Technology, University Putra Malaysia
M.N. Sulaiman Faculty of Computer Science and Information Technology, University Putra Malaysia
N.I. Udzir Faculty of Computer Science and Information Technology, University Putra Malaysia

DOI:

https://doi.org/10.33736/jita.45.2014

Keywords:

Intrusion Detection system, Anomaly Detection, Hybrid Learning, Clustering, Classification

Abstract

Intrusion detection systems (IDS) effectively complement other security mechanisms by detecting malicious activities on a computer or network, and their development is evolving at an extraordinary rate. The anomaly-based IDS, which uses learning algorithms, allows detection of unknown attacks. Unfortunately, the major challenge of this approach is to minimize false alarms while maximizing detection and accuracy rates. To overcome this problem, we propose a hybrid learning approach through the combination of K-Means clustering and Naïve Bayes classification. K-Means clustering is used to cluster all data into the corresponding group based on data behavior, i.e. malicious and non-malicious, while the Naïve Bayes classifier is used to classify clustered data into correct categories, i.e. R2L, U2R, Probe, DoS and Normal. Experiments have been carried out to evaluate the performance of the proposed approach using KDD Cup ’99 dataset. The results showed that our proposed approach significantly improves the accuracy, detection rate up to 99.6% and 99.8%, respectively, while decreasing false alarms to 0.5%.

References

Wenke Lee, J. Salvatore Stolfo, and W. Kui Mok, 1999. A Data Mining Framework for Adaptive Intrusion Detection, Proceedings of the 1999 lEEE Symposium on Security and Privacy, p.I20-132.

Patcha, A., Park, J-M., 2007. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer Network.

https://doi.org/10.1016/j.comnet.2007.02.001

Solahuddin, S.,2008. Applying knowledge discovery in database techniques. Modeling Packet Header Anomaly Intrusion Detection Systems. Journal of Software, 3(9): p.68-76.

https://doi.org/10.4304/jsw.3.9.68-76

Ming, X., and Changjun, Z., 2009. Applied Research on Data Mining Algorithm in Network Intrusion Detection. International Joint Conference on Artificial Intelligence.

Tsang, C.H., Kwong, S., and Wang, H, 2007. Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection. Pattern Recognition, 40: p.2373-2391.

https://doi.org/10.1016/j.patcog.2006.12.009

Tsai, C.F. and Lin, C.Y, 2010. A triangle area-based nearest neighbors approach to intrusion detection. Pattern Recognition, 43(1): p.222-229.

https://doi.org/10.1016/j.patcog.2009.05.017

Yang, L. and Li, G., 2007. An active learning based on TCM-KNN algorithm for supervised network intrusion. Computer and Securtiy, 26: p.459-467.

https://doi.org/10.1016/j.cose.2007.10.002

Gang, W., Jinxing, H., and Jian, M., 2011. A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering. Expert systems with applications,376: p.6225-6232.

https://doi.org/10.1016/j.eswa.2010.02.102

Cao, L., Zhong, J., and Feng, Y., 2010. Construction Cosine RBF Neural Networks Based on Artificial Immune Networks. Lecture Notes In Computer Science, p.134-141.

https://doi.org/10.1007/978-3-642-17313-4_13

Shaohua, T., Hongle, D., Naiqi, W., Wei, Z., and Jiangyi, S., 2010. A Cooperative Network Intrusion Detection Based on Fuzzy SVMs. Journal of Networks, 5: p.475-483.

Amiri, F., Mohammad, R. Y., Caro, L., Azadeh, S., and Nasser, Y., 2011. Mutual Information-Based Feature Selection for Intrusion Detection System. Journal of Network and Computer Applications, 34: p.1184-1199.

https://doi.org/10.1016/j.jnca.2011.01.002

Horng, S.J., 2011. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications. 38(1): p.306-313.

https://doi.org/10.1016/j.eswa.2010.06.066

Panda, M. and Patra, M.R., 2008. A comparative study of data mining algorithms for network intrusion detection. In Proceedings of ICETET, India, p.504-507.

https://doi.org/10.1109/ICETET.2008.80

Huy Anh, N., and Deokjai, C., 2008. Application of Data Mining to Network Intrusion Detection: Classifier Selection Model. Lecture Notes in Computer Science, 5297: p.399-408.

https://doi.org/10.1007/978-3-540-88623-5_41

Meera, G., and Srivatsa, S.K., 2010. Classification Algorithms in Comparing Classifier Categories to Predict the Accuracy of the Network Intrusion Detection - A Machine Learning Approach. Advances in Computational Sciences and Technology, 3:p.321-334.

Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D., Webster, S.E., Wyschogrod, D., Cunningham, R.K., and Zissman, M.A., 2000. Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In Proceedings of the 2000 DARPA Information Survivability Conference and Exposition (DISCEX), Los Alamitos, CA, 2: p.12-26.

Harry, Z., and Jiang, S., 2008. Naive Bayes for optimal ranking. Journal of Experimental and Theoretical Artificial Intelligence, 20: p.79-93.

https://doi.org/10.1080/09528130701476391

KDD (1999). < http://kdd.ics.uci.edu/databases/ - kddcup99/kddcup99.html> [Accessed 5 Jan 2011]. [20] Breiman, L. Et al., 1984. Classification and regression trees. Monterey, CA: Wadsworth & Books/Cole Advanced Boks & Software.

Shi-Jinn, H., Ming-Yang, S., and Yuan-Hsin, C., 2011. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications, 38: p.306-313

https://doi.org/10.1016/j.eswa.2010.06.066

Xiang, C., Yong, P.C., and Meng, L.S., 2008. Design of multiple level hybrid classifier for intrusion detection system using Bayesian clustering and decision tree. Pattern Recognition Letters, 29: p.918-924

https://doi.org/10.1016/j.patrec.2008.01.008

Toosi, M., 2007. A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers. Computer Communications, 30: p.2201-2212.

https://doi.org/10.1016/j.comcom.2007.05.002

Downloads

Published

2016-04-21

How to Cite

Muda, Z., Yassin, W., Sulaiman, M., & Udzir, N. (2016). K-Means Clustering and Naive Bayes Classification for Intrusion Detection. Journal of IT in Asia, 4(1), 13–25. https://doi.org/10.33736/jita.45.2014

Download Citation

Issue

Vol. 4 No. 1 (2014)

Section

Articles

License

Copyright Transfer Statement for Journal

1) In signing this statement, the author(s) grant UNIMAS Publisher an exclusive license to publish their original research papers. The author(s) also grant UNIMAS Publisher permission to reproduce, recreate, translate, extract or summarize, and to distribute and display in any forms, formats, and media. The author(s) can reuse their papers in their future printed work without first requiring permission from UNIMAS Publisher, provided that the author(s) acknowledge and reference publication in the Journal.

2) For open access articles, the author(s) agree that their articles published under UNIMAS Publisher are distributed under the terms of the CC-BY-NC-SA (Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License) which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original work of the author(s) is properly cited.

3) For subscription articles, the author(s) agree that UNIMAS Publisher holds copyright, or an exclusive license to publish. Readers or users may view, download, print, and copy the content, for academic purposes, subject to the following conditions of use: (a) any reuse of materials is subject to permission from UNIMAS Publisher; (b) archived materials may only be used for academic research; (c) archived materials may not be used for commercial purposes, which include but not limited to monetary compensation by means of sale, resale, license, transfer of copyright, loan, etc.; and (d) archived materials may not be re-published in any part, either in print or online.

4) The author(s) is/are responsible to ensure his or her or their submitted work is original and does not infringe any existing copyright, trademark, patent, statutory right, or propriety right of others. Corresponding author(s) has (have) obtained permission from all co-authors prior to submission to the journal. Upon submission of the manuscript, the author(s) agree that no similar work has been or will be submitted or published elsewhere in any language. If submitted manuscript includes materials from others, the authors have obtained the permission from the copyright owners.

5) In signing this statement, the author(s) declare(s) that the researches in which they have conducted are in compliance with the current laws of the respective country and UNIMAS Journal Publication Ethics Policy. Any experimentation or research involving human or the use of animal samples must obtain approval from Human or Animal Ethics Committee in their respective institutions. The author(s) agree and understand that UNIMAS Publisher is not responsible for any compensational claims or failure caused by the author(s) in fulfilling the above-mentioned requirements. The author(s) must accept the responsibility for releasing their materials upon request by Chief Editor or UNIMAS Publisher.

6) The author(s) should have participated sufficiently in the work and ensured the appropriateness of the content of the article. The author(s) should also agree that he or she has no commercial attachments (e.g. patent or license arrangement, equity interest, consultancies, etc.) that might pose any conflict of interest with the submitted manuscript. The author(s) also agree to make any relevant materials and data available upon request by the editor or UNIMAS Publisher.