K-Means Clustering and Naive Bayes Classification for Intrusion Detection

  • Z. Muda Faculty of Computer Science and Information Technology, University Putra Malaysia
  • W. Yassin Faculty of Computer Science and Information Technology, University Putra Malaysia
  • M.N. Sulaiman Faculty of Computer Science and Information Technology, University Putra Malaysia
  • N.I. Udzir Faculty of Computer Science and Information Technology, University Putra Malaysia
Keywords: Intrusion Detection system, Anomaly Detection, Hybrid Learning, Clustering, Classification

Abstract

Intrusion detection systems (IDS) effectively complement other security mechanisms by detecting malicious activities on a computer or network, and their development is evolving at an extraordinary rate. The anomaly-based IDS, which uses learning algorithms, allows detection of unknown attacks. Unfortunately, the major challenge of this approach is to minimize false alarms while maximizing detection and accuracy rates. To overcome this problem, we propose a hybrid learning approach through the combination of K-Means clustering and Naïve Bayes classification. K-Means clustering is used to cluster all data into the corresponding group based on data behavior, i.e. malicious and non-malicious, while the Naïve Bayes classifier is used to classify clustered data into correct categories, i.e. R2L, U2R, Probe, DoS and Normal. Experiments have been carried out to evaluate the performance of the proposed approach using KDD Cup ’99 dataset. The results showed that our proposed approach significantly improves the accuracy, detection rate up to 99.6% and 99.8%, respectively, while decreasing false alarms to 0.5%.

References

Wenke Lee, J. Salvatore Stolfo, and W. Kui Mok, 1999. A Data Mining Framework for Adaptive Intrusion Detection, Proceedings of the 1999 lEEE Symposium on Security and Privacy, p.I20-132.

Patcha, A., Park, J-M., 2007. An overview of anomaly detection techniques: Existing solutions and latest technological trends. Computer Network.

https://doi.org/10.1016/j.comnet.2007.02.001

Solahuddin, S.,2008. Applying knowledge discovery in database techniques. Modeling Packet Header Anomaly Intrusion Detection Systems. Journal of Software, 3(9): p.68-76.

https://doi.org/10.4304/jsw.3.9.68-76

Ming, X., and Changjun, Z., 2009. Applied Research on Data Mining Algorithm in Network Intrusion Detection. International Joint Conference on Artificial Intelligence.

Tsang, C.H., Kwong, S., and Wang, H, 2007. Genetic-fuzzy rule mining approach and evaluation of feature selection techniques for anomaly intrusion detection. Pattern Recognition, 40: p.2373-2391.

https://doi.org/10.1016/j.patcog.2006.12.009

Tsai, C.F. and Lin, C.Y, 2010. A triangle area-based nearest neighbors approach to intrusion detection. Pattern Recognition, 43(1): p.222-229.

https://doi.org/10.1016/j.patcog.2009.05.017

Yang, L. and Li, G., 2007. An active learning based on TCM-KNN algorithm for supervised network intrusion. Computer and Securtiy, 26: p.459-467.

https://doi.org/10.1016/j.cose.2007.10.002

Gang, W., Jinxing, H., and Jian, M., 2011. A new approach to intrusion detection using Artificial Neural Networks and fuzzy clustering. Expert systems with applications,376: p.6225-6232.

https://doi.org/10.1016/j.eswa.2010.02.102

Cao, L., Zhong, J., and Feng, Y., 2010. Construction Cosine RBF Neural Networks Based on Artificial Immune Networks. Lecture Notes In Computer Science, p.134-141.

https://doi.org/10.1007/978-3-642-17313-4_13

Shaohua, T., Hongle, D., Naiqi, W., Wei, Z., and Jiangyi, S., 2010. A Cooperative Network Intrusion Detection Based on Fuzzy SVMs. Journal of Networks, 5: p.475-483.

Amiri, F., Mohammad, R. Y., Caro, L., Azadeh, S., and Nasser, Y., 2011. Mutual Information-Based Feature Selection for Intrusion Detection System. Journal of Network and Computer Applications, 34: p.1184-1199.

https://doi.org/10.1016/j.jnca.2011.01.002

Horng, S.J., 2011. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications. 38(1): p.306-313.

https://doi.org/10.1016/j.eswa.2010.06.066

Panda, M. and Patra, M.R., 2008. A comparative study of data mining algorithms for network intrusion detection. In Proceedings of ICETET, India, p.504-507.

https://doi.org/10.1109/ICETET.2008.80

Huy Anh, N., and Deokjai, C., 2008. Application of Data Mining to Network Intrusion Detection: Classifier Selection Model. Lecture Notes in Computer Science, 5297: p.399-408.

https://doi.org/10.1007/978-3-540-88623-5_41

Meera, G., and Srivatsa, S.K., 2010. Classification Algorithms in Comparing Classifier Categories to Predict the Accuracy of the Network Intrusion Detection - A Machine Learning Approach. Advances in Computational Sciences and Technology, 3:p.321-334.

Lippmann, R.P., Fried, D.J., Graf, I., Haines, J.W., Kendall, K.R., McClung, D., Weber, D., Webster, S.E., Wyschogrod, D., Cunningham, R.K., and Zissman, M.A., 2000. Evaluating intrusion detection systems: The 1998 DARPA off-line intrusion detection evaluation. In Proceedings of the 2000 DARPA Information Survivability Conference and Exposition (DISCEX), Los Alamitos, CA, 2: p.12-26.

Harry, Z., and Jiang, S., 2008. Naive Bayes for optimal ranking. Journal of Experimental and Theoretical Artificial Intelligence, 20: p.79-93.

https://doi.org/10.1080/09528130701476391

KDD (1999). < http://kdd.ics.uci.edu/databases/ - kddcup99/kddcup99.html> [Accessed 5 Jan 2011]. [20] Breiman, L. Et al., 1984. Classification and regression trees. Monterey, CA: Wadsworth & Books/Cole Advanced Boks & Software.

Shi-Jinn, H., Ming-Yang, S., and Yuan-Hsin, C., 2011. A novel intrusion detection system based on hierarchical clustering and support vector machines. Expert Systems with Applications, 38: p.306-313

https://doi.org/10.1016/j.eswa.2010.06.066

Xiang, C., Yong, P.C., and Meng, L.S., 2008. Design of multiple level hybrid classifier for intrusion detection system using Bayesian clustering and decision tree. Pattern Recognition Letters, 29: p.918-924

https://doi.org/10.1016/j.patrec.2008.01.008

Toosi, M., 2007. A new approach to intrusion detection based on an evolutionary soft computing model using neuro-fuzzy classifiers. Computer Communications, 30: p.2201-2212.

https://doi.org/10.1016/j.comcom.2007.05.002

Published
2016-04-21
How to Cite
Muda, Z., Yassin, W., Sulaiman, M., & Udzir, N. (2016). K-Means Clustering and Naive Bayes Classification for Intrusion Detection. Journal of IT in Asia, 4(1), 13-25. https://doi.org/10.33736/jita.45.2014
Section
Articles