Improved Feature Selection Based on Mutual Information for Regression Tasks

  • Muhammad A. Sulaiman
  • Jane Labadin Universiti Malaysia Sarawak
Keywords: feature selection, filter, estimation, mutual information, machine learning

Abstract

Mutual Information (MI) is an information theory concept often used in the recent time as a criterion for feature selection methods. This is due to its ability to capture both linear and non-linear dependency relationships between two variables. In theory, mutual information is formulated based on probability density functions (pdfs) or entropies of the two variables. In most machine learning applications, mutual information estimation is formulated for classification problems (that is data with labeled output). This study investigates the use of mutual information estimation as a feature selection criterion for regression tasks and introduces enhancement in selecting optimal feature subset based on previous works. Specifically, while focusing on regression tasks, it builds on the previous work in which a scientifically sound stopping criteria for feature selection greedy algorithms was proposed. Four real-world regression datasets were used in this study, three of the datasets are public obtained from UCI machine learning repository and the remaining one is a private well log dataset. Two Machine learning models namely multiple regression and artificial neural networks (ANN) were used to test the performance of IFSMIR. The results obtained has proved the effectiveness of the proposed method.

References

Battiti, R. (1994). Using Mutual Information for Selecting Features in Supervised Neural Net Learning. IEEE Transaction on Neural Networks, 5.

https://doi.org/10.1109/72.298224

Cortez, P. & Morais, A. (2007). A Data Mining Approach to Predict Forest Fires using Meteorological Data. In J. Neves, M. F. Santos & J. Machado (Eds.), New Trends in Artificial Intelligence, Proceedings of the 13th EPIA 2007 - Portuguese Conference on Artificial Intelligence, December, Guimaraes, Portugal, 512-523. Available at: http://www.dsi.uminho.pt/~pcortez/fires.pdf

Doquire, G. & Verleysen, M. (2011). Feature Selection with Mutual Information for Uncertain Data. Springer Link Data Warehousing and Knowledge Discovery Lecture Notes in Computer Science, 6862, 330-341.

https://doi.org/10.1007/978-3-642-23544-3_25

Evans, D. (2008). A Computationally efficient estimator for mutual information. Proc. R. Soc. A, 464, 1203-1215.

https://doi.org/10.1098/rspa.2007.0196

François, D., Rossi, F., Wertz, V. & Verleysen, M. (2007). Resampling methods for parameter-free and robust feature selection with mutual information. Neurocomputing, 70, 7-9, 1276-1288.

https://doi.org/10.1016/j.neucom.2006.11.019

Frénay, B., Doquire, G., & Verleysen, M. (2013). Is mutual information adequate for feature selection in regression? Neural Networks Letter, 48, 1-7.

https://doi.org/10.1016/j.neunet.2013.07.003

Gringarten, E. (2012). Integrated uncertainty assessment - from seismic and well-logs to flow simulation. PARADIGM, SEG Las Vegas 2012 Annual Meeting.

https://doi.org/10.1190/segam2012-1375.1

Guyon, I. & Elisseeff, A. (2003). An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3, 1157-1182.

Kozachenko, L. F. & Leonenko, N. N. (1987). Sample estimate of entropy of a random vector. Probl. Inf. Transm., 23, 95-101.

Kraskov, A., St¨ogbauer, H. & Grassberger, P. (2004). Estimating mutual information. Physical Review E, 69, 066138.

https://doi.org/10.1103/PhysRevE.69.066138

Latorre Carmona, P., Sotoca, J.M., Pla, F., Phoa, F.K.H., Bioucas Dias, J. (2011), Feature Selection in Regression Tasks Using Conditional Mutual Information. Pattern Recognition and Imag e Analysis Volume 6669 of the series Lecture Notes in Computer Science, 224-231.

https://doi.org/10.1007/978-3-642-21257-4_28

Liu, H., Liu, L. & Zhang, H (2008). Feature Selection Using Mutual Information: An Experimental Study. PRICAI 2008, LNAI 5351, Springer-Verlag Berlin Heidelberg, 235-246.

https://doi.org/10.1007/978-3-540-89197-0_24

Liu, H. & Yu, L (2005). Toward Integrating Feature Selection Algorithms for Classification and Clustering. IEEE Transactions on Knowledge and Data Engineering, 17.

https://doi.org/10.1109/TKDE.2005.66

Peng, H., Long, F. & Ding, C. (2005). Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27.

https://doi.org/10.1109/TPAMI.2005.159

Pudil, P., Novovicova, J. & Kittler, J. (1994). Floating search methods in feature selection. Pattern Recognition Letters, 15, 1119-1125.

https://doi.org/10.1016/0167-8655(94)90127-9

Redmond, M. A. & Baveja, A. (2002). A Data-Driven Software Tool for Enabling Cooperative Information Sharing Among Police Departments. European Journal of Operational Research, 141, 660-678.

https://doi.org/10.1016/S0377-2217(01)00264-8

Rossi, F., Lendasse, A., François, D., Wertz, V. & Verleysen, M. (2006). Mutual information for the selection of relevant variables in spectrometric nonlinear modelling. Chemometrics and Intelligent Laboratory Systems, 80, 2, 215-226.

https://doi.org/10.1016/j.chemolab.2005.06.010

Song, L., Smola, A., Gretton, A., Bedo, J. & Borgwardt, K. (2012). Feature Selection via Dependence Maximization. Journal of Machine Learning Research, 13, 1393 - 1433.

Sulaiman, M. A. & Labadin, J. (2015a). Feature Selection Based on Mutual Information for Machine Learning Prediction of Petroleum reservoir properties. The 9th International Conference on IT in Asia (CITA), 1-6, DOI: 10.1109/CITA.2015.7349827.

https://doi.org/10.1109/CITA.2015.7349827

Sulaiman, M. A. & Labadin, J. (2015b). Feature Selection with Mutual Information for Regression Problems. The 9th International Conference on IT in Asia (CITA), 1-6. DOI: 10.1109/CITA.2015.7349826.

https://doi.org/10.1109/CITA.2015.7349826

Unler, A., Murat, A. & Chinnam, R.B. (2011). Mr2PSO: A maximum relevance minimum redundancy feature selection method based on swarm intelligence for support vector machine classification. Elsevier Journal of Information Sciences, 181, 4625 - 4641.

https://doi.org/10.1016/j.ins.2010.05.037

Verleysen, M., Rossi, F. & François, D. (2009). Advances in Feature Selection with Mutual Information, arXiv:0909.0635 [cs.LG].

https://doi.org/10.1007/978-3-642-01805-3_4

Yan, S., Wang, H., Huang, T. S., Yang, Q. & Tang, X. (2007). Ranking with Uncertainty Labels. In Proceedings of IEEE International Conference on Multimedia and Expo.

https://doi.org/10.1109/ICME.2007.4284595

Yu, L. and Liu, H (2003). Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. Proceeding of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC.

Published
2016-12-21
How to Cite
Sulaiman, M. A., & Labadin, J. (2016). Improved Feature Selection Based on Mutual Information for Regression Tasks. Journal of IT in Asia, 6(1), 11-24. https://doi.org/10.33736/jita.330.2016
Section
Articles