The Impact of Scaling Techniques on Breast Cancer Prediction Algorithms

Authors

  • Oluwaseyi Ezekiel Olorunshola Computer Science Department, Air Force Institute of Technology, Kaduna, Nigeria
  • Okeh Dominic Ebuka Computer Science Department, Air Force Institute of Technology, Kaduna, Nigeria
  • Adeniran Kolade Ademuwagun Computer Science Department, Air Force Institute of Technology, Kaduna, Nigeria
  • Fatimah Adamu-Fika Cyber Security Department, Air Force Institute of Technology, Kaduna, Nigeria
  • Muhammad Abdullahi Kabir Cyber Security Department, Air Force Institute of Technology, Kaduna, Nigeria

DOI:

https://doi.org/10.33736/jcsi.8449.2025

Keywords:

Breast cancer, Benign, Malignant, Hyper parameter tuning, Esemble

Abstract

Breast cancer develops when the genetic material of breast cells undergoes mutations, causing the cells to grow uncontrollably and form tumors. Efforts however have been made to combat it by developing machine learning models to help clinicians with early detection. This study investigates the impact of scaling techniques on the performance of algorithms used for breast cancer prediction. Two scaling approaches were compared with models utilizing the raw, unscaled data. The result revealed that the different scaling techniques had minimal effect on the prediction performance after Hyperparameter tuning. This suggests that for the specific dataset and algorithms used, potential sources of bias were analyzed and the classifiers adapted their internal parameters to compensate for the difference in feature scaling. The model's performance was evaluated using four metrics which are Accuracy, Recall, Precision, and F1-score through the 5-fold cross-validation. The results of this study showed that the Random Forest an ensemble model outperformed all other individual classifier after hyperparameter tuning was performed, it had an Accuracy value of 0.9578, a Recall value of 0.9297, a Precision of 0.9571, and an F1-score of 0.9425.

References

Ahsan, M., Mahmud, M., Saha, P., Gupta, K., & Siddique, Z. (2021). Effect of data scaling methods on machine learning algorithms and model performance. Technologies, 9(3), 52. https://doi.org/10.3390/technologies9030052

Ambarwari, A.; Adrian, Q.J. & Herdiyeni, Y. (2020) Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification. Jurnal Resti (Rekayasa Sistem Dan Teknologi Informasi) 2020, 4, 117–122.

Balabaeva, K. & Kovalchuk, S. (2019). Comparison of Temporal and Non-Temporal Features Effect on Machine Learning Models Quality and Interpretability for Chronic Heart Failure Patients. Procedia Computer Science. 2019, 156, 87–96.

Chaurasiya, S., & Rajak, R. (2022). Comparative analysis of machine learning algorithms in breast cancer classification. Research Square (Research Square). https://doi.org/10.21203/rs.3.rs-1772158/v1

Das, A. K., Biswas, S. K., Mandal, A., Bhattacharya, A., & Sanyal, S. (2024). Machine Learning based Intelligent System for Breast Cancer Prediction (MLISBCP). Expert Systems with Applications, 242, 122673. https://doi.org/10.1016/j.eswa.2023.122673

Elsadig, M. A., Altigani, A., & Elshoush, H. T. (2023). Breast cancer detection using machine learning approaches: a comparative study. International Journal of Electrical and Computer Engineering, 13(1), 736. https://doi.org/10.11591/ijece.v13i1.pp736-745

Fatima, N., Liu, L., Hong, S., & Ahmed, H. (2020). Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access, 8, 150360–150376. https://doi.org/10.1109/access.2020.3016715

Fonseca, M. M., Lamb, L. R., Verma, R., Ogunkinle, O., & Seely, J. M. (2019). Breast pain and cancer: should we continue to work-up isolated breast pain? Breast Cancer Research and Treatment, 177(3), 619–627. https://doi.org/10.1007/s10549-019-05354-1

Halim, K. N. A., Jaya, A. S. M., & Fadzil, A. F. A. (2020). Data Pre-Processing Algorithm for Neural Network Binary Classification model in Bank Tele-Marketing. International Journal of Innovative Technology and Exploring Engineering, 9(3), 272–277 https://doi.org/10.35940/ijitee.c8472.019320

Harbeck, N., Penault-Llorca, F., Cortes, J., Gnant, M., Houssami, N., Poortmans, P., Ruddy, K., Tsang, J., & Cardoso, F. (2019). Breast cancer. Nature Reviews. Disease Primers, 5(1). https://doi.org/10.1038/s41572-019-0111-2

Huang, Y., Zeng, P., & Zhong, C. (2024). Classifying breast cancer subtypes on multi-omics data via sparse canonical correlation analysis and deep learning. BMC Bioinformatics, 25(1). https://doi.org/10.1186/s12859-024-05749-y

Jaiswal, V., Suman, P., & Bisen, D. (2023). An improved ensembling techniques for prediction of breast cancer tissues. Multimedia Tools and Applications, 83(11), 31975–32000. https://doi.org/10.1007/s11042-023-16949-8

Katsura, C., Ogunmwonyi, I., Kankam, H. K., & Saha, S. (2022). Breast cancer: presentation, investigation and management. British Journal of Hospital Medicine, 83(2), 1–7. https://doi.org/10.12968/hmed.2021.0459

Laghmati, S., Hamida, S., Hicham, K., Cherradi, B., & Tmiri, A. (2023). An improved breast cancer disease prediction system using ML and PCA. Multimedia Tools and Applications, 83(11), 33785–33821. https://doi.org/10.1007/s11042-023-16874-w

Mahesh, T. R., Kumar, V. V., Vivek, V., Raghunath, K. M. K., & Madhuri, G. S. (2022). Early predictive model for breast cancer classification using blended ensemble learning. International Journal of System Assurance Engineering and Management, 15(1), 188–197. https://doi.org/10.1007/s13198-022-01696-0

Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019a). Machine learning classification techniques for breast cancer diagnosis. IOP Conference Series: Materials Science and Engineering, 495, 012033. https://doi.org/10.1088/1757-899x/495/1/012033

Polyakova, M. V., & Krylov, V. N. (2022). Data normalization methods to improve the quality of classification in the breast cancer diagnostic system. Applied Aspects of Information Technologies, 5(1), 55–63. https://doi.org/10.15276/aait.05.2022.5

Shahriyari, L. (2019). Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ datasets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma. Briefings in Bioinformatics. 2019, 20, 985–994.

Sharma, A., Goyal, D., & Mohana, R. (2024). An ensemble learning-based framework for breast cancer prediction. Decision Analytics Journal, 10, 100372. https://doi.org/10.1016/j.dajour.2023.100372

Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524

Strelcenia, E., & Prakoonwit, S. (2023). Effective feature engineering and Classification of breast cancer diagnosis: a Comparative study. BioMedInformatics, 3(3), 616–631. https://doi.org/10.3390/biomedinformatics3030042

World Health Organization. Breast cancer. 2021. https://www.who.int/news-room/fact- sheets/detail/breast-cancer

Yang, P., Wu, W., Wu, C., Shih, Y., Hsieh, C., & Hsu, J. (2021). Breast cancer recurrence prediction with ensemble methods and cost-sensitive learning. Open Medicine, 16(1), 754–768. https://doi.org/10.1515/med-2021-0282

Downloads

Published

2025-05-04

How to Cite

Olorunshola, O. E., Ebuka, O. D., Ademuwagun, A. K., Adamu-Fika, F., & Kabir, M. A. (2025). The Impact of Scaling Techniques on Breast Cancer Prediction Algorithms. Journal of Computing and Social Informatics, 4(2), 17–27. https://doi.org/10.33736/jcsi.8449.2025