The Impact of Scaling Techniques on Breast Cancer Prediction Algorithms

Authors

Oluwaseyi Ezekiel Olorunshola Computer Science Department, Air Force Institute of Technology, Kaduna, Nigeria
Okeh Dominic Ebuka Computer Science Department, Air Force Institute of Technology, Kaduna, Nigeria
Adeniran Kolade Ademuwagun Computer Science Department, Air Force Institute of Technology, Kaduna, Nigeria
Fatimah Adamu-Fika Cyber Security Department, Air Force Institute of Technology, Kaduna, Nigeria
Muhammad Abdullahi Kabir Cyber Security Department, Air Force Institute of Technology, Kaduna, Nigeria

DOI:

https://doi.org/10.33736/jcsi.8449.2025

Keywords:

Breast cancer, Benign, Malignant, Hyper parameter tuning, Esemble

Abstract

Breast cancer develops when the genetic material of breast cells undergoes mutations, causing the cells to grow uncontrollably and form tumors. Efforts however have been made to combat it by developing machine learning models to help clinicians with early detection. This study investigates the impact of scaling techniques on the performance of algorithms used for breast cancer prediction. Two scaling approaches were compared with models utilizing the raw, unscaled data. The result revealed that the different scaling techniques had minimal effect on the prediction performance after Hyperparameter tuning. This suggests that for the specific dataset and algorithms used, potential sources of bias were analyzed and the classifiers adapted their internal parameters to compensate for the difference in feature scaling. The model's performance was evaluated using four metrics which are Accuracy, Recall, Precision, and F1-score through the 5-fold cross-validation. The results of this study showed that the Random Forest an ensemble model outperformed all other individual classifier after hyperparameter tuning was performed, it had an Accuracy value of 0.9578, a Recall value of 0.9297, a Precision of 0.9571, and an F1-score of 0.9425.

References

Ahsan, M., Mahmud, M., Saha, P., Gupta, K., & Siddique, Z. (2021). Effect of data scaling methods on machine learning algorithms and model performance. Technologies, 9(3), 52. https://doi.org/10.3390/technologies9030052

Ambarwari, A.; Adrian, Q.J. & Herdiyeni, Y. (2020) Analysis of the Effect of Data Scaling on the Performance of the Machine Learning Algorithm for Plant Identification. Jurnal Resti (Rekayasa Sistem Dan Teknologi Informasi) 2020, 4, 117–122.

Balabaeva, K. & Kovalchuk, S. (2019). Comparison of Temporal and Non-Temporal Features Effect on Machine Learning Models Quality and Interpretability for Chronic Heart Failure Patients. Procedia Computer Science. 2019, 156, 87–96.

Chaurasiya, S., & Rajak, R. (2022). Comparative analysis of machine learning algorithms in breast cancer classification. Research Square (Research Square). https://doi.org/10.21203/rs.3.rs-1772158/v1

Das, A. K., Biswas, S. K., Mandal, A., Bhattacharya, A., & Sanyal, S. (2024). Machine Learning based Intelligent System for Breast Cancer Prediction (MLISBCP). Expert Systems with Applications, 242, 122673. https://doi.org/10.1016/j.eswa.2023.122673

Elsadig, M. A., Altigani, A., & Elshoush, H. T. (2023). Breast cancer detection using machine learning approaches: a comparative study. International Journal of Electrical and Computer Engineering, 13(1), 736. https://doi.org/10.11591/ijece.v13i1.pp736-745

Fatima, N., Liu, L., Hong, S., & Ahmed, H. (2020). Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access, 8, 150360–150376. https://doi.org/10.1109/access.2020.3016715

Fonseca, M. M., Lamb, L. R., Verma, R., Ogunkinle, O., & Seely, J. M. (2019). Breast pain and cancer: should we continue to work-up isolated breast pain? Breast Cancer Research and Treatment, 177(3), 619–627. https://doi.org/10.1007/s10549-019-05354-1

Halim, K. N. A., Jaya, A. S. M., & Fadzil, A. F. A. (2020). Data Pre-Processing Algorithm for Neural Network Binary Classification model in Bank Tele-Marketing. International Journal of Innovative Technology and Exploring Engineering, 9(3), 272–277 https://doi.org/10.35940/ijitee.c8472.019320

Harbeck, N., Penault-Llorca, F., Cortes, J., Gnant, M., Houssami, N., Poortmans, P., Ruddy, K., Tsang, J., & Cardoso, F. (2019). Breast cancer. Nature Reviews. Disease Primers, 5(1). https://doi.org/10.1038/s41572-019-0111-2

Huang, Y., Zeng, P., & Zhong, C. (2024). Classifying breast cancer subtypes on multi-omics data via sparse canonical correlation analysis and deep learning. BMC Bioinformatics, 25(1). https://doi.org/10.1186/s12859-024-05749-y

Jaiswal, V., Suman, P., & Bisen, D. (2023). An improved ensembling techniques for prediction of breast cancer tissues. Multimedia Tools and Applications, 83(11), 31975–32000. https://doi.org/10.1007/s11042-023-16949-8

Katsura, C., Ogunmwonyi, I., Kankam, H. K., & Saha, S. (2022). Breast cancer: presentation, investigation and management. British Journal of Hospital Medicine, 83(2), 1–7. https://doi.org/10.12968/hmed.2021.0459

Laghmati, S., Hamida, S., Hicham, K., Cherradi, B., & Tmiri, A. (2023). An improved breast cancer disease prediction system using ML and PCA. Multimedia Tools and Applications, 83(11), 33785–33821. https://doi.org/10.1007/s11042-023-16874-w

Mahesh, T. R., Kumar, V. V., Vivek, V., Raghunath, K. M. K., & Madhuri, G. S. (2022). Early predictive model for breast cancer classification using blended ensemble learning. International Journal of System Assurance Engineering and Management, 15(1), 188–197. https://doi.org/10.1007/s13198-022-01696-0

Omondiagbe, D. A., Veeramani, S., & Sidhu, A. S. (2019a). Machine learning classification techniques for breast cancer diagnosis. IOP Conference Series: Materials Science and Engineering, 495, 012033. https://doi.org/10.1088/1757-899x/495/1/012033

Polyakova, M. V., & Krylov, V. N. (2022). Data normalization methods to improve the quality of classification in the breast cancer diagnostic system. Applied Aspects of Information Technologies, 5(1), 55–63. https://doi.org/10.15276/aait.05.2022.5

Shahriyari, L. (2019). Effect of normalization methods on the performance of supervised learning algorithms applied to HTSeq-FPKM-UQ datasets: 7SK RNA expression as a predictor of survival in patients with colon adenocarcinoma. Briefings in Bioinformatics. 2019, 20, 985–994.

Sharma, A., Goyal, D., & Mohana, R. (2024). An ensemble learning-based framework for breast cancer prediction. Decision Analytics Journal, 10, 100372. https://doi.org/10.1016/j.dajour.2023.100372

Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524. https://doi.org/10.1016/j.asoc.2019.105524

Strelcenia, E., & Prakoonwit, S. (2023). Effective feature engineering and Classification of breast cancer diagnosis: a Comparative study. BioMedInformatics, 3(3), 616–631. https://doi.org/10.3390/biomedinformatics3030042

World Health Organization. Breast cancer. 2021. https://www.who.int/news-room/fact- sheets/detail/breast-cancer

Yang, P., Wu, W., Wu, C., Shih, Y., Hsieh, C., & Hsu, J. (2021). Breast cancer recurrence prediction with ensemble methods and cost-sensitive learning. Open Medicine, 16(1), 754–768. https://doi.org/10.1515/med-2021-0282

Downloads

Published

2025-05-04

How to Cite

Olorunshola, O. E., Ebuka, O. D., Ademuwagun, A. K., Adamu-Fika, F., & Kabir, M. A. (2025). The Impact of Scaling Techniques on Breast Cancer Prediction Algorithms. Journal of Computing and Social Informatics, 4(2), 17–27. https://doi.org/10.33736/jcsi.8449.2025

Download Citation

Issue

Vol. 4 No. 2 (2025): Continuous Publication (in progress)

Section

Articles

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Copyright Transfer Statement for Journal

1) In signing this statement, the author(s) grant UNIMAS Publisher an exclusive license to publish their original research papers. The author(s) also grant UNIMAS Publisher permission to reproduce, recreate, translate, extract or summarise, and to distribute and display in any forms, formats, and media. The author(s) can reuse their papers in their future printed work without first requiring permission from UNIMAS Publisher, provided that the author(s) acknowledge and reference publication in the Journal.

2) For open access articles, the author(s) agree that their articles published under UNIMAS Publisher are distributed under the terms of the CC-BY-NC-SA (Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License) which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original work of the author(s) is properly cited.

3) For subscription articles, the author(s) agree that UNIMAS Publisher holds copyright, or an exclusive license to publish. Readers or users may view, download, print, and copy the content, for academic purposes, subject to the following conditions of use: (a) any reuse of materials is subject to permission from UNIMAS Publisher; (b) archived materials may only be used for academic research; (c) archived materials may not be used for commercial purposes, which include but not limited to monetary compensation by means of sale, resale, license, transfer of copyright, loan, etc.; and (d) archived materials may not be re-published in any part, either in print or online.

4) The author(s) is/are responsible to ensure his or her or their submitted work is original and does not infringe any existing copyright, trademark, patent, statutory right, or propriety right of others. Corresponding author(s) has (have) obtained permission from all co-authors prior to submission to the journal. Upon submission of the manuscript, the author(s) agree that no similar work has been or will be submitted or published elsewhere in any language. If submitted manuscript includes materials from others, the authors have obtained the permission from the copyright owners.

5) In signing this statement, the author(s) declare(s) that the researches in which they have conducted are in compliance with the current laws of the respective country and UNIMAS Journal Publication Ethics Policy. Any experimentation or research involving human or the use of animal samples must obtain approval from Human or Animal Ethics Committee in their respective institutions. The author(s) agree and understand that UNIMAS Publisher is not responsible for any compensational claims or failure caused by the author(s) in fulfilling the above-mentioned requirements. The author(s) must accept the responsibility for releasing their materials upon request by Chief Editor or UNIMAS Publisher.

6) The author(s) should have participated sufficiently in the work and ensured the appropriateness of the content of the article. The author(s) should also agree that he or she has no commercial attachments (e.g. patent or license arrangement, equity interest, consultancies, etc.) that might pose any conflict of interest with the submitted manuscript. The author(s) also agree to make any relevant materials and data available upon request by the editor or UNIMAS Publisher.