Using Entropy to Measure Text Readability in Bahasa Malaysia for Year One Students


  • Mohamad Hardyman Barawi Universiti Malaysia Sarawak
  • Siti Nabilah Mohamed Osman Universiti Malaysia Sarawak
  • Noor Fazilla Abd Yusof Universiti Teknikal Malaysia Melaka
  • Ebuka Ibeke Robert Gordon University
  • Muhibuddin Fadhli Universitas Negeri Malang



readability, reading, text analysis, text difficulty


Text readability is essential for effective learning and communication, especially for beginner readers. However, there are no known measures to calculate the readability of Bahasa Malaysia, the national language of Malaysia. This research proposes a new method based on entropy, a measure of information and uncertainty, to assess the readability of Bahasa Malaysia texts for Year One students. An experiment was conducted with six Year One students to determine the relationship between entropy and readability. The results indicated a positive correlation, suggesting that higher entropy values corresponded with lower readability for this age group. This study also revealed the need for beginner readers to focus on the text difficulty level to enhance learning.


Agrawal, R., Gollapudi, S., Kannan, A., & Kenthapadi, K. (2011). Identifying enrichment candidates in textbooks. In Proceedings of the 20th International Conference Companion on World Wide Web (pp. 483–492).

Arshad, M., Yousaf, M. M., & Sarwar, S. M. (2023). Comprehensive readability assessment of scientific learning resources. IEEE Access.

Austin, K., Orcutt, S., & Rosso, J. (2001). How people learn: Introduction to learning theories. The Learning Classroom: Theory into Practice–A Telecourse for Teacher Education and Professional Development.

Azpiazu, I. M., & Pera, M. S. (2019). Multiattentive recurrent neural network architecture for multilingual readability assessment. Transactions of the Association for Computational Linguistics, 7, 421–436.

Baker, J. R. (2020). Going beyond the readability formula: How do titles contribute to the readability of essays? International Journal of TESOL Studies, 2(1), 119–132.

Ball, D. L., & Cohen, D. K. (1996). Reform by the book: What is—or might be—the role of curriculum materials in teacher learning and instructional reform? Educational Researcher, 25(9), 6–14.

Bamford, J. (1984). Extensive reading by means of graded readers. Reading in a Foreign Language, 2(2), 218–260.

Barzillai, M., Broek, P., Schroeder, S., & Thomson, J. (2018). Learning to read in a digital world. John Benjamins Publishing Company.

Ben-Naim, A. (2019). Entropy and information theory: Uses and misuses. Entropy, 21(12), 1170.

Brammer, M. (1967). Textbook publishing. What happens in book publishing (pp. 320–349).

Brown, J. D. (1997). An EFL readability index. University of Hawai’i Working Papers in English as a Second Language, 15(2), 85–119.

Bruhn, A. L., & Hasselbring, T. S. (2013). Increasing student access to content area textbooks. Intervention in School and Clinic, 49(1), 30–38.

Chall, J. S. (1983). Stages of reading development. McGraw-Hill.

Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books.

Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.

Cover, T. M. (1999). Elements of information theory. John Wiley & Sons.

Crossley, S. A., Dufty, D. F., McCarthy, P. M., & McNamara, D. S. (2007a). Toward a new readability: A mixed model approach. In Proceedings of the Annual Meeting of the Cognitive Science Society, p. 29.

Crossley, S. A., Dufty, D. F., McCarthy, P. M., & McNamara, D. S. (2007b). Toward a new readability: A mixed model approach. In Proceedings of the Annual Meeting of the Cognitive Science Society, p. 29.

Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading, 42(3-4), 541–561.

Crossley, S., Skalicky, S., Berger, C., & Heidari, A. (2022). Assessing readability formulas in the wild. In Conference on Smart Learning Ecosystems and Regional Development, 91–101, Springer.

Crossley, S., Heintz, A., Choi, J. S., Batchelor, J., Karimi, M., & Malatinszky, A. (2023). A large-scale corpus for assessing text readability. Behavior Research Methods, 55(2), 491–507.

Davison, A., & Kantor, R. N. (1982). On the failure of readability formulas to define readable texts: A case study from adaptations. Reading Research Quarterly, 17, 187–209.

Duke, N. K., & Cartwright, K. B. (2021). The science of reading progresses: Communicating advances beyond the simple view of reading. Reading Research Quarterly, p. 56, S25–S44.

Eleyan, D., Othman, A., & Eleyan, A. (2020). Enhancing software comments readability using flesch reading ease score. Information, 11(9), 430.

Farrokhi, F., Ansarin, A. A., & Mohammadnia, Z. (2008). Preemptive focus on form: Teachers’ practices across proficiencies. Linguistics Journal, 3(2), 150–157.

Fosnot, C. T., & Perry, R. S. (1996). Constructivism: A psychological theory of learning. Constructivism: Theory, Perspectives, and Practice, 2(1), 8–33.

Fossum, V. (n.d.). Entropy, compression, and information content. Unpublished article.

Gay, L. R., Mills, G. E., & Airasian, P. W. (2011). Educational Research: Competencies for Analysis and Applications. Pearson Higher Education.

Graden, I. C. (2023). The effects of research-based strategies on reading achievement among English language learners. Doctoral Dissertations and Projects. 4640.

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-matrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202.

Greenfield, J. (2004). Readability formulas for EFL. JALT Journal, 26(1), 5–24.

Hadfi, R., & Ito, T. (2024). Structural complexity predicts consensus readability in online discussions. Social Network Analysis and Mining, 14(1), 51.

Hovious, A. S., & O’Connor, B. C. (2023). The reader as subjective entropy: A novel analysis of multimodal readability. Journal of Documentation, 79(2), 415–430.

Jian, L., Xiang, H., & Le, G. (2022). English text readability measurement based on convolutional neural network: A hybrid network model. Computational Intelligence and Neuroscience, 2022,1–9.

Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329.

Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713–3744.

Kontoyiannis, I. (1997). The complexity and entropy of literary styles. Department of Statistics, Stanford University.

Kulm, G., Roseman, J., & Treistman, M. (1999). A benchmarks-based approach to textbook evaluation. Science Books & Films, 35(4), 147–153.

Lesne, A. (2014). Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science, 24(3), e240311.

MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge University Press.

Makebo, T. H., Bachore, M. M., & Ayele, Z. A. (2022). Investigating the correlation between students’ reading fluency and comprehension. Journal of Language Teaching and Research, 13(2), 229–242.

Maqsood, S., Shahid, A., Afzal, M. T., Roman, M., Khan, Z., Nawaz, Z., & Aziz, M. H. (2022). Assessing English language sentences readability using machine learning models. PeerJ Computer Science, p. 8, e818.

Marnell, G. (2008). Measuring readability, part 1: The spirit is willing, but the Flesch is weak. Southern Communicator, 14(1), 12–16.

Martinc, M., Pollak, S., & Robnik-Šikonja, M. (2021). Supervised and unsupervised neural approaches to text readability. Computational Linguistics, 47(1), 141–179. 10.1162/coli_a_00398

McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22(3), 247–288.

Moradi, H., Grzymala-Busse, J. W., & Roberts, J. A. (1998). Entropy of English text: Experiments with humans and a machine learning system based on rough sets. Information Sciences, 104(1-2), 31–47.

Ojose, B. (2008). Applying Piaget’s theory of cognitive development to mathematics instruction. The Mathematics Educator, 18(1), 26.30

Orellana, P., Silva, M., and Iglesias, V. (2024). Students’ reading comprehension level and reading demands in teacher education programs: the elephant in the room? Frontiers in Psychology, 15, 1324055.

Papalia, D., Olds, S., & Feldman, R. (2008). Human development. McGraw-Hill.

Rafatbakhsh, E., & Ahmadi, A. (2023). Predicting the difficulty of EFL reading comprehension tests based on linguistic indices. Asian-Pacific Journal of Second and Foreign Language Education, 8(1), 41.

Rockinson-Szapkiw, A. J., Courduff, J., Carter, K., & Bennett, D. (2013). Electronic versus traditional print textbooks: A comparison study on the influence of university students’ learning. Computers & Education, 63, 259–266.

Ryu, J., & Jeon, M. (2020). An analysis of text difficulty across grades in Korean middle school English textbooks using coh-metrix. Journal of Asia TEFL, 17(3), 921.

Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30(1), 50–64.

Snowling, M. J. (2013). Early identification and interventions for dyslexia: A contemporary view. Journal of Research in Special Educational Needs, 13(1), 7–14.

Stenner, A. J. (2022). Measuring reading comprehension with the Lexile framework. In Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected papers by A. Jackson Stenner (pp. 63–88). Springer.

Tiffin-Richards, S. P., & Schroeder, S. (2015). The component processes of reading comprehension in adolescents. Learning and Individual Differences, 42, 1–9.

Titchener, M. R. (2000). A measure of information. In Proceedings DCC 2000. Data Compression Conference (pp. 353–362). IEEE.

Wang, M., & Hu, F. (2021). Applying the nltk library for Python natural language processing in corpus research. Theory and Practice in Language Studies, 11(9), 1041–1049.

Wright, B. D., & Stenner, A. J. (2022). Readability and reading ability. In Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected Papers by A. Jackson Stenner (pp. 89–107). Springer.

Wylie, J., Thomson, J., Leppänen, P., Ackerman, R., Kanniainen, L., & Prieler, T. (2018). Cognitive processes and digital reading. In M. Barzillai, J. Thomson, P. van den Broek, & S. Schroeder (Eds). Learning to read in a digital world (pp. 57-90). John Benjamins.

Zamanian, M., & Heydari, P. (2012). Readability of texts: State of the art. Theory & Practice in Language Studies, 2(1), 43–53.

Zulqarnain, M., & Saqlain, M. (2023). Text readability evaluation in higher education using CNNs. Journal of Industrial Intelligence, 1(3), 184–193.




How to Cite

Barawi, M. H., Mohamed Osman, S. N. ., Abd Yusof, N. F. ., Ibeke, E. ., & Fadhli , M. . (2024). Using Entropy to Measure Text Readability in Bahasa Malaysia for Year One Students. Journal of Cognitive Sciences and Human Development, 10(1), 103–123.