Using Entropy to Measure Text Readability in Bahasa Malaysia for Year One Students

Authors

Mohamad Hardyman Barawi Universiti Malaysia Sarawak
Siti Nabilah Mohamed Osman Universiti Malaysia Sarawak
Noor Fazilla Abd Yusof Universiti Teknikal Malaysia Melaka
Ebuka Ibeke Robert Gordon University
Muhibuddin Fadhli Universitas Negeri Malang

DOI:

https://doi.org/10.33736/jcshd.6817.2024

Keywords:

readability, reading, text analysis, text difficulty

Abstract

Text readability is essential for effective learning and communication, especially for beginner readers. However, there are no known measures to calculate the readability of Bahasa Malaysia, the national language of Malaysia. This research proposes a new method based on entropy, a measure of information and uncertainty, to assess the readability of Bahasa Malaysia texts for Year One students. An experiment was conducted with six Year One students to determine the relationship between entropy and readability. The results indicated a positive correlation, suggesting that higher entropy values corresponded with lower readability for this age group. This study also revealed the need for beginner readers to focus on the text difficulty level to enhance learning.

References

Agrawal, R., Gollapudi, S., Kannan, A., & Kenthapadi, K. (2011). Identifying enrichment candidates in textbooks. In Proceedings of the 20th International Conference Companion on World Wide Web (pp. 483–492).

Arshad, M., Yousaf, M. M., & Sarwar, S. M. (2023). Comprehensive readability assessment of scientific learning resources. IEEE Access.

Austin, K., Orcutt, S., & Rosso, J. (2001). How people learn: Introduction to learning theories. The Learning Classroom: Theory into Practice–A Telecourse for Teacher Education and Professional Development.

Azpiazu, I. M., & Pera, M. S. (2019). Multiattentive recurrent neural network architecture for multilingual readability assessment. Transactions of the Association for Computational Linguistics, 7, 421–436. https://doi.org/10.1162/tacl_a_00278

Baker, J. R. (2020). Going beyond the readability formula: How do titles contribute to the readability of essays? International Journal of TESOL Studies, 2(1), 119–132. https://doi.org/10.46451/ijts.2020.06.08

Ball, D. L., & Cohen, D. K. (1996). Reform by the book: What is—or might be—the role of curriculum materials in teacher learning and instructional reform? Educational Researcher, 25(9), 6–14.

Bamford, J. (1984). Extensive reading by means of graded readers. Reading in a Foreign Language, 2(2), 218–260.

Barzillai, M., Broek, P., Schroeder, S., & Thomson, J. (2018). Learning to read in a digital world. John Benjamins Publishing Company.

Ben-Naim, A. (2019). Entropy and information theory: Uses and misuses. Entropy, 21(12), 1170.

Brammer, M. (1967). Textbook publishing. What happens in book publishing (pp. 320–349).

Brown, J. D. (1997). An EFL readability index. University of Hawai’i Working Papers in English as a Second Language, 15(2), 85–119.

Bruhn, A. L., & Hasselbring, T. S. (2013). Increasing student access to content area textbooks. Intervention in School and Clinic, 49(1), 30–38.

Chall, J. S. (1983). Stages of reading development. McGraw-Hill.

Chall, J. S., & Dale, E. (1995). Readability revisited: The new Dale-Chall readability formula. Brookline Books.

Cherry, E. C. (1953). Some experiments on the recognition of speech, with one and with two ears. The Journal of the Acoustical Society of America, 25(5), 975–979.

Cover, T. M. (1999). Elements of information theory. John Wiley & Sons.

Crossley, S. A., Dufty, D. F., McCarthy, P. M., & McNamara, D. S. (2007a). Toward a new readability: A mixed model approach. In Proceedings of the Annual Meeting of the Cognitive Science Society, p. 29.

Crossley, S. A., Dufty, D. F., McCarthy, P. M., & McNamara, D. S. (2007b). Toward a new readability: A mixed model approach. In Proceedings of the Annual Meeting of the Cognitive Science Society, p. 29.

Crossley, S. A., Skalicky, S., & Dascalu, M. (2019). Moving beyond classic readability formulas: New methods and new models. Journal of Research in Reading, 42(3-4), 541–561. https://doi.org/10.1111/1467-9817.12283

Crossley, S., Skalicky, S., Berger, C., & Heidari, A. (2022). Assessing readability formulas in the wild. In Conference on Smart Learning Ecosystems and Regional Development, 91–101, Springer. https://doi.org/10.1007/978-981-19-5240-1_6

Crossley, S., Heintz, A., Choi, J. S., Batchelor, J., Karimi, M., & Malatinszky, A. (2023). A large-scale corpus for assessing text readability. Behavior Research Methods, 55(2), 491–507. https://doi.org/10.3758/s13428-022-01802-x

Davison, A., & Kantor, R. N. (1982). On the failure of readability formulas to define readable texts: A case study from adaptations. Reading Research Quarterly, 17, 187–209. https://doi.org/10.2307/747483

Duke, N. K., & Cartwright, K. B. (2021). The science of reading progresses: Communicating advances beyond the simple view of reading. Reading Research Quarterly, p. 56, S25–S44. https://doi.org/10.1002/rrq.411

Eleyan, D., Othman, A., & Eleyan, A. (2020). Enhancing software comments readability using flesch reading ease score. Information, 11(9), 430. https://doi.org/10.3390/info11090430

Farrokhi, F., Ansarin, A. A., & Mohammadnia, Z. (2008). Preemptive focus on form: Teachers’ practices across proficiencies. Linguistics Journal, 3(2), 150–157.

Fosnot, C. T., & Perry, R. S. (1996). Constructivism: A psychological theory of learning. Constructivism: Theory, Perspectives, and Practice, 2(1), 8–33.

Fossum, V. (n.d.). Entropy, compression, and information content. Unpublished article. https://a2957a73-a-62cb3a1a-ssites.googlegroups.com/site/vfossum/entropy_explanation.pdf.

Gay, L. R., Mills, G. E., & Airasian, P. W. (2011). Educational Research: Competencies for Analysis and Applications. Pearson Higher Education.

Graden, I. C. (2023). The effects of research-based strategies on reading achievement among English language learners. Doctoral Dissertations and Projects. 4640. https://digitalcommons.liberty.edu/doctoral/4640

Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-matrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202.

Greenfield, J. (2004). Readability formulas for EFL. JALT Journal, 26(1), 5–24.

Hadfi, R., & Ito, T. (2024). Structural complexity predicts consensus readability in online discussions. Social Network Analysis and Mining, 14(1), 51.

Hovious, A. S., & O’Connor, B. C. (2023). The reader as subjective entropy: A novel analysis of multimodal readability. Journal of Documentation, 79(2), 415–430.

Jian, L., Xiang, H., & Le, G. (2022). English text readability measurement based on convolutional neural network: A hybrid network model. Computational Intelligence and Neuroscience, 2022,1–9. https://doi.org/10.1155/2022/6984586)

Just, M. A., & Carpenter, P. A. (1980). A theory of reading: From eye fixations to comprehension. Psychological Review, 87(4), 329.

Khurana, D., Koli, A., Khatter, K., & Singh, S. (2023). Natural language processing: State of the art, current trends and challenges. Multimedia Tools and Applications, 82(3), 3713–3744.

Kontoyiannis, I. (1997). The complexity and entropy of literary styles. Department of Statistics, Stanford University.

Kulm, G., Roseman, J., & Treistman, M. (1999). A benchmarks-based approach to textbook evaluation. Science Books & Films, 35(4), 147–153.

Lesne, A. (2014). Shannon entropy: A rigorous notion at the crossroads between probability, information theory, dynamical systems and statistical physics. Mathematical Structures in Computer Science, 24(3), e240311. https://doi.org/10.1017/S0960129512000783

MacKay, D. J. (2003). Information theory, inference and learning algorithms. Cambridge University Press.

Makebo, T. H., Bachore, M. M., & Ayele, Z. A. (2022). Investigating the correlation between students’ reading fluency and comprehension. Journal of Language Teaching and Research, 13(2), 229–242. https://doi.org/10.17507/jltr.1302.02

Maqsood, S., Shahid, A., Afzal, M. T., Roman, M., Khan, Z., Nawaz, Z., & Aziz, M. H. (2022). Assessing English language sentences readability using machine learning models. PeerJ Computer Science, p. 8, e818. https://doi.org/10.7717/peerj-cs.818

Marnell, G. (2008). Measuring readability, part 1: The spirit is willing, but the Flesch is weak. Southern Communicator, 14(1), 12–16.

Martinc, M., Pollak, S., & Robnik-Šikonja, M. (2021). Supervised and unsupervised neural approaches to text readability. Computational Linguistics, 47(1), 141–179. https://doi.org/ 10.1162/coli_a_00398

McNamara, D. S., & Kintsch, W. (1996). Learning from texts: Effects of prior knowledge and text coherence. Discourse Processes, 22(3), 247–288.

Moradi, H., Grzymala-Busse, J. W., & Roberts, J. A. (1998). Entropy of English text: Experiments with humans and a machine learning system based on rough sets. Information Sciences, 104(1-2), 31–47.

Ojose, B. (2008). Applying Piaget’s theory of cognitive development to mathematics instruction. The Mathematics Educator, 18(1), 26.30

Orellana, P., Silva, M., and Iglesias, V. (2024). Students’ reading comprehension level and reading demands in teacher education programs: the elephant in the room? Frontiers in Psychology, 15, 1324055. https://doi.org/10.3389/fpsyg.2024.1324055

Papalia, D., Olds, S., & Feldman, R. (2008). Human development. McGraw-Hill.

Rafatbakhsh, E., & Ahmadi, A. (2023). Predicting the difficulty of EFL reading comprehension tests based on linguistic indices. Asian-Pacific Journal of Second and Foreign Language Education, 8(1), 41. https://doi.org/10.1186/s40862-023-00214-4

Rockinson-Szapkiw, A. J., Courduff, J., Carter, K., & Bennett, D. (2013). Electronic versus traditional print textbooks: A comparison study on the influence of university students’ learning. Computers & Education, 63, 259–266. https://doi.org/10.1016/j.compedu.2012.11.022

Ryu, J., & Jeon, M. (2020). An analysis of text difficulty across grades in Korean middle school English textbooks using coh-metrix. Journal of Asia TEFL, 17(3), 921. https://doi.org/10.18823/asiatefl.2020.17.3.11.921

Shannon, C. E. (1951). Prediction and entropy of printed English. Bell System Technical Journal, 30(1), 50–64.

Snowling, M. J. (2013). Early identification and interventions for dyslexia: A contemporary view. Journal of Research in Special Educational Needs, 13(1), 7–14. https://doi.org/10.1111/j.1471-3802.2012.01262.x

Stenner, A. J. (2022). Measuring reading comprehension with the Lexile framework. In Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected papers by A. Jackson Stenner (pp. 63–88). Springer.

Tiffin-Richards, S. P., & Schroeder, S. (2015). The component processes of reading comprehension in adolescents. Learning and Individual Differences, 42, 1–9. https://doi.org/10.1016/j.lindif.2015.07.016

Titchener, M. R. (2000). A measure of information. In Proceedings DCC 2000. Data Compression Conference (pp. 353–362). IEEE.

Wang, M., & Hu, F. (2021). Applying the nltk library for Python natural language processing in corpus research. Theory and Practice in Language Studies, 11(9), 1041–1049. https://doi.org/10.17507/tpls.1109.09

Wright, B. D., & Stenner, A. J. (2022). Readability and reading ability. In Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement: Selected Papers by A. Jackson Stenner (pp. 89–107). Springer.

Wylie, J., Thomson, J., Leppänen, P., Ackerman, R., Kanniainen, L., & Prieler, T. (2018). Cognitive processes and digital reading. In M. Barzillai, J. Thomson, P. van den Broek, & S. Schroeder (Eds). Learning to read in a digital world (pp. 57-90). John Benjamins.

Zamanian, M., & Heydari, P. (2012). Readability of texts: State of the art. Theory & Practice in Language Studies, 2(1), 43–53. https://doi.org/10.4304/tpls.2.1.43-53

Zulqarnain, M., & Saqlain, M. (2023). Text readability evaluation in higher education using CNNs. Journal of Industrial Intelligence, 1(3), 184–193. https://doi.org/10.56578/jii010305

Downloads

Published

2024-03-31

How to Cite

Barawi, M. H., Mohamed Osman, S. N. ., Abd Yusof, N. F. ., Ibeke, E. ., & Fadhli , M. . (2024). Using Entropy to Measure Text Readability in Bahasa Malaysia for Year One Students. Journal of Cognitive Sciences and Human Development, 10(1), 103–123. https://doi.org/10.33736/jcshd.6817.2024

Download Citation

Issue

Vol. 10 No. 1 (2024): Journal of Cognitive Sciences and Human Development

Section

Articles

License

Copyright Transfer Statement for Journal

1) In signing this statement, the author(s) grant UNIMAS Publisher an exclusive license to publish their original research papers. The author(s) also grant UNIMAS Publisher permission to reproduce, recreate, translate, extract or summarize, and to distribute and display in any forms, formats, and media. The author(s) can reuse their papers in their future printed work without first requiring permission from UNIMAS Publisher, provided that the author(s) acknowledge and reference publication in the Journal.

2) For open access articles, the author(s) agree that their articles published under UNIMAS Publisher are distributed under the terms of the CC-BY-NC-SA (Creative Commons Attribution-Non Commercial-Share Alike 4.0 International License) which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original work of the author(s) is properly cited.

3) The author(s) is/are responsible to ensure his or her or their submitted work is original and does not infringe any existing copyright, trademark, patent, statutory right, or propriety right of others. Corresponding author(s) has (have) obtained permission from all co-authors prior to submission to the journal. Upon submission of the manuscript, the author(s) agree that no similar work has been or will be submitted or published elsewhere in any language. If submitted manuscript includes materials from others, the authors have obtained the permission from the copyright owners.

4) In signing this statement, the author(s) declare(s) that the researches in which they have conducted are in compliance with the current laws of the respective country and UNIMAS Journal Publication Ethics Policy. Any experimentation or research involving human or the use of animal samples must obtain approval from Human or Animal Ethics Committee in their respective institutions. The author(s) agree and understand that UNIMAS Publisher is not responsible for any compensational claims or failure caused by the author(s) in fulfilling the above-mentioned requirements. The author(s) must accept the responsibility for releasing their materials upon request by Chief Editor or UNIMAS Publisher.

5) The author(s) should have participated sufficiently in the work and ensured the appropriateness of the content of the article. The author(s) should also agree that he or she has no commercial attachments (e.g. patent or license arrangement, equity interest, consultancies, etc.) that might pose any conflict of interest with the submitted manuscript. The author(s) also agree to make any relevant materials and data available upon request by the editor or UNIMAS Publisher.