Using Classical Test and Item Response Theories to Evaluate Psychometric Quality of Teacher-Made Test in Ghana

  • Paul Kwame Butakor University of Ghana, Ghana
Keywords: Item difficulty, discrimination indices, teacher-made test, classical test theory, item response theory

Abstract

Teaching, learning, and assessment are key concepts in education and their relationship can be seen as participants of a three-legged race. In this regard, classroom assessment practices such as teacher-made tests are important and meaningful when they support students’ learning. The purpose of this study was to establish the psychometric quality of a teacher-made mathematics test item used in one of the Senior High Schools in Ghana. This study employed quantitative descriptive design where 400 selected students' responses to a teacher-made Mathematics test were collected and analyzed through various psychometric techniques. The results showed that the Mathematics test had low but acceptable reliability coefficient of 0.61. Also out of the 40 multiple-choice items, 26 were of satisfactory difficulty levels with only one test item found to be too difficult and three test items being too easy. The findings of the discrimination indices suggest that 25 test items had bad or weak discrimination indices and four items showed negative discrimination indices. The study further indicated that 30.8 percent of the options were functioning distractors whereas the majority of the options (69.2%) were non-functioning distractors.It is therefore recommended that inservice training on effective ways of developing test items should be organized regularly for in-service teachers to help improve of the quality of teacher-made tests across Senior High Schools in Ghana.

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

PlumX Statistics

References

1. Adegoke, B. A. (2013). Comparison of item statistics of physics achievement test using classical test and item response theory frameworks. Journal of Education and Practice, 4(22), 87-96. https://www.iiste.org/Journals/index.php/JEP/article/view/8331
2. Amoako, I. (2018). Formative assessment practices among distanceeducation tutors in Ghana. African Journal of Teacher Education, 7(3), 22-36. http://doi.org/10.21083/ajote.v7i3.4325
3. Asare, K. (2015). Exploring the kindergarten teachers’ assessment practices in Ghana. Developing Country Studies, 5(8), 2225-0565.
4. Attali, Y., & Fraenkel, T. (2000). The point‐biserial as a discrimination index for distractors in multiple‐choice items: Deficiencies in usage and an alternative. Journal of Educational Measurement, 37(1), 77-86.
5. Awoniyi, F. C. (2016). The understanding of senior high school mathematics teachers of school-based assessment and its challenges in the Cape coast metropolis. British Journal of Education, 4(10), 22-38.
6. Beziat, T. L., & Coleman, B. K. (2015). Classroom assessment literacy: Evaluating pre-service teachers. The Researcher, 27(1), 25-30. http://www.nrmera.org/wp-content/uploads/2016/02/Beziat.and_.Coleman.2015.Vol_.27.Issue_.1.pdf
7. Bichi, A. A. (2016). Classical test theory: An introduction to linear modelling approach to test and item analysis. International Journal for Social Studies, 2(9), 27-33. https://doi.org/10.26643/ijss.v2i9.6690
8. Black, P., & Wiliam, D. (2010). Inside the black box: Raising standards through classroom assessment. Phi Delta Kappan, 92(1), 81-90.
9. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51. https://doi.org/10.1007/BF02291411.
10. Cai, L., Choi, K., Hansen, M., & Harrell, L. (2016). Item response theory. Annual Review of Statistics and Its Application, 3, 297-321. https://doi/10.1146/annurev-statistics-041715-033702
11. Clarke, M. (2011). Framework for building an effective student assessment system: READ/SABER Working Paper. World Bank.
12. Courville, T. G. (2004). An Empirical Comparison of Item Response Theory and Classical Test Theory Item/Person Statistics. Unpublished Ph.D Dissertation, Texas A & M University.
13. Crehan K.D., Haladyna T. M., & Brewer B. W. (1993). Use of an inclusive option and the optimal number of options for multiple-choice items. Educational and psychological measurement. Sage Journals, 53(1): 241-247. https://doi.org/10.1177/0013164493053001027
14. Davies, M., & Dempsey, I. (2011). Australian policies to support inclusive assessments. In Handbook of accessible achievement tests for all students (pp. 83-96). Springer, New York, NY.
15. Downer, J. T., Booren, L. M., Lima, O. K., Luckner, A. E., & Pianta, R. C. (2010). The individualized classroom assessment scoring system (inCLASS): Preliminary reliability and validity of a system for observing preschoolers’ competence in classroom interactions. Early childhood research quarterly, 25(1), 1-16. https://doi.org/10.1016/j.ecresq.2009.08.004.
16. Dunn, K. E., & Mulvenon, S. W. (2009). A critical review of research on formative assessments: The limited scientific evidence of the impact of formative assessments in education. Practical Assessment, Research, and Evaluation, 14(1), 7. https://doi.org/10.7275/jg4h-rb87
17. Ebel, R. L. (1979). The Essentials of educational measurement (3rd ed.). Englewood Cliffs, NJ: Prentice Hall.https://doi.org/10.1177/001316448104100337
18. Gareis, C. R., & Grant, L. W. (2015). Teacher-made assessments: How to connect curriculum, instruction, and student learning. Routledge.
19. Haladyna T. M., & Downing S. M. (1989). A taxanomy of multiple-choice item-writing rules. Applied measurement in education journal, 2(1):37-50. https://doi.org/10.1207/s15324818ame0201_3.
20. Haladyna, T. M. (2004). Developing and validating multiple-choice test items. Routledge. https://doi.org/10.4324/9780203825945.
21. Haladyna, T. M. (2016). Item analysis for selected-response test items. In S. Lane, M. Raymond, & T. Haladyna (Eds.), Handbook of test development (2nd ed., pp. 392–409).
22. Haladyna, T. M., & Downing, S. M. (1993). How many options is enough for a multiple-choice test item?. Educational and psychological measurement, 53(4), 999-1010. https://doi.org/10.1177/0013164493053004013.
23. Haladyna, T. M., & Rodriguez, M. C. (2013). Developing and validating test items. Routledge. https://doi.org/10.4324/9780203850381
24. Hambleton, R. K., & Russell, W. J. (1993). Comparison of Classical Test Theory and Item Response Theory and their Applications to Test Development. Educational Measurement: Issues and Practice, 12(3), 38-47.
25. Heritage, M. (2010). Formative assessment: Making it happen in the classroom. Corwin Press. http://dx.doi.org/10.4135/9781452219493.
26. Hockings, C. (2010). Inclusive learning and teaching in higher education: a synthesis of research. York: Higher Education Academy.
27. Hotiu, A. (2006). The relationship between item difficulty and discrimination indices in a physical science course (MSc thesis). Florida Atlantic university, BocoRaton, FL. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.599.5172&rep=rep1&type=pdf
28. Kaplan, R. M., & Saccuzo, D. P. (1997). Psychological Testing: Principles, Applications and Issues. Pacific Grove, CA: Brooks/Cole Pub. Co.
29. Magno, C. (2009). Demonstrating the difference between classical test theory and item response theory using derived test data. The International Journal of Educational and Psychological Assessment,1(1), 1-11.
30. Martin, C. R., & Larkin, D. (2018). Minimum sample size requirements for a validation study of the Schizophrenia Quality of Life Scale-Revision 4 (SQLS-R4). Journal of Basic and Clinical Health Sciences, 2(3), 76+. https://doi.org/10.30621/jbachs.2018.411
31. Matlock-Hetzel, S. (1997). Basic Concepts in Item and Test Analysis. Texas A & M University, USA. https://eric.ed.gov/?id=ED406441
32. Mensah, F. (2014). Evaluation of Social Studies Students’ learning Using Formative Assessment in Selected Colleges of Education in Ghana. British Journal of Education, 2(1), 39-48.
33. Mills, E. D., & Mereku, D. K. (2016). Students’ performance on the Ghanaian junior high school mathematics national minimum standards in the Efutu Municipality. African Journal of Educational Studies in Mathematics and Sciences, 12, 25-34.
34. Mozaffer, R. H., & Farhan, J. (2012). Analysis of one-best MCQs: The difficulty index, discrimination index and distractor efficiency. Journal of Pakistan Medical Association, 62,142–147. http://jpma.org.pk/PdfDownload/3255.pdf
35. Odili, J. N. (2010). Effect of language manipulation on differential item functioning of test items in Biology in a multicultural setting. Journal of Educational assessment in Africa, 4-268.
36. O'Malley, P. (2010, November). Students evaluation: Steps for creating teacher-made test. In Assessment Group Conference-School programme. Maryland: Kennedy Krieger Institute.
37. Quaigrain, K., & Arhin, A. K. (2017). Using reliability and item analysis to evaluate a teacher-developed test in educational measurement and evaluation. Cogent Education, 4(1), 1301013. https://doi.org/10.1080/2331186X.2017.1301013
38. Ravela, P., Arregui, P., Valverde, G., Wolfe, R., Ferrer, G., Rizo, F. M., Aylwin, M., & Wolff, L. (2009). The Educational Assessments that Latin America Needs. Washington, DC: PREAL.
39. Rodriguez, M., C. (2005). Three options are optimal for multiple choice items: a meta-analysis of 80 years of research. Educational Measurement Issues and Practice, 24(2),3-13. https://doi.org/10.1111/j.1745-3992.2005.00006.x
40. Salvia, J., Ysseldyke, J., & Witmer, S. (2012). Assessment: In special and inclusive education. Cengage Learning. https://doi.org/10.1177/0731948719826296
41. Samejima, F. (1979). A New Family of Models for the Multiple-Choice Item. Tennessee Univ Knoxville Dept. of Psychology. https://doi.org/10.21236/ada080350
42. Tarrant, M., Ware J., & Mohammed, A. M. (2009). An assessment of functioning and non- functioning distractors in multiple-choice questions: A descriptive analysis, BMC Medical education, 9(40), 2-20. https://doi.org/10.1186/1472-6920-9-40
43. Tendeiro, J. N. (2017). The lz (p)* person-fit statistic in an unfolding model context. Applied psychological measurement, 41(1), 44-59. https://doi.org/10.1177/0146621616669336.
44. World Bank (2013). System Assessment Benchmarking for Education Results: Ghana SABER Country Report. Available at http://wbgfiles.worldbank.org/documents/hdn/ed/saber/supporting_doc/CountryReport s/SAS/SABER_SA_Ghana_CR_Final_2013.pdf.
45. Zubairi, A. M., & Kassim, N. L. A. (2006). Classical and Rasch Analysis of Dichotomously Scored Reading Comprehension Test Items. Malaysian Journal of ELT Research, 2, 1-20. https://www.researchgate.net/publication/254504568_Classical_And_Rasch_Analyses_Of_Dichotomously_Scored_Reading_Comprehension_Test_Items
Published
2022-01-31
How to Cite
Butakor, P. K. (2022). Using Classical Test and Item Response Theories to Evaluate Psychometric Quality of Teacher-Made Test in Ghana. European Scientific Journal, ESJ, 18(1), 139. https://doi.org/10.19044/esj.2022.v18n1p139
Section
ESJ Social Sciences