MODEL COMPARISONS AMONG TESTLET RESPONSE THEORIES (TRT) ON A READING COMPREHENSION TEST

Kim, Kyungtae

MODEL COMPARISONS AMONG TESTLET RESPONSE THEORIES (TRT) ON A READING COMPREHENSION TEST

Files

Kim_mtsu_0170E_10390.pdf (2.18 MB)

Date

2015-04-08

Authors

Kim, Kyungtae

Publisher

Middle Tennessee State University

Abstract

The purpose of this study was to evaluate the strengths and weaknesses of psychometric models such as Classical Test Theory (CTT), Item Response Theory (IRT), and Testlet Response Theory (TRT) as well as test items of a fifth grade reading comprehension test with a large data set (N = 10,897). The reading comprehension test contained 22 items with 7 passages along with 4 areas of reading standards of literature (RL), reading standards of informational text (RI), reading standards of foundation skills (RF), and language standards (L) of Common Core State Standards (CCSS). The 22-item showed a good internal consistency reliability index with the Cronbach's alpha of .79. The exploratory factor analysis (EFA) confirmed that the data could be analyzed with the traditional IRT analyses because the data showed a unidimensional solution. The model comparison criteria (-2LL, AIC, and BIC) revealed that the 3PLM was the best-fitting model for the data when compared with 1PLM and 2PLM. Comparisons of the results from CTT and 3PLM addressed the advantages of IRT over CTT with more item information (a, b, c-parameter estimates) along with detailed understandings of the item parameters for specific students' ability levels. The -2LL, AIC, BIC illustrated that local item dependence (LID) among test items was minimal in the 5th grade reading comprehension test so unidimensional IRT was more appropriate than the TRT models. However, several testlet variances from the generalized TRT model indicated that the testlet effects were not negligible. The 3PLM, constrained TRT, and generalized TRT models provided consistent ability estimations with a mean of 0.00 and standard deviation of 1.00. Two item parameter estimates (a and c-parameters) except the item difficulty parameter (bi) were highly correlated among 3PLM and two TRT models. The b-parameters were associated with the estimated testlet mean. In this study, comparisons of psychometric models and test item parameters among CTT, IRT and TRTs on a reading comprehension test are meaningful for both researchers and practitioners to achieve the precise evaluation of a reading comprehension test.

Keywords

CTT, IRT, Item difficulty, Item discrimination, Psychometric models, TRT

URI

http://jewlscholar.mtsu.edu/handle/mtsu/4452

Collections

Doctoral Dissertations

Full item page