ISSN : 1229-067X
This study analyzes the three different estimation algorithms for recovering item parameters for the compensatory multidimensional IRT (MIRT) models. In particular, two- and four-dimensional models were investigated with different degrees of correlation between latent traits. The standards such as bias, standard error, and root mean square error were used to evaluate the recovery of item parameters for each program. The results indicated that in most conditions, Metropolis-Hasting Robbins-Monro (MH-RM) outperformed full information item factor analysis (FIIFA) and bivariate information item factor analysis (BIIFA) for a-parameters except for the independent and very low inter-trait correlation conditions where BIIFA outperformed the other algorithms. However, the MH-RM algorithm consistently produced the highest empirical standard errors compared to the other two methods for all conditions. FIIFA performed at a higher standard than BIIFA for a-parameters with moderately correlated latent traits. BIIFA is more suitable for a-parameters, especially when the levels of latent traits' independence or correlation are very low, and it is more suitable for d-parameters regardless of inter-trait correlations in the four-dimensional models. Overall, three estimation methods provided more accurate a- and d-parameter as the number of examinees increased, and less accurate a-parameter occurred as the inter-trait correlation increased. The inter-trait correlation condition did not have a dramatic impact on the recovery of d-parameter across all three algorithms.
Béguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66, 541-562.
Bock, R. D., & Aitkin M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika. 46, 443-459.
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full-information item factor analysis. Applied Psychological Measurement, 12, 261-280.
Bock, R. D., Gibbons, R., Schilling, S. G., Muraki, E., Wilson, D. T., & Wood, R. (2003). TESTFACT 4.0 [Computer software and manual]. Lincolnwood, IL: Scientific Software International.
Bolt, D. M. (2005). Limited and full-information IRT estimation. In A. Maydeu-Olivares & J. McArdle (Eds.), Contemporary psychometrics (pp. 27-71), Mahwah, NJ: Erlbaum.
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis- Hastings Robbins-Monro algorithm. Psychometrika, 75(1), 33-57.
Cai, L. (2012). flexMIRT: Flexible multilevel item factor analysis and test scoring [Computer software]. Chapel Hill, NC: Vector Psychometric Group, LLC.
Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for Windows [Computer software]. Lincolnwood, IL: Scientific Software International.
Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1-29. URL http://www.jstatsoft.org/v48/i06.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32.
DeMars, C. E. (2007). “Guessing” parameter estimates for multidimensional item response theory models, Educational and Psychological Measurement, 67, 433-446.
Finch, H. (2010). Item parameter estimation for the MIRT model: Bias and precision of confirmatory factor analysis-based models. Applied Psychological Measurement, 34, 10-26.
Finch, H. (2011). Multidimensional item response theory parameter estimation with nonsimple structure items. Applied Psychological Measurement, 35, 67-82.
Finger, M. S. (2001). A comparison of fullinformation and unweighted least-squares limitedinformation methods used with the 2-parameter normal ogive model. Unpublished doctoral dissertation, University of Minnesota, Twin Cities Campus.
Folk, V. G., & Green, B. F. (1989). Adaptive estimation when the unidimensionality assumption of IRT is violated. Applied Psychological Measurement, 13, 373-390.
Forero, C. G., & Maydeu-Olivares, A. (2009). Estimation of IRT Graded Response Models:Limited versus full information methods. Psychological Methods, 14, 275-299.
Fraser, C., & McDonald, R. P. (1988). NOHARM:Least squares item factor analysis. Multivariate Behavioral Research, 23, 267-269.
Gosz, J. K., & Walker, C. M. (2002). An empirical comparison of multidimensional item response data using TESTFACT and NOHARM. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
Harwell, M., Stone, C. A., Hsu, T., & Kirisci, L. (1996). Monte carlo studies in item response theory. Applied Psychological Measurement, 20, 101-125.
Hastings, W. K. (1970). Monte carlo simulation methods using markov chains and their applications. Biometrika, 57, 97-109.
Kim, J., & Bolt, D. M. (2007). An NCME instructional module on estimating item response theory models using Markov chain Monte Carlo methods. Educational Measurement:Issues and Practice, 26, 38-51.
Knol, D. L., & Berger, M. P. F. (1991). Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 26, 457-477.
Metropolis N., Rosenbluth A. W., Teller, A. H., & Teller, E. (1953). Equations of state space calculations by fast computing machines. Journal of Chemical Physics, 21, 1087-1091.
McDonald, R. P. (1997). Normal-ogive multidimensional model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 257-269). New York: Springer.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates.
Miller, T. R., & Hirsch, T. M. (1992). Cluster analysis of angular data in applications of multidimensional item-response theory Applied Measurement in Education, 5, 193-211.
Muthen, L. K., & Muthen, B. O. (2006). Mplus [Computer software]. Los Angeles, CA:Author.
R Development Core Team (2010). R: A language and environment for statistical computing. R Foundation for statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, URL http://www.R-project.org.
Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412.
Reckase, M. D., Ackerman, T. A., & Carlson, J. E. (1988). Building a unidimensional test using multidimensional items. Journal of Educational Measurement, 25, 193-203.
Robbins, H., & Monro, S. (1951). A stochastic approximations method. The Annals of Mathematical Statistics, 22, 400-407.
Tang, K. L., Way, W. D., & Carey, P. A. (1993). The effect of small calibration sample sizes on TEOFL IRT-based equating (TOEFL Technical Report TR-7). Princeton, NJ:Educational Testing Service.
Yao, L (2008). BMIRT: Bayesian multivariate item response theory [Computer software]. Monterey, CA: CTB/McGraw-Hill.
Zhang, B., & Stone, C. (2004, April). Direct and indirect estimation of three-parameter compensatory multidimensional item response models. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.