ISSN : 1229-067X
The purpose of this Monte Carlo study was to evaluate the performance of the multiple indicators and multiple causes (MIMIC) confirmatory factor analysis (CFA) for detecting differential item functioning (DIF). Specifically, this study compared different application strategies including two conventional testing approaches (forward-inclusion, backward-elimination) and five test statistic values (uncorrected or Bonferroni-corrected LR, △CFI of 0.01 or 0.002, △SRMR of 0.005) across conditions of different item type, test length, sample size, impact, and DIF type and DIF size in a target item and an anchor set. In addition, the author proposed an alternative testing approach (effects-coded backward-elimination) as a potential solution for arbitrary choice of a DIF-free anchor set. Simulation results indicated that when an anchor set was truly biased, only the proposed approach performed adequately under several conditions. False positive rates were controlled at the nominal alpha level (with Bonferroni-corrected LR) or slightly inflated (with uncorrected LR) as the DIF contamination rate in a scale decreased.
American Educational Research Association, American Psychological Association and National Council on Measurement in Education (1999). Standards for educational and psychological testing. Washington DC: American Educational Research Association.
Barrett, P. (2007). Structural equation modeling: Adjusting model fit. Personality and Individual Differences, 42, 815-824.
Bollen, K. A. (1989). Structural equations with latent variables. New York, NY: Wiley.
Brannick, M. T. (1995). Critical comments on applying covariance structure modeling. Journal of Organizational Behavior, 16, 201-213.
Byrne, B. M., & Stewart, S. M. (2006). The MACS approach to testing for multigroup invariance of a second-order structure: A walk through the process. Structural Equation Modeling, 13, 287-321.
Camilli, G., & Shepard, L. A. (1994). Measurement methods for the social sciences series: Methods for identifying biased test items (Vol. 4). Thousand Oaks, CA: Sage.
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14, 464-504.
Cheung, G. W., & Rensvold, R. B. (1999). Testing factorial invariance across groups: A reconceptualization and proposed new method, Journal of Management, 25, 1-27.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9, 233-255.
Christensen, H., MacKinnon, A. J., Korten, A., & Jorm, A. F. (2001). The “common cause hypothesis” of cognitive aging: Evidence for not only a common factor but also specific associations of age with vision and grip strength in a cross-sectional analysis. Psychology and Aging, 16, 588-599.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential Item Functioning (pp. 35-66). Hillsdale NJ: Lawrence Erlbaum.
Drasgow, F. (1984). Scrutinizing psychological tests: Measurement equivalence and equivalent relations with external variables are central issues. Psychological Bulletin, 95, 134–135.
Everson, H. T., & Millsap, R. E. (2004). Beyond individual differences: Exploring school effects on SAT scores. Educational Psychologist, 39, 157-172.
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST and the IRT likelihood ratio test. Applied Psychological Measurement, 29, 278-295.
Finch, H., & French, B. F. (2011). Estimation of MIMIC model parameters with multilevel data. Structural Equation Modeling, 18, 229-252.
Fleishman, J. A. (2005). Using MIMIC models to assess the influence of differential item functioning. Retrieved October, 24 2005, from http://outcomes.cancer.gov/conference/irt/fleishman.pdf
Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of differential item functioning on age and gender differences in functional disability. Journal of Gerontology: Social Sciences, 57, 275-283.
Gallo, J. J., Anthony, J. C., & Muthen, B. O. (1994). Age differences in the symptoms of depression: A latent trait analysis. Journal of Gerontology: Psychological Sciences, 49, 251-264.
González-Romá, V., Hernández, A., & Gómez-Benito, J. (2006). Power and Type I error of the mean and covariance structure analysis model for detecting differential item functioning in graded response items. Multivariate Behavioral Research, 41, 29-53.
Jones, R. N. (2006). Identification of measurement differences between English and Spanish language versions of the Mini-Mental State Examination: Detecting differential item functioning using MIMIC modeling. Medical Care, 44, 124-133.
Jöreskog, K. G., & Goldberger, A. S. (1975). Estimation of a model with multiple indicators and multiple causes of a single latent variable. Journal of the American Statistical Association, 10, 631-639.
Kamata, A., & Bauer, D. J. (2008). A note on the relation between factor analytic and item response theory models. Structural Equation Modeling, 15, 136-153.
Lee, J. (2009). Type I error and power of the MACS CFA for DIF detection: Methodological issues and resolutions. Unpublished doctoral dissertation, University of Kansas, USA.
Lee, J., Little, T. D., & Preacher, K. J. (2010). Methodological issues in using structural equation models for testing differential item functioning. In E. Davidov, P. Schmidt, and J. Billiet (Eds.), Cross-cultural data analysis: Methods and applications. (pp. 57-86). New York, NY: Routledge.
Little, T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: Practical and theoretical issues. Multivariate Behavioral Research, 32, 53-76.
Little, T. D., Slegers, D. W., & Card, N. A. (2006). A non-arbitrary method of identifying and scaling latent variables in SEM and MACS models. Structural Equation Modeling, 13, 59-72.
MacIntosh, R., & Hashim, S. (2003). Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Applied Psychological Measurement, 27, 372-379.
Mackinnon, A., Jorm, A. F., Christensen, H., Korten, A. E., Jacomb, P. A., & Rodgers, B. (1999). A short form of the Positive and Negative Affect Schedule: Evaluation of factorial validity and invariance across demographic variables in a community sample. Personality and Individual Differences, 27, 405-416.
Maydeu-Olivares, A., & Cai, L. (2006). A cautionary note on using G2 (dif) to assess relative model fit in categorical data analysis. Multivariate Behavioral Research, 41, 55-64.
McDonald, R. P. (1999). Test theory: Unified treatment. Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Meade, A. W., & Lautenschlager, G. K. (2004). A Monte-Carlo study of confirmatory factor analytic tests of measurement equivalence/ invariance. Structural Equation Modeling, 11, 60-72.
Meade, A. W., Johnson, E. C., & Braddy, P. W. (2008). Power and sensitivity of alternative fit indices in test of measurement invariance. Journal of Applied Psychology, 93, 568-592.
Mehta, P. D., & Neale, M. C. (2005). People are variables too: Multilevel structural equations modeling. Psychological Methods, 10, 259-284.
Mellenbergh, G. J. (1994). A unidimensional latent trait model for continuous item responses. Multivariate Behavioral Research, 29, 223-237.
Millsap, R. E. (2005). Four unresolved problems in studies of factorial invariance. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics (pp. 153-172). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Muthén, B. O. (1988). Some uses of structural equation modeling in validity studies: Extending IRT to external variables. In H. Wainder & H. Braun (Eds.), Test Validity (pp. 213–238). Hillsdale, NJ: Lawrence Erlbaum.
Muthén, B. O., & Asparouhov, T. (2002). Latent variable analysis with categorical outcomes: Multiple-group and growth modeling in Mplus. Los Angeles: University of California and Muthén & Muthén.
Muthén, B. O., Kao, C. F., & Burstein, L. (1991). Instructionally sensitive psychometrics: Application of a new IRT-based detection technique to mathematics achievement test items. Journal of Educational Measurement, 28, 1-22.
Muthén, L.K. & Muthén, B.O. (1998–2010). Mplus user’s guide. (6th Ed.). Los Angeles, CA: Muthén & Muthén.
Navas-Ara, M. J., & Gomez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Journal of Psychological Assessment, 18, 9-15.
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87, 517-529.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data analysis methods (2nd ed.). Thousand Oaks, CA: Sage.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory Factor Analysis and item response theory: Two approaches for exploring measurement invariance. Psychological Bulletin, 114, 552-566.
SAS Institute. (2002–2008). SAS/STAT 9.2 user's guide. Cary, NC: SAS Institute Inc.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monographs, No. 17.
Sörbom, D. (1974). A general method for studying differences in factor means and factor structure between groups. British Journal of Mathematical and Statistical Psychology, 27, 229-239.
Stark, S., Chernyshenko, O.S., & Drasgow, F. (2004). Examining the effects of differential item/test functioning (DIF/DTF) on selection decisions: When are statistically significant effects practically important? Journal of Applied Psychology, 89, 497-508.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91, 1292-1306.
Teresi, J. A. (2006). Overview of quantitative measurement methods: Equivalence, invariance, and differential item functioning in health applications. Medical Care, 44, 39-49.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69.
Woods, C. M. (2009a). Evaluation of MIMIC-model methods for DIF testing with comparison to two-group anlaysis. Multivariate Behavioral Research, 44, 1-27.
Woods, C. M. (2009b). Testing for differential item functioning with measures of partial association. Applied Psychological Measurement, 33, 538–554.
Woods, C. M., & Grimm, K. J. (2011). Testing for nonuniform differential item functioning with multiple indictor multiple cause models. Applied Psychological Measurement, 35, 339-361.
Woods, C. M., Oltmanns, T. F., & Turkheimer, E. (2009). Illustration of MIMIC-Model DIF Testing with the Schedule for Nonadaptive and Adaptive Personality. Journal of Psychopathology and Behavioral Assessment, 31, 320-330.