바로가기메뉴

본문 바로가기 주메뉴 바로가기

Korean Journal of Psychology: General

A Comparison of Full Information Maximum Likelihood, Multiple Imputation, and Bayesian Approach in Overall Goodness of Fit Assessment of Structural Equation Modeling with Missing Data

Korean Journal of Psychology: General / Korean Journal of Psychology: General, (P)1229-067X; (E)2734-1127
2014, v.33 no.2, pp.507-533
(University of Oklahoma)

Abstract

In practical applications of any statistical modeling, including structural equation modeling(SEM), virtually every data set contains missing values. It is a well known fact that improper handling of missing data can exert harmful impact on subsequent statistical inferences in a variety of ways to varying degrees. In the context of SEM, the full information maximum likelihood(FIML) has been arguably the most popular method for addressing missing data. Despite of being yet less widely known to majority of applied researchers as flexible alternatives to FIML, multiple imputation (MI) procedures and Bayesian approaches have recently begun to emerge as viable solutions among many applied researchers. An important objective of this article is to introduce these methods to applied researchers in an accessible manner using SEM as the context. Structural equation modeling actually involves the process of proposing, estimating, and evaluating the researcher’s hypothesis that is believed to be underlying and purported in generating the observed data. Therefore, it is essential to evaluate the overall goodness-of-fit of the posited model in any given application. FIML, MI and Bayesian approaches, respectively, yield the chi-square, , , and the posterior predictive modeling checking (PPMC) p-value as statistical tools for the assessment of data-model fit. Another important objective of this article is to study performance of these model evaluation tools in the context of SEM. Further, relative performance of these data-model fit assessment tools is to be evaluated with respect to their Type I error rates and power. The performance of these assessment tools, except the chi-square statistics, has never been evaluated nor been compared within the context of SEM. The initial results provided in the present article is believed to not only enhance the knowledge base regarding the characteristics of these assessment tools under missing data, but also provide an initial guideline for the proper use of these assessment tools in the real-world data analysis especially in the application of SEM with missing data.

keywords
결측치, 공분산구조모형, 최대우도, 다중대체, 베이지안, missing data, structural equation modeling, full information maximum likelihood, multiple imputation, Bayesian

Reference

1.

Allison, P. D. (1987). Estimation of linear models with incomplete data. Sociological methodology, 17, 71-103.

2.

Allison, P. D. (2003). Missing data techniques for structural equation modeling. Journal of abnormal psychology, 112(4), 545-557.

3.

Arbuckle, J. L. (1996). Full informaton estimation in the presence of incomplete data. In G. A. Marcoulides & R. E. Schumaker (Eds.), Advanced structural equation modeling (pp. 243-277). Mahwah, NJ: Lawrence Erlbaum Associates.

4.

Bentler, P. M., & Bonett, D. G. (1980). Significance tests and goodness of fit in the analysis of covariance structures. Psychological bulletin, 88(3), 588-606.

5.

Bodner, T. E. (2008). What improves with increased missing data imputations?. Structural Equation Modeling, 15(4), 651-675.

6.

Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. Sage Focus Editions, 154, 136-136.

7.

Dahl, F. A. (2006). On the conservativeness of posterior predictive p-values. Statistics &Probability Letters, 76, 1170-1174.

8.

Enders, C. K. (2001). The performance of the full information maximum likelihood estimator in multiple regression models with missing data. Educational and Psychological Measurement, 61(5), 713-740.

9.

Finkbeiner, C. (1979). Estimation for the multiple factor model when data are missing. Psychometrika, 44(4), 409-420.

10.

Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis. CRC press.

11.

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. Pattern Analysis and Machine Intelligence, IEEE Transactions on, (6), 721-741.

12.

Gelman, A., Meng, X. L., & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies. Statistica sinica, 6(4), 733-760.

13.

Gold, M. S., & Bentler, P. M. (2000). Treatments of missing data: A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization. Structural Equation Modeling, 7(3), 319-355.

14.

Graham, J. W., Olchowski, A. E., & Gilreath, T. D. (2007). How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prevention Science, 8(3), 206-213.

15.

Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55.

16.

Larson, R. (2011). Missing data imputation versus full information maximum likelihood with second-level dependencies. Structural Equation Modeling, 18(4), 649-662.

17.

Lee, S. Y. (2007). Structural equation modeling: A Bayesian approach (Vol. 711). John Wiley &Sons.

18.

Li, K. H., Meng, X. L., Raghunathan, T. E., & Rubin, D. B. (1991). Significance levels from repeated p-values with multiply-imputed data. Statistica Sinica, 1(1), 65-92.

19.

Little, R. J. & Rubin, D. B. (2002). Statistical analysis with missing data. Wiley.

20.

Meng, X. L., & Rubin, D. B. (1992). Performing likelihood ratio tests with multiply-imputed data sets. Biometrika, 79(1), 103-111.

21.

Olinsky, A. Chen, S. & Harlow, L. (2003). The comparative efficacy of imputation methods for missing data in structural equation modeling. European Journal of Operational Research, 151(1), 53-79.

22.

Robins, J. M., van der Vaart, A., & Ventura, V. (2000). Asymptotic distribution of p values in composite null models. Journal of the American Statistical Association, 95, 1143-1156.

23.

Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581-592.

24.

Rubin, D. B. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association, 72(359), 538-543.

25.

Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys. New York, New York: Wiley.

26.

Schafer, J. L. (1997). Analysis of incomplete multivariate data. CRC press.

27.

Schafer, J. L., & Graham, J. W. (2002). Missing data: our view of the state of the art. Psychological methods, 7(2), 147.

28.

Sinharay, S., Stern, H. S., & Russell, D. (2001). The use of multiple imputation for the analysis of missing data. Psychological methods, 6(4), 317-329.

29.

Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. Journal of the American statistical Association, 82(398), 528-540.

30.

Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38(1), 1-10.

31.

Yuan, K-H., Yang-Wallentin, F. & Bentler, P. M. (2012). ML versus MI for missing data with violation of distribution conditions. Sociological Methods & Research, 41(4). 598-629.

32.

Van der Vaart, A. W. (2000). Asymptotic statistics (Vol. 3). Cambridge university press.

33.

Van Buuren, S. (2012). Flexible imputation of missing data. CRC press.

34.

Wilks, S. S. (1932). Moments and distributions of estimates of population parameters from fragmentary samples. The annals of Mathematical Statistics, 3(3), 163-195.

Korean Journal of Psychology: General