ISSN : 1229-067X
The development of new technology such as big data, machine learning, and Artificial Intelligence changes human behaviors and thought. Increased use of the internet makes it possible to observe various human activities that were not observable before. Huge amounts of data about various types of human activities are being stored on the internet. Analyzing this information will help extend the scope of understanding human behaviors and psychology. The present paper attempts to find a way of applying new technology to psychological studies. Specifically, we focused on what big data are like and how they can be used for psychological research. This paper first reviewed the characteristics of big data and their role in psychological research. In this context, it discussed the problems of data-driven analysis techniques in which big data analysis is applied and the possibility of applying such methods to psychological research. In this context, it discussed the problems of the data-driven analytic scheme that big data analysis adapting and the possibilities of applying such a method to psychological research. Second, data analytic techniques used in big data analyses are reviewed. These techniques should be able to deal with big and unorganized data and unstructured data such as pictures, video clips, texts, etc. Specifically, it reviewed basic principles of topic modeling, ridge or lasso regression, support vector machine, neural network, and deep learning, and their application to psychological data. Third, the limitations of the use of big data in psychological research are discussed. Finally, it proposed ways of applying big data technology to psychological research.
김청택, 이태헌 (2002). 뇌와 인지모형: 잠재의미분석을 사용한 문서분류. 한국심리학지:실험 및 인지, 14(4), 309-320.
박성준, 박희영, 김청택 (2019). 잠재의미분석을 활용한 성격검사문항의 의미표상과 요인구조의 비교. 인지과학, 30(3), 133-156.
이태헌, 김청택 (2004). LSA모형에서 다의어 의미의 표상, 인지과학, 15, 23-31.
Adjerid, I.,, & Kelley, K. (2018). Big data in psychology: A framework for research advancement. American Psychologist, 73(7), 899-917. https://doi.org/10.1037/amp0000190
Amato A., & Coronato, A. (2017). Supporting hypothesis generation by machine learning in smart health. Advances in Intelligent Systems and Computing, 612, 401-410. https://doi.org/10.1007/978-3-319-61542-4_38
Anderson, J. (1990). The Adaptive Character of Thought. Hillsdale, NJ: Erlbaum Associates.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation, Journal of Machine Learning Research, 3, 993-1022.
Boser, B. E., Guyon, I., & Vapnik, V.N. (1992). Training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop of Computational Learning Theory (pp 144-152), Pittsburgh: ACM. https://doi.org/10.1145/130385.130401
Cheung, M. W. L., & Jak, S. (2016). Analyzing big data in psychology: A split/analyze/metaanalyze approach. Frontiers in Psychology, 7, https://doi.org/10.3389/fpsyg.2016.00738
Farnadi, G., Sitaraman, G., Sushmita, S., Celli, F., Kosinski, M., Stillwell, D., Marvalos, S. Moens, M-F., & De Cock, M. (2016). Computational personality recognition in social media. User Modeling and User-Adapted Interaction, 26, 109-142. https://doi.org/10.1007/s11257-016-9171-0
Griggs, B. (2014, January 27). It's Facebook vs. Princeton in study smackdown. CNN. https://edition.cnn.com/2014/01/24/tech/social-media/facebook-princeton-smackdown/index.html
Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313, 504-507.
Hofmann, T. (1999). Probabilistic latent semantic analysis. In K. B. Laskey, & H. Prade (Eds.), Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence(pp. 289-296). Stockholm Sweden: Morgan Kaufmann Publishers Inc.
Hofmann, T. (2001). Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42, 177-196. https://doi.org/10.1023/A:1007617005950
HostingFacts (2019, November) Internet Stats & Facts for 2019. Retrieved November 25, 2019from https://hostingfacts.com/internet-factsstats
Kaplan, R. M., & Saccuzzo, D. P. (2018). Psychological Testing: Principles, Applications, and Issues. Boston, MA: Cengage Learning.
Kosinski, M., Matz, S., Gosling, S., Popov, V., & Stillwell, D. (2015). Facebook as a research tool for the social sciences: Opportunities, challenges, ethical considerations, and practical guidelines, American Psychologist, 70(6), 543-556. https://doi.org/10.1037/a0039210
Kosinski, M., Wang, Y., Lakkaraju, H., & Leskovec, J. (2016). Mining big data to extract patterns and predict real-life outcomes. Psychological Methods, 21(4), 493. https://doi.org/10.1037/met0000105
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s Problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2), 211-240. https://doi.org/10.1037/0033-295X.104.2.211
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). Introduction to latent semantic analysis. Discourse Processes, 25, 259-284. https://doi.org/10.1080/01638539809545028
Landers, R., & Behrend, T. (2015). An inconvenient truth: arbitrary distinctions between organizational, Mechanical Turk, and other convenience samples. Industrial and Organizational Psychology, 8(2), 142-164. https://doi.org/10.1017/iop.2015.13
Laney, D. (2001) 3D Data management:controlling data volume, velocity and variety. META Group Research Note, 6.
Lazer, D., Kennedy, R., King, G., & Vespignani, A. (2014). The parable of google flu: Traps in big data analysis. Science, 343(6176). 1203-1205. https://doi.org/10.1126/science.1248506
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521, 436-444. https://doi.org/10.1038/nature14539
Markowetz, A, Błaszkiewicz, K, Montag, C, Switala, C, & Schlaepfer, T. E. (2014). Psycho-informatics: Big data shaping modern psychometrics. Medical Hypotheses, 82(4), 405-411.
McClelland, J. L., & Rumelhart, D. E. (1981). An interactive activation model of context effects in letter perception: I. An account of basic findings. Psychological Review, 88(5), 375-407. https://doi.org/10.1037/0033-295X.88.5.375
McClelland, J. L., Rumelhart, D. E., & the PDP Research Group (Eds.). (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Vol. 2. Psychological and biological models. Cambridge, MA: MIT Press.
McCulloch, W. S, & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5(4), 115-133. https://doi.org/10.1007/BF02478259
Moustafa, A. A., Diallo, T. M. O., Amoroso, N., Zaki, N., Hassan, M., & Alashwal, H. (2018). Applying big data methods to understanding human behavior and health. Frontiers in Computational Neuroscience, 12, 1-4. https://doi.org/10.3389/fncom.2018.00084
Oquendo, M. A., Baca-Garcia, E., Artés-Rodríguez, A., Perez-Cruz, F., Galfalvy, H. C., Blasco-Fontecilla, H.,, Madigan D., & Duan, N. (2012, October). Machine learning and data mining: Strategies for hypothesis generation. Molecular Psychiatry. https://doi.org/10.1038/mp.2011.173
Popper, K. R. (1959). The Logic of Scientific Discovery (translation of Logik der Forschung). London: Hutchinson.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0
Rumelhart, D. E., McClelland, J. L., & the PDP Research Group (Eds.). (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition: Vol. 1. Foundations. Cambridge, MA: MIT Press.
Sang, S., Yang, Z., Li, Z., & Lin, H. (2015). Supervised learning based hypothesis generation from biomedical literature. BioMed Research International, 215, https://doi.org/10.1155/2015/698527.
Shawe-Taylor, J., & Cristianini, N. (2004). Kernel Methods for Pattern Analysis. Cambridge:Cambridge University Press.
Snijders, C., Matzat, U., & Reips, U.-D. (2012). ‘Big Data’: Big gaps of knowledge in the field of internet. International Journal of Internet Science, 7, 1-5.
Steyvers, M., & Griffiths, T. (2006). Probabilistic topic models. In D. Landauer, D. McNamara, S. Dennis, & W. Kintsch (Eds.). Latent Semantic Analysis: A Road to Meaning. Mahwah:Erlbaum.
Thomas, K. A., & Clifford, S. (2017). Validity and Mechanical Turk: An assessment of exclusion methods and interactive experiments. Computers in Human Behavior, 77, 184-197. https://doi.org/10.1016/j.chb.2017.08.038
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B, 58, 267-288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
Young, J. L. (2018). The long history of big data in psychology. The American Journal of Psychology, 131(4), 477-482. https://doi.org/10.5406/amerjpsyc.131.4
Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-based personality judgments are more accurate than those made by humans. Proceedings of the National Academy of Sciences of the United States of America, 112(4), 1036-1040. https://doi.org/10.1073/pnas.1418680112