바로가기메뉴

본문 바로가기 주메뉴 바로가기

Text Mining Driven Content Analysis of Ebola on News Media and Scientific Publications

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2016, v.50 no.2, pp.289-307
https://doi.org/10.4275/KSLIS.2016.50.2.289



  • Downloaded
  • Viewed

Abstract

Infectious diseases such as Ebola virus disease become a social issue and draw public attention to be a major topic on news or research. As a result, there have been a lot of studies on infectious diseases using text-mining techniques. However, there is no research on content analysis of two media channels that have distinct characteristics. Accordingly, in this study, we conduct topic analysis between news (representing a social perspective) and academic research paper (representing perspectives of bio-professionals). As text-mining techniques, topic modeling is applied to extract various topics according to the materials, and the word co-occurrence map based on selected bio entities is used to compare the perspectives of the materials specifically. For network analysis, topic map is built by using Gephi. Aforementioned approaches uncovered the difference of topics between two materials and the characteristics of the two materials. In terms of the word co-occurrence map, however, most of entities are shared in both materials. These results indicate that there are differences and commonalties between social and academic materials.

keywords
에볼라 바이러스, 텍스트 마이닝, 전염병, 매체별 분석, 토픽 모델링, 동시출현 네트워크, 토픽맵, Ebola virus, Text mining, Epidemics, Media analysis, Topic modeling, Co-occurrence network,  Topic map

Reference

1.

김은경 외. 2013. 전염병의 경로 추적 및 예측을 위한 통합 정보 시스템 구현. 인터넷정보학회논문지, 14(5), 69-76.

2.

최정실. 2008. 법정전염병 감염관리를 위한 정보시스템 개발 및 효과. 기본간호학회지, 15(3):371-379.

3.

황교상, 이태식, 이현록. 2014. 센서스 데이터를 기반으로 만든 전염병 전파 시뮬레이션 모델. 대한산업공학회지, 40(2), 163-171.

4.

Bastian, M., Heymann, S., and Jacomy, M. 2009. Gephi: An Open Source Software for Exploring and Manipulating Networks. In Proceedings of International AAAI Conference on Weblogs and Social Media, May 17-20, 2009, San Jose, CA: 8, 361-362.

5.

Blei, D. M., Andrew Y. N., and Michael I. J. 2003. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.

6.

Blondel, V. D. et al. 2008. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment. [online] [cited 2016. 4. 20.]<http://iopscience.iop.org/article/10.1088/1742-5468/2008/10/P10008/pdf>

7.

Ding, Y. et al. 2013. Entitymetrics: Measuring the Impact of Entities. PLoS ONE, 8(8):1-14, e71416. [online] [cited 2016. 4. 20.]<http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0071416>

8.

Househ, M. 2015. Communicating Ebola through Social Media and Electronic News Media Outlets: A Cross-Sectional Study. Health informatics journal. Advance online publication. [online] [cited 2016. 4. 20.]<http://jhi.sagepub.com/content/early/2015/02/03/1460458214568037.full.pdf>

9.

Kim, E. H. J. et al. 2015. Topic-based Content and Sentiment Analysis of Ebola Virus on Twitter and in the News. Journal of Information Science. Advance online publication. [online][cited 2016. 4. 20.]<http://jis.sagepub.com/content/early/2015/10/05/0165551515608733.full.pdf+html>

10.

Lee, D., Kim, W. C., and Song, M. 2015. Finding the Differences between the Perceptions of Experts and the Public in the Field of Diabetes. In Proceedings of the 24th International Conference on World Wide Web Companion, May 18-22, 2015, Florence, Italy: 57-58.

11.

Manning, C. D. et al. 2014. The Stanford CoreNLP Natural Language Processing Toolkit.In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, June 22nd-27th, 2014, Baltimore, Maryland: 55-60.

12.

Mimno, D., and McCallum, A. 2012. Topic Models Conditioned on Arbitrary Features with Dirichlet-multinomial Regression. arXiv preprint arXiv: 1206.3278. [online] [cited 2016. 4. 20.] <https://arxiv.org/ftp/arxiv/papers/1206/1206.3278.pdf>

13.

Pesquita, C. et al. 2014. The Epidemiology Ontology: An Ontology for the Semantic Annotation of Epidemiological Resources. J. Biomedical Semantics, 5(4), 1-7. [online][cited 2016. 4. 20.]<https://www.researchgate.net/profile/Francisco_Couto/publication/259805277_The_epidemiology_ontology_an_ontology_for_the_semantic_annotation_of_epidemiological_resources/links/0a85e532030f847104000000.pdf>

14.

Salathe, M. et al. 2012. Digital Epidemiology. PLoS Comput Biol, 8(7), 1-5. [online] [cited 2016. 4. 20.] <http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1002616>

15.

Seltzer, E. K. et al. 2015. The Content of Social Media's Shared Images about Ebola: A Retrospective Study. Public Health, 129(9), 1273-1277.

16.

Towers, S. et al. 2015. Mass Media and the Contagion of Fear: The Case of Ebola in America. PLoS ONE, 10(6): e0129179. [online] [cited 2016. 4. 20.]<http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0129179>

Journal of the Korean Society for Library and Information Science