바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

Word2Vec과 WordNet 기반 불확실성 단어 간의 네트워크 분석에 관한 연구

Network Analysis between Uncertainty Words based on Word2Vec and WordNet

한국문헌정보학회지 / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2019, v.53 no.3, pp.247-271
https://doi.org/10.4275/KSLIS.2019.53.3.247
허고은 (연세대학교)
  • 다운로드 수
  • 조회수

초록

과학에서 지식의 불확실성은 명제가 현재 상태로는 참도 거짓도 아닌 불확실한 상태를 의미한다. 기존의 연구들은 학술 문헌에 표현된 명제를 분석하여 불확실성을 의미하는 단어를 수동적으로 구축하고 구축한 코퍼스를 대상으로 규칙 기반, 기계 학습 기반의 성능평가를 수행해왔다. 불확실성 단어 구축의 중요성은 인지하고 있지만 단어의 의미를 분석하여 자동적으로 확장하고자 하는 시도들은 부족했다. 한편, 계량정보학이나 텍스트 마이닝 기법을 이용하여 네트워크의 구조를 파악하는 연구들은 다양한 학문분야에서 지적 구조와 관계성을 파악하기 위한 방법으로 널리 활용되고 있다. 따라서, 본 연구에서는 기존의 불확실성 단어를 대상으로 Word2Vec을 적용하여 의미적 관계성을 분석하였고, 영어 어휘 데이터베이스이자 시소러스인 WordNet을 적용하여 불확실성 단어와 연결된 상위어, 하위어 관계와 동의어 기반 네트워크 분석을 수행하였다. 이를 통해 불확실성 단어의 의미적, 어휘적 관계성을 구조적으로 파악하였으며, 향후 불확실성 단어의 자동 구축의 확장 가능성을 제시하였다.

keywords
텍스트 마이닝, 계량정보학, 불확실성, Word2Vec, 워드넷, 네트워크 분석, Text Mining, Bibliometrics, Uncertainty, Word2Vec, WordNet, Network Analysis

Abstract

Uncertainty in scientific knowledge means an uncertain state where propositions are neither true or false at present. The existing studies have analyzed the propositions written in the academic literature, and have conducted the performance evaluation based on the rule based and machine learning based approaches by using the corpus. Although they recognized that the importance of word construction, there are insufficient attempts to expand the word by analyzing the meaning of uncertainty words. On the other hand, studies for analyzing the structure of networks by using bibliometrics and text mining techniques are widely used as methods for understanding intellectual structure and relationship in various disciplines. Therefore, in this study, semantic relations were analyzed by applying Word2Vec to existing uncertainty words. In addition, WordNet, which is an English vocabulary database and thesaurus, was applied to perform a network analysis based on hypernyms, hyponyms, and synonyms relations linked to uncertainty words. The semantic and lexical relationships of uncertainty words were structurally identified. As a result, we identified the possibility of automatically expanding uncertainty words.

keywords
텍스트 마이닝, 계량정보학, 불확실성, Word2Vec, 워드넷, 네트워크 분석, Text Mining, Bibliometrics, Uncertainty, Word2Vec, WordNet, Network Analysis

참고문헌

1.

허고은, 송민. 2013. 저자동시인용 분석과 동시출현단어 분석을 이용한 의료정보학 저널의 지적구조분석. 『정보관리학회지』, 30(2): 207-225

2.

허고은, 송민. 2019. 생의학 학술 문헌의 불확실성 기반 지식 동향 분석에 관한 연구. 『정보관리학회지』, 36(2): 175-199.

3.

허고은. 2019. 토픽 모델링 기반 과학적 지식의 불확실성의 흐름에 관한 연구. 『정보관리학회지』, 36(1): 191-213.

4.

Banerjee, S. and Pedersen, T. 2002. “An adapted Lesk algorithm for word sense disambiguation using WordNet.” In International Conference on Intelligent Text Processing and Computational Linguistics, 136-145. Springer, Berlin, Heidelberg.

5.

Bastian, M., Heymann, S. and Jacomy, M. 2009. “Gephi: an open source software for exploring and manipulating networks.” Icwsm, 8: 361-362.

6.

Blondel, V. D., Guillaume, J. L., Lambiotte, R. and Lefebvre, E. 2008. “Fast unfolding of communities in large networks.” Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

7.

Chen, C., Song, M. and Heo, G. E. 2018. “A scalable and adaptive method for finding semantically equivalent cue words of uncertainty.” Journal of Informetrics, 12(1): 158-180. https://doi.org/10.1016/j.joi.2017.12.004

8.

Daim, T. U., Rueda, G., Martin, H. and Gerdsri, P. 2006. “Forecasting emerging technologies: Use of bibliometrics and patent analysis.” Technological Forecasting and Social Change, 73(8): 981-1012.

9.

Farkas, R., Vincze, V., Móra, G., Csirik, J. and Szarvas, G. 2010. “The CoNLL-2010 shared task: learning to detect hedges and their scope in natural language text.” 1-12. Association for Computational Linguistics.

10.

Fernandes, E. R., Crestana, C. E. and Milidiú, R. L. 2010. “Hedge detection using the RelHunter approach.” In Proceedings of the Fourteenth Conference on Computational Natural Language Learning---Shared Task, (July): 64-69. Association for Computational Linguistics.

11.

Freeman, L. C. 1978. “Centrality in social networks conceptual clarification.” Social networks, 1(3): 215-239.

12.

Geaney, F., Scutaru, C., Kelly, C., Glynn, R. W. and Perry, I. J. 2015. “Type 2 diabetes research yield, 1951-2012: bibliometrics analysis and density-equalizing mapping.” PloS one, 10(7): e0133009.

13.

Heo, G. E., Kang, K. Y., Song, M. and Lee, J. H. 2017. “Analyzing the field of bioinformatics with the multi-faceted topic modeling technique.” BMC bioinformatics, 18(7): 251.

14.

Hyland, K. 1996. “Talking to the academy: Forms of hedging in science research articles.”Written communication, 13(2): 251-281.

15.

Hyland, K. 1998. Hedging in scientific research articles, Vol. 54. John Benjamins Publishing.

16.

Jeong Y. K., Heo, G. E. Kang, K. Y., Yoon, D. S. and Song, M. 2016. “Trajectory analysis of drug-research trends in pancreatic cancer on PubMed and ClinicalTrials.” gov. Journal of Informetrics, 10(1): 273-285.

17.

Kilicoglu, H. and Bergler, S. 2008. “Recognizing speculative language in biomedical research articles: a linguistically motivated perspective.” BMC bioinformatics, 9(11): S10.

18.

Kostoff, R. N., del Rio, J. A., Humenik, J. A., Garcia, E. O. and Ramirez, A. M. 2001. Citation mining: Integrating text mining and bibliometrics for research user profiling. Journal of the American Society for Information Science and Technology, 52(13): 1148-1156.

19.

Lesk, M. 1986. “Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone.” In Proceedings of the 5th annual international conference on Systems documentation, (pp. 24-26). ACM.

20.

Li, X., Shen, J., Gao, X. and Wang, X. 2010. “Exploiting rich features for detecting hedges and their scope.” In Proceedings of the Fourteenth Conference on Computational Natural Language Learning---Shared Task, (July): 78-83. Association for Computational Linguistics.

21.

Light, M., Qiu, X. Y. and Srinivasan, P. 2004. “The language of bioscience: Facts, speculations, and statements in between.” In Proceedings of BioLink 2004 workshop on linking biological literature, ontologies and databases: tools for users, (May): 17-24. Association for Computational Linguistics.

22.

Madani, F. and Weber, C. 2016. “The evolution of patent mining: Applying bibliometrics analysis and keyword network analysis.” World Patent Information, 46: 32-48.

23.

Malhotra, A., Younesi, E., Gurulingappa, H. and Hofmann-Apitius, M. 2013. “‘Hypothesis Finder’: a strategy for the detection of speculative statements in scientific text.” PLoS computational biology, 9(7): e1003117.

24.

Medlock, B. and Briscoe, T. 2007. “Weakly supervised learning for hedge classification in scientific literature.” In ACL, (June): 992-999.

25.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S. and Dean, J. 2013. “Distributed representations of words and phrases and their compositionality.” In Advances in neural information processing systems, (pp. 3111-3119).

26.

Miller, G. A., Beckwith, R., Fellbaum, C., Gross, D. and Miller, K. J. 1990. “Introduction to WordNet: An on-line lexical database.” International journal of lexicography, 3(4): 235-244.

27.

Palmer, F. R. 2014. Modality and the English modals. Routledge.

28.

Pyysalo, S., Ginter, F., Moen, H., Salakoski, T. and Ananiadou, S. 2013. “Distributional semantics resources for biomedical text processing.” In: LBM. Tokyo: Database Center for Life Science.

29.

Rei, M. and Briscoe, T. 2010. “Combining manual rules and supervised learning for hedge cue and scope detection.” In Proceedings of the Fourteenth Conference on Computational Natural Language Learning---Shared Task, (July): 56-63. Association for Computational Linguistics.

30.

Sánchez, L. M., Li, B. and Vogel, C. 2010. “Exploiting CCG structures with tree kernels for speculation detection.” In Proceedings of the Fourteenth Conference on Computational Natural Language Learning---Shared Task, (July): 126-131. Association for Computational Linguistics.

31.

Song, M., Heo, G. E. and Kim, S. Y. 2014. “Analyzing topic evolution in bioinformatics:investigation of dynamics of the field with conference data in DBLP.” Scientometrics, 101(1):397-428.

32.

Song, M., Heo, G. E. and Lee, D. H. 2014. “Identifying the Landscape of Alzheimer’s Disease Research with Network and Content Analysis.” Scientometrics, 102(1): 905-927.

33.

Szarvas, G. 2008. “Hedge classification in biomedical texts with a weakly supervised selection of keywords.” Proceedings of ACL-08: HLT, 281-289.

34.

Szarvas, G., Vincze, V., Farkas, R. and Csirik, J. 2008. “The BioScope corpus: annotation for negation, uncertainty and their scope in biomedical texts.” In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing, (pp. 38-45). Association for Computational Linguistics.

35.

Szarvas, G., Vincze, V., Farkas, R., Móra, G. and Gurevych, I. 2012. “Cross-genre and cross-domain detection of semantic uncertainty.” Computational Linguistics, 38(2): 335-367. https://doi.org/10.1162/COLI_a_00098

36.

Tang, B., Wang, X., Wang, X., Yuan, B. and Fan, S. 2010. “A cascade method for detecting hedges and their scope in natural language text.” In Proceedings of the Fourteenth Conference on Computational Natural Language Learning---Shared Task, (July): 13-17. Association for Computational Linguistics.

37.

Thompson, P., Nawaz, R., McNaught, J. and Ananiadou, S. 2011. “Enriching a biomedical event corpus with meta-knowledge annotation.” BMC bioinformatics, 12(1): 393.

38.

Vincze, V. 2013. “Weasels, hedges and peacocks: Discourse-level uncertainty in Wikipedia articles.” International Joint Conference on Natural Language Processing, (October): 383-391. Nagoya, Japan.

39.

Vincze, V., Szarvas, G., Farkas, R., Móra, G. and Csirik, J. 2008. “The BioScope corpus:biomedical texts annotated for uncertainty, negation and their scopes.” BMC bioinformatics, 9(11): S9. https://doi.org/10.1186/1471-2105-9-S11-S9

40.

Wu, Z. and Palmer, M. 1994. “Verbs semantics and lexical selection.” In Proceedings of the 32nd annual meeting on Association for Computational Linguistics, (June): 133-138. Association for Computational Linguistics.

41.

Zhang, S., Zhao, H., Zhou, G. and Lu, B. L. 2010. “Hedge detection and scope finding by sequence labeling with normalized feature selection.” In Proceedings of the Fourteenth Conference on Computational Natural Language Learning---Shared Task, (July): 92-99. Association for Computational Linguistics.

한국문헌정보학회지