An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning

최성필

doi:10.4275/KSLIS.2016.50.2.309

P-ISSN1225-598X
E-ISSN2982-6292

홈으로

논문 상세

논문 투고

Vol.50 No.2

Citation Share

기계 학습을 이용한 바이오 분야 학술 문헌에서의 관계 추출에 대한 실험적 연구

An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning

한국문헌정보학회지 / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292

2016, v.50 no.2, pp.309-336

https://doi.org/10.4275/KSLIS.2016.50.2.309

최성필 (경기대학교)

최성필. (2016). 기계 학습을 이용한 바이오 분야 학술 문헌에서의 관계 추출에 대한 실험적 연구. , 50(2), 309-336, https://doi.org/10.4275/KSLIS.2016.50.2.309

복사

초록

본 논문에서는 지지벡터기계(Support Vector Machines, SVM) 기반의 기계 학습 모듈을 활용하여 특정 문장 내에서의 두 개체 간의 관계를 자동으로 식별하고 분류하는 바이오 분야 관계 추출 시스템을 제안한다. 제안된 시스템의 특징은 개체를 포함하고 있는 문장 내에서 풍부한 언어 자질을 추출하여 학습에 활용함으로써 그 성능을 극대화할 수 있는 다양한 기능들을 포함하고 있다는 점이다. 제안된 시스템의 성능 측정을 위해서 전 세계적으로 많이 활용되고 있는 바이오 분야 관계 추출 표준 컬렉션 3가지를 활용하여 심층적인 실험을 수행한 결과 모든 컬렉션에서 높은 성능을 획득하여 그 우수성을 입증하였다. 결론적으로, 본 논문에서 수행한 바이오 분야 관계 추출에 대한 광범위하고 심층적인 실험 연구가 향후 기계학습 기반의 바이오 분야 텍스트 분석 연구에 많은 시사점을 제공할 것으로 보인다.

keywords: 관계 추출, 지지벡터기계, 단백질 간 상호작용 추출, 텍스트 마이닝, 기계 학습, Relation Extraction, Support Vector Machines, Protein-Protein Interaction Extraction, Text Mining, Machine Learning

Abstract

This paper introduces a relation extraction system that can be used in identifying and classifying semantic relations between biomedical entities in scientific texts using machine learning methods such as Support Vector Machines (SVM). The suggested system includes many useful functions capable of extracting various linguistic features from sentences having a pair of biomedical entities and applying them into training relation extraction models for maximizing their performance. Three globally representative collections in biomedical domains were used in the experiments which demonstrate its superiority in various biomedical domains. As a result, it is most likely that the intensive experimental study conducted in this paper will provide meaningful foundations for research on bio-text analysis based on machine learning.

keywords: 관계 추출, 지지벡터기계, 단백질 간 상호작용 추출, 텍스트 마이닝, 기계 학습, Relation Extraction, Support Vector Machines, Protein-Protein Interaction Extraction, Text Mining, Machine Learning

참고문헌

Airola, A. et al. 2008. All-Paths Graph Kernel for Protein-Protein Interaction Extraction with Evaluation of Cross-Corpus Learning. BMC Bioinformatics, 9(11), 1-12.

Ananiadou, S., Kell, D. B., and Tsujii, J. 2006. Text Mining and Its Potential Applications in Systems Biology. Trends in Biotechnology, 24(12), 571-579.

Ananiadou, S. et al. 2010. Event Extraction for Systems Biology by Text Mining the Literature.Trends in Biotechnology, 28(7), 381-390.

Andrade, M. A., and Valencia, A. 1998. Automatic Extraction of Keywords from Scientific Text: Application to the Knowledge Domain of Protein Families. Bioinformatics, 14(7):600-607.

Blaschke, C., Hirschman, L., and Valencia, A. 2002. Information Extraction in Molecular Biology. Briefings in Bioinformatics, 3(2), 154-165.

Bunescu, R. et al. 2005. Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions. Artificial Intelligence in Medicine, 33(2), 139-155.

Chang, C. C., and Lin, C. J. 2011. LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-39.

Choi, S. P. et al. 2014. An Intensive Case Study on Kernel-based Relation Extraction.Multimedia Tools and Applications, 71(2), 741-767.

Choi, S. P., and Myaeng, S. H. 2010. Simplicity Is Better: Revisiting Single Kernel PPI Extraction. In Proceedings of the 23rd International Conference on Computational Linguistics, August 23rd-27th, 2010, Beijing: Beijing International Convention Center: 206-214.

10.

Craven, M., and Kumlien, J. 1999. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, August 6th-10th, 1999, Heidelbrg: Kongresshaus Stadthalle: 77-86.

11.

Fan, R. E. et al. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 9, 1871-1874.

12.

Fundel, K., Küffner, R., and Zimmer, R. 2007. RelEx — Relation Extraction Using Dependency Parse Trees. Bioinformatics, 23(3), 365-371.

13.

Li, C., Liakata, M., and Rebholz-Schuhmann, D. 2014. Biological Network Extraction from Scientific Literature: State of the Art and Challenges. Briefings in Bioinformatics, 15(5), 856-877.

14.

Manning, C. D. et al. 2014. The Stanford CoreNLP Natural Language Processing Toolkit.In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations, June 22nd-27th, 2014, Baltimore, MD: 55-60.

15.

Miller, G. A. 1995. WordNet: A Lexical Database for English. Communications of the ACM, 38(11), 39-41.

16.

Miwa, M. et al. 2009. Protein-Protein Interaction Extraction by Leveraging Multiple Kernels and Parsers. International Journal of Medical Informatics, 78(12): e39-e46.

17.

Ono, T. et al. 2001. Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature. Bioinformatics, 17(2), 155-161.

18.

Papanikolaou, N. et al. 2014. Protein-Protein Interaction Predictions Using Text Mining Methods. Methods, 74, 47-53.

19.

Wikipedia. 2016. San Francisco, CA: Wikimedia Foundation.. s.v. Protein-Protein Interaction.[online]<https://en.wikipedia.org/w/index.php?title=Protein%E2%80%93protein_interaction&oldi d=713402377>

20.

Pyysalo, S. et al. 2008. Comparative Analysis of Five Protein-Protein Interaction Corpora.BMC Bioinformatics, 9(3), 1-11.

21.

Pyysalo, S. et al. 2007. BioInfer: A Corpus for Information Extraction in the Biomedical Domain. BMC Bioinformatics, 8(1), 50-73.

22.

Rosario, B., and Hearst, M. A. 2004. Classifying Semantic Relations in Bioscience Texts.In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, July 21st-26th, 2004, Barcelona: Forum Convention Centre: 430-437.

23.

Sekimizu, T., Park, H. S., and Tsujii, J. 1998. Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. Genome Informatics, 9, 62-71.

24.

Temkin, J. M., and Gilder, M. R. 2003. Extraction of Protein Interaction Information from Unstructured Text using a Context-Free Grammar. Bioinformatics, 19(16), 2046-2053.

25.

Zhou, D., and He, Y. 2008. Extracting Interactions Between Proteins from the Literature.Journal of Biomedical Informatics, 41(2), 393-407.

바로가기메뉴

논문 상세

Vol.50 No.2

기계 학습을 이용한 바이오 분야 학술 문헌에서의 관계 추출에 대한 실험적 연구

An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning

초록

Abstract

참고문헌

한국문헌정보학회지