바로가기메뉴

본문 바로가기 주메뉴 바로가기

An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2016, v.50 no.2, pp.309-336
https://doi.org/10.4275/KSLIS.2016.50.2.309

  • Downloaded
  • Viewed

Abstract

This paper introduces a relation extraction system that can be used in identifying and classifying semantic relations between biomedical entities in scientific texts using machine learning methods such as Support Vector Machines (SVM). The suggested system includes many useful functions capable of extracting various linguistic features from sentences having a pair of biomedical entities and applying them into training relation extraction models for maximizing their performance. Three globally representative collections in biomedical domains were used in the experiments which demonstrate its superiority in various biomedical domains. As a result, it is most likely that the intensive experimental study conducted in this paper will provide meaningful foundations for research on bio-text analysis based on machine learning.

keywords
관계 추출, 지지벡터기계, 단백질 간 상호작용 추출, 텍스트 마이닝, 기계 학습, Relation Extraction, Support Vector Machines, Protein-Protein Interaction Extraction, Text Mining, Machine Learning

Reference

1.

Airola, A. et al. 2008. All-Paths Graph Kernel for Protein-Protein Interaction Extraction with Evaluation of Cross-Corpus Learning. BMC Bioinformatics, 9(11), 1-12.

2.

Ananiadou, S., Kell, D. B., and Tsujii, J. 2006. Text Mining and Its Potential Applications in Systems Biology. Trends in Biotechnology, 24(12), 571-579.

3.

Ananiadou, S. et al. 2010. Event Extraction for Systems Biology by Text Mining the Literature.Trends in Biotechnology, 28(7), 381-390.

4.

Andrade, M. A., and Valencia, A. 1998. Automatic Extraction of Keywords from Scientific Text: Application to the Knowledge Domain of Protein Families. Bioinformatics, 14(7):600-607.

5.

Blaschke, C., Hirschman, L., and Valencia, A. 2002. Information Extraction in Molecular Biology. Briefings in Bioinformatics, 3(2), 154-165.

6.

Bunescu, R. et al. 2005. Comparative Experiments on Learning Information Extractors for Proteins and Their Interactions. Artificial Intelligence in Medicine, 33(2), 139-155.

7.

Chang, C. C., and Lin, C. J. 2011. LIBSVM: A Library for Support Vector Machines. ACM Transactions on Intelligent Systems and Technology, 2(3), 1-39.

8.

Choi, S. P. et al. 2014. An Intensive Case Study on Kernel-based Relation Extraction.Multimedia Tools and Applications, 71(2), 741-767.

9.

Choi, S. P., and Myaeng, S. H. 2010. Simplicity Is Better: Revisiting Single Kernel PPI Extraction. In Proceedings of the 23rd International Conference on Computational Linguistics, August 23rd-27th, 2010, Beijing: Beijing International Convention Center: 206-214.

10.

Craven, M., and Kumlien, J. 1999. Constructing Biological Knowledge Bases by Extracting Information from Text Sources. In Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, August 6th-10th, 1999, Heidelbrg: Kongresshaus Stadthalle: 77-86.

11.

Fan, R. E. et al. 2008. LIBLINEAR: A Library for Large Linear Classification. Journal of Machine Learning Research, 9, 1871-1874.

12.

Fundel, K., Küffner, R., and Zimmer, R. 2007. RelEx — Relation Extraction Using Dependency Parse Trees. Bioinformatics, 23(3), 365-371.

13.

Li, C., Liakata, M., and Rebholz-Schuhmann, D. 2014. Biological Network Extraction from Scientific Literature: State of the Art and Challenges. Briefings in Bioinformatics, 15(5), 856-877.

14.

Manning, C. D. et al. 2014. The Stanford CoreNLP Natural Language Processing Toolkit.In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations, June 22nd-27th, 2014, Baltimore, MD: 55-60.

15.

Miller, G. A. 1995. WordNet: A Lexical Database for English. Communications of the ACM, 38(11), 39-41.

16.

Miwa, M. et al. 2009. Protein-Protein Interaction Extraction by Leveraging Multiple Kernels and Parsers. International Journal of Medical Informatics, 78(12): e39-e46.

17.

Ono, T. et al. 2001. Automated Extraction of Information on Protein-Protein Interactions from the Biological Literature. Bioinformatics, 17(2), 155-161.

18.

Papanikolaou, N. et al. 2014. Protein-Protein Interaction Predictions Using Text Mining Methods. Methods, 74, 47-53.

19.

Wikipedia. 2016. San Francisco, CA: Wikimedia Foundation.. s.v. Protein-Protein Interaction.[online]<https://en.wikipedia.org/w/index.php?title=Protein%E2%80%93protein_interaction&oldi d=713402377>

20.

Pyysalo, S. et al. 2008. Comparative Analysis of Five Protein-Protein Interaction Corpora.BMC Bioinformatics, 9(3), 1-11.

21.

Pyysalo, S. et al. 2007. BioInfer: A Corpus for Information Extraction in the Biomedical Domain. BMC Bioinformatics, 8(1), 50-73.

22.

Rosario, B., and Hearst, M. A. 2004. Classifying Semantic Relations in Bioscience Texts.In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, July 21st-26th, 2004, Barcelona: Forum Convention Centre: 430-437.

23.

Sekimizu, T., Park, H. S., and Tsujii, J. 1998. Identifying the Interaction between Genes and Gene Products Based on Frequently Seen Verbs in Medline Abstracts. Genome Informatics, 9, 62-71.

24.

Temkin, J. M., and Gilder, M. R. 2003. Extraction of Protein Interaction Information from Unstructured Text using a Context-Free Grammar. Bioinformatics, 19(16), 2046-2053.

25.

Zhou, D., and He, Y. 2008. Extracting Interactions Between Proteins from the Literature.Journal of Biomedical Informatics, 41(2), 393-407.

Journal of the Korean Society for Library and Information Science