A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel

최성필; 정창후; 전홍우; 조현양

doi:10.4275/KSLIS.2011.45.2.251

Journal of the Korean Society for Library and Information Science

P-ISSN1225-598X
E-ISSN2982-6292

Home

Article Contents

Prev Next

e-Submission

Vol.45 No.2

Citation Share

A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292

2011, v.45 no.2, pp.251-275

https://doi.org/10.4275/KSLIS.2011.45.2.251

& (2011). A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel. Journal of the Korean Society for Library and Information Science, 45(2), 251-275, https://doi.org/10.4275/KSLIS.2011.45.2.251

copy

Abstract

In this paper, we propose a novel kernel called a semantic parse tree kernel that extends the parse tree kernel previously studied to extract protein-protein interactions(PPIs) and shown prominent results. Among the drawbacks of the existing parse tree kernel is that it could degenerate the overall performance of PPI extraction because the kernel function may produce lower kernel values of two sentences than the actual analogy between them due to the simple comparison mechanisms handling only the superficial aspects of the constituting words. The new kernel can compute the lexical semantic similarity as well as the syntactic analogy between two parse trees of target sentences. In order to calculate the lexical semantic similarity, it incorporates context-based word sense disambiguation producing synsets in WordNet as its outputs, which, in turn, can be transformed into more general ones. In experiments, we introduced two new parameters: tree kernel decay factors, and degrees of abstracting lexical concepts which can accelerate the optimization of PPI extraction performance in addition to the conventional SVM's regularization factor. Through these multi-strategic experiments, we confirmed the pivotal role of the newly applied parameters. Additionally, the experimental results showed that semantic parse tree kernel is superior to the conventional kernels especially in the PPI classification tasks.

keywords: Relation Extraction, Kernel-based Approaches, Parse Tree Kernels, Semantic Parse Tree Kernels, Word Sense Disambiguation, Relation Extraction, Kernel-based Approaches, Parse Tree Kernels, Semantic Parse Tree Kernels, Word Sense Disambiguation, 관계 추출, 커널 기반 방법, 구문 트리 커널, 시멘틱 구문 트리 커널, 어휘 중의성 해소

Reference

Airola, A., Pyysalo, S., Bjorne, J., Pahikkala, T., Ginter, F., & Salakoski, T. 2008. “All-pathsgraph kernel for protein-protein interaction extraction with evaluation of cross-corpus learning."BMC Bioinformatics, 9(S2).

Andrade, Miguel A. & Valencia, A. 1998. “Automatic extraction of keywords from scientifictext: Application to the knowledge domain of protein families." Bioinformatics, 14(7): 600-607.

Banerjee, S., & Pedersen, T. 2002. “An adapted Lesk algorithm for word sense disambiguationusing WordNet." Proceedings of the 3rd International Conference on Intelligent Text Processingand Computational Linguistics(CICLing-2002), 136-45.

Blaschke, C., Andrade, M., Ouzounis, C., & Valencia, A. 1999. “Automatic extraction ofbiological information from scientific text: Protein-protein interactions." Proceedings of theInternational Conference on Intelligent Systems for Molecular Biology, 7: 60-67.

Bunescu, R., Ge, R., Kate, R., Marcotte, E., Mooney, R., Ramani, A., & Wong, Y. 2005.“Comparative experiments on learning information extractors for proteins and their interactions."Artiicial Inteligence in Medicine, Summarization and Information Extraction from MedicalDocuments, 33: 139-155.

Collins, M., & Duffy, N. 2001. “Convolution kernels for natural language." NIPS-2001.

Craven, M., & Kumlien, J. 1999. “Constructing biological knowledge bases by extractinginformation from text sources." Proceedings of the 7th International Conference on IntelligentSystems for Molecular Biology, 77-86.

Ding, J., Berleant, D., Nettleton, D., & Wurtele, E. 2002. “Mining MEDLINE: abstracts,sentences, or phrases?" Proceedings of PSB'02, 326-337.

Fundel, K., Küffner, R., & Zimmer, R. 2007. “RelEx - Relation extraction using dependencyparse trees." Bioinformatics, 23: 365-371.

10.

Gondy, L., Hsinchun, C., & Martinez, Jesse D. 2003. “A shallow parser based on closed-classwords to capture relations in biomedical text." Journal of Biomedical Informatics, 36(3):145-158.

11.

Lesk, M. 1986. “Automatic sense disambiguation using machine readable dictionaries: Howto tell a pine cone from an ice cream cone." Proceedings of the 5th annual internationalconference on Systems documentation, 24-26.

12.

Marcotte, Edward M., Xenarios, I., & Eisenberg, D. 2001. “Mining literature for protein-proteininteractions." Bioinformatics, 17(4): 359-363.

13.

Miwa, M., Sætre, R., Miyao, Y., & Tsujii, J. 2009. “Protein-protein interaction extractionby leveraging multiple kernels and parsers." International Journal of Medical Informatics.

14.

Moschitti, A. 2006. “Making tree kernels practical for natural language learning." Proceedingsof EACL.

15.

Nedellec, C. 2005. “Learning language in logic - genic interaction extraction challenge." Proceedingsof LLL'05, 31-37.

16.

Nikolai, D., Anton, Y., Sergei, E., Svetalana, N., Alexander, N., & llya, M. 2004. “Extractinghuman protein interactions from MEDLINE using a full-sentence parser." Bioinformatics,20(5): 604-611.

17.

Ono, T., Hishigaki, H., Tanigam, A., & Takagi, T. 2001. “Automated extraction of informationon protein-protein interactions from the biological literature." Bioinformatics, 17(2): 155-161.

18.

Pyysalo, S., Airola, A., Heimonen, J., Björne, J., Ginter, F., & Salakoski, T. 2008. “Comparativeanalysis of five protein-protein interaction corpora." BMC Bioinformatics, 9(S6).

19.

Pyysalo, S., Ginter, F., Heimonen, J., Bjorne, J., Boberg, J., Jarvinen, J., & Salakoski, T. 2007.“BioInfer: A corpus for information extraction in the biomedical domain." BMC Bioinformatics,8(50).

20.

Sekimizu, T., Park, H. S., & Tsujii, J. 1998. “Identifying the interaction between genes andgene products based on frequently seen verbs in MEDLINE abstracts." Workshop on genomeinformatics, 9: 62-71.

21.

Temkin, Joshua M., & Gilder, Mark R. 2003. “Extraction of protein interaction informationfrom unstructured text using a context-free grammar." Bioinformatics, 19(16): 2046-2053.

22.

Vishwanathan, S. V. N., & Smola, A. J. 2003. “Fast kernels for string and tree matching."Advances in Neural Information Processing Systems, MIT Press, 15: 569-576.

23.

Wikipedia. [online]. [cited 2010.11.1].<http://en.wikipedia.org/wiki/Protein-protein_interaction>.

24.

Zhang, M., Zhang, J., Su, J., & Zhou, G. 2006. “A composite kernel to extract relationsbetween entities with both flat and structured features." 21st International Conference onComputational Linguistics and 44th Annual Meeting of the ACL, 825-832.

25.

Zhou, D., & He, Y. 2008. “Extracting interactions between proteins from the literature."Journal of Biomedical Informatics, 41: 393-407.

바로가기메뉴

Article Contents

Vol.45 No.2

A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel

Abstract

Reference

Journal of the Korean Society for Library and Information Science