A Rule-based Approach to Identifying Citation Text from Korean Academic Literature

학술 문헌 원문에서 발견되는 인용문은 인용에 기초한 학술문헌 자동 요약, 리뷰 논문 자동 생성, 인용문 감성 분석, 인용문 기반 문헌 검색 등 다양한 학술 정보 서비스의 창출을 가능케 한다. 이러한 서비스가 가능하기 위해서는 원문 텍스트로부터 인용문의 자동 인식이 선행되어야 한다. 그러나 인용문의 인식은 인용 표지가 부착되지 않은 암묵 인용문의 존재로 인해 그 처리가 용이하지 않다. 영어의 경우 최근 이에 대한 연구가 집중되고 있으나 한국어 학술 문헌 내 인용문의 자동 인식 연구는 찾기 힘들다. 이 논문은 한국어 인용문을 자동 인식하는 규칙 기반의 방법을 제시하고 다양한 베이스라인 기법들과 인용문 인식 성능을 비교하였다. 제안된 방법은 테스트 셋 내 전체 암묵 인용문의 30%를 약 70%의 정확률로 인식할 수 있었다.

keywords: citing sentences, citing sentence identification, implicit citing sentences, rules for identifying citing sentences, cue phrases for citing sentences, 인용문, 인용문 인식, 암묵 인용문, 인용문 인식 규칙, 인용문 단서 어구

Abstract

Identifying citing sentences from article full-text is a prerequisite for creating a variety of future academic information services such as citation-based automatic summarization, automatic generation of review articles, sentiment analysis of citing statements, information retrieval based on citation contexts, etc. However, finding citing sentences is not easy due to the existence of implicit citing sentences which do not have explicit citation markers. While several methods have been proposed to attack this problem for English, it is difficult to find such automatic methods for Korean academic literature. This article presents a rule-based approach to identifying Korean citing sentences. Experiments show that the proposed method could find 30% of implicit citing sentences in our test data in nearly 70% precision.

keywords: citing sentences, citing sentence identification, implicit citing sentences, rules for identifying citing sentences, cue phrases for citing sentences, 인용문, 인용문 인식, 암묵 인용문, 인용문 인식 규칙, 인용문 단서 어구

참고문헌

강인수. (2011). 표절 예방을 위한 본문 인용 태깅 지침서:한국과학기술정보연구원.

김세종. (2012). KLE 연구실의 언어처리 기반 기술 소개. 포항공과대학교 지식 및 언어 공학연구실.

Abu-Jbara, A.. (2011). Coherent citation-based summarization of scientific papers (500-509). Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL).

Abu-Jbara, A.. (2012). Reference scope identification in citing sentences (80-90). Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (HLT-NAACL).

Athar, A.. (2012). Detection of implicit citations for sentiment detection (18-26). Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (ACL).

Bradshaw, S.. (2003). Reference directed indexing: Redeeming relevance for subject search in citation indexes (499-510). Proceedings of the European Conference on Research and Advanced Technology for Digital Libraries (ECDL).

Councill, I.. (2008). Parscit : An open-source CRF reference string parsing package (661-667). Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC).

Kang, I.. (2012). Characteristics of citation scopes: A preliminary study to detect citing sentences (80-85). Proceedings of the 2011 International Conference on u- and e-Service, Science and Technology (UNESST).

Kaplan, D.. (2009). Automatic extraction of citation contexts for research paper summarization : A coreference-chain based approach (88-95). Proceedings of the 2009 Workshop on Text and Citation Analysis for Scholarly Digital Libraries.

10.

Nanba, H.. (2000). Classification of research papers using citation links and citation types : Towards automatic review article generation (117-134). Proceedings of the 11th SIG Classification Research Workshop.

11.

O’Connor, J.. (1982). Citing statements : Computer recognition and use to improve retrieval. Information Processing and Management, 18(3), 125-131.

12.

Qazvinian, V.. (2010). Identifying non-explicit citing sentences for citation-based summarization (555-564). Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics(ACL).

13.

Ritchie, A.. (2008). Comparing citation contexts for information retrieval (213-222). Proceedings of the 17th ACM Conference on Information and Knowledge Management(CIKM).

14.

Singhal, A.. (1996). Length normalization in degraded text collections (149-162). Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval(SDAIR).

15.

Teufel, S.. (2006). Automatic classification of citation function (103-110). Proceedings of 2006 Conference on Empirical Methods in Natural Language Processing(EMNLP).

바로가기메뉴

논문 상세

Vol.29 No.4

한국어 학술 문헌의 본문 인용문 인식을 위한 규칙 기반 방법

A Rule-based Approach to Identifying Citation Text from Korean Academic Literature

초록

Abstract

참고문헌

정보관리학회지