바로가기메뉴

본문 바로가기 주메뉴 바로가기

logo

  • P-ISSN1013-0799
  • E-ISSN2586-2073
  • KCI

Application of Machine Learning Techniques for Resolving Korean Author Names

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073
2008, v.25 no.3, pp.27-39
https://doi.org/10.3743/KOSIM.2008.25.3.027

Abstract

In bibliographic data, the use of personal names to indicate authors makes it difficult to specify a particular author since there are numerous authors whose personal names are the same. Resolving same-name author instances into different individuals is called author resolution, which consists of two steps: calculating author similarities and then clustering same-name author instances into different person groups. Author similarities are computed from similarities of author-related bibliographic features such as coauthors, titles of papers, publication information, using supervised or unsupervised methods. Supervised approaches employ machine learning techniques to automatically learn the author similarity function from author-resolved training samples. So far, however, a few machine learning methods have been investigated for author resolution. This paper provides a comparative evaluation of a variety of recent high-performing machine learning techniques on author disambiguation, and compares several methods of processing author disambiguation features such as coauthors and titles of papers.

keywords
저자 식별, 동명저자, 기계학습, author disambiguation, same-name authors, machine learning techniques, author disambiguation, same-name authors, machine learning techniques

Reference

1.

강인수. (2008). 저자 식별을 위한 자질 비교. 한국콘텐츠학회 논문지, 8(2), 41-47.

2.

강인수. (2008). 저자 식별을 위한 전자메일의 추출 및 활용. 한국콘텐츠학회 논문지, 8(6), 261-268.

3.

이승우. (2006). 서지정보의 동명이인 구별을 위한 공저자 관계의 효용성 연구 (10-12). 한국컴퓨터종합학술대회 논문집.

4.

Alani, H.. (2003). Identifying commu- nities of practice through ontology network analysis. IEEE Intelligent Systems, 18(2), 18-25.

5.

Aswani, N.. (2006). Mining information for inst- ance unification (329-342). Proceedings of ISW C-2006.

6.

Bilenko, M.. (2003). Adaptive name matching in information integration. IEEE Intelligent Systems, 18(5), 16-23.

7.

Blei, D.. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.

8.

Guha, R.. (2004). Disambigua- ting people in search (-). Proceedings of WWW-2004.

9.

Huang, J.. (2006). Efficient name disambiguation for large scale databases (536-544). Proceedings of PKDD-2006.

10.

Kanani, P.. (2007). Efficient strategies for improving partitioningbased author coreference by incorporating Web pages as graph nodes (-). Proceedings of IIWeb-2007.

11.

McCallum, A.. (2000). Efficient clustering of high- dimensional data sets with application to reference matching (169-178). Proceedings of KDD-2007.

12.

Song, Y.. (2007). Efficient topic-based unsupervised name disambiguation (-). Proceedings of JCDL-2007.

13.

Tan, Y.F.. (2006). Search engine driven author disambi- guation (314-315). Proceedings of JCDL-2006.

14.

Yang, K.H.. (2006). Extracting citation relationships from Web documents for author disambiguation. Institute of Information Science, Academia Sinica.

15.

Wan, X.. (2005). Person resolution in person search results: WebHawk (163-170). Proceedings of CIKM-2005.

16.

Winkler, W.E.. (2006). Overview of record linkage and current research directions. Statistical Research Division, U.S. Census Bureau.

17.

Xia, X.. (2005). Methods of decreasing the number of support vectors via k-mean clustering. LNCS, 3644, 717-726.

Journal of the Korean Society for Information Management