종합목록의 중복레코드 검증을 위한 알고리즘 연구

조순영

Journal of the Korean Society for Library and Information Science

P-ISSN1225-598X
E-ISSN2982-6292

Home

Article Contents

Prev Next

e-Submission

Vol.37 No.4

Citation Share

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292

2003, v.37 no.4, pp.69-88

(2003). . Journal of the Korean Society for Library and Information Science, 37(4), 69-88.

copy

Abstract

This study intends to develop a new duplicate detection algorithm to improve database quality. The new algorithm is developed to analyze by variables of language and bibliographic type, and it checks elements in bibliographic data, not just MARC fields. The algorithm computes the degree of similarity and the weight values to avoid possible elimination of records by simple input error. The study was performed on the 7,649 newly uploaded records during the last one year against the 210,000 sample master database. The findings show that the new algorithm has improved the duplicates recall rate by 36.2%.

keywords: 종합목록, 오류데이터, 중복데이터, 데이터 품질관리Union Catalog, Duplicate Detection Algorithm, MARC, 종합목록, 오류데이터, 중복데이터, 데이터 품질관리Union Catalog, Duplicate Detection Algorithm, MARC

바로가기메뉴

Article Contents

Vol.37 No.4

Abstract

Journal of the Korean Society for Library and Information Science