바로가기메뉴

본문 바로가기 주메뉴 바로가기

Journal of the Korean Society for Library and Information Science / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292
2003, v.37 no.4, pp.69-88

Abstract

This study intends to develop a new duplicate detection algorithm to improve database quality. The new algorithm is developed to analyze by variables of language and bibliographic type, and it checks elements in bibliographic data, not just MARC fields. The algorithm computes the degree of similarity and the weight values to avoid possible elimination of records by simple input error. The study was performed on the 7,649 newly uploaded records during the last one year against the 210,000 sample master database. The findings show that the new algorithm has improved the duplicates recall rate by 36.2%.

keywords
종합목록, 오류데이터, 중복데이터, 데이터 품질관리Union Catalog, Duplicate Detection Algorithm, MARC, 종합목록, 오류데이터, 중복데이터, 데이터 품질관리Union Catalog, Duplicate Detection Algorithm, MARC

Journal of the Korean Society for Library and Information Science