A Study on Improving Duplicate Verification Algorithm for Public Library  MARC Data: Focusing on the Case of M Library in Busan

Song Min-geon; 송민건; Lee Soo-Sang; 이수상

doi:10.16981/PublicLibrary,CatalogData,MARC,DuplicateVerification,IntegratedLibrary

P-ISSN2466-2542
KCI

홈으로

OA 정책

ISSN : 2466-2542

논문 상세

이전 다음

논문 투고

Vol.56 No.1

Citation Share

공공도서관 MARC 데이터 중복검증 알고리즘 개선 방안 연구: 부산 지역 M도서관 사례를 중심으로

A Study on Improving Duplicate Verification Algorithm for Public Library MARC Data: Focusing on the Case of M Library in Busan

한국도서관·정보학회지 / Journal of Korean Library and Information Science Society, (P)2466-2542;

2025, v.56 no.1, pp.289-305

https://doi.org/10.16981/PublicLibrary,CatalogData,MARC,DuplicateVerification,IntegratedLibrary

송민건(Min-geon Song) (부산대학교)
이수상(Soo-Sang Lee) (부산대학교)

송민건, & 이수상. (2025). 공공도서관 MARC 데이터 중복검증 알고리즘 개선 방안 연구: 부산 지역 M도서관 사례를 중심으로. , 56(1), 289-305, https://doi.org/10.16981/PublicLibrary,CatalogData,MARC,DuplicateVerification,IntegratedLibrary

복사

초록

본 논문은 본 연구자가 기존에 수행한 중복검증 알고리즘의 적용 연구의 한계점을 보완하고자 수행한 후속 연구에 대한 논문이다. 부산 지역의 M도서관으로부터 직접 MARC 데이터를 제공받아 KERIS의 중복검증 알고리즘을 Python으로 구현하여 적용하였다. 도서기호가 일치하는 레코드 쌍을 추출하고 이를 별치기호와 권․연차기호를 기준으로 동일 집단과 불일치 집단으로 나누어 알고리즘 적용 결과를 비교하였다. 동일 집단은 98.10%가, 불일치 집단은 0.43%만이 동일 자료로 판정되었다. 알고리즘 적용 결과 불일치로 판정된 중복레코드 쌍을 분석하여 알고리즘의 개선 방안을 다음과 같이 3가지로 제안하였다. 첫째, 세트(SET) ISBN을 제거하고 판정. 둘째, 발행처 항목 판정에서 전방 또는 후방일치는 일치로 간주. 셋째, 저자 항목 판정에서 전방 또는 후방일치는 일치로 간주. 알고리즘 개선 결과 동일 집단에서는 동일 판정이 98.29%로 상승하였고, 불일치 집단에서는 동일 판정의 변화 없이 불일치 판정이 93.40%에서 93.63%로 상승하였다. 이에 따라 개선 방안이 다른 자료를 중복 자료로 판정하는 오류를 억제하면서 알고리즘 성능을 높일 수 있음을 확인하였다.

keywords: 공공도서관, 목록데이터, MARC, 중복검증, 통합도서관

Abstract

This paper is a follow-up study to compensate for the limitations of the previous research on the application of the duplicate verification algorithm. MARC data was provided directly from M Library in Busan, and the duplicate verification algorithm of KERIS was implemented and applied in Python. We extracted pairs of records with matching book numbers and divided them into ‘same group’ and ‘mismatch group’ based on matching location symbols and volumes, and compared the results of the algorithm. As a result of applying the algorithm, 98.10% of the ‘same group’ and only 0.43% of the ‘mismatch group’ were determined to be the same material. By analyzing the duplicate record pairs that were determined to be mismatched as a result of the algorithm, we proposed three ways to improve the algorithm as follows. First, remove ISBNs that contain the phrase SET. Second, consider forward or backward matches as matches in the publisher category. Third, forward or backward matches for author entries were considered matches. As a result of the algorithmic improvements, the identical judgment increased to 98.29% in the same group, and the mismatch judgment increased from 93.40% to 93.63% with no change in the identical judgment in the mismatch group. This shows that the improvements can increase algorithm performance while suppressing the error of labeling different materials as duplicates.

keywords: Public Library, Catalog Data, MARC, Duplicate Verification, Integrated Library

투고일Submission Date: 2025-02-24

수정일Revised Date

게재확정일Accepted Date: 2025-03-11

바로가기메뉴

논문 상세

Vol.56 No.1

공공도서관 MARC 데이터 중복검증 알고리즘 개선 방안 연구: 부산 지역 M도서관 사례를 중심으로

A Study on Improving Duplicate Verification Algorithm for Public Library MARC Data: Focusing on the Case of M Library in Busan

초록

Abstract

한국도서관·정보학회지