바로가기메뉴

본문 바로가기 주메뉴 바로가기

Research on Minimizing Access to RDF Triple Store for Efficiency in Constructing Massive Bibliographic Linked Data

Journal of Korean Library and Information Science Society / Journal of Korean Library and Information Science Society, (P)2466-2542;
2017, v.48 no.3, pp.233-257
https://doi.org/10.16981/kliss.48.3.201709.233


  • Downloaded
  • Viewed

Abstract

In this paper, we propose an effective method to convert and construct the MEDLINE, the world's largest biomedical bibliographic database, into linked data. To do this, we first derive the appropriate RDF schema by analyzing the MEDLINE record structure in detail, and convert each record into a valid RDF file in the derived schema. We apply the dual batch registration method to streamline the subject URI duplication checking procedure when merging all RDF files in the converted record unit and storing it in a single RDF triple storage. By applying this method, the number of RDF triple storage accesses for the subject URI duplication is reduced from 26,597,850 to 2,400, compared with the sequential configuration of linked data in units of RDF files. Therefore, it is expected that the result of this study will provide an important opportunity to eliminate the inefficiency in converting large volume bibliographic record sets into linked data, and to secure promptness and timeliness.

keywords
MEDLINE, 링크드 데이터, RDF 스키마, RDF 트리플 저장소, 이중 일괄 등록, MEDLINE, Linked data, RDF Schema, RDF triple store, Dual batch registration

Reference

1.

김천중, 김기연, 윤종현, 임종태, 복경수, 유재수. 2014. 대규모 RDF 데이터의 분산 저장을 위한 동적 분할 기법. 정보과학회논문지 , 41(12): 1126-1135.

2.

문현정, 성정환, 김영지, 우용태. 2007. 대용량 RDF 데이터의 처리 성능 개선을 위한 효율적인 저장구조 설계 및 구현. 한국전자거래학회지 , 12(3): 251-268.

3.

전명중, 홍진영, 박영택. 2016. SparQLing : SparkSQL 기반 대용량 트리플 데이터를 위한 SPARQL 질의 시스템 구축. 정보과학회논문지 , 43(4): 450-459.

4.

정준원, 정호영, 김종남, 임동혁, 김형주. 2005. RDF 기반의 온톨로지 처리시스템. 정보과학회논문지 : 컴퓨팅의 실제 및 레터 , 11(4): 381-392.

5.

한국정보화진흥원. 2014. 2014 「링크드 오픈 데이터 국내 구축 사례집」. 서울: 한국정보문화진흥원.

6.

Berners-Lee, Tim. 2006. Linked Data, <https://www.w3.org/DesignIssues/LinkedData.html> [citied 2017. 8. 7].

7.

NIH. 2017. Fact Sheet MEDLINE, PubMed, and PMC(PubMed Central): How are they different?, <https://www.nlm.nih.gov/pubs/factsheets/dif_med_pub.html> [cited 2017. 8. 7].

8.

Oliver E, Bhalotia G, Schwartz AS, Altman RB, Hearst MA. 2004. “Tools for loading MEDLINE into a local relational database.” BMC Bioinformatics, 5(1): 146.

9.

Zhiyong Lu. 2011. PubMed and beyond: a survey of web tools for searching biomedical literature. Database, 2011.

10.

Chen, B., Ding, Y., Wang, H., Wild, D. J., Dong, X., Sun, Y., & Sankaranarayanan, M. 2010. “Chem2bio2rdf: A Linked Open Data Portal for Systems Chemical Biology.” In Web Intelligence and Intelligent Agent Technology (WI-IAT), 1: 232-239.

11.

Kilicoglu, H., Fiszman, M., Rodriguez, A., Shin, D., Ripple, A., & Rindflesch, T. C. 2008. Semantic MEDLINE: a web application for managing the results of PubMed Searches, in: Proc. 3rd International Symposium in Semantic Mining in Biomedicine, European Bioinformatics Institute, Hinxton, 2008: 69-76.

12.

Lin, J., 2009. “Is searching full text more effective than searching abstracts?.” BMC bioinformatics, 10(1): 46.

13.

Castro, L.J.G., McLaughlin, C. and Garcia, A., 2013. “Biotea: RDFizing PubMed Central in support for the paper as an interface to the Web of Data.” Journal of biomedical semantics, 4(1): S5.

Journal of Korean Library and Information Science Society