바로가기메뉴

본문 바로가기 주메뉴 바로가기

ACOMS+ 및 학술지 리포지터리 설명회

  • 한국과학기술정보연구원(KISTI) 서울분원 대회의실(별관 3층)
  • 2024년 07월 03일(수) 13:30
 

기계학습에 유효한 데이터 요건 및 선별: 공공데이터포털 제공 데이터 사례를 통해

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal

사물인터넷융복합논문지 / Journal of The Korea Internet of Things Society, (P)2799-4791;
2022, v.8 no.1, pp.37-43
https://doi.org/https://doi.org/10.20465/kiots.2022.8.1.037
오효정 (전북대학교)
윤보현 (목원대학교)

초록

인공지능 기술의 가장 큰 근간은 학습 가능한 데이터이다. 최근 정부나 사기업에서 수집·생산하는 데이터의종류와 양이 기하급수적으로 증가하고 있지만, 실제 기계학습에 활용 가능한 데이터의 확보로는 아직까지 이어지지 않고 있다. 이에 본 연구에서는 기계학습에 실제 활용 가능한 데이터가 갖추어야 할 조건에 대해 논의하고, 실제 사례연구를 통해 데이터 품질을 저하시키는 요인을 파악한다. 이를 위해 공공빅데이터를 활용해 예측 모델을 개발한 대표사례를선정, 공공데이터포털로부터 실제 문제 해결을 위한 데이터를 수집 후 데이터 품질을 확인하였다. 이를 통해 유효한데이터 선별 기준을 적용하고 후처리한 결과와의 차이를 보인다. 본 연구의 궁극적인 목적은 인공지능의 핵심인 기계학습 기술 개발에 앞서 가장 근본적으로 선결되어야 할 데이터 품질을 관리하고 유효한 데이터를 축적하기 위한 기반마련에 있다.

keywords
Valid Data, Machine Learning, Data Discrimination, Quality of Data, Public Big data, 유효 데이터, 기계학습, 데이터 선별, 데이터 품질, 공공빅데이터

Abstract

The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.

keywords
Valid Data, Machine Learning, Data Discrimination, Quality of Data, Public Big data, 유효 데이터, 기계학습, 데이터 선별, 데이터 품질, 공공빅데이터

참고문헌

1.

IDC. IDC Forecasts Improved Growth for global AI Market in 2021 [Internet], https://www.idc.com/getdoc.jsp?containerId=prUS47482321

2.

T.J.Kim, Data Dam’, What Kind of Businesses Are They Made Up [Internet], https://zdnet.co.kr/view/? no=20200902101741

3.

K.V.Cruz, "Moon Jae-In’s Strategy Amid Covid-19Pandemic: Reviving the Green in the Korean New Deal." in Collection of Essays on Korea’s Public Diplomacy , 2020

4.

D.Fang and L.Deng, "Legal Regulation of Government Data Opening: American Legislation and China's Path: Reflection Based on the US the Open, Public, Electronic, and Necessary (OPEN) Government Data Act," Information and Documentation Services Vol.42, No.5, pp.50-57, 2021

5.

D.J.Kim, "Spatial Big Data Plan for Government 3.0and Creative Economy", Korea Research Institute For Human Settlements , No.14, pp.40-47, 2014

6.

G.Viscusi, B.Spahiu, A.Maurino, and C.Batini, "Compliance with open government data policies: An empirical assessment of Italian local public administrations." Information polity Vol.19, No.3, pp.263-275, 2014.

7.

Gartner Reserach. Measuring the Business Value of Data Quality [Internet], https://www.gartner.com/en/documents/1819214/measuring-the-business-value-of -data-quality

8.

S.O.Yun and J.W.Hyun, “An Analysis of Open Data Policy in Korea: Focused on National Core Data in Open Data Portal,” Korean Public Management Review, Vol.33, No.1, pp.219-247, 2019

9.

W.S.Lim and S.J.Jung, Open Data, Small Amount. Useless Files [Internet], https://www.donga.com/news/article/all/20160517/78152584/1

10.

H.W.Lee, “Intrusion Artifact Acquisition Method based on IoT Botnet Malware,” Journal of KIOTS , Vol.7, No.3, pp.1-8, 2021

11.

S.H.Yoon, J.H.Na, and H.-J.Oh, “Data Opening Status Analysis and Quality Management Strategies in Land, Infrastructure and Transport Domain,”, Journal of Digital Culture Archives, Vol.3, No.2, pp.73-85, 2020

12.

J.H.Na, S.H.Yoon, and H.-J.Oh, "Black Ice Formation Prediction Model Based on Public Data in Land, Infrastructure and Transport Domain," KIPS Transactions on Software and Data Engineering, Vol.10, No.7, pp.257-262. 2021

13.

S.S.Yu, K.P.Choi, H.Myung, and H.-J.Oh, "Prediction Model of Pest According to Individual Farms Based on Heterogeneous Public Big data." Journal of KIIT . Vol.18, No.6, pp.1-9, 2020

14.

K.P.Choi, S.S.Yu, N.H.Yoo, and H.-J.Oh, “Pest Prediction and Prevention Model Visualization using Farm Map for Ecological Smart Farm,” Journal of KIIT . Vol.19, No.2, pp.105-113, 2021

15.

H.W.Lee and H.S.Lee, “Optimal Machine Learning Model for Detecting Normal and Malicious Android Apps,” Journal of KIOT S, Vol.6, No.2, pp.1-10, 2020

사물인터넷융복합논문지