바로가기메뉴

본문 바로가기 주메뉴 바로가기

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal

Journal of The Korea Internet of Things Society / Journal of The Korea Internet of Things Society, (P)2799-4791;
2022, v.8 no.1, pp.37-43
https://doi.org/https://doi.org/10.20465/kiots.2022.8.1.037


Abstract

The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.

keywords
Valid Data, Machine Learning, Data Discrimination, Quality of Data, Public Big data, 유효 데이터, 기계학습, 데이터 선별, 데이터 품질, 공공빅데이터

Reference

1.

IDC. IDC Forecasts Improved Growth for global AI Market in 2021 [Internet], https://www.idc.com/getdoc.jsp?containerId=prUS47482321

2.

T.J.Kim, Data Dam’, What Kind of Businesses Are They Made Up [Internet], https://zdnet.co.kr/view/? no=20200902101741

3.

K.V.Cruz, "Moon Jae-In’s Strategy Amid Covid-19Pandemic: Reviving the Green in the Korean New Deal." in Collection of Essays on Korea’s Public Diplomacy , 2020

4.

D.Fang and L.Deng, "Legal Regulation of Government Data Opening: American Legislation and China's Path: Reflection Based on the US the Open, Public, Electronic, and Necessary (OPEN) Government Data Act," Information and Documentation Services Vol.42, No.5, pp.50-57, 2021

5.

D.J.Kim, "Spatial Big Data Plan for Government 3.0and Creative Economy", Korea Research Institute For Human Settlements , No.14, pp.40-47, 2014

6.

G.Viscusi, B.Spahiu, A.Maurino, and C.Batini, "Compliance with open government data policies: An empirical assessment of Italian local public administrations." Information polity Vol.19, No.3, pp.263-275, 2014.

7.

Gartner Reserach. Measuring the Business Value of Data Quality [Internet], https://www.gartner.com/en/documents/1819214/measuring-the-business-value-of -data-quality

8.

S.O.Yun and J.W.Hyun, “An Analysis of Open Data Policy in Korea: Focused on National Core Data in Open Data Portal,” Korean Public Management Review, Vol.33, No.1, pp.219-247, 2019

9.

W.S.Lim and S.J.Jung, Open Data, Small Amount. Useless Files [Internet], https://www.donga.com/news/article/all/20160517/78152584/1

10.

H.W.Lee, “Intrusion Artifact Acquisition Method based on IoT Botnet Malware,” Journal of KIOTS , Vol.7, No.3, pp.1-8, 2021

11.

S.H.Yoon, J.H.Na, and H.-J.Oh, “Data Opening Status Analysis and Quality Management Strategies in Land, Infrastructure and Transport Domain,”, Journal of Digital Culture Archives, Vol.3, No.2, pp.73-85, 2020

12.

J.H.Na, S.H.Yoon, and H.-J.Oh, "Black Ice Formation Prediction Model Based on Public Data in Land, Infrastructure and Transport Domain," KIPS Transactions on Software and Data Engineering, Vol.10, No.7, pp.257-262. 2021

13.

S.S.Yu, K.P.Choi, H.Myung, and H.-J.Oh, "Prediction Model of Pest According to Individual Farms Based on Heterogeneous Public Big data." Journal of KIIT . Vol.18, No.6, pp.1-9, 2020

14.

K.P.Choi, S.S.Yu, N.H.Yoo, and H.-J.Oh, “Pest Prediction and Prevention Model Visualization using Farm Map for Ecological Smart Farm,” Journal of KIIT . Vol.19, No.2, pp.105-113, 2021

15.

H.W.Lee and H.S.Lee, “Optimal Machine Learning Model for Detecting Normal and Malicious Android Apps,” Journal of KIOT S, Vol.6, No.2, pp.1-10, 2020

Journal of The Korea Internet of Things Society