A Proposal of Evaluation of Large Language Models Built Based on Research Data

Han Na-eun; 한나은; Seo Sujeong; 서수정; Um Jung-ho; 엄정호

doi:10.3743/KOSIM.2023.40.3.077

P-ISSN1013-0799
E-ISSN2586-2073
KCI

Home

OA Policy

ISSN : 1013-0799

Article Contents

Prev Next

e-Submission

Vol.40 No.3

Citation Share

A Proposal of Evaluation of Large Language Models Built Based on Research Data

Journal of the Korean Society for Information Management / Journal of the Korean Society for Information Management, (P)1013-0799; (E)2586-2073

2023, v.40 no.3, pp.77-98

https://doi.org/10.3743/KOSIM.2023.40.3.077

Na-eun Han (KISTI)
Sujeong Seo (KISTI)
Jung-ho Um (KISTI)

Han, N., Seo, S., & Um, J. (2023). A Proposal of Evaluation of Large Language Models Built Based on Research Data. , 40(3), 77-98, https://doi.org/10.3743/KOSIM.2023.40.3.077

copy

Abstract

Large Language Models (LLMs) are becoming the major trend in the natural language processing field. These models were built based on research data, but information such as types, limitations, and risks of using research data are unknown. This research would present how to analyze and evaluate the LLMs that were built with research data: LLaMA or LLaMA base models such as Alpaca of Stanford, Vicuna of the large model systems organization, and ChatGPT from OpenAI from the perspective of research data. This quality evaluation focuses on the validity, functionality, and reliability of Data Quality Management (DQM). Furthermore, we adopted the Holistic Evaluation of Language Models (HELM) to understand its evaluation criteria and then discussed its limitations. This study presents quality evaluation criteria for LLMs using research data and future development directions.

keywords: Large Language Model (LLM), Quality Evaluation for LLM, Research Data Quality Management (DQM), evaluation criteria for LLM

Submission Date: 2023-08-16

Revised Date: 2023-09-04

Accepted Date: 2023-09-18

바로가기메뉴

Article Contents

Vol.40 No.3

A Proposal of Evaluation of Large Language Models Built Based on Research Data

Abstract

Journal of the Korean Society for Information Management