6개 논문이 있습니다.
This essay is a prompt response to Matthew Kirschenbaum and Rita Raley’s essay “AI and the University as a Service” published in the most recent issue of PMLA. I diagnose the anticipated problem of higher education in the era of AI as the “global digitalization of the university and language.” By the digitalization of the university, I mean the online circulation of recorded courses. By the digitalization of language, my approach broadens to refer not only to the proliferation of online resources but also the dissemination of language into machine-readable units and their redistribution. My positionality in interpreting these kinds of digitalization is global, involving multilingual, multicultural contexts. The possibilities and challenges concern the use of LLMs for language education and writing pedagogy in higher education in two aspects: 1) whether it is justifiable to train students using the language of LLM outputs based on statistical probability rather than communicative intent; 2) how to decolonize the English monopoly on language education and data structures. My suggestions are open-ended, reminding us humanists of the Korean resources that are widely accessible and the co-evolutionary approach of writing with AI. All we need in future higher education may be APT (AI Personal Training).
This essay is a prompt response to Matthew Kirschenbaum and Rita Raley’s essay “AI and the University as a Service” published in the most recent issue of PMLA. I diagnose the anticipated problem of higher education in the era of AI as the “global digitalization of the university and language.” By the digitalization of the university, I mean the online circulation of recorded courses. By the digitalization of language, my approach broadens to refer not only to the proliferation of online resources but also the dissemination of language into machine-readable units and their redistribution. My positionality in interpreting these kinds of digitalization is global, involving multilingual, multicultural contexts. The possibilities and challenges concern the use of LLMs for language education and writing pedagogy in higher education in two aspects: 1) whether it is justifiable to train students using the language of LLM outputs based on statistical probability rather than communicative intent; 2) how to decolonize the English monopoly on language education and data structures. My suggestions are open-ended, reminding us humanists of the Korean resources that are widely accessible and the co-evolutionary approach of writing with AI. All we need in future higher education may be APT (AI Personal Training).
Most network analyses of narrative texts have focused on character interactions, often limiting their scope to social relationships as envisioned by social network analysis. This paper, however, presents a network analysis of the narrator and the main character in F. Scott Fitzgerald's The Great Gatsby, expanding on my previous research that examined characters as networks of words within a dramatic narrative. I conceptualize the narrator and characters as lexical networks derived from the novel's dialogues and narration. A "symptomatic reading" of a character's speech network uncovers hidden aspects of that character, such as Gatsby's obsessive desire for Daisy and fixation on the lost past. Furthermore, analyzing a character's ego network within the narrator's narration reveals how the narrative voice understands and portrays that character. Specifically, Gatsby's ego network exposes the narrator's preoccupation with physical appearances, his subtle male gaze, his speculation about Gatsby's mysterious past, and his narrative strategy to mythologize Gatsby through temporal and spatial movements. Finally, the bipartite network between the narrator and the character, mediated through shared words, illustrates points of convergence and divergence, emphasizing the stark contrast between Gatsby as a character and Nick as the narrator. This study demonstrates how computational literary criticism can contribute to digital humanities by providing a refined examination of literary texts while creatively employing digital methodologies.
Most network analyses of narrative texts have focused on character interactions, often limiting their scope to social relationships as envisioned by social network analysis. This paper, however, presents a network analysis of the narrator and the main character in F. Scott Fitzgerald's The Great Gatsby, expanding on my previous research that examined characters as networks of words within a dramatic narrative. I conceptualize the narrator and characters as lexical networks derived from the novel's dialogues and narration. A "symptomatic reading" of a character's speech network uncovers hidden aspects of that character, such as Gatsby's obsessive desire for Daisy and fixation on the lost past. Furthermore, analyzing a character's ego network within the narrator's narration reveals how the narrative voice understands and portrays that character. Specifically, Gatsby's ego network exposes the narrator's preoccupation with physical appearances, his subtle male gaze, his speculation about Gatsby's mysterious past, and his narrative strategy to mythologize Gatsby through temporal and spatial movements. Finally, the bipartite network between the narrator and the character, mediated through shared words, illustrates points of convergence and divergence, emphasizing the stark contrast between Gatsby as a character and Nick as the narrator. This study demonstrates how computational literary criticism can contribute to digital humanities by providing a refined examination of literary texts while creatively employing digital methodologies.
이 글은 대한민국 제1공화국 시기에 발행된 경향신문, 동아일보, 조선일보의 주권 관련 기사 859건을 토대로 텍스트마이닝 방법론을 활용하여 당대 주권 개념의 담론과 의미, 그리고 그 변동을 분석한 연구이다. 머신러닝을 접목한 자연어처리 기법인 토픽 모델링을 통해 거시적 차원에서 당대 ‘주권’의 담론을 파악하였으며, 단어 연결망을 통해 미시적 차원에서 ‘주권’의 의미를 확인하였다. 먼저 ‘주권’과 관련하여 제1공화국 시기는 1948~1955년(전기), 1956~1960년(후기)으로 구분될 수 있었다. 담론을 분석한 결과, 대내적 담론 4개(개헌, 정치, 사회상, 선거)와 대외적 담론 4개(해양주권, 세계-냉전, 세계-독립, 분단)가 도출되었다. 그 초점이 전기에는 대외적 담론에 있었으나 후기에 들어 대내적 담론으로 이동하였으며, 특히 후기의 변화는 선거의 담론에 집중되었다. 의미를 분석한 결과, 먼저 전기와 후기 모두 국민주권으로의 인식과 주권의 결함에 대한 인식이 드러났다. 전기의 경우 ‘주권’은 이양과 회복의 대상이면서도, 특히 영역의 측면에서 침해와 제약의 대상이었다. 다만 후기의 경우 ‘주권’은 옹호와 투쟁의 대상이었으며, 그 행사 수단으로 선거가 제시되어 주권 개념에 능동성이 부여되었다.
This paper analyzes the discourse and meaning of ‘Sovereignty’ in the First Republic of Korea by employing text mining methodology, based on 859 sovereignty-related articles from Kyunghyang Shinmun, Dong-A Ilbo, and Chosun Ilbo published during the era. Topic modeling - the Natural Language Processing(NLP) technology that integrates Machine Learning - was used to uncover the discourse related to the concept of sovereignty at a macro level, and co-word network analysis was employed to identify the meaning at a micro level. First, the period of the First Republic was divided into two phases: the former period(1948-1955) and the latter period(1956-1960). The discourse analysis revealed four domestic discourses(Constitutional Amendment, Politics, Social Situation, and Election) and four international discourses(Maritime Sovereignty, World-Cold War, World-Independence, and the Division of Korea). The focus of the discourse shifted from international(the former) to domestic(the latter), with a particular emphasis on the discourse of election. The meaning analysis, at first indicated that both periods reflect an awareness of popular sovereignty and a recognition of deficiencies in its exercise. During the former period, 'Sovereignty' was perceived to be transferred and restored, but to be infringed and restricted especially in terms of territory. In the latter, ‘Sovereignty’ became a matter of defense and struggle with election being presented as the means of its exercise, thereby imparting proactivity to the concept.
본 논문은 이상(李箱) 단편소설 기초 데이터셋과 이상 단편소설 감각 데이터셋의 설계·구축 과정을 소상히 소개하는 데에 그 목적이 있다. 이상 단편소설 13편을 대상으로 구축한 ‘이상 단편소설 기초 문학 데이터셋’과 연구자 주도로 텍스트 내 감각 정보를 레이블링한 ‘이상 단편소설 감각 데이터셋’을 중심으로, 데이터셋의 구조와 설계 의도 및 활용에 대해 서술하였다. 이상 단편소설 기초 데이터셋은 민음사와 소명출판 판본을 바탕으로 메타 데이터를 문장 단위로 레이블링한 기계 가독형 데이터로 구축되었다. 이상 단편소설 감각 데이터셋은 연구자가 설계한 감각 분류 모델에 기초하여 이상 단편소설에 나타난 감각 정보를 크게 신체 감각과 심리 감각으로 대별하고, 감각을 도합 4계층으로 세분화하여 문장 단위로 레이블링하였다. 구축한 데이터셋은 이상 단편소설 내 감각 양상에 대한 기계적 분석, 감정 분석 등 여타 분석 방법론을 수행하기 위한 실질적 기반이 되며, 나아가 멀리서 읽기의 가능성을 제공한다.
This paper aims to thoroughly introduce the design and construction processes of two datasets related to Yi Sang's short stories: the Yi Sang Short Story Basic Dataset and the Yi Sang Short Story Sense Dataset. Centered on 13 selected short stories, the Yi Sang Short Story Basic Dataset presents a machine-readable structure created through the annotation of meta-data at the sentence level, based on editions from Mineumsa and Somyeong Publishing. The Yi Sang Short Story Sense Dataset, constructed by the researcher, labels sensory information found within the texts, using a sensory classifi-cation model that categorizes perceptions broadly into physical and psychological senses. This model further subdivides sensory details into four hierarchical levels, enabling nuanced, sentence-level labeling. The constructed datasets serve as practical foundations for conducting computational analyses of sensory patterns in Yi Sang’s short stories, as well as for other analytical methodologies such as emotion analysis, and further provide the potential for distant reading.
본 연구는 근대 국한문혼용체 자료의 자동화된 분석을 위한 서브워드 기반 형태소 분석기를 설계하고 적용하는 방법을 제안한다. 현재 구축된 대규모 근대 문헌 데이터베이스는 한자어와 옛한글이 혼재된 특성으로 인해 현존하는 형태소 분석기로는 효과적인 처리가 어렵다. 이러한 문제를 해결하기 위해 본 연구에서는 kiwipiepy 라이브러리의 sw_tokenizer를 활용하여 서브워드 토큰화 기반의 새로운 접근법을 제시한다. 1890-1940년대의 신문 및 잡지 자료 약 230만 건(약 7억 7천만 음절)을 학습 데이터로 활용하여 세 가지 다른 vocab_size(32000, 48000, 64000)를 적용한 모델을 구현하고 그 성능을 비교하였다. 실험 결과, vocab_size가 커질수록 복합 한자어의 의미 단위가 더 잘 보존되는 것을 확인하였으며, 연구 목적에 따라 적절한 분석 단위를 선택할 수 있음을 보였다. 본 연구는 근대 국한문혼용체 자료의 자동화된 분석을 위한 실용적인 도구를 제시함으로써 디지털 인문학 연구의 새로운 방향을 제시했다는 의의가 있다.
This study proposes the design and implementation of a subword-based morphological analyzer for automated analysis of modern Sino-Korean mixed texts. Current large-scale modern literature databases are difficult to process effectively with existing morphological analyzers due to their characteristics of mixed Sino-Korean characters and archaic Korean. To address this issue, we present a new approach using the sw_tokenizer of the kiwipiepy library based on subword tokenization. We implemented three models with different vocab sizes (32000, 48000, 64000) using approximately 2.3 million newspaper and magazine articles (about 771.5 million syllables) from 1890-1940 as training data. The experimental results show that larger vocab sizes better preserve the semantic units of compound Sino-Korean characters, and researchers can select appropriate analysis units according to their research purposes. This study contributes to digital humanities research by providing a practical tool for automated analysis of modern Sino-Korean mixed texts and suggests new directions for future research in this field.