Analyzing the Phenomena of Hate in Korea by Text Mining Techniques

김혜진

doi:10.4275/KSLIS.2022.56.4.431

ACOMS+ 및 학술지 리포지터리 설명회

한국과학기술정보연구원(KISTI) 서울분원 대회의실(별관 3층)
2024년 07월 03일(수) 13:30

사전등록 바로가기

오늘 하루 그만보기

P-ISSN1225-598X
E-ISSN2982-6292

홈으로

ISSN : 1225-598X

논문 상세

이전 다음

논문 투고

Vol.56 No.4

Citation Share

텍스트마이닝 기법을 이용한 한국 사회의 혐오 양상 분석

Analyzing the Phenomena of Hate in Korea by Text Mining Techniques

한국문헌정보학회지 / Journal of the Korean Society for Library and Information Science, (P)1225-598X; (E)2982-6292

2022, v.56 no.4, pp.431-453

https://doi.org/10.4275/KSLIS.2022.56.4.431

김혜진 (공주대학교)

김혜진. (2022). 텍스트마이닝 기법을 이용한 한국 사회의 혐오 양상 분석. 한국문헌정보학회지, 56(4), 431-453, https://doi.org/10.4275/KSLIS.2022.56.4.431

복사

초록

혐오는 타인에 대한 배타성이 집단적으로 표출된 것으로, 잘못된 대중적 인식을 통하여 양산되고 재생산된다. 이 연구는 우리 사회에서 언급되고 있는 ‘혐오’ 양상을 거시적으로 탐색하고자 1990년부터 2020년까지 발행된 뉴스데이터 17,867건을 대상으로 텍스트마이닝 기법을 활용하여 키워드 네트워크와 군집 분석을 수행하였다. 그리고 단어를 추출하기 전에 먼저 기사를 문장으로 분리하는 전처리 과정을 거쳐 ‘혐오’, ‘편견’, ‘차별’이라는 단어를 포함하고 있는 문장 총 52,520개를 추출하여 분석에 활용함으로써 ‘혐오’라는 단어와 인접한 단어들로 구성된 키워드 네트워크를 구축하였다. 수집한 뉴스데이터의 단어 동시출현빈도 분석 결과, 우리 사회에서 혐오와 관련되어 가장 빈번하게 등장하는 대상은 여성, 인종, 성소수자 등이며, 관련된 이슈는 이들 집단과 관련된 법과 범죄 등이었다. 키워드 네트워크 군집 분석 결과, 성별(41.4%), 소수자(28.7%), 인종․민족(15.1%), 선택적․이해관계적(8.5%), 정치․이념(5.7%), 환경․생존적(0.3%) 혐오 등 총 6개의 혐오 군집들이 발견되었다. 논의에서는 군집 분석 결과 구체적으로 드러나지 않은 혐오의 표적(대상)을 모두 추출하여 분석하였다.

keywords: 동시출현 단어 분석, 노인 혐오, 성소수자 혐오, 여성 혐오, 인종 혐오, 텍스트마이닝, Co-word analysis, Gerontophobia, LGBTQ hate, Misogyny, Xenophobia, Text-mining

Abstract

Hate is a collective expression of exclusivity toward others and it is fostered and reproduced through false public perception. This study aims to explore the objects and issues of hate discussed in our society using text mining techniques. To this end, we collected 17,867 news data published from 1990 to 2020 and constructed a co-word network and cluster analysis. In order to derive an explicit co-word network highly related to hate, we carried out sentence split and extracted a total of 52,520 sentences containing the words ‘hate’, ‘prejudice’ and ‘discrimination’ in the preprocessing phase. As a result of analyzing the frequency of words in the collected news data, the subjects that appeared most frequently in relation to hate in our society were women, race, and sexual minorities, and the related issues were related laws and crimes. As a result of cluster analysis based on the co-word network, we found a total of six hate-related clusters. The largest cluster was ‘genderphobic’, accounting for 41.4% of the total, followed by ‘sexual minority hatred’ at 28.7%, ‘racial hatred’ at 15.1%, ‘selective hatred’ at 8.5%, ‘political hatred’ accounted for 5.7% and ‘environmental hatred’ accounted for 0.3%. In the discussion, we comprehensively extracted all specific hate target names from the collected news data, which were not specifically revealed as a result of the cluster analysis.

keywords: 동시출현 단어 분석, 노인 혐오, 성소수자 혐오, 여성 혐오, 인종 혐오, 텍스트마이닝, Co-word analysis, Gerontophobia, LGBTQ hate, Misogyny, Xenophobia, Text-mining

바로가기메뉴

논문 상세

Vol.56 No.4

텍스트마이닝 기법을 이용한 한국 사회의 혐오 양상 분석

Analyzing the Phenomena of Hate in Korea by Text Mining Techniques

초록

Abstract

한국문헌정보학회지