A Study on Frequency of Subject on Content of Thesis in Field of Science and Technology

일반적으로 문헌을 검색하고 접근하기 위하여 주제색인과 같은 주제어를 활용하곤 한다. 그렇다면 문헌의 내용과 문헌의 주제어는 분명히 어떤 밀접한 상관관계가 있을 것으로 예측해볼 수 있다. 본 연구는 이러한 의문점에서 출발하여, 디지털콘텐트의 본문내용이 비교적 짜임새 있게 정형화되어 있는 석사 학위논문을 연구문헌으로 한정하여 학위논문 전문에서 나타나는 학위논문의 주제어 분포도를 연구하였다. 학위논문의 주제어는 논문 저자가 부여한 주제어를 사용하되, 학위논문 전문은 ‘목차’, ‘서론’, ‘이론배경’, ‘본론’, ‘결론’, ‘참고문헌’의 내용위치로 분할하여 내용위치에 따른 주제어의 출현율을 확인하였다. 연구대상 학위논문 전문은 1226.3개의 용어, 5152.3번의 용어 출현을 보였다. 학위논문 저자가 부여한 주제어는 12~13개 용어로 구성되어 있었다. 연구결과, 전문 내용위치에 따른 주제어의 출현율은 ‘목차’ 11.4%와 ‘서론’ 11.2%에서 가장 높았으며(11%), 다음 순위는 내용위치 ‘결론’ 9.8%이었다.

keywords: digital content, table of contents, subject term, frequency of subject, division of full-text, 디지털콘텐트, 내용목차, 주제어, 주제어 분포도, 전문 내용위치

Abstract

We would generally use subject terms such as subject indexing for searching and accessing documents. So then, there must be any relationship between document's full-text and its subject terms. This study is started in this question. Master's theses in field of science and technology are worked with because full-text is relatively formatted. This study is to study locations of subject term on Thesis, distribution patterns of subject terms on content of full-text; ‘Contents’, ‘Introduction’, ‘Theory’, ‘Main subject’, ‘Conclusion’ and ‘References’. Thesis were averagely composed of 1226.3 terms. And Subject terms were averagely compose of 12~13 terms. As a result, ‘Contents’ and ‘Introduction’ have had the most frequency of subject.

keywords: digital content, table of contents, subject term, frequency of subject, division of full-text, 디지털콘텐트, 내용목차, 주제어, 주제어 분포도, 전문 내용위치

참고문헌

국립중앙도서관. 국립중앙도서관 주제명표목표 개발. http://www.nl.go.kr.

국민대학교. KLT: Korean Language Tech- nology: (구)HAM. http://nlp.kookmin.ac.kr/HAM/kor/index.html.

김광해. (1987). 유의어, 반의어 사전:한샘.

백지원. (2002). 용어분류의 비교연구 (19-26). 제9회 한국정보관리학회 학술대회논문집.

안희국. (2005). 문서 분류를 위한 문장 응집도와 주어 주도의 주제어 추출 (163-165). 한국컴퓨터종합학술대회 논문집.

유영준. (2003). 문헌정보학의 지식 구조에 관한 연구. 정보관리학회지, 20(3), 277-297.

이강일. (2005). 주제어와 미분류 문서들을 이용한 문서의 자동 분류 방법 (592-594). 한국컴퓨터종합학술대회.

이경찬. (2002). 범주 대표어의 가중치 계산 방식에 의한 자동 문서 분류 시스템 (475-477). 한국정보과학회 봄 학술발표논문집.

이영숙. (2001). 계층적 분류체계를 위한 자동분류 기법에 관한 연구 (-). 제8회 한국정보관리학회 학술대회 논문집.

10.

이창범. (2002). 주성분 분석을 이용한 문서 주제어 추출. 정보과학회논문지 : 소프트웨어 및 응용, 29(10), 747-754.

11.

이혜영. (2007). 학위논문의 주제어 분포에 관한 연구 (-). 제14회 한국정보관리학회 학술대회 논문집.

12.

이혜영. (2003). 잠재적의미색인을 이용한 더블린코어 메터데이터 유사도 판단기법.

13.

한광록. (2004). 주제어구 추출과 질의어 기반 요약을 이용한 문서 요약. 정보과학회논문지 : 소프트웨어 및 응용, 31(4), 488-497.

14.

황재영. (2003). 자동문헌분류를 위한 대표색인어 추출에 관한 연구 (55-64). 제10회 한국정보관리학회 학술대회 논문집.

15.

Amini, M. R.. (2002). The Use of Unlabeled Data to Improve Super- vised Learning for Text Summarization (105-112). Procedding of ACM SIGIR'02.

16.

Battiti, R.. (1994). Using Mutual Information for Selection Features in Supervised Neural Net Learning. IEEE Trans. Neural Networks, 5, 537-550.

17.

Chuang, W. T.. (2000). Extracting Sentence Segments for Text Summa- rization: A Machine Learning Approach (152-159). Proceeding of ACM SIGIR'00.

18.

Craven, Timothy C.. (2000). Abstracts Produced Using Computer Assistance. Journal of the American Society for Information Science, 51(8), 745-756.

19.

Damerau,Fred J.. (1993). Generation and eval- uating domain-oriented multi-word terms from texts. Information Processing and Management, 29(4), 433-447.

20.

Gil-Leiva, I.. (2007). Keywords given by authors of scien- tific articles in database descriptors. Journal of the American Society for Information Science and Technology, 58(8), 1175-1187.

21.

Heery, Rache. (1996). Review of Metadata Formal. Program, 30(4), 345-373.

22.

Kwak, N.. (1999). Improved Mutual Information Feature Selector for Neural Networks in Supervised Learning (1313-1318). Int. Joint Conf. on Neural Networks (IJCNN'99).

23.

Lange, Holley R.. (1997). Taming the Internet Metadata, A Work in Progress. Advances in Libra- rianship, 21, 47-72.

24.

Marshakova-Shaikevich, Irina. (2005). Biblio- metric Maps of Field of Science. Infor- mation Processing and Management, 41, 1534-1547.

25.

Moens, Marie-Francine. (2000). Text categorization: the assign- ment of subject descriptors to mag- azine articles. Information Processing & Management, 36(6), 841-861.

26.

Novovicova, Jana. (2004). Feature Selection Using Improved Mutual Information for Text Classification. LNCS, 3138, 1010-1017.

27.

Sano, Hikomaro. (1990). Facet Tabulation of Index Terms. Information Processing and Management, 25(4), 543-548.

28.

Silvester, June P.. (1993). An Operational System for Subject Switching Between Controlled Vacabularies. Information Processing and Management, 29(1), 47-59.

29.

Soucy, Pascal. (2003). Feature Selection Strategies for Text Categorization. LNAI, 2671, 505-509.

30.

Yang, Yiming. (1997). Comparative Study on Feature Se- lection in Text Categorization (412-420). Pro- ceedings of the 14th ICML97.

바로가기메뉴

논문 상세

Vol.25 No.1

과학기술분야 학위논문 내용목차에 따른 주제어 출현빈도에 관한 연구

A Study on Frequency of Subject on Content of Thesis in Field of Science and Technology

초록

Abstract

참고문헌

정보관리학회지