In this paper, we propose optimal methodologies for classifying patent literature by examining various feature extraction methods, machine learning and deep learning models, and provide optimal performance through experiments. We compared the traditional BoW method and a distributed representation method (word embedding vector) as a feature extraction, and compared the morphological analysis and multi gram as the method of constructing the document collection. In addition, classification performance was verified using traditional machine learning model and deep learning model. Experimental results show that the best performance is achieved when we apply the deep learning model with distributed representation and morphological analysis based feature extraction. In Section, Class and Subclass classification experiments, We improved the performance by 5.71%, 18.84% and 21.53%, respectively, compared with traditional classification methods.
김재호, 최기선. 2005. 문서의 의미적 구조정보를 이용한 특허 문서 분류. 『한국정보과학회 언어공학연구회 학술발표 논문집』, 28-34.
박찬정, 김기용, 성동수. 2014. KNN 을 이용한 융합기술 특허문서의 자동 IPC 분류. 『한국정보기술학회논문지』, 12(3): 175-185.
임소라, 권용진. 2017. 특허문서 필드의 기능적 특성을 활용한 IPC 다중 레이블 분류. 『인터넷정보학회지』, 18(1): 77-88.
특허청. 2018. 『2017 지식재산통계연보』. 대전: 특허청.
한국과학기술원 융합연구정책센터. 2018. 『2017년도 국가융합기술 R&D 조사·분석』. 서울: 한국과학기술원 융합연구정책센터
Bahdanau D., Cho, K. and Bengio, Y. 2015. “Neural Machine Translation by Jointly Learning to Align and Translate.” In Proceeding of ICLR 2015. [arXiv:1409.0473]
Bojanowski, P. et al. 2017. “Enriching word vectors with subword information.” Transactions of the Association for Computational Linguistics, 5: 135-146.
Chen, Y. and Chang, Y. 2012. “A three-phase method for patent classification.” Information Processing & Management, 48(6): 1017-1030.
Collobert, R. and Weston, J. 2008. “A Unified Architecture for Natural Language Processing:Deep Neural Networks with Multitask Learning.” In Proceeding of the 25th International Conference on Maching Learning.
Fall, C. et al. 2003. “Automated categorization in the international patent classification.”In Acm Sigir Forum, 37(1): 10-25.
Koster, C. and Seutter, M. 2003. “Taming wild phrases.” In Proceedings of the 25th European conference on IR research (ECIR’03), 161-176.
Larkey, L. 1999. “A patent search and classification system.” In Proceedings of the fourth ACM conference on Digital libraries, 179-187.
Mikolov, T., Chen, K., Corrado, G. and Dean, J. 2013. “Efficient estimation of word representations in vector space.” arXiv preprint arXiv:1301.3781.
Pennington, J., Socher, R. and Manning, C. 2014. “Glove: Global vectors for word representation.”In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543.
Tikk, D., Biró, G. and Törcsvári, A. 2008. “A hierarchical online classifier for patent categorization.”Emerging technologies of text mining: Techniques and applications. IGI Global, 244-267.