Malicious URL Detection by Visual Characteristics with Machine Learning: Roles  of HTTPS

HONG Sung-Won; 홍성원; KANG Min-Soo; 강민수

doi:10.24225/jkaia.2023.1.2.1

Log In/Sign Up
E-ISSN3022-5388

Home

Browse Articles

Article Detail

Home > Article Detail

E-ISSN 3022-5388

e-Submission

Vol.1, No.2

Citation Share

Malicious URL Detection by Visual Characteristics with Machine Learning: Roles of HTTPS

Journal of Korean Artificial Intelligence Association / Journal of Korean Artificial Intelligence Association, (E)3022-5388

2023, v.1 no.2, pp.1-9

https://doi.org/10.24225/jkaia.2023.1.2.1

Sung-Won HONG (Eulji University)
Min-Soo KANG (Eulji University)

HONG, S., & KANG, M. (2023). Malicious URL Detection by Visual Characteristics with Machine Learning: Roles of HTTPS. Journal of Korean Artificial Intelligence Association, 1(2), 1-9, https://doi.org/10.24225/jkaia.2023.1.2.1

copy

Downloaded
Viewed

PDF Download

Abstract

In this paper, we present a new method for classifying malicious URLs to reduce cases of learning difficulties due to unfamiliar and difficult terms related to information protection. This study plans to extract only visually distinguishable features within the URL structure and compare them through map learning algorithms, and to compare the contribution values of the best map learning algorithm methods to extract features that have the most impact on classifying malicious URLs. As research data, Kaggle used data that classified 7,046 malicious URLs and 7.046 normal URLs. As a result of the study, among the three supervised learning algorithms used (Decision Tree, Support Vector Machine, and Logistic Regression), the Decision Tree algorithm showed the best performance with 83% accuracy, 83.1% F1-score and 83.6% Recall values. It was confirmed that the contribution value of https is the highest among whether to use https, sub domain, and prefix and suffix, which can be visually distinguished through the feature contribution of Decision Tree. Although it has been difficult to learn unfamiliar and difficult terms so far, this study will be able to provide an intuitive judgment method without explanation of the terms and prove its usefulness in the field of malicious URL detection.