E-ISSN : 2508-7894
Indonesia ranks fifth as the country of origin for spammers. Attention is urgently needed to tackle spam, especially in Bahasa Indonesia (Indonesian language), which can be achieved by building the best spam detection model. This study aims to compare machine learning models for spam detection, study spam email modeling topics, and design the implementation on the REST API. Spam detection is carried out using machine learning algorithms, i.e., Long Short Term Memory (LSTM), K-Nearest Neighbours (KNN), Naive Bayes, Random Forest, Adaboost, and Support Vector Machine (SVM) combined with slang preprocessing convert and translate. Furthermore, Latent Dirichlet Allocation (LDA) is used for topic modeling of spam emails. The results show that slang processes convert and translate can improve accuracy and f1-score, Long Short Term Memory (LSTM) was the best method with accuracy 93.15% and f1-score of 93.01%, compared to the other methods. In addition, there were five main topics on data categorized as spam: promotions, job vacancies, educational offers, bulletins and news, and investment and finance. A REST API model was successfully developed to separate spam categories based on promotional and other topics.