Research Article | Volume 3 Issue 1 - 2026
A Comparative Study of Arabic Text Classification using k-NN, SVM, and Naive Bayes
Salih Saad Qarash*
General Electricity Company, Libyan Arab Jamahiriya
*Corresponding Author: Salih Saad Qarash, General Electricity Company, Libyan Arab Jamahiriya.
Abstract
Arabic text classification is a critical task in natural language processing, yet it remains challenging due to the language’s morphological complexity and the scarcity of annotated datasets. This study presents a comparative evaluation of three classical machine learning algorithms—k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Naive Bayes—for multi-category Arabic text classification. We employ a curated dataset of 700 articles from Al-Hayat newspaper, evenly distributed across seven categories: Technology, Economy, Sports, General News, Science, Culture, and Politics. The texts, written in Modern Standard Arabic, undergo standard preprocessing including normalization, tokenization, stopword removal, and light stemming., and models are evaluated based on accuracy, precision, recall, and F1-score. Experimental results show that SVM achieves the highest performance with 89.3% accuracy and 88.8% F1-score, followed by Naive Bayes (86.4% accuracy) and k-NN (79.3% accuracy). The findings confirm SVM as the most effective classical model for this task, while Naive Bayes offers a computationally efficient alternative. k-NN underperforms, particularly in high-dimensional spaces. This work provides a reproducible benchmark for Arabic text classification and highlights the importance of preprocessing and feature representation. The results serve as a foundation for future research, including the integration of deep learning models and expansion to dialectal Arabic content.
Keywords: Text Classification; KNN; SVM; Naive Bayes; TF-IDF; Machine Learning
References
- H Al-Khalifa and H Al-Aqary. “Arabic web page classification using machine learning techniques”. Proceedings of the IEEE International Conference on Computer Systems and Applications (AICCSA) (2005): 1-6.
- A Khreishah, I Chelloug and M Alsyouf. “Comparative study of machine learning algorithms for Arabic text classification”. Journal of King Saud University - Computer and Information Sciences 22.2 (2010): 87-96.
- O Mustafa, S El-Masri and K Darwish. “Hybrid stemming for Arabic text classification”. Proceedings of the International Conference on Language Resources and Evaluation (LREC) (2013): 3120-3124.
- A Al-Azani and S El-Beltagy. “A comparative analysis of machine learning classifiers for Arabic text categorization”. International Journal of Computer Applications 180.3 (2018): 1-7.
- M Al-Smadi and I Al-Natsheh. “Arabic news text classification using support vector machines”. Procedia Computer Science 32 (2014): 752-759.
- N Al-Twairesh and A Al-Osaimi. “Performance evaluation of machine learning algorithms for Arabic news classification”. International Journal of Advanced Computer Science and Applications 10.5 (2019): 445-451.
- M Diab, K Leidos and R Maamouri. “Automatic morphological tagging of Arabic”. Natural Language Engineering 9.2 (2003): 149-181.
- K Darwish. “Building and using a lexical database for Arabic”. Proceedings of the Language Resources and Evaluation Conference (LREC) (2006): 111-116.
- A Almaksour and M Cecchini. “Arabic text classification: A survey”. Proceedings of the IEEE International Conference on Information Reuse and Integration (IRI) (2011): 1-8.
- W Aljedaani and S Alqaraawi. “A comparative study of TF-IDF and word embeddings for Arabic text classification”. Proceedings of the International Conference on Computational Science and Computational Intelligence (CSCI) (2020): 1024-1029.
- A Antoun, F Baly and H Hajj. “AraBERT: Transformer-based model for Arabic language understanding”. Proceedings of the 4th Workshop on Open-Source Arabic Corpora and Processing Tools (2021): 9-15.
- MA Al-Badrashiny, AR Sadat and N Diab. “CAMeL Tools: An open-source toolkit for Arabic natural language processing”. Natural Language Engineering 27.4 (2021): 589-608.
- T El-Halees. “Arabic text classification using machine learning and deep learning approaches”. IEEE Access 8 (2020): 158?420-158?429.
- A Alharbi and A Azmi. “A survey of Arabic text classification: Challenges and solutions”. Information Processing & Management 57.6 (2020): 102347.
- MS Abuarqoub, M Al-Ayyoub and Y Jararweh. “Deep learning approaches for Arabic sentiment analysis”. Future Generation Computer Systems 118 (2021): 344-353.
Citation
Salih Saad Qarash. “A Comparative Study of Arabic Text Classification using k-NN, SVM, and Naive Bayes". Clareus Scientific Science and Engineering 3.1 (2026): 29-37.
Copyright
© 2026 Salih Saad Qarash. Licensee Clareus Scientific Publications. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.