Comparative Study of Supervised Classification Techniques With a Modified KNN Algorithm

Noah Owusu, University of Texas at El Paso

Abstract

The goal of classification is to develop a model that can be used to accurately assign new observations to labeled classes based on the patterns learned from the training data. K-nearest Neighbors algorithm (KNN) is a popular and widely used algorithm for classification, however, its performance can be adversely affected by the presence of outliers in a dataset. In this study we have modified this existing KNN algorithm that can alleviate the effect of outliers in a dataset, thereby improving the performance of the KNN algorithm. We compared the performances of the Modified KNN method and the Existing KNN algorithm as well as other six machine learning algorithms – Naive Bayes algorithm, Random Forest, Support Vector Machine (SVM Linear), Logistic Regression (logit), Linear Discriminant (LDA), and Quadratic Discriminant Analysis (QDA). Utilizing a simulated data and HCV dataset which is available at UCI machine learning repository (HCV data 2020), we compared the performances of these techniques in terms of F1 score, AUC-ROC, and accuracy. The simulation study revealed that the Modified KNN method outperforms the existing KNN algorithm when applied to a simulated datasets that contained different proportion of outliers. Also, with the real data, the Modified KNN method outperforms the existing KNN algorithm in predicting Hepatitis C. The performance evaluations confirm the validity of the Modified KNN method.

Subject Area

Statistics|Computer science|Computer Engineering

Recommended Citation

Owusu, Noah, "Comparative Study of Supervised Classification Techniques With a Modified KNN Algorithm" (2023). ETD Collection for University of Texas, El Paso. AAI30575815.
https://scholarworks.utep.edu/dissertations/AAI30575815

Share

COinS