Date of Award

2023-08-01

Degree Name

Master of Science

Department

Mathematical Sciences

Advisor(s)

Abhijit M. Mandal

Abstract

The goal of classification is to develop a model that can be used to accurately assign new observations to labeled classes based on the patterns learned from the training data. K-nearest Neighbors algorithm (KNN) is a popular and widely used algorithm for classification, however, its performance can be adversely affected by the presence of outliers in a dataset. In this study we have modified this existing KNN algorithm that can alleviate the effect of outliers in a dataset, thereby improving the performance of the KNN algorithm. We compared the performances of the Modified KNN method and the Existing KNN algorithm as well as other six machine learning algorithms â?? Naive Bayes algorithm, Random Forest, Support Vector Machine (SVM Linear), Logistic Regression (logit), Linear Discriminant (LDA), and Quadratic Discriminant Analysis (QDA). Utilizing a simulated data and HCV dataset which is available at UCI machine learning repository (HCV data2020), we compared the performances of these techniques in terms of F1 score, AUC-ROC and accuracy. The simulation study revealed that the Modified KNN method outperforms the existing KNN algorithm when applied to a simulated datasets that contained different proportion of outliers. Also, with the real data, the Modified KNN method outperforms the existing KNN algorithm in predicting Hepatitis C. The performance evaluations confirm the validity of the Modified KNN method.

Language

en

Provenance

Recieved from ProQuest

File Size

63 p.

File Format

application/pdf

Rights Holder

Noah Owusu

Share

COinS