Date of Award

2023-05-01

Degree Name

Master of Science

Department

Mathematical Sciences

Advisor(s)

Abhijit Mandal

Abstract

Due to advanced technology and wide source of data collection, high-dimensional data is available in several fields, including healthcare, bioinformatics, medicine, epidemiology, economics, finance, sociology, and climatology. In those datasets, outliers are generally encountered due to technical errors, heterogeneous sources, or the effect of some confounding variables. As outliers are often difficult to detect in high-dimensional data, the standard approaches may fail to model such data and produce misleading information. In this thesis, we studied Huber and Tukey's M-estimators for linear regression that automatically down-weight outliers and provide a good fit. We also investigated two variable selection methods -- LASSO and LAD-LASSO. In addition, we performed a simulation study to compare different estimators in pure and contaminated data. Finally, we analyzed cardiovascular data to model systolic and diastolic blood pressure. The results show that Huber and Tukey's M-estimators perform better for this dataset.

Language

en

Provenance

Recieved from ProQuest

File Size

p.

File Format

application/pdf

Rights Holder

Jagannath Das

Share

COinS