Comparison of Different Robust Methods in Linear Regression and Applications in Cardiovascular Data

Jagannath Das, University of Texas at El Paso

Abstract

Due to advanced technology and wide source of data collection, high-dimensional data is available in several fields, including healthcare, bioinformatics, medicine, epidemiology, economics, finance, sociology, and climatology. In those datasets, outliers are generally encountered due to technical errors, heterogeneous sources, or the effect of some confounding variables. As outliers are often difficult to detect in high-dimensional data, the standard approaches may fail to model such data and produce misleading information. In this thesis, we studied Huber and Tukey's M-estimators for linear regression that automatically down-weight outliers and provide a good fit. We also investigated two variable selection methods -- LASSO and LAD-LASSO. In addition, we performed a simulation study to compare different estimators in pure and contaminated data. Finally, we analyzed cardiovascular data to model systolic and diastolic blood pressure. The results show that Huber and Tukey's M-estimators perform better for this dataset.

Subject Area

Statistics|Information Technology|Applied Mathematics

Recommended Citation

Das, Jagannath, "Comparison of Different Robust Methods in Linear Regression and Applications in Cardiovascular Data" (2023). ETD Collection for University of Texas, El Paso. AAI30521760.
https://scholarworks.utep.edu/dissertations/AAI30521760

Share

COinS