Date of Award
Master of Science
Amidst the exponential surge in big data, managing high-dimensional datasets across diverse fields and industries has emerged as a significant challenge. Conventional statistical methods struggle to handle their complexity, making analysis intricate. In response, we've formulated a robust estimator tailored to counter outliers and heavy-tailed errors. Our approach integrates the SCAD penalty into the Density Power Divergence method, effectively reducing insignificant coefficients to zero. This enhances analysis precision and result reliability.We benchmark our robust and penalized model against existing techniques like Huber, Tukey, LASSO, LAD, and LAD-LASSO. Employing both simulated and UCI machine learning repository datasets, we assess method performance using RMPE, Sensitivity, Specificity, and Mean Dimension reduction. In simulations, BIC(DPD) and EBIC(DPD) consistently yielded the lowest RMPE values for outlier proportions (0\%, 5\%, 10\%) and signal-to-noise ratios (0.5, 1, 5), with sample size increasing from 100 to 500. Cp(DPD) exhibited strong sensitivity. Our model, Cp(DPD), surpassed LASSO and LAD-LASSO in achieving dimension reduction within high-dimensional data. While constrained by computational complexity, our model's predictor inclusion was limited. Future research should expand this aspect, validating established methods against our innovation, the Robust Penalized Density Power Divergence Regression with SCAD penalty.
Recieved from ProQuest
Maxwell Kwesi Mac-Ocloo
Mac-Ocloo, Maxwell Kwesi, "Robust Penalized Density Power Divergence Regression With Scad Penalty For High Dimensional Data Analysis" (2023). Open Access Theses & Dissertations. 3920.