Robust Penalized Density Power Divergence Regression With SCAD Penalty for High Dimensional Data Analysis

Maxwell Kwesi Mac-Ocloo, University of Texas at El Paso


Amidst the exponential surge in big data, managing high-dimensional datasets across diverse fields and industries has emerged as a significant challenge. Conventional statistical methods struggle to handle their complexity, making analysis intricate. In response, we’ve formulated a robust estimator tailored to counter outliers and heavy-tailed errors. Our approach integrates the SCAD penalty into the Density Power Divergence method, effectively reducing insignificant coefficients to zero. This enhances analysis precision and result reliability. We benchmark our robust and penalized model against existing techniques like Huber, Tukey, LASSO, LAD, and LAD-LASSO. Employing both simulated and UCI machine learning repository datasets, we assess method performance using RMPE, Sensitivity, Specificity, and Mean Dimension reduction. In simulations, BIC(DPD) and EBIC(DPD) consistently yielded the lowest RMPE values for outlier proportions (0%, 5%, 10%) and signal-to-noise ratios (0.5, 1, 5), with sample size increasing from 100 to 500. Cp(DPD) exhibited strong sensitivity. Our model, Cp(DPD), surpassed LASSO and LAD-LASSO in achieving dimension reduction within high-dimensional data. While constrained by computational complexity, our model’s predictor inclusion was limited. Future research should expand this aspect, validating established methods against our innovation, the Robust Penalized Density Power Divergence Regression with SCAD penalty.

Subject Area

Statistics|Information science

Recommended Citation

Mac-Ocloo, Maxwell Kwesi, "Robust Penalized Density Power Divergence Regression With SCAD Penalty for High Dimensional Data Analysis" (2023). ETD Collection for University of Texas, El Paso. AAI30635117.