Statistics Technical Reports:Search | Browse by year

Sorted by:

Title:On the impact of predictor geometry on the performance on high-dimensional ridge-regularized generalized robust regression estimators
Author(s):El Karoui, Noureddine; 
Date issued:July 2015 (PDF)
Abstract:We study ridge-regularized generalized robust regression estimators, i.e $$ \betaHat=\argmin_{\beta \in \mathbb{R}^p} \frac{1}{n}\sum_{i=1}^n \rho_i(Y_i-X_i\trsp \beta)+\frac{\tau}{2}\norm{\beta}^2\;, \text{ where } Y_i=\eps_i+X_i\trsp \beta_0\;. $$ in the situation where $p/n$ tends to a finite non-zero limit. Our study here focuses on the situation where the errors $\eps_i$'s are heavy-tailed and $X_i$'s have an "elliptical-like" distribution. Our assumptions are quite general and we do not require homoskedasticity of $\eps_i$'s for instance. We obtain a characterization of the limit of $\norm{\betaHat-\beta_0}$, as well as several other results, including central limit theorems for the entries of $\betaHat$.
Keyword note:El__Karoui__Noureddine
Report ID:826

Title:Can we trust the bootstrap in high-dimension?
Author(s):El Karoui, Noureddine; Purdom, Elizabeth; 
Date issued:February 2015 (PDF)
Abstract:We consider the performance of the bootstrap in high-dimensions for the setting of linear regression, where p < n but p/n is not close to zero. We consider ordinary least-squares as well as robust regression methods and adopt a minimalist performance requirement: can the bootstrap give us good confidence intervals for a single coordinate of $\beta$? (where $\beta$ is the true regression vector). We show through a mix of numerical and theoretical work that the bootstrap is fraught with problems. Both of the most commonly used methods of bootstrapping for regression – residual bootstrap and pairs bootstrap – give very poor inference on $\beta$ as the ratio p/n grows. We find that the residuals bootstrap tend to give anti-conservative estimates (inflated Type I error), while the pairs bootstrap gives very conservative estimates (severe loss of power) as the ratio p/n grows. We also show that the jackknife resampling technique for estimating the variance of $\hat{beta}$ severely overestimates the variance in high dimensions. We contribute alternative bootstrap procedures based on our theoretical results that mitigate these problems. However, the corrections depend on assumptions regarding the under- lying data-generation model, suggesting that in high-dimensions it may be difficult to have universal, robust bootstrapping techniques.
Keyword note:Purdom__Elizabeth El__Karoui__Noureddine
Report ID:824