Statistics Technical Reports:Search | Browse by year

Sorted by:
Page: 1 2  Next

Title:Commuting birth-and-death processes
Author(s):Evans, Steven N.; Sturmfels, Bernd; Uhler, Caroline; 
Date issued:December 2008 (PDF)
Abstract:We use methods from combinatorics and algebraic statistics to study analogues of birth-and-death processes that have as their state space a finite subset of the m-dimensional lattice and for which the m matrices that record the transition probabilities in each of the lattice directions commute pairwise. One reason such processes are of interest is that the transition matrix is straightforward to diagonalize, and hence it is easy to compute n step transition probabilities. The set of commuting birth-and-death processes decomposes as a union of toric varieties, with the main component being the closure of all processes whose nearest neighbor transition probabilities are positive. We exhibit an explicit monomial parametrization for this main component, and we explore the boundary components using primary decomposition.
Keyword note:Evans__Steven_N Sturmfels__Bernd Uhler__Caroline
Report ID:769

Title:Dynamics of the time to the most recent common ancestor in a large branching population
Author(s):Evans, Steven N.; Ralph, Peter L.; 
Date issued:December 2008 (PDF)
Abstract:If we follow an asexually reproducing population through time, then the amount of time that has passed since the most recent common ancestor (MRCA) of all current individuals lived will change as time progresses. The resulting stochastic process has been studied previously when the population has a constant large size and evolves via the diffusion limit of standard Wright-Fisher dynamics. We investigate cases in which the population varies in size and evolves according to a class of models that includes suitably conditioned $(1+\beta)$-stable continuous state branching processes (in particular, it includes the conditioned Feller continuous state branching process). We also consider the discrete time Markov chain that tracks the MRCA age just before and after its successive jumps. We find transition probabilities for both the continuous and discrete time processes, determine when these processes are transient and recurrent, and compute stationary distributions when they exist. We also introduce a new family of Markov processes that stand in a relation with respect to the $(1+\beta)$-stable continuous state branching process that is similar to the one between the Bessel-squared diffusions and the Feller continuous state branching process.
Keyword note:Evans__Steven_N Ralph__Peter
Report ID:768

Title:High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence
Author(s):Ravikumar, Pradeep; Wainwright, Martin J.; Raskutti, Garvesh; Yu, Bin; 
Date issued:November 2008 (PDF)
Abstract:Given i.i.d. observations of a random vector $X \in \mathbb{R}^p$, we study the problem of estimating both its covariance matrix $\Sigma^*$, and its inverse covariance or concentration matrix \mbox{$\Theta^* = (\Sigma^*)^{-1}$.} We estimate $\Theta^*$ by minimizing an $\ell_1$-penalized log-determinant Bregman divergence; in the multivariate Gaussian case, this approach corresponds to $\ell_1$-penalized maximum likelihood, and the structure of $\Theta^*$ is specified by the graph of an associated Gaussian Markov random field. We analyze the performance of this estimator under high-dimensional scaling, in which the number of nodes in the graph $p$, the number of edges $s$ and the maximum node degree $d$, are allowed to grow as a function of the sample size $n$. In addition to the parameters $(p,s,d)$, our analysis identifies other key quantities covariance matrix $\Sigma^*$; and (b) the $\ell_\infty$ operator norm of the sub-matrix $\Gamma^*_{S S}$, where $S$ indexes the graph edges, and $\Gamma^* = (\Theta^*)^{-1} \otimes (\Theta^*)^{-1}$; and (c) a mutual incoherence or irrepresentability measure on the matrix $\Gamma^*$ and (d) the rate of decay $1/\sctail(\numobs,\scdelta)$ on the probabilities $ \{|\widehat{\Sigma}^n_{ij}- \Sigma^*_{ij}| > \delta \}$, where $\widehat{\Sigma}^n$ is the sample covariance based on $n$ samples. Our first result establishes consistency of our estimate $\widehat{\Theta}$ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $\degmax = o(\sqrt{\spindex})$. In our second result, we show that with probability converging to one, the estimate $\widehat{\Theta}$ correctly specifies the zero pattern of the concentration matrix $\Theta^*$. We illustrate our theoretical results via simulations for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in simulations.
Keyword note:Ravikumar__Pradeep Wainwright__Martin Raskutti__Garvesh Yu__Bin
Report ID:767

Title:Analyzing Data with Graphs: Metagenomic Data and the Phylogenetic Tree
Author(s):Purdom, Elizabeth; 
Date issued:November 2008 (PDF)
Abstract:In biological experiments, researchers often have information in the form of a graph that supplements observed numerical data. Incorporating the knowledge contained in these graphs into an analysis of the numerical data is an important and non trivial task. We look at the example of metagenomic data -- data from a genomic survey of the abundence of different species of bacteria in a sample. Here, the graph of interest is a phylogenetic tree depicting the interspecies relationships among the bacteria species. We demonstrate that analysis of the data in a non-standard inner-product space effectively uses this additional graphical information and produces more meaningful results.
Keyword note:Purdom__Elizabeth
Report ID:766

Title:Message-passing for graph-structured linear programs: Proximal methods and rounding schemes
Author(s):Ravikumar, Pradeep; Agarwal, Alekh; Wainwright, Martin J.; 
Date issued:October 2008 (PDF)
Abstract:The problem of computing a maximum a posteriori (MAP) configuration is a central computational challenge associated with Markov random fields. A line of work has focused on ``tree-based'' linear programming (LP) relaxations for the MAP problem. This paper develops a family of super-linearly convergent algorithms for solving these LPs, based on proximal minimization schemes using Bregman divergences. As with standard message-passing on graphs, the algorithms are distributed and exploit the underlying graphical structure, and so scale well to large problems. Our algorithms have a double-loop character, with the outer loop corresponding to the proximal sequence, and an inner loop of cyclic Bregman divergences used to compute each proximal update. Different choices of the Bregman divergence lead to conceptually related but distinct LP-solving algorithms. We establish convergence guarantees for our algorithms, and illustrate their performance via some simulations. We also develop two classes of graph-structured rounding schemes, randomized and deterministic, for obtaining integral configurations from the LP solutions. Our deterministic rounding schemes use a ``re-parameterization'' property of our algorithms so that when the LP solution is integral, the MAP solution can be obtained even before the LP-solver converges to the optimum. We also propose a graph-structured randomized rounding scheme that applies to iterative LP solving algorithms in general. We analyze the performance of our rounding schemes, giving bounds on the number of iterations required, when the LP is integral, for the rounding schemes to obtain the MAP solution. These bounds are expressed in terms of the strength of the potential functions, and the energy gap, which measures how well the integral MAP solution is separated from other integral configurations. We also report simulations comparing these rounding schemes.
Keyword note:Ravikumar__Pradeep Agarwal__Alekh Wainwright__Martin
Report ID:765

Title:Estimating divergence functionals and the likelihood ratio by convex risk minimization
Author(s):Nguyen, XuanLong; Wainwright, Martin J.; Jordan, Michael I.; 
Date issued:October 2008 (PDF)
Abstract:We develop and analyze M-estimation methods for the divergence functionals and the likelihood ratios of two probability distributions. Our method is based on a non-asymptotic variational characterization of $f$-divergences, which allows the problem of estimating divergences to be tackled via convex risk optimization. The resulting estimators are simple to implement, requiring only the solution of standard convex programs. We present an analysis of consistency and convergence for these estimators. Given conditions only on the ratios of densities, we show that our estimators can achieve optimal minimax rates for the likelihood ratio in certain regimes. Finally, we derive an efficient optimization algorithm for computing our estimates, and illustrate their convergence behavior and practical viability by simulations.
Keyword note:Nguyen__XuanLong Wainwright__Martin Jordan__Michael_I
Report ID:764

Title:Multiway Spectral Clustering: A Margin-based Perspective
Author(s):Zhang, Zhihua; Jordan, Michael I.; 
Date issued:September 2008 (PDF)
Abstract:Spectral clustering is a broad class of clustering procedures in which an intractable combinatorial optimization formulation of clustering is "relaxed" into a tractable eigenvector problem, and in which the relaxed solution is subsequently "rounded" into an approximate discrete solution to the original problem. In this paper we present a novel margin-based perspective on multiway spectral clustering. We show that the margin-based perspective illuminates both the relaxation and rounding aspects of spectral clustering, providing a unified analysis of existing algorithms and guiding the design of new algorithms. We also present connections between spectral clustering and several other topics in statistics, specifically minimum-variance clustering, Procrustes analysis and Gaussian intrinsic autoregression.
Keyword note:Zhang__Zhihua Jordan__Michael_I
Report ID:763

Title:Vital rates from the action of mutation accumulation
Author(s):Wachter, Kenneth W.; Steinsaltz, David R.; Evans, Steven N.; 
Date issued:August 2008 (PDF)
Abstract:New models for evolutionary processes of mutation accumulation allow hypotheses about the age-specificity of mutational effects to be translated into predictions of heterogeneous population hazard functions. We apply these models to questions in the biodemography of longevity, including proposed explanations of Gompertz hazards and mortality plateaus, and use them to explore the possibility of melding evolutionary and functional models of aging.
Keyword note:Wachter__Kenneth Steinsaltz__David Evans__Steven_N
Report ID:762

Title:Union support recovery in high-dimensional multivariate regression
Author(s):Obozinski, Guillaume; Wainwright, Martin J.; Jordan, Michael I.; 
Date issued:August 2008 (PDF)
Abstract:In the problem of multivariate regression, a K-dimensional response vector is regressed upon a common set of p covariates, with a matrix B* of regression coefficients. We study the behavior of the group Lasso using l1/l2 regularization for the union support problem, meaning that the set of s rows for which B* is non-zero is recovered exactly. Studying this problem under high-dimensional scaling, we show that group Lasso recovers the exact row pattern with high probability over the random design and noise for scalings of (n; p; s) such that the sample complexity parameter given by theta(n,p,s)=n/(2 psi(B*) log(p-s)) exceeds a critical threshold. Here n is the sample size, p is the ambient dimension of the regression model, s is the number of non-zero rows, and psi(B*) is a sparsity-overlap function that measures a combination of the sparsities and overlaps of the K-regression coefficient vectors that constitute the model. This sparsity-overlap function reveals that, if the design is uncorrelated on the active rows, block l1/l2 regularization for multivariate regression never harms performance relative to an ordinary Lasso approach, and can yield substantial improvements in sample complexity (up to a factor of K) when the regression vectors are suitably orthogonal. For more general designs, it is possible for the ordinary Lasso to outperform the group Lasso. We complement our analysis with simulations that demonstrate the sharpness of our theoretical results, even for relatively small problems.
Keyword note:Obozinski__Guillaume Wainwright__Martin Jordan__Michael_I
Report ID:761

Title:Data Spectroscopy: Eigenspace Of Convolution Operators And Clustering
Author(s):Shi, Tao; Belkin, Mikhail; Yu, Bin; 
Date issued:July 2008 (PDF)
Abstract:This paper focuses on obtaining clustering information in a distribution when iid data are given. First, we develop theoretical results for understanding and using clustering information contained in the eigenvectors of data adjacency matrices based on a radial kernel function (with a sufficiently fast tail decay). We provide population analyses to give insights into which eigenvectors should be used and when the clustering information for the distribution can be recovered from the data. In particular, we learned that top eigenvectors do not contain all the clustering information. Second, we use heuristics from these analyses to design the Data Spectroscopic clustering (DaSpec) algorithm that uses properly selected top eigenvectors, determines the number of clusters, gives data labels, and provides a classification rule for future data, all based on only one eigen decomposition. Our findings not only extend and go beyond the intuitions underlying existing spectral techniques (e.g. spectral clustering and Kernel Principal Components Analysis), but also provide insights about their usability and modes of failure. Simulation studies and experiments on real world data are conducted to show the promise of our proposed data spectroscopy clustering algorithm relative to k-means and one spectral method. In particular, DaSpec seems to be able to handle unbalanced groups and recover clusters of different shapes better than competing methods.
Keyword note:Shi__Tao Belkin__Mikhail Yu__Bin
Report ID:760

Title:A path following algorithm for Sparse Pseudo-Likelihood Inverse Covariance Estimation (SPLICE)
Author(s):Rocha, Guilherme V.; Zhao, Peng; Yu, Bin; 
Date issued:July 2008 (PDF)
Abstract:Given n observations of a p-dimensional random vector, the covariance matrix and its inverse (precision matrix) are needed in a wide range of applications. Sample covariance (e.g. its eigenstructure) can misbehave when p is comparable to the sample size n. Regularization is often used to mitigate the problem. In this paper, we proposed an l1-norm penalized pseudo-likelihood estimate for the inverse covariance matrix. This estimate is sparse due to the l1-norm penalty, and we term this method SPLICE. Its regularization path can be computed via an algorithm based on the homotopy/LARS-Lasso algorithm. Simulation studies are carried out for various inverse covariance structures for p=15 and n=20,1000. We compare SPLICE with the l1-norm penalized likelihood estimate and a l1-norm penalized Cholesky decomposition based method. SPLICE gives the best overall performance in terms of three metrics on the precision matrix and ROC curve for model selection. Moreover, our simulation results demonstrate that the SPLICE estimates are positive-definite for most of the regularization path even though the restriction is not enforced.
Keyword note:Rocha__Guilherme_V Zhao__Peng Yu__Bin
Report ID:759

Title:Information In The Non-Stationary Case
Author(s):Vu, Vincent Q.; Yu, Bin; Kass, Robert E.; 
Date issued:June 2008 (PDF)
Abstract:Information estimates such as the "direct method" of Strong et al. (1998) sidestep the difficult problem of estimating the joint distribution of response and stimulus by instead estimating the difference between the marginal and conditional entropies of the response. While this is an effective estimation strategy, it tempts the practitioner to ignore the role of the stimulus and the meaning of mutual information. We show here that, as the number of trials increases indefinitely, the direct (or "plug-in") estimate of marginal entropy converges (with probability 1) to the entropy of the time-averaged conditional distribution of the response, and the direct estimate of the conditional entropy converges to the time-averaged entropy of the conditional distribution of the response. Under joint stationarity and ergodicity of the response and stimulus, the difference of these quantities converges to the mutual information. When the stimulus is deterministic or non-stationary the direct estimate of information no longer estimates mutual information, which is no longer meaningful, but it remains a measure of variability of the response distribution across time.
Keyword note:Vu__Vincent_Q Yu__Bin Kass_Robert_E
Report ID:758

Title:The Age-Specific Force of Natural Selection and Walls of Death
Author(s):Wachter, Kenneth W.; Evans, Steven N.; Steinsaltz, David R.; 
Date issued:July 2008 (PDF)
Abstract:W. D. Hamilton's celebrated formula for the age-specific force of natural selection furnishes predictions for senescent mortality due to mutation accumulation, at the price of reliance on a linear approximation. Applying to Hamilton's setting the full non-linear demographic model for mutation accumulation of Evans et al. (2007), we find surprising differences. Non-linear interactions cause the collapse of Hamilton-style predictions in the most commonly studied case, refine predictions in other cases, and allow Walls of Death at ages before the end of reproduction. Haldane's Principle for genetic load has an exact but unfamiliar generalization.
Keyword note:Wachter__Kenneth Evans__Steven_N Steinsaltz__David
Report ID:757

Title:On Model Selection Consistency of the Elastic net When p>>n
Author(s):Jia, Jinzhu; Yu, Bin; 
Date issued:May 2008 (PDF)
Abstract:In this paper, we study the model selection property of the Elastic net. In the classical settings when $p$ (the number of predictors) and $q$ (the number of predictors with non-zero coefficients in the true linear model) are fixed, Yuan and Lin (2007) give a necessary and sufficient condition for the Elastic net to consistently select the true model, which is called the Elastic Irrepresentable Condition (EIC) in this paper. Here we study the general case when $p,q$ and $n$ all go to infinity. For general scalings of $p,q$ and $n$, when Gaussian noise is assumed, sufficient conditions on $p$, $q$ and $n$ are given in this paper such that EIC guarantees the Elastic net's model selection consistency. We show that to make these conditions hold, $n$ should grow at a rate faster than $q*\log(p-q)$. For the classical case, when $p$ and $q$ are fixed, we also study the relationship between EIC and the Irrepresentable Condition (IC) which is necessary and sufficient for the Lasso to select the true model. Through theoretical results and simulation studies, we provide insights into when and why EIC is weaker than IC and when the Elastic net can consistently select the true model even when the Lasso can not.
Keyword note:Jia__Jinzhu Yu__Bin
Report ID:756

Title:Maximum Likelihood Estimation of Cloud Height from Multi-Angle Satellite Imagery.
Author(s):Anderes, E.; Yu, B.; Jovanovic, V.; Moroney, C.; Garay, M.; Braverman, A.; Clothiaux, E.; 
Date issued:May 2008 (PDF)
Abstract:We develop a new estimation technique for recovering depth-of-field from multiple stereo images. Depth-of-field is estimated by determining the shift in image location resulting from different camera viewpoints. When this shift is not divisible by pixel width, the multiple stereo images can be combined to form a super-resolution image. By modeling this super-resolution image as a realization of a random field, one can view the recovery of depth as a likelihood estimation problem. We apply these modeling techniques to the recovery of cloud height from multiple viewing angles provided by the MISR instrument on the Terra Satellite. Our efforts are focused on a two layer cloud ensemble where both layers are relatively planer, the bottom layer is optically thick and textured, and the top layer is optically thin. Our results demonstrate that with relative ease, we get comparable estimates to the M2 stereo matcher which is the same algorithm used in the current MISR standard product. Moreover, our techniques provide the possibility of modeling all of the MISR data in a unified way for cloud height estimation. Future research is underway to extend this framework for fast, quality global estimates of cloud height.
Keyword note:Anderes__Ethan Yu__Bin Jovanovic__V Moroney__C Garay__M Braverman__Amy_J Clothiaux__Eugene_E
Report ID:755

Title:Information-theoretic limits on sparse signal recovery: Dense versus sparse measurement matrices
Author(s):Wang, Wei; Wainwright, Martin J.; Ramchandran, Kannan; 
Date issued:May 2008 (PDF)
Abstract:We study the information-theoretic limits of exactly recovering the support of a sparse signal using noisy projections defined by various classes of measurement matrices. Our analysis is high-dimensional in nature, in which the number of observations $\numobs$, the ambient signal dimension $\pdim$, and the signal sparsity $\kdim$ are all allowed to tend to infinity in a general manner. This paper makes two novel contributions. First, we provide sharper necessary conditions for exact support recovery using general (non-Gaussian) dense measurement matrices. Combined with previously known sufficient conditions, this result yields sharp characterizations of when the optimal decoder can recover a signal for various scalings of the sparsity $\kdim$ and sample size $\numobs$, including the important special case of linear sparsity ($\kdim = \Theta(\pdim)$) using a linear scaling of observations ($\numobs = \Theta(\pdim)$). Our second contribution is to prove necessary conditions on the number of observations $\numobs$ required for asymptotically reliable recovery using a class of $\spgam$-sparsified measurement matrices, where the measurement sparsity $\spgam(\numobs, \pdim, \kdim) \in (0,1]$ corresponds to the fraction of non-zero entries per row. Our analysis allows general scaling of the quadruplet $(\numobs, \pdim, \kdim, \spgam)$, and reveals three different regimes, corresponding to whether measurement sparsity has no effect, a minor effect, or a dramatic effect on the information-theoretic limits of the subset recovery problem.
Keyword note:Wang__Wei Wainwright__Martin Ramchandran__K
Report ID:754

Title:High-dimensional subset recovery in noise: Sparsified measurements without loss of statistical efficiency
Author(s):Omidiran, Dapo; Wainwright, Martin J.; 
Date issued:May 2008 (PDF)
Abstract:We consider the problem of estimating the support of a vector $\beta^* \in \mathbb{R}^{p}$ based on observations contaminated by noise. A significant body of work has studied behavior of $\ell_1$-relaxations when applied to measurement matrices drawn from standard dense ensembles (e.g., Gaussian, Bernoulli). In this paper, we analyze \emph{sparsified} measurement ensembles, and consider the trade-off between measurement sparsity, as measured by the fraction $\gamma$ of non-zero entries, and the statistical efficiency, as measured by the minimal number of observations $n$ required for exact support recovery with probability converging to one. Our main result is to prove that it is possible to let $\gamma \rightarrow 0$ at some rate, yielding measurement matrices with a vanishing fraction of non-zeros per row while retaining the same statistical efficiency as dense ensembles. A variety of simulation results confirm the sharpness of our theoretical predictions.
Keyword note:Omidiran__Dapo Wainwright__Martin
Report ID:753

Title:To what extent does genealogical ancestry imply genetic ancestry?
Author(s):Matsen, Frederick A.; Evans, Steven N.; 
Date issued:May 2008 (PDF)
Abstract:Recent statistical and computational analyses have shown that a genealogical most recent common ancestor may have lived in the recent past. However, coalescent-based approaches show that genetic most recent common ancestors for a given non-recombining locus are typically much more ancient. It is not immediately clear how these two perspectives interact. This paper investigates relationships between the number of descendant alleles of an ancestor allele and the number of genealogical descendants of the individual who possessed that allele for a simple diploid genetic model extending the genealogical model of J.T. Chang.
Keyword note:Matsen__Frederick_A Evans__Steven_N
Report ID:752

Title:Network-based consensus averaging with general noisy channels
Author(s):Rajagopal, Ram; Wainwright, Martin J.; 
Date issued:May 2008 (PDF)
Abstract:This paper focuses on the consensus averaging problem on graphs under general noisy channels. We study a particular class of distributed consensus algorithms based on damped updates, and using the ordinary differential equation method, we prove that the updates converge almost surely to exact consensus for finite variance noise. Our analysis applies to various types of stochastic disturbances, including errors in parameters, transmission noise, and quantization noise. Under a suitable stability condition, we prove that the error is asymptotically Gaussian, and we show how the asymptotic covariance is specified by the graph Laplacian. For additive parameter noise, we show how the scaling of the asymptotic MSE is controlled by the spectral gap of the Laplacian.
Keyword note:Rajagopal__Ram Wainwright__Martin
Report ID:751

Title:High-Dimensional Graphical Model Selection Using $\ell_1$-Regularized Logistic Regression
Author(s):Ravikumar, Pradeep; Wainwright, Martin J.; Lafferty, John D.; 
Date issued:April 2008 (PDF)
Abstract:We consider the problem of estimating the graph structure associated with a discrete Markov random field. We describe a method based on $\ell_1$-regularized logistic regression, in which the neighborhood of any given node is estimated by performing logistic regression subject to an $\ell_1$-constraint. Our framework applies to the high-dimensional setting, in which both the number of nodes $p$ and maximum neighborhood sizes $d$ are allowed to grow as a function of the number of observations $n$. Our main results provide sufficient conditions on the triple $(n, p, d)$ for the method to succeed in consistently estimating the neighborhood of every node in the graph simultaneously. Under certain assumptions on the population Fisher information matrix, we prove that consistent neighborhood selection can be obtained for sample sizes $n = \Omega(d^3 \log p)$, with the error decaying as $\order(\exp(-C n/d^3))$ for some constant $C$. If these same assumptions are imposed directly on the sample matrices, we show that $n = \Omega(d^2 \log p)$ samples are sufficient.
Keyword note:Ravikumar__Pradeep Wainwright__Martin Lafferty__John
Report ID:750

Page: 1 2  Next