**Statistics Technical Reports:**Search | Browse by year

**Term(s):**2006**Results:**25**Sorted by:****Page: 1 2 Next**

**Title:**Learning a potential function from a trajectory**Author(s):**Brillinger, David R.; **Date issued:**December 2006

http://nma.berkeley.edu/ark:/28722/bk0005d0p65 (PDF) **Abstract:**This letter concerns the use of stochastic gradient systems in the modeling of the paths of moving particles and the consequent
estimation of a potential function. The work proceeds by setting down a model for the potential function which leads to a
stochastic differential equation. The method is simple, direct and flexible being based on a linear model and least squares.
The estimated potential function may be used for: simple description, summary, comparison, seeking patterns, simulation, prediction,
and model appraisal. Explanatories, attractors and repellors, may be included in the potential function directly. The large
sample distribution of the estimated potential function is provided. There is an example analyzing the path of an elk. There
are direct extensions to: updating, sliding window, adaptive, robust and real time variants. Index Terms: Mobility model,
monitoring, potential function, stochastic differential equation, stochastic gradient system, surveillance, tracking, waypoint
data.**Keyword note:**Brillinger__David_R**Report ID:**723**Relevance:**100

**Title:**AdaBoost is Consistent**Author(s):**Bartlett, Peter L.; Traskin, Mikhail; **Date issued:**December 2006

http://nma.berkeley.edu/ark:/28722/bk0005d071p (PDF) **Abstract:**The risk, or probability of error, of the classifier produced by the AdaBoost algorithm is investigated. In particular, we
consider the stopping strategy to be used in AdaBoost to achieve universal consistency. We show that provided AdaBoost is
stopped after $n^{1-\varepsilon}$ iterations---for sample size $n$ and $\varepsilon \in (0,1)$---the sequence of risks of
the classifiers it produces approaches the Bayes risk.**Keyword note:**Bartlett__Peter Traskin__Mikhail**Report ID:**722**Relevance:**100

**Title:**Probability and real trees**Author(s):**Evans, Steven N.; **Date issued:**December 2006

http://nma.berkeley.edu/ark:/28722/bk0005d073s (PDF) **Abstract:**These are the notes for the lectures I gave at the Saint-Flour probability summer school in 2005.**Keyword note:**Evans__Steven_N**Report ID:**721**Relevance:**100

**Title:**Lasso-type recovery of sparse representations for high-dimensional data**Author(s):**Meinshausen, Nicolai; Yu, Bin; **Date issued:**December 2006

http://nma.berkeley.edu/ark:/28722/bk0005d075w (PDF) **Abstract:**The Lasso (Tibshirani, 1996) is an attractive technique for regularization and variable selection for high-dimensional data,
where the number of predictor variables p is potentially much larger than the number of samples n. However, it was recently
discovered (Zhao and Yu, 2006; Zou, 2005; Meinshausen and Buhlmann, 2006) that the sparsity pattern of the Lasso estimator
can only be asymptotically identical to the true sparsity pattern if the design matrix satisfies the so-called irrepresentable
condition. The latter condition can easily be violated in applications due to the presence of highly correlated variables.
Here we examine the behavior of the Lasso estimators if the irrepresentable condition is relaxed. Even though the Lasso cannot
recover the correct sparsity pattern, we show that the estimator is still consistent in the l_2-norm sense for fixed designs
under conditions on (a) the number s(n) of non-zero components of the vector beta(n) and (b) the minimal singular values of
the design matrices that are induced by selecting of order s(n) variables. The results are extended to vectors beta in weak
l_q-balls with 0<q<1. Our results imply that, with high probability, all important variables are selected. The set of selected
variables is a useful (meaningful) reduction on the original set of variables (p(n) >n). Finally, our results are illustrated
with the detection of closely adjacent frequencies, a problem encountered in astrophysics.**Keyword note:**Meinshausen__Nicolai Yu__Bin**Report ID:**720**Relevance:**100

**Title:**Hierarchical Beta Processes and the Indian Buffet Process**Author(s):**Thibaux, Romain; Jordan, Michael I.; **Date issued:**November 2006

http://nma.berkeley.edu/ark:/28722/bk0004d2b07 (PDF) **Abstract:**We show that the beta process is the de Finetti mixing distribution underlying the Indian buffet process of Griffiths and
Ghahramani (2005). This result shows that the beta process plays the role for the Indian buffet process that the Dirichlet
process plays for Chinese restaurant process, a parallel that guides us in deriving analogs for the beta process of the many
known extensions of the Dirichlet process. In particular we define Bayesian hierarchies of beta processes and use the connection
to the beta process to develop posterior inference algorithms for the Indian buffet process. We also present an application
to document classification, exploring a relationship between the hierarchical beta process and smoothed naive Bayes models.**Keyword note:**Thibaux__Romain Jordan__Michael_I**Report ID:**719**Relevance:**100

**Title:**Probabilistic Analysis of Linear Programming Decoding**Author(s):**Daskalakis, Constantinos; Dimakis, Alexandros D. G.; Karp, Richard M.; Wainwright, Martin J.; **Date issued:**October 2006

http://nma.berkeley.edu/ark:/28722/bk0004d8424 (PDF) **Abstract:**We initiate the probabilistic analysis of linear programming (LP) decoding of low-density parity-check (LDPC) codes. Specifically,
we show that for a random LDPC code ensemble, the linear programming decoder of Feldman et al. succeeds in correcting a constant
fraction of errors with high probability. The fraction of correctable errors guaranteed by our analysis surpasses all prior
non-asymptotic results for LDPC codes, and in particular exceeds the best previous finite-length result on LP decoding by
a factor greater than ten. This improvement stems in part from our analysis of probabilistic bit-flipping channels, as opposed
to adversarial channels. At the core of our analysis is a novel combinatorial characterization of LP decoding success, based
on the notion of a generalized matching. An interesting by-product of our analysis is to establish the existence of "almost
expansion" in random bipartite graphs, in which one requires only that almost every (as opposed to every) set of a certain
size expands, with expansion coefficients much larger than the classical case.**Keyword note:**Daskalakis__Constantinos Dimakis__Alexandros Karp__Richard_M Wainwright__Martin**Report ID:**718**Relevance:**100

**Title:**A mutation-selection model for general genotypes with recombination**Author(s):**Evans, Steven N.; Steinsaltz, David; Wachter, Kenneth W.; **Date issued:**September 2006

http://nma.berkeley.edu/ark:/28722/bk0004d2b6j (PDF) **Abstract:**A probability model is presented for the dynamics of mutation-selection balance in a infinite-population infinite-sites setting
sufficiently general to cover mutation-driven changes in full age-specific demographic schedules. An earlier work by the same
authors presented a haploid model -- without genetic recombination -- of similar scope. This work complements that model,
adding genetic recombination, based on a well-known general discrete-population genetic model of N. Barton and M. Turelli.
The model with recombination is a flow on Poisson intensities, substantially different from the haploid model. It is shown
that the new model arises from the haploid model when recombination is added, in the limit as generations per unit time go
to infinity, and selection strength and mutation per generation go to 0.**Keyword note:**Evans__Steven_N Steinsaltz__David Wachter__Kenneth**Report ID:**717**Relevance:**100

**Title:**Regularized estimation of large covariance matrices**Author(s):**Bickel, Peter J.; Levina, Elizaveta; **Date issued:**September 2006

http://nma.berkeley.edu/ark:/28722/bk0000n339j (PDF) **Abstract:**This paper considers estimating a covariance matrix of p variables from n oberservations by either banding the sample covariance
matrix or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the
operator norm as long as (log p)^2/n converges to 0, and obtain explicit rates. The results are uniform over some fairly natural
well-conditioned families of covariance matices. We also introduce an analogue of the Gaussian white noise model and show
that if the population covariance is embeddable in that model and well-conditioned then the banded approximations produce
consistent estimates of eigenvalues and associated eigenvectors of the covariance matrix. The results can be extended to smooth
versions of banding and to non-Gaussian distributions with sufficient short tails. A resampling approach is proposed for choosing
the banding parameter in practice. This approach is illustrated numerically on both simulated and real data.**Keyword note:**Bickel__Peter_John Levina__Elizaveta**Report ID:**716**Relevance:**100

**Title:**Kernel Dimension Reduction in Regression**Author(s):**Fukumizu, Kenji; Bach, Francis R.; Jordan, Michael I.; **Date issued:**September 2006

http://nma.berkeley.edu/ark:/28722/bk0000n337f (PDF) **Abstract:**We present a new methodology for sufficient dimension reduction (SDR). Our methodology derives directly from a formulation
of SDR in terms of the conditional independence of the covariate $X$ from the response $Y$, given the projection of $X$ on
the central subspace (Li, 1991; Cook, 1998). We show that this conditional independence assertion can be characterized in
terms of conditional covariance operators on reproducing kernel Hilbert spaces and we show how this characterization leads
to an M-estimator for the central subspace. The resulting estimator is shown to be consistent under weak conditions; in particular,
we do not have to impose linearity or ellipticity conditions of the kinds that are generally invoked for SDR methods. We
also present empirical results showing that the new methodology is competitive in practice.**Keyword note:**Fukumizu__Kenji Bach__Francis_R Jordan__Michael_I**Report ID:**715**Relevance:**100

**Title:**On Detecting Periodicity in Astronomical Point Processes**Author(s):**Bickel, Peter; Kleijn, Bas; Rice, John; **Date issued:**August 2006

http://nma.berkeley.edu/ark:/28722/bk0000n335b (PDF) **Abstract:**We consider the problem of detecting periodicity in the rate function of a point process or a marked point process, motivated
by the problem of detecting $\gamma$-ray pulsars. The detection problem poses both theoretical and computational challenges.
On the theoretical side, there are no compelling optimality results that dictate the choice of a detection algorithm and the
properties of detection procedures can be quite difficult to analyze. On the computational side, searching over a range of
frequency and frequency drift can be a daunting task, even for a record consisting of only a thousand or so events. We discuss
a class of detection procedures, weighted quadratic test statistics arising from likelihood expressions, whose properties
we can understand and which do not impose excessive computational burdens. We show how knowledge of the point spread function
associated with photon arrivals can be incorporated to improve power. We show that if a search over frequencies is conducted
by discretizing a frequency band, the discretization must be very fine and we discuss the use of integration over frequency
bands as an alternative. We also discuss the use of extreme value theory in conjunction with simulation in assessing statistical
significance for such a search.**Keyword note:**Bickel__Peter_John Kleijn__Bas Rice__John_Andrew**Report ID:**714**Relevance:**100

**Title:**Damage segregation at fissioning may increase growth rates: A superprocess model**Author(s):**Evans, Steven N.; Steinsaltz, David; **Date issued:**August 2006

http://nma.berkeley.edu/ark:/28722/bk0000n3337 (PDF) **Abstract:**A fissioning organism may purge unrepairable damage by bequeathing it preferentially to one of its daughters. We propose a
superprocess model, and show that when damage accumulates deterministically, optimal growth is achieved by unequal division
of damage between the daughters.**Keyword note:**Evans__Steven_N Steinsaltz__David**Report ID:**713**Relevance:**100

**Title:**Measuring Similarity between Gene Expression Profiles with the Consideration of Both Shape and Magnitude**Author(s):**Kim, Kyungpil; Jiang, Keni; Zhang, Shibo; Cai, Li; Lee, In-Beum; Feldman, Lewis; Huang, Haiyan; **Date issued:**June 2006

http://nma.berkeley.edu/ark:/28722/bk0000n3314 (PDF) **Abstract:**Clustering methods have been widely applied to gene expression data in order to group genes sharing common or similar expression
profiles into discrete functional groups. In such analyses, designing an appropriate (dis)similarity measure is critical.
Motivated by the Poisson based similarity measure PoissonC designed for SAGE data (Cai et al., 2004), we explore more generally
applicable similarity measures in clustering analysis that consider both shape and magnitude of the gene expression profile.
Our idea is to model the shape and magnitude information separately and use the estimated shape and magnitude parameters to
define a similarity measure in a new data space, wherein each dimension represents different aspects of an expression profile
shape. We expect that our new measure would be more effective to detect shape changes compared to PoissonC and have necessary
sensitivity to magnitude. The application results of our new measure to different types of expression data demonstrate the
effectiveness of our method.**Keyword note:**Kim__Ki_Mok Jiang__Keni Zhang__Shibo Cai__Li Lee__In-Beum Feldman__Lewis Huang__Haiyan**Report ID:**712**Relevance:**100

**Title:**A statistical framework to infer functional gene associations from multiple biologically interrelated microarray experiments**Author(s):**Teng, Siew-Leng Melinda; Zhou, Jasmine; Huang, Haiyan; **Date issued:**June 2006

http://nma.berkeley.edu/ark:/28722/bk0000n3291 (PDF) **Abstract:**Inferring functional gene relationships is a major step in understanding biological networks. With microarray data from an
increasing number of biologically interrelated experiments, it now allows for more complete portrayals of functional gene
relationships involved in biological processes. In current studies of gene relationships, the existence of dependencies between
gene expressions from the biologically interrelated experiments, however, has been widely ignored. When not accounted for,
these experimental dependencies can result in inaccurate inferences of functional gene relationships, and hence incorrect
biological conclusions. This article proposes a statistical framework and a novel gene co?expression measure, named Knorm
correlation, to address this problem. The most important aspect of the proposed model is its ability to decompose the interesting
biological variations in gene expressions into two mutually independent components each arising from the genes and the experiments,
in addition to variations due to random noises. As a result, the Knorm correlation can critically de-correlate the experimental
dependencies before estimating the gene relationships, thus leading to improved accuracies in inferring functional gene relationships.
Knorm correlation simplifies to the Pearson coefficient when experiments are uncorrelated. Using simulation studies, a yeast
microarray and a human microarray dataset, we demonstrate the success of the Knorm correlation as a more accurate and reliable
measure, and the adverse impact of experimental dependencies on the Pearson coefficient, in inferring functional gene relationships
from interrelated and interdependent experiments**Keyword note:**Teng__Siew-Leng_Melinda Zhou__Jasmine Huang__Haiyan**Report ID:**711**Relevance:**100

**Title:**Expectation, Conditional Expectation and Martingales in Local Fields**Author(s):**Evans, Steven N.; Lidman, Tye; **Date issued:**June 2006

http://nma.berkeley.edu/ark:/28722/bk0000n327x (PDF) **Abstract:**We investigate a possible definition of expectation and conditional expectation for random variables with values in a local
field such as the $p$-adic numbers. We define the expectation by analogy with the observation that for real-valued random
variables in $L^2$ the expected value is the orthogonal projection onto the constants. Previous work has shown that the local
field version of $L^\infty$ is the appropriate counterpart of $L^2$, and so the expected value of a local field-valued random
variable is defined to be its "projection" in $L^\infty$ onto the constants. Unlike the real case, the resulting projection
is not typically a single constant, but rather a ball in the metric on the local field. However, many properties of this
expectation operation and the corresponding conditional expectation mirror those familiar from the real-valued case; for example,
conditional expectation is, in a suitable sense, a contraction on $L^\infty$ and the tower property holds. We also define
the corresponding notion of martingale, show that several standard examples of martingales (for example, sums or products
of suitable independent random variables or "harmonic" functions composed with Markov chains) have local field analogues,
and obtain versions of the optional sampling and martingale convergence theorems.**Keyword note:**Evans__Steven_N Lidman__Tye**Report ID:**710**Relevance:**100

**Title:**Sharp thresholds for high-dimensional and noisy recovery**Author(s):**Wainwright, Martin J.; **Date issued:**June 2006

http://nma.berkeley.edu/ark:/28722/bk0000n325t (PDF) **Abstract:**The problem of consistently estimating the sparsity pattern of a vector $\betastar \in \real^\mdim$ based on observations
contaminated by noise arises in various contexts, including subset selection in regression, structure estimation in graphical
models, sparse approximation, and signal denoising. We analyze the behavior of $\ell_1$-constrained quadratic programming
(QP), also referred to as the Lasso, for recovering the sparsity pattern. Our main result is to establish a sharp relation
between the problem dimension $\mdim$, the number $\spindex$ of non-zero elements in $\betastar$, and the number of observations
$\numobs$ that are required for reliable recovery. For a broad class of Gaussian ensembles satisfying mutual incoherence conditions,
we establish existence and compute explicit values of thresholds $\ThreshLow$ and $\ThreshUp$ with the following properties:
for any $\threshbou > 0$, if $\numobs > 2 \, \spindex ( \ThreshUp + \threshbou) \log (\mdim - \spindex) + \spindex + 1$, then
the Lasso succeeds in recovering the sparsity pattern with probability converging to one for large problems, whereas for $\numobs
< 2 \, \spindex ( \ThreshLow - \threshbou) \log (\mdim - \spindex) + \spindex + 1$, then the probability of successful recovery
converges to zero. For the special case of the uniform Gaussian ensemble, we show that $\ThreshLow = \ThreshUp = 1$, so that
the threshold is sharp and exactly determined.**Keyword note:**Wainwright__Martin**Report ID:**709**Relevance:**100

**Title:**On optimal quantization rules for some sequential decision problems**Author(s):**Nguyen, Xuanlong; Wainwright, Martin J.; Jordan, Michael I.; **Date issued:**June 2006

http://nma.berkeley.edu/ark:/28722/bk0000n323q (PDF) **Abstract:**We consider the problem of sequential decentralized detection, a problem that entails several interdependent choices: the
choice of a stopping rule (specifying the sample size), a global decision function (a choice between two competing hypotheses),
and a set of quantization rules (the local decisions on the basis of which the global decision is made). In this paper we
resolve an open problem concerning whether optimal local decision functions for the Bayesian formulation of sequential decentralized
detection can be found within the class of stationary rules. We develop an asymptotic approximation to the optimal cost of
stationary quantization rules and show how this approximation yields a negative answer to the stationarity question. We also
consider the class of blockwise stationary quantizers and show that asymptotically optimal quantizers are likelihood-based
threshold rules.**Keyword note:**Nguyen__XuanLong Wainwright__Martin Jordan__Michael_I**Report ID:**708**Relevance:**100

**Title:**Representation of Radon Shape Diffusions via Hyperspherical Brownian Motion**Author(s):**Panaretos, Victor M.; **Date issued:**April 2006

http://nma.berkeley.edu/ark:/28722/bk0000n312n (PDF) **Abstract:**A framework is introduced for the study of general Radon shape diffusions, that is, shape diffusions induced by projections
of randomly rotating shapes [Panaretos, 2006]. This is done via a convenient representation of unoriented Radon shape diffusions
in (unoriented) D.G. Kendall shape space $\widetilde(\Sigma)_n^k$ through a Brownian motion on the hypersphere. This representation
leads to a coordinate system for the generalized version of Radon diffusions since it is shown that shape cna be essentially
identified with unoriented shape in the projected case. A bijective correspondence between Brownian motion on real projective
space and Radon shape diffusions is established. Furthermore, equations are derived for the general (unoriented) Radon diffusion
of shape-and-size, and stationary measures are discussed. References: Panaretos, V.M. (June 2006). The diffusion of Radon
shape. Adv. App. Prob. 38 (2), forthcoming.**Keyword note:**Panaretos__Victor**Report ID:**707**Relevance:**100

**Title:**Embracing Statistical Challenges in the Information Technology Age**Author(s):**Yu, Bin; **Date issued:**March 2006

http://nma.berkeley.edu/ark:/28722/bk0000n321m (PDF) **Abstract:**Information Technology is creating an exciting time for statistics. In this article, we review the diverse sources of IT data
in three clusters: IT core, IT systems, and IT fringe. The new data forms, huge data volumes, and high data speeds of IT are
contrasted against the constraints on storage, transmission and computation to point to the challenges and opportunities.
In particular, we describe the impacts of IT on a typical statistical investigation of data collection, data visualization,
and model tting, with an emphasis on computation and feature selection. Moreover, two research projects on network tomography
and arctic cloud detection are used throughout the paper to bring the discussions to a concrete level.**Keyword note:**Yu__Bin**Report ID:**706**Relevance:**100

**Title:**Non-equilibrium theory of the allele frequency spectrum**Author(s):**Evans, Steven N.; Shvets, Yelena; Slatkin, Montgomery; **Date issued:**April 2006

http://nma.berkeley.edu/ark:/28722/bk0000n3090 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n310j (PostScript) **Abstract:**A forward diffusion equation describing the evolution of the allele frequency spectrum is presented. The influx of mutations
is accounted for by imposing a suitable boundary condition. For a Wright-Fisher diffusion with or without selection and varying
population size, the boundary condition is $\lim_(x \downarrow 0) x f(x,t)=\theta \rho(t)$, where $f(\cdot,t)$ is the frequency
spectrum of derived alleles at independent loci at time $t$ and $\rho(t)$ is the relative population size at time $t$. When
population size and selection intensity are independent of time, the forward equation is equivalent to the backwards diffusion
usually used to derive the frequency spectrum, but the forward equation allows computation of the time dependence of the spectrum
both before an equilibrium is attained and when population size and selection intensity vary with time. From the diffusion
equation, we derive a set of ordinary differential equations for the moments of $f(\cdot,t)$ and express the expected spectrum
of a finite sample in terms of those moments. We illustrate the use of the forward equation by considering neutral and selected
alleles in a highly simplified model of human history. For example, we show that approximately 30\% of the expected heterozygosity
of neutral loci is attributable to mutations that arose since the onset of population growth in roughly the last $150,000$
years.**Keyword note:**Evans__Steven_N Shvets__Yelena Slatkin__Montgomery**Report ID:**705**Relevance:**100

**Title:**Comparison of MISR aerosol optical thickness with AERONET measurements in Beijing Metropolitan Area**Author(s):**Jiang, Xin; Liu, Yang; Yu, Bin; Jiang, Ming; **Date issued:**March 2006

http://nma.berkeley.edu/ark:/28722/bk0000n319h (PDF) **Abstract:**Aerosol optical thickness (AOT) retrieved by the Multi-angle Imaging SpectroRadiometer (MISR) from 2002 to 2004 were compared
with AOT measurements from an Aerosol Robotic Network (AERONET) site located in Beijing urban area. MISR and AERONET AOTs
were highly correlated, with an overall linear correlation coefficient of 0.93 at 558nm wavelength. On average, MISR AOT at
558 nm was 30% lower than the AERONET AOT at 558 nm interpolated from 440 nm and 675 nm. A linear regression analysis using
AERONET AOT as the response yielded a slope of 0.58 and an intercept of 0.07 in the green band with similar results in the
other three bands, indicating that MISR substantially underestimates AERONET AOT. After applying a narrower averaging time
window to control for temporal variability, the agreement between MISR and AERONET AOTs were significantly improved with the
correlation coefficient of 0.97 and a slope of 0.71 in an ordinary linear least squares fit. A weighted linear least squares,
which reduces the impact of spatial averaging, yielded a better result with the slope going up to 0.73. The best agreement
was achieved with the slope of 0.91 when only the central points are Abstract included in the regression analysis. By investigating
PM10 spatial distribution of Beijing, we found substantial spatial variations of aerosol loading, which can introduce uncertainty
when validating MISR AOT. Our findings also suggest that MISR aerosol retrieval algorithm might need to be adjusted for the
extremely high aerosol loadings and substantial spatial variations that it will probably encounter in heavily polluted metropolitan
areas.**Keyword note:**Jiang__Xin Liu__Yang Yu__Bin Jiang__Ming**Report ID:**704**Relevance:**100