**Statistics Technical Reports:**Search | Browse by year

**Term(s):**2001**Results:**25**Sorted by:****Page: 1 2 Next**

**Title:**Elementary divisors and determinants of random matrices over a local field**Author(s):**Evans, Steven N.; **Date issued:**Dec 2001

http://nma.berkeley.edu/ark:/28722/bk0000n208z (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n209h (PostScript) **Abstract:**We consider the elementary divisors and determinant of a uniformly distributed $n \times n$ random matrix $M_n$ with entries
in the ring of integers of an arbitrary local field. We show that the sequence of elementary divisors is in a simple bijective
correspondence with a Markov chain on the nonnegative integers. The transition dynamics of this chain do not depend on the
size of the matrix. As $n \rightarrow \infty$, all but finitely many of the elementary divisors are $1$, and the remainder
arise from a Markov chain with these same transition dynamics. We also obtain the distribution of the determinant of $M_n$
and find the limit of this distribution as $n \rightarrow \infty$. Our formulae have connections with classical identities
for $q$-series, and the $q$-binomial theorem in particular.**Keyword note:**Evans__Steven_N**Report ID:**614**Relevance:**100

**Title:**Markov processes on vermiculated spaces**Author(s):**Barlow, Martin T.; Evans, Steven N.; **Date issued:**Nov 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2f3x (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2f4g (PostScript) **Abstract:**A general technique is given for constructing new Markov processes from existing ones. The new process and its state space
are both projective limits of sequences built by an iterative scheme. The space at each stage in the scheme is obtained by
taking disjoint copies of the space at the previous stage and quotienting to identify certain distinguished points. Away
from the distinguished points, the process at each stage evolves like the one constructed at the previous stage on some copy
of the previous state space, but when the process hits a distinguished point it enters at random another of the copies ``pinned''
at that point. Special cases of this construction produce diffusions on fractal-like objects that have been studied recently.**Keyword note:**Barlow__Martin_T Evans__Steven_N**Report ID:**613**Relevance:**100

**Title:**Improving the Accuracy of the Census Through Adjustment**Author(s):**Freedman, David A.; Wachter, Kenneth W.; **Date issued:**Nov 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2d85 (PDF) **Abstract:**In this article, we sketch procedures for taking the census, making adjustments, and evaluating the results. Despite what
you read in the newspapers, the census is remarkably accurate. Statistical adjustment is unlikely to improve on the census,
because adjustment can easily put in more error than it takes out. Indeed, error rates in the adjustment turn out to be comparable
to errors in the census. The data suggest a strong geographical pattern to such errors, even after controlling for demography--
which contradicts a basic premise of adjustment. The complex demographic controls built into the adjustment mechanism turn
out to be counter-productive. Proponents of adjustment have cited "loss function analysis" to compare the accuracy of the
census and adjustment, generally to the advantage of the latter. However, the chosen analyses make assumptions that are highly
stylized, and quite favorable to adjustment. With more realistic assumptions, loss function analysis is neutral, or favors
the census. At the heart of the adjustment mechanism, there a large sample survey-- the post enumeration survey. The size
of the survey cannot be justified. The adjustment process now consumes too large a share of the Census Bureau's scarce resources,
which should be reallocated to other Bureau programs.**Keyword note:**Freedman__David Wachter__Kenneth**Report ID:**612**Relevance:**100

**Title:**What is the Chance of an Earthquake**Author(s):**Freedman, D. A.; Stark, P. B.; **Date issued:**Sep 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2d5h (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2d62 (PostScript) **Abstract:**What is the chance that at least one earthquake of magnitude 6.7 or greater will occur before the year 2030 in the San Francisco
Bay Area? The U.S. Geological Survey estimated the chance to be 0.7 +/- 0.1. In this paper, we try to interpret such probabilities.
Making sense of earthquake forecasts is surprisingly difficult. In part, this is because the forecasts are based on a complicated
mixture of geological maps, empirical rules of thumb, expert opinion, physical models, stochastic models, numerical simulations,
as well as geodetic, seismic, and paleoseismic data. Even the concept of probability is hard to define in this context.
We examine the problems in applying standard definitions of probability to earthquakes, taking the USGS forecast---the product
of a particularly careful and ambitious study---as our lead example. The issues are general, and concern the interpretation
more than the numerical values. Despite the work involved in the USGS forecast, the probability estimate is shaky, as is the
estimate of its uncertainty.**Keyword note:**Freedman__David Stark__Philip_B**Report ID:**611**Relevance:**100

**Title:**Modelling Movements of Free-Ranging Animals**Author(s):**Brillinger, David R.; Preisler, Haiganoush K.; Ager, Alan A.; Kie, John G.; Stewart, Brent S.; **Date issued:**Sep 2001

http://nma.berkeley.edu/ark:/28722/bk0000n3z35 (PDF) **Abstract:**This work derives and fits stochastic models to the trajectories of mammals moving about in a heterogeneous landscape. The
basic data are locations of 53 Rocky Mountain elk ((\it Cervus elaphus)) estimated approximately every two-hours for nine
months. The elk roam about the Starkey Experimental Forest and Range in eastern Oregon. Elk movements may be affected by explanatory
variables such as the locations of fences, of roads, of cover, of water, of forage and other habitat characteristics. Wildlife
biologists are interested in questions like how an elk's movement relates to such explanatories. In the work a model was developed
in successive stages. First equations of motion were set down motivated by the idea of a potential function. Then the functional
parameters appearing in the equations were estimated nonparametrically. Statistical questions arising involved how to include
explanatory variables in the equations and how to decide which variables are significant? Residual plots proved useful. Time
of day was found to play a fundamental role and distance to nearest road enters as well. Future work will include other explanatories.**Keyword note:**Brillinger__David_R Preisler__Haiganoush_Krikorian Ager__Alan_A Kie__John_G Stewart__Brent_S**Report ID:**610**Relevance:**100

**Title:**Inverse Problems as Statistics**Author(s):**Evans, S. N.; Stark, P. B.; **Date issued:**Aug 2001

http://nma.berkeley.edu/ark:/28722/bk0000n3n42 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n3n5m (PostScript) **Abstract:**What mathematicians, scientists, engineers, and statisticians mean by ``inverse problem'' differs. For a statistician, an
inverse problem is an inference or estimation problem. The data are finite in number and contain errors, as they do in classical
estimation or inference problems, and the unknown typically is infinite-dimensional, as it is in nonparametric regression.
The additional complication in an inverse problem is that the data are only indirectly related to the unknown. Canonical
abstract formulations of statistical estimation problems subsume this complication by allowing probability distributions to
be indexed in more-or-less arbitrary ways by parameters, which can be infinite-dimensional. Standard statistical concepts,
questions, and considerations such as bias, variance, mean-squared error, identifiability, consistency, efficiency, and various
forms of optimality, apply to inverse problems. This article discusses inverse problems as statistical estimation and inference
problems, and points to the literature for a variety of techniques and results.**Keyword note:**Evans__Steven_N Stark__Philip_B**Report ID:**609**Relevance:**100

**Title:**Hitting, occupation, and inverse local times of one-dimensional diffusions: martingale and excursion approaches**Author(s):**Pitman, Jim; Yor, Marc; **Date issued:**Nov 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2f08 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2f1t (PostScript) **Abstract:**Basic relations between the distributions of hitting, occupation, and inverse local times of a one-dimensional diffusion
process $X$, first discussed by \Ito-McKean, are reviewed from the perspectives of martingale calculus and excursion theory.
These relations, and the technique of conditioning on $L_T^y$, the local time of $X$ at level $y$ before a suitable random
time $T$, yield formulae for the joint Laplace transform of $L_T^y$ and the times spent by $X$ above and below level $y$
up to time $T$.**Keyword note:**Pitman__Jim Yor__Marc**Report ID:**607**Relevance:**100

**Title:**Boosting with the $L_2$-Loss: Regression and Classification**Author(s):**Bühlmann, Peter; Yu, Bin; **Date issued:**Aug 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2c96 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2d0r (PostScript) **Abstract:**This paper investigates a variant of boosting, $L_2$Boost, which is constructed from a functional gradient descent algorithm
with the $L_2$-loss function. Based on an explicit stagewise refitting expression of $L_2$Boost, the case of (symmetric)
linear weak learners is studied in detail in both regression and two-class classification. In particular, with the boosting
iteration $m$ working as the smoothing or regularization parameter, a new exponential bias-variance trade off is found with
the variance (complexity) term bounded as $m$ tends to infinity. When the weak learner is a smoothing spline, an optimal rate
of convergence result holds for both regression and two-class classification. And this boosted smoothing spline adapts to
higher order, unknown smoothness. Moreover, a simple expansion of the 0-1 loss function is derived to reveal the importance
of the decision boundary, bias reduction, and impossibility of an additive bias-variance decomposition in classification.
Finally, simulation and real data set results are obtained to demonstrate the attractiveness of $L_2$Boost, particularly with
a novel component-wise cubic smoothing spline as an effective and practical weak learner.**Keyword note:**Buhlmann__Peter Yu__Bin**Report ID:**605**Relevance:**100

**Title:**Large scale inference and tomography for network monitoring and diagnosis**Author(s):**Coates, Mark; Hero, Alfred; Nowak, Robert; Yu, Bin; **Date issued:**Aug 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2c6j (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2c73 (PostScript) **Abstract:**Today's Internet is a massive, distributed network which continues to explode in size as e-commerce and related activities
grow. The heterogeneous and largely unregulated structure of the Internet renders tasks such as dynamic routing, optimized
service provision, service level verification, and detection of anamolous/malicious behavior increasingly challenging tasks.
The problem is compounded by the fact that one cannot rely on the cooperation of individual servers and routers to aid in
the collection of network traffic measurements vital for these tasks. In many ways, network monitoring and inference problems
bear a strong resemblance to other ``inverse problems'' in which key aspects of a system are not directly observable. Familiar
signal processing problems such as tomographic image reconstruction, pattern recognition, system identification, and array
processing all have interesting interpretations in the networking context. This article introduces the new field of large-scale
network inference, a field which we believe will benefit greatly from the wealth of signal processing research and algorithms.**Keyword note:**Coates__Mark Hero__Alfred Nowak__Robert Yu__Bin**Report ID:**604**Relevance:**100

**Title:**Eigenvalues of random wreath products**Author(s):**Evans, Steven N.; **Date issued:**Jul 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2c3w (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2c4f (PostScript) **Abstract:**Consider a uniformly chosen element $X_n$ of the $n$-fold wreath product $\Gamma_n = G \wr G \wr \dots \wr G$, where $G$
is a finite permutation group acting transitively on some set of size $s$. The eigenvalues of $X_n$ in the natural $s^n$-dimensional
permutation representation are investigated by considering the random measure $\Xi_n$ on the unit circle that assigns mass
$1$ to each eigenvalue. It is shown that if $f$ is a trigonometric polynomial, then $\lim_(n \rightarrow \infty) P\(\int
f d\Xi_n \ne s^n \int f d\lambda\)=0$, where $\lambda$ is normalised Lebesgue measure on the unit circle. In particular, $s^(-n)
\Xi_n$ converges weakly in probability to $\lambda$ as $n \rightarrow \infty$. For a large class of test functions $f$ with
non-terminating Fourier expansions, it is shown that there exists a constant $c$ and a non-zero random variable $W$ (both
depending on $f$) such that $c^(-n) \int f d\Xi_n$ converges in distribution as $n \rightarrow \infty$ to $W$. These results
have applications to Sylow $p$-groups of symmetric groups and autmorphism groups of regular rooted trees.**Keyword note:**Evans__Steven_N**Report ID:**603**Relevance:**100

**Title:**Non-Parametric Estimators Which Can Be ``Plugged-In''**Author(s):**Bickel, Peter J.; Ritov, Ya'acov; **Date issued:**Jul 2001

http://nma.berkeley.edu/ark:/28722/bk0000n295z (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n296h (PostScript) **Abstract:**We consider nonparametric estimation of an object such as a probability density or a regression function. Can such an estimator
achieve the minimax rate of convergence on suitable function spaces, while, at the same time, when ``plugged-in'', estimate
efficiently (at a rate of $n^(-1/2)$ with the best constant) many functionals of the object? For example, can we have a density
estimator whose definite integrals are efficient estimators of the cumulative distribution function? We show that this is
impossible for very large sets, e.g., expectations of all functions bounded by $M<\en$. However we also show that it is possible
for sets as large as indicators of all quadrants, i.e., distribution functions. We give appropriate constructions of such
estimates.**Keyword note:**Bickel__Peter_John Ritov__Yaacov**Report ID:**602**Relevance:**100

**Title:**On Specifying Graphical Models for Causation**Author(s):**Freedman, David A.; **Date issued:**Jun 2001

http://nma.berkeley.edu/ark:/28722/bk0000n293v (PDF) **Abstract:**This paper (which is mainly expository) sets up graphical models for causation, having a bit less than the usual complement
of hypothetical counterfactuals. Assuming the invariance of error distributions may be essential for causal inference, but
the errors themselves need not be invariant. Graphs can be interpreted using conditional distributions, so that we can better
address connections between the mathematical framework and causality in the world. The identification problem is posed in
terms of conditionals. As will be seen, causal relationships cannot be inferred from a data set by running regressions unless
there is substantial prior knowledge about the mechanisms that generated the data. The idea can be made more precise in several
ways. There are few successful applications of graphical models, mainly because few causal pathways can be excluded on a priori
grounds. The invariance conditions themselves remain to be assessed.**Keyword note:**Freedman__David**Report ID:**601**Relevance:**100

**Title:**Applications of resampling methods to estimate the number of clusters and to improve the accuracy of a clustering method**Author(s):**Fridlyand, Jane; Dudoit, Sandrine; **Date issued:**Sep 2001

http://nma.berkeley.edu/ark:/28722/bk0000n3z0h (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n3z12 (PostScript) **Abstract:**The burgeoning field of genomics, and in particular microarray experiments, have revived interest in both discriminant and
cluster analysis, by raising new methodological and computational challenges. The present paper discusses applications of
resampling methods to problems in cluster analysis. A resampling method, known as bagging in discriminant analysis, is applied
to increase clustering accuracy and to assess the confidence of cluster assignments for individual observations. A novel prediction-based
resampling method is also proposed to estimate the number of clusters, if any, in a dataset. The performance of the proposed
and existing methods are compared using simulated data and gene expression data from four recently published cancer microarray
studies.**Keyword note:**Fridlyand__Jane Dudoit__Sandrine**Report ID:**600**Relevance:**100

**Title:**Random logistic maps and Lyapunov exponents**Author(s):**Steinsaltz, D.; **Date issued:**Jun 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2c07 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2c1s (PostScript) **Abstract:**We prove that under certain basic regularity conditions, a random iteration of logistic maps converges to a random point attractor
when the Lyapunov exponent is negative, and does not converge to a point when the Lyapunov exponent is positive.**Keyword note:**Steinsaltz__David**Report ID:**599**Relevance:**100

**Title:**Convergence of Moments in a Markov-chain central limit theorem**Author(s):**Steinsaltz, D.; **Date issued:**Jul 2001

http://nma.berkeley.edu/ark:/28722/bk0000n2b7k (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2b84 (PostScript) **Abstract:**We show that all moments of the partial sum process of a test function g along the paths of a V-uniformly ergodic Markov chain
converge to the corresponding moments of a normal variable. For the n-th moment to converge, g^n must be bounded by a constant
times V. We also derive starting-point dependent bounds on the rate of convergence.**Keyword note:**Steinsaltz__David**Report ID:**598**Relevance:**100

**Title:**Poisson-Dirichlet and GEM invariant distributions for split-and-merge transformations of an interval partition**Author(s):**Pitman, Jim; **Date issued:**Jun 2001

http://nma.berkeley.edu/ark:/28722/bk0000n1q44 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n1q5p (PostScript) **Abstract:**This paper introduces a split-and-merge transformation of interval partitions which combines some features of one model studied
by Gnedin and Kerov and another studied by Tsilevich and by Mayer-Wolf, Zeitouni and Zerner. The invariance under this split-and-merge
transformation of the interval partition generated by a suitable Poisson process yields a simple proof of the recent result
of Mayer-Wolf, Zeitouni and Zerner that a Poisson-Dirichlet distribution is invariant for a closely related fragmentation-coagulation
process. Uniqueness and convergence to the invariant measure are established for the split-and-merge transformation of interval
partitions, but the corresponding problems for the fragmentation-coagulation process remain open.**Keyword note:**Pitman__Jim**Report ID:**597**Relevance:**100

**Title:**Census 2000**Author(s):**Freedman, D. A.; Wachter, K. W.; **Date issued:**Apr 2001

http://nma.berkeley.edu/ark:/28722/bk0000n3m4j (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n3m53 (PostScript) **Abstract:**The census has been taken every ten years since 1790. Counts are used to apportion Congress and to redistrict states. Furthermore,
census data are the basis for allocations of federal tax money to cities and other local governments. For such purposes,
the geographical distribution of the population matters, rather than counts for the nation as a whole. The census turns out
to be remarkably good, despite the generally bad press reviews. Statistical adjustment is unlikely to improve the accuracy:
adjustment may well put in more error than it takes out. In this article, we sketch procedures for taking the census, evaluating
it, and making adjustments. Pointers to the literature will be found at the end of the article, including citations to the
main arguments for and against adjustment.**Keyword note:**Freedman__David Wachter__Kenneth**Report ID:**596**Relevance:**100

**Title:**Invariance Principles for Non-uniform Random Mappings and Trees**Author(s):**Aldous, David; Pitman, Jim; **Date issued:**Dec 2001

http://nma.berkeley.edu/ark:/28722/bk0000n287j (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n2883 (PostScript) **Abstract:**In the context of uniform random mappings of an n-element set to itself, Aldous and Pitman (1994) established a functional
invariance principle, showing that many limit distributions as n tends to infinity can be described as distributions of suitable
functions of reflecting Brownian bridge. To study non-uniform cases, in this paper we formulate a "sampling invariance principle"
in terms of iterates of a fixed number of random elements. We show that the sampling invariance principle implies many,
but not all, of the distributional limits implied by the functional invariance principle. We give direct verifications of
the sampling invariance principle in two successive generalizations of the uniform case, to p-mappings (where elements are
mapped to i.i.d. non-uniform elements) and P-mappings (where elements are mapped according to a Markov matrix). We compare
with parallel results in the simpler setting of random trees.**Keyword note:**Aldous__David_J Pitman__Jim**Report ID:**594**Relevance:**100

**Title:**Random mappings, forests, and subsets associated with Abel-Cayley-Hurwitz multinomial expansions**Author(s):**Pitman, Jim; **Date issued:**Jun 2001

http://nma.berkeley.edu/ark:/28722/bk0000n4213 (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n422n (PostScript) **Abstract:**Various random combinatorial objects, such as mappings, trees, forests, and subsets of a finite set, are constructed with
probability distributions related to the binomial and multinomial expansions due to Abel, Cayley and Hurwitz. Relations between
these combinatorial objects, such as Joyal's bijection between mappings and marked rooted trees, have interesting probabilistic
interpretations, and applications to the asymptotic structure of large random trees and mappings. An extension of Hurwitz's
binomial formula is associated with the probability distribution of the random set of vertices of a fringe subtree in a random
forest whose distribution is defined by terms of a multinomial expansion over rooted labeled forests.**Pub info:**S{\'e}minaire Lotharingien de Combinatoire, Issue 46, (45 pp.) 2001**Keyword note:**Pitman__Jim**Report ID:**593**Relevance:**100

**Title:**A different construction of Gaussian fields from Markov chains: Dirichlet covariances**Author(s):**Diaconis, Persi; Evans, Steven N.; **Date issued:**Apr 2001

http://nma.berkeley.edu/ark:/28722/bk0000n1s0z (PDF)

http://nma.berkeley.edu/ark:/28722/bk0000n1s1h (PostScript) **Abstract:**We study a class of Gaussian random fields with negative correlations. These fields are easy to simulate. They are defined
in a natural way from a Markov chain that has the index space of the Gaussian field as its state space. In parallel with
Dynkin's investigation of Gaussian fields having covariance given by the Green's function of a Markov process, we develop
connections between the occupation times of the Markov chain and the prediction properties of the Gaussian field. Our interest
in such fields was initiated by their appearance in random matrix theory.**Keyword note:**Diaconis__Persi Evans__Steven_N**Report ID:**592**Relevance:**100