Statistics
Basic Properties
- X is around , give or take
- if , are independent:
- , converse is true if and are bivariate
normal, extends to multivariate normal
Approximations
Law of Large Numbers
Let be IID, with expectation and variance
. . Let be realisations of the random variable ,
then
Central Limit Theorem
Let where IID.
Distributions
Poisson()
Normal
https://kfrankc.com/posts/2018/10/19/normal-dist-derivation
- When , is an even function, and where
is odd
- is the standard normal
Gamma
Distribution
Let , has a distribution with 1 d.f.
Let be IID, then
is with n degree freedom,
t-distribution
Let , be independent, has a t-distribution with n d.f.
- t is symmetric about 0
F-distribution
Let be independent, has an F distribution with (m,n) d.f.
If , is an F
distribution with (1,n) d.f, with :
For ,
Sampling
Let be IID .
Properties of and
- and are independent
Simple Random Sampling (SRS)
Assume random draws are made without replacement. (Not SRS, will
be corrected for later).
Summary of Lemmas
- : Lemma A
- For , : Lemma B
Estimation Problem
Let be random draws with replacement. Then
is an estimator of . and the observed value of
, is an estimate of .
Standard Error (SE)
SE of an is defined to be .
Without Replacement
SE is multiplied by , because is biased for
: , but N is normally large.
Confidence Interval
An approximate CI for is
Biased Measurements
Let , where ,
Suppose X is used to measure an unknown constant a, . , where is the bias.
Mean square error (MSE) is
with n IID measurements,
, hence
is a good measure of the accuracy of the estimate
of a.
Estimation of a Ratio
Consider a population of members, and two characteristics are
recorded: , .
An obvious estimator of r is
, where
is
the population covariance.
Properties
Population coefficient
Ratio Estimates
The bias is of order , small compared to its standard error.
is better than , having smaller
variance, when , where
Variance of can be estimated by
An approximate C.I. for is
Method of Moments
To estimate , express it as a function of moments
Monte Carlo
Monte Carlo is used to generate many realisations of random
variable.
, MOM estimators are
consistent (asymptotically unbiased).
:
: ,
:
Maximum Likelihood Estimator (MLE)
Poisson Case
ML estimate of is . ML estimator is
Normal case
Gamma case
Multinomial Case
where is the number of times the value occurs, and not the
number of trials. and are non-negative integers
summing to . :
same as MOM estimator.
CIs in MLE
Given the realisations and , is the exact CI for .
,
is the exact
CI for .
Distribution |
MLE |
Variance |
Po() |
|
|
Be() |
|
|
Bin(,) |
|
|
HWE tri |
|
|
General trinomial:
In all the above cases, .
Asymptotic Normality of MLE
As , in distribution, and hence
As , MLE is consistent.
SE of an estimate of is the SD of the estimator
, hence
Efficiency
Cramer-Rao Inequality: if is unbiased, then , , if =
then is efficient.
Sufficiency
Characterisation
Let . The sample space of , is the
disjoint union of across all possible values of .
is sufficient for if .
Factorisation Theorem
is sufficient for iff
Rao-Blackwell Theorem
Let be an estimator of with finite variance,
be sufficient for . Let . Then for every ,
. Equality holds iff
is a function of .
Random Conditional Expectation
- iff is a constant
Hypothesis Testing
Let be IID with density . null , . Critical region is
. and .
.
Critical region , and among all tests
with this size, it has the maximum power (Neyman-Pearson Lemma).
A hypothesis is simple if it completely specifies the distibution of
the data.
: Critical region , the power is a function of ,
and this is uniformly the most powerful test for size .
: Critical region , but not uniformly most
powerful.
The CI for consists of precisely the values
for which is not rejected against . Exact for normal with known variance, approx. in others.
p-value
the probability under that the test statistic is more extreme
than the realisation. (A, B): . (C): . The smaller the p-value,
the more suspicious one should be about . If size is smaller than
p-value, do not reject .
Generalized Likelihood Ratio
, . The closer is to 0, the stronger
the evidence for .
Large-sample null distribution of
Under , when n is large, , where .
Normal (C):
Multinomial: where is
the expected frequency of the ith event under . , which is the Pearson
chi-square statistic, written as .
Poisson Dispersion Test
For let are independent.
.
For large n, the null distribution of is approximately
Comparing 2 samples
Normal Theory: Same Variance
be i.i.d and be
i.i.d , independent.
Known Variance
and reject
when
Unknown Variance
where . is an unbiased
estimator of . within factor of 2 from .
follows a t
distribution with d.f.
If two-sided: reject when . If
one-sided, e.g , reject when .
CI
if is known, or
if is unknown.
Unequal Variance
, with where and
Mann-Whitney Test
We take the smaller sample of size , and sum the ranks in that
sample. , and , we reject if is too small.
Test works for all distributions, and is robust to outliers.
Paired Samples
are paired and related to the same individual. is independent from . Compute , To
test , .
CI:
Ranked Test
is the sum of ranks among all positive and is the
sum of ranks among all negative . We want to reject if
is too large.