Jethro's Braindump

Statistics

Basic Properties

  1. E(X)=xp(x)
  2. Var(X)=(xμ)2f(x)
  3. X is around E(X), give or take SD(X)
  4. E(aX+bY)=aE(X)+bE(Y)
  5. Var(aX+bY)=a2Var(X)+b2Var(Y)
  6. Var(X)=E(X2)[E(X)]2
  7. Cov(X1,X2)=E(X1X2)E(X1)E(X2)
  8. if X, Y are independent:
    1. MX+Y(t)=MX(t)MY(t)
    2. E(XY)=E(X)E(Y), converse is true if X and Y are bivariate normal, extends to multivariate normal

Approximations

Law of Large Numbers

Let X1,X2,,Xn be IID, with expectation μ and variance σ2. Xn=1ni=1nXinμ. Let x1,x2,,xn be realisations of the random variable X1,X2,,Xn, then xn=1ni=1nxnnμ

Central Limit Theorem

Let Sn=i=1nXi where X1,X2,,Xn IID. SnnμnσnN(0,1)

Distributions

Poisson(λ)

E(X)=Var(X)=λ

Normal XN(μ,σ2)

https://kfrankc.com/posts/2018/10/19/normal-dist-derivation

f(x)=12πσexp((xμ)22σ2),<x<

  1. When μ=0, f(x) is an even function, and E(Xk)=0 where k is odd
  2. Y=XE(X)SD(X) is the standard normal

Gamma Γ

g(t)=λαΓ(α)tα1eλt,t0

μ1=αλ,μ2=α(α+1)λ2

χ2 Distribution

Let ZN(0,1), U=Z2 has a χ2 distribution with 1 d.f.

fU(u)=12πu12eu2,u0

χ12Γ(α=12,λ=12)

Let U1,U2,,Un be χ12 IID, then V=i=1nUi is χn2 with n degree freedom, VΓ(α=n2,λ=12)

E(χn2)=n,Var(χn2)=2n

M(t)=(12t)n2

t-distribution

Let ZN(0,1), Unχn2 be independent, tn=ZUn/n has a t-distribution with n d.f.

f(t)=Γ([(n+1)/2])nπΓ(n/2)(1+t2n)n+12

  1. t is symmetric about 0
  2. tnnZ

F-distribution

Let Uχm2,Vχn2 be independent, W=U/mV/n has an F distribution with (m,n) d.f.

If Xtn, X2=Z/1Un/n is an F distribution with (1,n) d.f, with w0:

For n>2, E(W)=nn2

Sampling

Let X1,X2,,Xn be IID N(μ,σ2).

sample mean, X=1ni=1nXi

sample variance, S2=1n1i=1n(XiX)2

Properties of X and S2

  1. X and S2 are independent
  2. XN(μ,σ2n)
  3. (n1)S2σ2χn12
  4. XμS/ntn1

Simple Random Sampling (SRS)

Assume n random draws are made without replacement. (Not SRS, will be corrected for later).

Summary of Lemmas

  • P(Xi=ξj)=njN: Lemma A
  • For ij, Cov(Xi,Xj)=σ2N1: Lemma B

Estimation Problem

Let X1,X2,,Xn be random draws with replacement. Then X is an estimator of μ. and the observed value of X, x is an estimate of μ.

Standard Error (SE)

SE of an X is defined to be SD(X).

param est SE Est. SE
μ X σn sn
p p^ p(1p)n p^(1p^)n1

Without Replacement

SE is multiplied by NnN1, because s2 is biased for σ2: E(N1Ns2)=σ2, but N is normally large.

Confidence Interval

An approximate 1α CI for μ is

(xzα/2sn,x+zα/2sn)

Biased Measurements

Let X=μ+ϵ, where E(ϵ)=0, Var(ϵ)=σ2

Suppose X is used to measure an unknown constant a, aμ. X=a+(μa)+ϵ, where μa is the bias.

Mean square error (MSE) is E((Xa)2)=σ2+(μa)2

with n IID measurements, x=μ+ϵ

E((xa)2)=σ2n+(μa)2

MSE=SE2+bias2, hence MSE is a good measure of the accuracy of the estimate x of a.

Estimation of a Ratio

Consider a population of N members, and two characteristics are recorded: (X1,Y1),(X2,Y2),,(Xn,Yn), r=μyμx.

An obvious estimator of r is R=YX

Cov(X,Y)=σxyn, where

σxy:=1Ni=1N(xiμx)(xiμy) is the population covariance.

Properties

Var(R)1μx2(r2σX2+σY22rσXY)

Population coefficient ρ=σxyσxσy

E(R)r+1n(NnN1)1μx2(rσx2ρσxσy)

sxy=1n1i=1n(XiX)(YiY)

Ratio Estimates

YR=μxXY=μxR

Var(YR)1nNnN1(r2σx2+σy22rρσxσy)

E(YR)μy1nNnN11μx(rσx2ρσxσy)

The bias is of order 1n, small compared to its standard error.

YR is better than Y, having smaller variance, when ρ>12(CxCy), where Ci=σi/μi

Variance of YR can be estimated by

sYR2=1nNnN1(R2sx2+sy22Rsxy)

An approximate 1α C.I. for μy is YR±zα/2sYR

Method of Moments

To estimate θ, express it as a function of moments g(μ^1,μ^2,)

Monte Carlo

Monte Carlo is used to generate many realisations of random variable.

Xnα/λ,σ^2nα/λ2, MOM estimators are consistent (asymptotically unbiased).

Poisson(λ): bias=0,SExn

N(μ,σ2): μ=μ1, σ2=μ2μ12

Γ(λ,α): λ^=μ^1μ^2μ^12=Xσ^2,α^=μ^12μ^2μ^12=X2σ^2

Maximum Likelihood Estimator (MLE)

Poisson Case

L(λ)=i=1nλxieλxi!=λi=1nxienλi=1nxi!

l(λ)=i=1nxilogλnλi=1nlogxi!

ML estimate of λ0 is x. ML estimator is λ^0=X

Normal case

l(μ,σ)=nlogσnlog2π2i=1n(Xiμ)22σ2

lμ=(Xiμ)σ2μ^=x

lσ=i=1n(Xiμ)2σ3nσ σ2^=1ni=1n(XiX)2

Gamma case

l(θ)=nαlogλ+(α1)i=1nlogXiλi=1nXinlogΓ(α)

lα=nlogα+i=1nlogXii=1nXinΓ(α)Γ(α)

lλ=nαλi=1nXi

λ^=α^x^

Multinomial Case

f(x1,,xr)=(nx1,x2,xr)i=1npiXi

where Xi is the number of times the value occurs, and not the number of trials. and x1,x2,xr are non-negative integers summing to n. i:

E(Xi)=npi,Var(Xi)=npi(1pi)

Cov(Xi,Xj)=npipj,ij

l(p)=\Kappa+i=1r1xilogpi+xrlog(1p1pr1)

lpi=xipixrpr=0 assuming MLE exists

xip^i=xrp^rp^i=xic,c=xrp^r

i=1rp^i=i=1rxic=1 c=i=1rxi=np^i=xin

same as MOM estimator.

CIs in MLE

X^μs/ntn1

Given the realisations x and s, x±tn1,α/2sn,x+tn1,α/2sn is the exact 1α CI for μ.

Missing superscript or subscript argument, nσ^2χn1,α/22,nσ^2χn1,1α/22 is the exact 1α CI for σ.

Fisher Information

I(θ)=E(θ2logf(x|θ))

Distribution MLE Variance
Po(λ) X λ
Be(p) X p(1p)
Bin(n,p) Xn p(1p)n
HWE tri X2+2X3n θ(1θ)n

General trinomial: (X1n,X2n)

[p1(1p1)p1p2 p1p2p2(1p2)]1n

In all the above cases, var(θ^)=I(θ)1.

Asymptotic Normality of MLE

As n, nI(θ)(θ^θ)N(0,1) in distribution, and hence θ^N(θ,I(θ)1n)

As θ^nθ, MLE is consistent.

SE of an estimate of θ is the SD of the estimator θ^, hence SE=SD(θ^)=I(θ)1nI(θ^)1n

1α CI θ^±zα/2I(θ)1n

Efficiency

Cramer-Rao Inequality: if θ is unbiased, then θΘ , var(θ^)I(θ^)1/n, if = then θ^ is efficient.

eff(θ^)=I(θ^)1/nvar(θ^)<1

Sufficiency

Characterisation

Let St=x:T(x)=t. The sample space of X, S is the disjoint union of St across all possible values of T.

T is sufficient for θ if q() s.t. xSt,fθ(X\x|T=t)=q(x).

Factorisation Theorem

T is sufficient for θ iff g(t,θ),h(x) s.t. θΘ,fθ(x)=g(T(x),θ)h(x)x

Rao-Blackwell Theorem

Let θ^ be an estimator of θ with finite variance, T be sufficient for θ. Let θ~=E[θ^|T]. Then for every θΘ, E(θ^θ)2E(θ^θ)2. Equality holds iff θ^ is a function of T.

Random Conditional Expectation

  1. E(X)=E(E(X|T))
  2. var(X)=var(E(X|T))+E(var(X|T))
  3. var(Y|X)=E(Y2|X)E(Y|X)2
  4. E(Y)=Y,var(Y)=0 iff Y is a constant

Hypothesis Testing

Let X1Xn be IID with density f(x|θ). null H0:θ=θ0, H1:θ=θ1. Critical region is RRn. size=P0(XR) and power=P1(XR).

Λ(x)=f0(x1)f0(xn)f1(x1)f1(xn). Critical region x:Λ(x)<cα, and among all tests with this size, it has the maximum power (Neyman-Pearson Lemma).

A hypothesis is simple if it completely specifies the distibution of the data.

H1:μ>μ0: Critical region {x¯>μ0+zασn}, the power is a function of μ, and this is uniformly the most powerful test for size α.

H1:μμ0: Critical region {|x¯μ0|>c},c=zα2σn, but not uniformly most powerful.

The (1α) CI for μ consists of precisely the values μ0 for which H0:μ=μ0 is not rejected against H1:μμ0. Exact for normal with known variance, approx. in others.

p-value

the probability under H0 that the test statistic is more extreme than the realisation. (A, B): p=p0(X¯>x¯)=P(Z>x¯μ0σ/n). (C): p=P0(|X¯μ0|>|x¯μ0|). The smaller the p-value, the more suspicious one should be about H0. If size is smaller than p-value, do not reject H0.

Generalized Likelihood Ratio

Λ=maxθω0L(θ)maxθΩL(θ), Ω=ω0ω1. The closer Λ is to 0, the stronger the evidence for H1.

Large-sample null distribution of Λ

Under H0, when n is large, 2logΛ=χk2, where k=dim(Ω)dim(ω0).

Normal (C): p=P(χ12>(x¯μ0)2σ2/n)

Multinomial: Λ=i=1r(EiXi)Xi where Ei=npi(θ^) is the expected frequency of the ith event under H0. 2logΛi=1r(XiEi)2Ei, which is the Pearson chi-square statistic, written as X2.

Poisson Dispersion Test

For i=1n let XiPoisson(λi) are independent.

w0={λ~|λ1=λ2==λn}

w1={λ~|λiλj for some i,j}

2logΛi=1n(XiX¯)2X¯. For large n, the null distribution of 2logΛ is approximately χn12

Comparing 2 samples

Normal Theory: Same Variance

X1,,Xn be i.i.d N(μX,σ2) and Y1,,Ym be i.i.d N(μY,σ2), independent. H0:μXμY=d

Known Variance

Z:=X¯Y¯(μXμY)σ1n+1m and reject H0 when |Z|>zα/2

Unknown Variance

sp2=(n1)sX2+(m1)sY2m+n2 where sX2=1n1i=1n(XiX¯)2. sp2 is an unbiased estimator of σ2. sX within factor of 2 from sY.

t:=X¯Y¯(μXμY)sp1n+1m follows a t distribution with m+n2 d.f.

If two-sided: reject H0 when |t|>tn+m2,α/2. If one-sided, e.g H1:μX>μY, reject H0 when t>tn+m2,α.

CI

X¯Y¯±zα/2σ1n+1m if σ is known, or X¯Y¯±tm+n2,α/2sp1n+1m if σ is unknown.

Unequal Variance

Z:=X¯Y¯(μXμY)σX2n+σY2m

t:=X¯Y¯(μXμY)sX2n+sY2m, with df=(a+b)2a2n1+b2m1 where a=sX2n and b=sY2m

Mann-Whitney Test

We take the smaller sample of size n1, and sum the ranks in that sample. R=n1(m+n+1)R, and R=min(R,R), we reject H0:F=G if R is too small.

Test works for all distributions, and is robust to outliers.

Paired Samples

(Xi,Yi) are paired and related to the same individual. (Xi,Yi) is independent from (Xj,Yj). Compute Di=YiXi, To test H0:μD=d, t=D¯μDsD/n.

1α CI: D¯±tn1,α/2SD/n

Ranked Test

W+ is the sum of ranks among all positive Di and Wi is the sum of ranks among all negative Di. We want to reject H0 if W=min(W+,W) is too large.