Statistics

Basic Properties

$E (X) = \sum x p (x)$
$V a r (X) = \sum (x - μ)^{2} f (x)$
X is around $E (X)$ , give or take $S D (X)$
$E (a X + b Y) = a E (X) + b E (Y)$
$V a r (a X + b Y) = a^{2} V a r (X) + b^{2} V a r (Y)$
$V a r (X) = E (X^{2}) - [E (X)]^{2}$
$C o v (X_{1}, X_{2}) = E (X_{1} X_{2}) - E (X_{1}) E (X_{2})$
if $X$ , $Y$ are independent:
1. $M_{X + Y} (t) = M_{X} (t) M_{Y} (t)$
2. $E (X Y) = E (X) E (Y)$ , converse is true if $X$ and $Y$ are bivariate normal, extends to multivariate normal

Approximations

Law of Large Numbers

Let $X_{1}, X_{2}, \dots, X_{n}$ be IID, with expectation $μ$ and variance $σ^{2}$ . $\overset{―}{X_{n}} = \frac{1}{n} \sum_{i = 1}^{n} X_{i} \to_{n}^{\infty} μ$ . Let $x_{1}, x_{2}, \dots, x_{n}$ be realisations of the random variable $X_{1}, X_{2}, \dots, X_{n}$ , then $\overset{―}{x_{n}} = \frac{1}{n} \sum_{i = 1}^{n} x_{n} \to_{n}^{\infty} μ$

Central Limit Theorem

Let $S_{n} = \sum_{i = 1}^{n} X_{i}$ where $X_{1}, X_{2}, \dots, X_{n}$ IID. $\frac{S_{n} - n μ}{\sqrt{n} σ} \to_{n}^{\infty} N (0, 1)$

Distributions

Poisson( $λ$ )

$E (X) = V a r (X) = λ$

Normal $X \sim N (μ, σ^{2})$

https://kfrankc.com/posts/2018/10/19/normal-dist-derivation

$f (x) = \frac{1}{\sqrt{2 π} σ} e x p (- \frac{(x - μ)^{2}}{2 σ^{2}}), - \infty < x < \infty$

When $μ = 0$ , $f (x)$ is an even function, and $E (X^{k}) = 0$ where $k$ is odd
$Y = \frac{X - E (X)}{S D (X)}$ is the standard normal

Gamma $Γ$

$g (t) = \frac{λ^{α}}{Γ (α)} t^{α - 1} e^{- λ t}, t \geq 0$

$μ_{1} = \frac{α}{λ}, μ_{2} = \frac{α (α + 1)}{λ^{2}}$

$χ^{2}$ Distribution

Let $Z \sim N (0, 1)$ , $U = Z^{2}$ has a $χ^{2}$ distribution with 1 d.f.

$f_{U} (u) = \frac{1}{\sqrt{2 π}} u^{- \frac{1}{2}} e^{- \frac{u}{2}}, u \geq 0$

$χ_{1}^{2} \sim Γ (α = \frac{1}{2}, λ = \frac{1}{2})$

Let $U_{1}, U_{2}, \dots, U_{n}$ be $χ_{1}^{2}$ IID, then $V = \sum_{i = 1}^{n} U_{i}$ is $χ_{n}^{2}$ with n degree freedom, $V \sim Γ (α = \frac{n}{2}, λ = \frac{1}{2})$

$E (χ_{n}^{2}) = n, V a r (χ_{n}^{2}) = 2 n$

$M (t) = {(1 - 2 t)}^{- \frac{n}{2}}$

t-distribution

Let $Z \sim N (0, 1)$ , $U_{n} \sim χ_{n}^{2}$ be independent, $t_{n} = \frac{Z}{\sqrt{U_{n} / n}}$ has a t-distribution with n d.f.

$f (t) = \frac{Γ ([(n + 1) / 2])}{\sqrt{n} π Γ (n / 2)} {(1 + \frac{t^{2}}{n})}^{- \frac{n + 1}{2}}$

t is symmetric about 0
$t_{n} \to_{n}^{\infty} Z$

F-distribution

Let $U \sim χ_{m}^{2}, V \sim χ_{n}^{2}$ be independent, $W = \frac{U / m}{V / n}$ has an F distribution with (m,n) d.f.

If $X \sim t_{n}$ , $X^{2} = \frac{Z / 1}{U_{n} / n}$ is an F distribution with (1,n) d.f, with $w \geq 0$ :

For $n > 2$ , $E (W) = \frac{n}{n - 2}$

Sampling

Let $X_{1}, X_{2}, \dots, X_{n}$ be IID $N (μ, σ^{2})$ .

$sample mean, \overset{―}{X} = \frac{1}{n} \sum_{i = 1}^{n} X_{i}$

$sample variance, S^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} {(X_{i} - \overset{―}{X})}^{2}$

Properties of $\overset{―}{X}$ and $S^{2}$

$\overset{―}{X}$ and $S^{2}$ are independent
$\overset{―}{X} \sim N (μ, \frac{σ^{2}}{n})$
$\frac{(n - 1) S^{2}}{σ^{2}} \sim χ_{n - 1}^{2}$
$\frac{\overset{―}{X} - μ}{S / \sqrt{n}} \sim t_{n - 1}$

Simple Random Sampling (SRS)

Assume $n$ random draws are made without replacement. (Not SRS, will be corrected for later).

Summary of Lemmas

$P (X_{i} = ξ_{j}) = \frac{n_{j}}{N}$ : Lemma A
For $i \neq j$ , $C o v (X_{i}, X_{j}) = - \frac{σ^{2}}{N - 1}$ : Lemma B

Estimation Problem

Let $X_{1}, X_{2}, \dots, X_{n}$ be random draws with replacement. Then $\overset{―}{X}$ is an estimator of $μ$ . and the observed value of $\overset{―}{X}$ , $\overset{―}{x}$ is an estimate of $μ$ .

Standard Error (SE)

SE of an $\overset{―}{X}$ is defined to be $S D (\overset{―}{X})$ .

param	est	SE	Est. SE
$μ$	$\overset{―}{X}$	$\frac{σ}{\sqrt{n}}$	$\frac{s}{\sqrt{n}}$
$p$	$\hat{p}$	$\sqrt{\frac{p (1 - p)}{n}}$	$\sqrt{\frac{\hat{p} (1 - \hat{p})}{n - 1}}$

Without Replacement

SE is multiplied by $\frac{N - n}{N - 1}$ , because $s^{2}$ is biased for $σ^{2}$ : $E (\frac{N - 1}{N} s^{2}) = σ^{2}$ , but N is normally large.

Confidence Interval

An approximate $1 - α$ CI for $μ$ is

$(\overset{―}{x} - z_{α / 2} \frac{s}{\sqrt{n}}, \overset{―}{x} + z_{α / 2} \frac{s}{\sqrt{n}})$

Biased Measurements

Let $X = μ + ϵ$ , where $E (ϵ) = 0$ , $V a r (ϵ) = σ^{2}$

Suppose X is used to measure an unknown constant a, $a \neq μ$ . $X = a + (μ - a) + ϵ$ , where $μ - a$ is the bias.

Mean square error (MSE) is $E ((X - a)^{2}) = σ^{2} + (μ - a)^{2}$

with n IID measurements, $\overset{―}{x} = μ + \overset{―}{ϵ}$

$E ((x - a)^{2}) = \frac{σ^{2}}{n} + {(μ - a)}^{2}$

$MSE = {SE}^{2} + {bias}^{2}$ , hence $\sqrt{MSE}$ is a good measure of the accuracy of the estimate $\overset{―}{x}$ of a.

Estimation of a Ratio

Consider a population of $N$ members, and two characteristics are recorded: $(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})$ , $r = \frac{μ_{y}}{μ_{x}}$ .

An obvious estimator of r is $R = \frac{\overset{―}{Y}}{\overset{―}{X}}$

$C o v (\overset{―}{X}, \overset{―}{Y}) = \frac{σ_{x y}}{n}$ , where

$σ_{x y} := \frac{1}{N} \sum_{i = 1}^{N} (x_{i} - μ_{x}) (x_{i} - μ_{y})$ is the population covariance.

Properties

$V a r (R) \approx \frac{1}{μ_{x}^{2}} (r^{2} σ_{\overset{―}{X}}^{2} + σ_{\overset{―}{Y}}^{2} - 2 r σ_{\overset{―}{X} \overset{―}{Y}})$

Population coefficient $ρ = \frac{σ_{x y}}{σ_{x} σ_{y}}$

$E (R) \approx r + \frac{1}{n} (\frac{N - n}{N - 1}) \frac{1}{μ_{x}^{2}} (r σ_{x}^{2} - ρ σ_{x} σ_{y})$

$s_{x y} = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \overset{―}{X}) (Y_{i} - \overset{―}{Y})$

Ratio Estimates

${\overset{―}{Y}}_{R} = \frac{μ_{x}}{\overset{―}{X}} \overset{―}{Y} = μ_{x} R$

$V a r ({\overset{―}{Y}}_{R}) \approx \frac{1}{n} \frac{N - n}{N - 1} (r^{2} σ_{x}^{2} + σ_{y}^{2} - 2 r ρ σ_{x} σ_{y})$

$E ({\overset{―}{Y}}_{R}) - μ_{y} \approx \frac{1}{n} \frac{N - n}{N - 1} \frac{1}{μ_{x}} (r σ_{x}^{2} - ρ σ_{x} σ_{y})$

The bias is of order $\frac{1}{n}$ , small compared to its standard error.

${\overset{―}{Y}}_{R}$ is better than $\overset{―}{Y}$ , having smaller variance, when $ρ > \frac{1}{2} (\frac{C_{x}}{C_{y}})$ , where $C_{i} = σ_{i} / μ_{i}$

Variance of ${\overset{―}{Y}}_{R}$ can be estimated by

$s_{{\overset{―}{Y}}_{R}}^{2} = \frac{1}{n} \frac{N - n}{N - 1} (R^{2} s_{x}^{2} + s_{y}^{2} - 2 R s_{x y})$

An approximate $1 - α$ C.I. for $μ_{y}$ is ${\overset{―}{Y}}_{R} \pm z_{α / 2} s_{{\overset{―}{Y}}_{R}}$

Method of Moments

To estimate $θ$ , express it as a function of moments $g ({\hat{μ}}_{1}, {\hat{μ}}_{2}, \dots)$

Monte Carlo

Monte Carlo is used to generate many realisations of random variable.

$\overset{―}{X} \to_{n}^{\infty} α / λ, {\hat{σ}}^{2} \to_{n}^{\infty} α / λ^{2}$ , MOM estimators are consistent (asymptotically unbiased).

$Poisson (λ)$ : $bias = 0, S E \approx \sqrt{\frac{\overset{―}{x}}{n}}$

$N (μ, σ^{2})$ : $μ = μ_{1}$ , $σ^{2} = μ_{2} - μ_{1}^{2}$

$Γ (λ, α)$ : $\hat{λ} = \frac{{\hat{μ}}_{1}}{{\hat{μ}}_{2} - {\hat{μ}}_{1}^{2}} = \frac{\overset{―}{X}}{{\hat{σ}}^{2}}, \hat{α} = \frac{{\hat{μ}}_{1}^{2}}{{\hat{μ}}_{2} - {\hat{μ}}_{1}^{2}} = \frac{{\overset{―}{X}}^{2}}{{\hat{σ}}^{2}}$

Maximum Likelihood Estimator (MLE)

Poisson Case

$L (λ) = \prod_{i = 1}^{n} \frac{λ^{x_{i}} e^{- λ}}{x_{i}!} = \frac{λ \sum_{i = 1}^{n} x_{i} e^{- n λ}}{\prod_{i = 1}^{n} x_{i}!}$

$l (λ) = \sum_{i = 1}^{n} x_{i} \log λ - n λ - \sum_{i = 1}^{n} \log x_{i}!$

ML estimate of $λ_{0}$ is $\overset{―}{x}$ . ML estimator is ${\hat{λ}}_{0} = \overset{―}{X}$

Normal case

$l (μ, σ) = - n \log σ - \frac{n \log 2 π}{2} - \frac{\sum_{i = 1}^{n} {(X_{i} - μ)}^{2}}{2 σ^{2}}$

$\frac{\partial l}{\partial μ} = \frac{\sum (X_{i} - μ)}{σ^{2}} ⟹ \hat{μ} = \overset{―}{x}$

$\frac{\partial l}{\partial σ} = \frac{\sum_{i = 1}^{n} {(X_{i} - μ)}^{2}}{σ^{3}} - \frac{n}{σ} ⟹ \hat{σ^{2}} = \frac{1}{n} \sum_{i = 1}^{n} {(X_{i} - \overset{―}{X})}^{2}$

Gamma case

$l (θ) = n α \log λ + (α - 1) \sum_{i = 1}^{n} \log X_{i} - λ \sum_{i = 1}^{n} X_{i} - n \log Γ (α)$

$\frac{\partial l}{\partial α} = n \log α + \sum_{i = 1}^{n} \log X_{i} - \sum_{i = 1}^{n} X_{i} - \frac{n}{Γ (α)} Γ ‘ (α)$

$\frac{\partial l}{\partial λ} = \frac{n α}{λ} - \sum_{i = 1}^{n} X_{i}$

$\hat{λ} = \frac{\hat{α}}{\hat{x}}$

Multinomial Case

$f (x_{1}, \dots, x_{r}) = (\binom{n}{x_{1}, x_{2}, \dots x_{r}}) \prod_{i = 1}^{n} p_{i}^{X_{i}}$

where $X_{i}$ is the number of times the value occurs, and not the number of trials. and $x_{1}, x_{2}, \dots x_{r}$ are non-negative integers summing to $n$ . $\forall i$ :

$E (X_{i}) = n p_{i}, V a r (X_{i}) = n p_{i} (1 - p_{i})$

$C o v (X_{i}, X_{j}) = - n p_{i} p_{j}, \forall i \neq j$

$l (p) = \Kappa + \sum_{i = 1}^{r - 1} x_{i} \log p_{i} + x_{r} \log (1 - p_{1} - \dots - p_{r - 1})$

$\frac{\partial l}{\partial p_{i}} = \frac{x_{i}}{p_{i}} - \frac{x_{r}}{p_{r}} = 0 assuming MLE exists$

$\frac{x_{i}}{{\hat{p}}_{i}} = \frac{x_{r}}{{\hat{p}}_{r}} ⟹ {\hat{p}}_{i} = \frac{x_{i}}{c}, c = \frac{x_{r}}{{\hat{p}}_{r}}$

$\sum_{i = 1}^{r} {\hat{p}}_{i} = \sum_{i = 1}^{r} \frac{x_{i}}{c} = 1 ⟹ c = \sum_{i = 1}^{r} x_{i} = n ⟹ {\hat{p}}_{i} = \frac{{\overset{―}{x}}_{i}}{n}$

same as MOM estimator.

CIs in MLE

$\frac{\hat{X} - μ}{s / \sqrt{n}} \sim t_{n - 1}$

Given the realisations $\overset{―}{x}$ and $s$ , $\overset{―}{x} \pm t_{n - 1, α / 2} \frac{s}{\sqrt{n}}, \overset{―}{x} + t_{n - 1, α / 2} \frac{s}{\sqrt{n}}$ is the exact $1 - α$ CI for $μ$ .

$Missing superscript or subscript argument$ , $\frac{n {\hat{σ}}^{2}}{χ_{n - 1, α / 2}^{2}}, \frac{n {\hat{σ}}^{2}}{χ_{n - 1, 1 - α / 2}^{2}}$ is the exact $1 - α$ CI for $σ$ .

Fisher Information

$I (θ) = - E (\frac{\partial}{\partial θ^{2}} \log f (x | θ))$

Distribution	MLE	Variance
Po( $λ$ )	$X$	$λ$
Be( $p$ )	$X$	$p (1 - p)$
Bin( $n$ , $p$ )	$\frac{X}{n}$	$\frac{p (1 - p)}{n}$
HWE tri	$\frac{X_{2} + 2 X_{3}}{n}$	$\frac{θ (1 - θ)}{n}$

General trinomial: $(\frac{X_{1}}{n}, \frac{X_{2}}{n})$

$[\begin{matrix} p_{1} (1 - p_{1}) & - p_{1} p_{2} - p_{1} p_{2} & p_{2} (1 - p_{2}) \end{matrix}] \frac{1}{n}$

In all the above cases, $var (\hat{θ}) = I (θ)^{- 1}$ .

Asymptotic Normality of MLE

As $n \to \infty$ , $\sqrt{n I (θ)} (\hat{θ} - θ) \to N (0, 1)$ in distribution, and hence $\hat{θ} \sim N (θ, \frac{I {(θ)}^{- 1}}{n})$

As $\hat{θ} \to_{n}^{\infty} θ$ , MLE is consistent.

SE of an estimate of $θ$ is the SD of the estimator $\hat{θ}$ , hence $S E = S D (\hat{θ}) = \sqrt{\frac{I (θ)^{- 1}}{n}} \approx \sqrt{\frac{I (\hat{θ})^{- 1}}{n}}$

$1 - α CI \approx \hat{θ} \pm z_{α / 2} \sqrt{\frac{I (θ)^{- 1}}{n}}$

Efficiency

Cramer-Rao Inequality: if $θ$ is unbiased, then $\forall θ \in Θ$ , $v a r (\hat{θ}) \geq I (\hat{θ})^{- 1} / n$ , if = then $\hat{θ}$ is efficient.

$e f f (\hat{θ}) = \frac{I (\hat{θ})^{- 1} / n}{v a r (\hat{θ})} < 1$

Sufficiency

Characterisation

Let $S_{t} = x : T (x) = t$ . The sample space of $X$ , $S$ is the disjoint union of $S_{t}$ across all possible values of $T$ .

$T$ is sufficient for $θ$ if $\exists q () s.t. \forall x \in S_{t}, f_{θ} (X \x | T = t) = q (x)$ .

Factorisation Theorem

$T$ is sufficient for $θ$ iff $\exists g (t, θ), h (x) s.t. \forall θ \in Θ, f_{θ} (x) = g (T (x), θ) h (x) \forall x$

Rao-Blackwell Theorem

Let $\hat{θ}$ be an estimator of $θ$ with finite variance, $T$ be sufficient for $θ$ . Let $\tilde{θ} = E [\hat{θ} | T]$ . Then for every $θ \in Θ$ , $E {(\hat{θ} - θ)}^{2} \leq E {(\hat{θ} - θ)}^{2}$ . Equality holds iff $\hat{θ}$ is a function of $T$ .

Random Conditional Expectation

$E (X) = E (E (X | T))$
$v a r (X) = v a r (E (X | T)) + E (v a r (X | T))$
$v a r (Y | X) = E (Y^{2} | X) - E (Y | X)^{2}$
$E (Y) = Y, v a r (Y) = 0$ iff $Y$ is a constant

Hypothesis Testing

Let $X_{1} \dots X_{n}$ be IID with density $f (x | θ)$ . null $H_{0} : θ = θ_{0}$ , $H - 1 : θ = θ_{1}$ . Critical region is $R \subset R_{n}$ . $s i z e = P_{0} (X \in R)$ and $p o w e r = P_{1} (X \in R)$ .

$Λ (x) = \frac{f_{0} (x_{1}) \dots f_{0} (x_{n})}{f_{1} (x_{1}) \dots f_{1} (x_{n})}$ . Critical region $x : Λ (x) < c_{α}$ , and among all tests with this size, it has the maximum power (Neyman-Pearson Lemma).

A hypothesis is simple if it completely specifies the distibution of the data.

$H_{1} : μ > μ_{0}$ : Critical region ${\bar{x} > μ_{0} + z_{α} \frac{σ}{\sqrt{n}}}$ , the power is a function of $μ$ , and this is uniformly the most powerful test for size $\leq α$ .

$H_{1} : μ \neq μ_{0}$ : Critical region ${| \bar{x} - μ_{0} | > c}, c = z_{\frac{α}{2}} \frac{σ}{\sqrt{n}}$ , but not uniformly most powerful.

The $(1 - α)$ CI for $μ$ consists of precisely the values $μ_{0}$ for which $H_{0} : μ = μ_{0}$ is not rejected against $H_{1} : μ \neq μ_{0}$ . Exact for normal with known variance, approx. in others.

p-value

the probability under $H_{0}$ that the test statistic is more extreme than the realisation. (A, B): $p = p_{0} (\bar{X} > \bar{x}) = P (Z > \frac{\bar{x} - μ_{0}}{σ / \sqrt{n}})$ . (C): $p = P_{0} (| \bar{X} - μ_{0} | > | \bar{x} - μ_{0} |)$ . The smaller the p-value, the more suspicious one should be about $H_{0}$ . If size is smaller than p-value, do not reject $H_{0}$ .

Generalized Likelihood Ratio

$Λ^{*} = \frac{{max}_{θ \in ω_{0}} L (θ)}{{max}_{θ \in Ω} L (θ)}$ , $Ω = ω_{0} \cup ω_{1}$ . The closer $Λ$ is to 0, the stronger the evidence for $H_{1}$ .

Large-sample null distribution of $Λ$

Under $H_{0}$ , when n is large, $- 2 \log Λ = χ_{k}^{2}$ , where $k = dim (Ω) - dim (ω_{0})$ .

Normal (C): $p = P (χ_{1}^{2} > \frac{(\bar{x} - μ_{0})^{2}}{σ^{2} / n})$

Multinomial: $Λ = \prod_{i = 1}^{r} {(\frac{E_{i}}{X_{i}})}^{X_{i}}$ where $E_{i} = n p_{i} (\hat{θ})$ is the expected frequency of the ith event under $H_{0}$ . $- 2 \log Λ \approx \sum_{i = 1}^{r} \frac{(X_{i} - E_{i})^{2}}{E_{i}}$ , which is the Pearson chi-square statistic, written as $X^{2}$ .

Poisson Dispersion Test

For $i = 1 \dots n$ let $X_{i} \sim P o i s s o n (λ_{i})$ are independent.

$w_{0} = {\tilde{λ} | λ_{1} = λ_{2} = \dots = λ_{n}}$

$w_{1} = {\tilde{λ} | λ_{i} \neq λ_{j} for some i, j}$

$- 2 \log Λ \approx \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}}{\bar{X}}$ . For large n, the null distribution of $- 2 \log Λ$ is approximately $χ_{n - 1}^{2}$

Comparing 2 samples

Normal Theory: Same Variance

$X_{1}, \dots, X_{n}$ be i.i.d $N (μ_{X}, σ^{2})$ and $Y_{1}, \dots, Y_{m}$ be i.i.d $N (μ_{Y}, σ^{2})$ , independent. $H_{0} : μ_{X} - μ_{Y} = d$

Known Variance

$Z := \frac{\bar{X} - \bar{Y} - (μ_{X} - μ_{Y})}{σ \sqrt{\frac{1}{n} + \frac{1}{m}}}$ and reject $H_{0}$ when $| Z | > z_{α / 2}$

Unknown Variance

$s_{p}^{2} = \frac{(n - 1) s_{X}^{2} + (m - 1) s_{Y}^{2}}{m + n - 2}$ where $s_{X}^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} (X_{i} - \bar{X})^{2}$ . $s_{p}^{2}$ is an unbiased estimator of $σ^{2}$ . $s_{X}$ within factor of 2 from $s_{Y}$ .

$t := \frac{\bar{X} - \bar{Y} - (μ_{X} - μ_{Y})}{s_{p} \sqrt{\frac{1}{n} + \frac{1}{m}}}$ follows a t distribution with $m + n - 2$ d.f.

If two-sided: reject $H_{0}$ when $| t | > t_{n + m - 2, α / 2}$ . If one-sided, e.g $H_{1} : μ_{X} > μ_{Y}$ , reject $H_{0}$ when $t > t_{n + m - 2, α}$ .

CI

$\frac{\bar{X} - \bar{Y}}{\pm} z_{α / 2} \cdot σ \sqrt{\frac{1}{n} + \frac{1}{m}}$ if $σ$ is known, or $\frac{\bar{X} - \bar{Y}}{\pm} t_{m + n - 2, α / 2} \cdot s_{p} \sqrt{\frac{1}{n} + \frac{1}{m}}$ if $σ$ is unknown.

Unequal Variance

$Z := \frac{\bar{X} - \bar{Y} - (μ_{X} - μ_{Y})}{\sqrt{\frac{σ_{X}^{2}}{n} + \frac{σ_{Y}^{2}}{m}}}$

$t := \frac{\bar{X} - \bar{Y} - (μ_{X} - μ_{Y})}{\sqrt{\frac{s_{X}^{2}}{n} + \frac{s_{Y}^{2}}{m}}}$ , with $d f = \frac{(a + b)^{2}}{\frac{a^{2}}{n - 1} + \frac{b^{2}}{m - 1}}$ where $a = \frac{s_{X}^{2}}{n}$ and $b = \frac{s_{Y}^{2}}{m}$

Mann-Whitney Test

We take the smaller sample of size $n_{1}$ , and sum the ranks in that sample. $R^{'} = n_{1} (m + n + 1) - R$ , and $R * = m i n (R^{'}, R)$ , we reject $H_{0} : F = G$ if $R *$ is too small.

Test works for all distributions, and is robust to outliers.

Paired Samples

$(X_{i}, Y_{i})$ are paired and related to the same individual. $(X_{i}, Y_{i})$ is independent from $(X_{j}, Y_{j})$ . Compute $D_{i} = Y_{i} - X_{i}$ , To test $H_{0} : μ_{D} = d$ , $t = \frac{\bar{D} - μ_{D}}{s_{D} / \sqrt{n}}$ .

$1 - α$ CI: $\bar{D} \pm t_{n - 1, α / 2} S_{D} / \sqrt{n}$

Ranked Test

$W_{+}$ is the sum of ranks among all positive $D_{i}$ and $W_{i}$ is the sum of ranks among all negative $D_{i}$ . We want to reject $H_{0}$ if $W = m i n (W_{+}, W_{-})$ is too large.