# Probability Theory

## Setup

Suppose that we are about to perform an experiment whose outcome is
not predictable in advance. The set of all possible outcomes of an
experiment is known as the *sample space* \(S\).

For example, if the experiment consists of flipping a coin, then:

\begin{equation} S = (H, T) \end{equation}

Any subset \(E\) of the sample space \(S\) is known as an *event*. Some
examples of event include: \(E = (H)\) where \(E\) is the event that a
head appears on the flip of the coin.

We can define unions and intersections between 2 or more events. The union of two events \(E \cup F\) is a new event to consist of all outcomes that are either in \(E\) or \(F\).

## Probabilities Defined on Events

Consider an experiment whose sample space is \(S\). For each event \(E\) of the sample space \(S\), we assume that a number \(P(E)\) is defined and satisfies 3 conditions:

- \(0 \le P(E) \le 1\)
- \(P(S) = 1\)
- For any sequence of events \(E_1, E_2, \dots\) that are mutually exclusive:

\begin{equation} P( \mathop{\cup}_{n=1}^{\infty} E_n) = \sum_{n=1}^{\infty} P(E_n) \end{equation}

## Conditional Probabilities

Conditional probabilities are a powerful and useful concept. First, we are often interested in calculating probabilities and expectations when some partial information is available. Second, in calculating a desired probability or expectation, it is often extremely useful to first “condition” on some appropriate random variable.

We denote \(P(E|F)\) the conditional probability that \(E\) occurs given that \(F\) has occurred. This is valid for all events \(E\) and \(F\) that satisfy the 3 conditions above.

Recall that for any 2 events \(E\) and \(F\), the conditional probability of \(E\) given \(F\) is defined, as long as \(P(F) > 0\), by:

\begin{equation} P(E|F) = \frac{P(EF)}{P(F)} \end{equation}

If \(X\) and \(Y\) are discrete random variables, it is natural to define the conditional probability mass function of \(X\) given that \(Y = y\), by:

\begin{align}
p_{X|Y}(x|y) &= P\left\{ X=x | Y=y \right\} \\\

&= \frac{P(X = x, Y = y)}{P( Y = y)} \\\

&= \frac{p(x,y)}{p_Y(y)}
\end{align}

for all values of \(y\) such that \(P(Y = y) > 0\). Similarly, we can define \(F_{X|Y}(x|y) = \sum_{a\le x}p_{X|Y}(x|y)\).

Finally, the conditional expectation of \(X\) given that \(Y = y\) is defined by:

\begin{align}
E[X|Y = y] &= \sum_{x} P\left\{ X = x | Y = y \right\} \\\

&= \sum_{x} x p_{X|Y} (x|y)
\end{align}

### Computing Expectations by Conditioning

Let us denote by \(E[X|Y]\) that function of the random variable \(Y\) whose value at \(Y=y\) is \(E[X|Y=y]\). An extremely important property of conditional expectation is that for all random variables \(X\) and \(Y\):

\begin{equation} E[X] = E\left[ E[X|Y] \right] \end{equation}

## Independent Events

Two events \(E\) and \(F\) are independent if:

\begin{equation} P(EF) = P(E) P(F) \end{equation}

This also implies that \(P(E|F) = P(E)\).

## Bayes’ Formula

Let \(E\) and \(F\) be events. We may express \(E\) as \(E = EF \cup EF^c\). Since \(EF\) and \(EF^c\) are mutually exclusive, we have:

\begin{align}
P(E) &= P(EF) + P(EF^c) \\\

&= P(E|F)P(F) + P(E|F^c)P(F^c) \\\

&= P(E|F)P(F) + P(E|F^c)\left( 1 - P(F) \right)
\end{align}