# Entropy

tags
Information Theory, Gibbs’ Inequality

## Definitions

The Shannon information content of an outcome $$x$$, measured in bits, is defined to be:

$$h(x) = \log_2 \frac{1}{P(x)}$$

The entropy of an ensemble $$X$$ is defined to be the average Shannon information content of an outcome:

$$H(X)\equiv \sum_{x \in \mathcal{A}_X} P(x) \log \frac{1}{P(x)}$$

Entropy is 0 when the outcome is deterministic, and maximized with value $$\log(|\mathcal{A}_X|)$$ when the outcomes are uniformly distributed.

The joint entropy of two ensembles $$X, Y$$ is:

$$H(X,Y) \equiv \sum_{x,y \in \mathcal{A}_x \mathcal{A}_y} P(x,y) \log \frac{1}{P(x,y)}$$

Entropy is additive if the ensembles are independent:

$$H(X,Y) = H(X) + H(Y)$$

Entropy is decomposable.