Information Theory
Introduction
How can we achieve perfect communication over an imperfect noisy communication channel?
The physical solution is to improve the characteristics of the communication channel to reduce its error probability. For example, we can use more reliable components in the communication device’s circuitry.
Information theory and coding theory offer an alternative: we accept the given noisy channel as it is, and add communication systems to it to detect and correct errors introduced by the channel.
An encoder encodes the source message s
into a transmitted message
t
, adding redundancy to the original message in some way. The
channel adds noise to the transmitted message, yielding a received
message r
. The decoder uses the known redundancy introduced by the
encoding system to infer both the original signal s
and the added
noise.
Information theory is concerned of the theoretical limitations and potentials of these systems. Coding theory is concerned with the creation of practical encoding and decoding systems.
Error-correcting codes for binary symmetric channels
Repetition codes
Key idea: repeat every bit of the message a prearranged number of times, and pick the bit with the majority vote.
We can describe the channel as adding a sparse noise vector n to the transmitted vector, adding in a modulo 2 arithmetic.
One can show that this algorithm is optimal by considering the maximum
likelihood function of s
.
The repetition code
Block codes - the (7,4) Hamming Code
Key idea: add redundancy to blocks of data instead of encoding one bit at a time.
A block code is a rule for converting a sequence of source bits s
,
of length K
, into transmitted sequence t
of length N > K
bits.
The Hamming code transmits N=7
bits for every K=4
bits.
Because the Hamming code is a linear code, it can be written compactly as a matrix:
where
-
Decoding for linear codes: syndrome decoding
The decoding problem for linear codes can also be described in terms of matrices. We evaluate 3 parity-check bits for the received bits
, and see whether they match the three received bits . The differences (mod 2) between these 2 triplets are called the syndrome of the received vector. If the syndrome is 0, then the received vector is a code word, and the most probable decoding is given by reading its first four bits.
What performance can the best codes achieve?
We consider the (R, p_b)
plane, where
The maximum rate at which communication is possible with arbitrarily
small
Measuring Information Content
We view information content as the “degree of surprise” on learning
the value of
We would also like some desirable properties from our function
Then the average information a random variable transmits in the
process is obtained by taking the expectation of Eq. eqn:inf_content
with respect to the distribution
Source Coding Theorem
We can compress N outcomes from a source
This is provable by counting the typical set.
Shannon’s Noisy-Channel Coding Theorem
Information can be communicated over a noisy channel at a non-zero rate with arbitrarily small error probability