Jethro's Braindump

Lottery Ticket Hypothesis

The Lottery Ticket hypothesis states that:

A randomly-initialized, dense neural network contains a subnetwork that is initialized such that – when trained in isolation – it can match the test accuracy of the original network after training for at most the same number of iterations.

Traditional literature has shown that a fully trained dense network can be pruned to fewer parameters without much performance degradation, but it has impossible to train such a sparse sub-network from scratch. The paper provides one explanation to this phenomenon:

After pruning, the resultant sub-networks are randomly initialized, resulting in different training dynamics. If one instead initializes the weights of the sub-network to the original masked weights, performance can be maintained.

How do we find such a sub-network?

Iterative Magnitude Pruning (IMP)

  1. Start with dense initialization W0, and train to convergence to obtain WT(1)
  2. Determine the s percent smallest magnitude weights to create a binary mask m(1)
  3. Initialize the network with weights m(1)W0, and repeat the pruning process
  4. Halt when the desired sparsity is reached, or when performance drops significantly

IMP was only restricted to small-scale tasks. A more robust pruning method was required for larger models (e.g. tuning learning rate schedules, LTH with rewinding).

Resources

Robert Lange provides a great survey and visualization on LTH in his blog post.