Jethro's Braindump

chen20_simpl_framew_contr_learn_visual_repres: A simple framework for contrastive learning of visual representations

SimCLR is a simple framework for Contrastive Methods of visual representations.

A simple framework for contrastive learning of visual representations

We do not train the model with a memory bank

Rather than train with a memory bank, they use a large batch size, and the LARS Optimizer to stabilize training.

Key Contributions

  • Composition of data augmentation to form positive pairs
  • introduce a learnable non-linear transformation between the representation and the contrastive loss substantially improves the quality of the learned representations
  • Contrastive learning benefits from larger batch sizes and more training steps compared to supervised learning

Data Augmentation

A stochastic data augmentation module is introduced to produce two correlated views of the same example, denoted x~i and x~j, which is considered a positive pair. Some of these augmentations include:

  • random cropping
  • random color distortions
  • random Gaussian blur

A neural network encoder f() extracts representation vectors from augmented data examples.

A small network projection head g() maps representations to the space where contrastive loss is applied.

The loss function (normalized temperature-scaled cross entropy loss) is applied on the output of g().

A minibatch of N examples is sampled, resulting in 2N data-points. The other 2(N-1) augmented examples within the minibatch is used as negative examples.

i,j=logexp(sim(zi,zj)/τ)k=12N1[ki]exp(sim(zi,zk)/τ)

The Importance of the Projection Head

It is conjectured that the projection head g() is important due to loss of information induced by the contrastive loss. z=g(h) is trained to be invariant to the data transformation. Thus g can remove information that may be useful for the downstream task, such as color or orientation of objects.

<biblio.bib>