Jethro's Braindump
Large Batch Training
Links to this note
Gpipe
LARS Optimizer