Speeding Up the Training of Deep Neural Networks

Distributed training architectures rely on two concepts: all-reduce or a parameter server.

BytePS Jiang et al., n.d.

BytePS provides a unifying framework for All-reduce and parameter server architectures, showing communication optimality. Intra-machine communication is optimized. It also proposes a “Summation Service”, which accelerates DNN training by running gradient summation on CPUs, while performing parameter updates on GPUs.