Speeding Up the Training of Deep Neural Networks
Distributed training architectures rely on two concepts: all-reduce or a parameter server.
BytePS Jiang et al., n.d.
BytePS provides a unifying framework for All-reduce and parameter server architectures, showing communication optimality. Intra-machine communication is optimized. It also proposes a “Summation Service”, which accelerates DNN training by running gradient summation on CPUs, while performing parameter updates on GPUs.