Jethro's Braindump

Speeding Up the Training of Deep Neural Networks

Distributed training architectures rely on two concepts: all-reduce or a parameter server.

BytePS (Jiang et al., n.d.)

BytePS provides a unifying framework for All-reduce and parameter server architectures, showing communication optimality. Intra-machine communication is optimized. It also proposes a “Summation Service”, which accelerates DNN training by running gradient summation on CPUs, while performing parameter updates on GPUs.

Bibliography

Jiang, Yimin, Yibo Zhu, Chang Lan, Bairen Yi, Yong Cui, and Chuanxiong Guo. n.d. “A Unified Architecture for Accelerating Distributed DNN Training in Heterogeneous GPU/CPU Clusters.” In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), 463–79. USENIX Association. https://www.usenix.org/conference/osdi20/presentation/jiang.