【机器学习】机器学习系统SysML Distributed learning

Distributed learning

Large Scale Distributed Deep Networks
NIPS 2012
https://ai.google/research/pubs/pub40565.pdf

Managed Communication and Consistency for Fast Data-Parallel Iterative Analytics
SoCC 2015
http://dl.acm.org/authorize?N91363

Ako: Decentralised Deep Learning with Partial Gradient Exchange
SOCC 2016
https://lsds.doc.ic.ac.uk/sites/default/files/ako-socc16.pdf

Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
ATC 2017
https://www.usenix.org/system/files/conference/atc17/atc17-zhang.pdf

Parameter Hub: a Rack-Scale Parameter Server for Distributed Deep Neural Network Training
SoCC 2018
https://dl.acm.org/citation.cfm?id=3267840

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
ML Systems Workshop at NIPS 2016
https://arxiv.org/pdf/1512.01274.pdf

Scaling Distributed Machine Learning with the Parameter Server
OSDI 2014
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-li_mu.pdf

Project Adam: Building an Efficient and Scalable Deep Learning Training System
OSDI 2014
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-chilimbi.pdf

Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co-design
SoCC 2018
https://dl.acm.org/citation.cfm?id=3267810

Petuum: A New Platform for Distributed Machine Learning on Big Data
KDD 2015
https://arxiv.org/pdf/1312.7651.pdf

GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
https://arxiv.org/pdf/1811.06965.pdf

你可能感兴趣的:(【机器学习】机器学习系统SysML Distributed learning)