TensorFlow: A system for large-scale machine learning

OSDI, 2016.

Cited by: 2|Bibtex|Views605
EI
Other Links: dblp.uni-trier.de|dl.acm.org|academic.microsoft.com|arxiv.org
Weibo:
By choosing a unified dataflow graph to represent all computation in TensorFlow, we have enabled users to experiment with features that were built into the runtime of our previous system

Abstract:

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computatio...More

Code:

Data:

Introduction
  • Machine learning has driven advances in many different fields [3, 5, 23, 24, 30, 27, 40, 45, 48, 50, 55, 68, 69, 73, 76].
  • Recent breakthroughs in image classification models have benefited from the public ImageNet dataset, which contains 136 gigabytes of digital images [65]; and language modeling has benefited from efforts like the One Billion Word Benchmark [10]
  • The scale of these datasets motivates a dataparallel approach to training: a distributed file system holds the data, and a set of workers processes different subsets of data in parallel.
  • A distributed system can shard the model across many processes, to increase the available network bandwidth when many workers are simultaneously reading and updating the model
Highlights
  • In recent years, machine learning has driven advances in many different fields [3, 5, 23, 24, 30, 27, 40, 45, 48, 50, 55, 68, 69, 73, 76]
  • We introduce the TensorFlow system1 for experimenting with new models, training them on large datasets, and moving them into production
  • We focus on neural network training as a challenging systems problem, and select two representative applications from this space: image classification and language modeling
  • Recent breakthroughs in image classification models have benefited from the public ImageNet dataset, which contains 136 gigabytes of digital images [65]; and language modeling has benefited from efforts like the One Billion Word Benchmark [10]
  • Effective learned models for image recognition, language modeling, document clustering, and many other problems have a large number of parameters
  • By choosing a unified dataflow graph to represent all computation in TensorFlow, we have enabled users to experiment with features that were built into the runtime of our previous system [21]
Conclusion
  • The authors have described the TensorFlow system and its extensible dataflow-based programming model.
  • The core idea of this paper is that TensorFlow’s dataflow representation subsumes existing work on parameter server systems, and offers a uniform programming model that allows users to harness large-scale heterogeneous systems, both for production tasks and for experimenting with new approaches.
  • Since the authors released TensorFlow as open-source software, over 8,000 people have forked the source code repository, the binary distribution has been downloaded 500,000 times, and the users have published dozens of machine learning models that use TensorFlow
Summary
  • Introduction:

    Machine learning has driven advances in many different fields [3, 5, 23, 24, 30, 27, 40, 45, 48, 50, 55, 68, 69, 73, 76].
  • Recent breakthroughs in image classification models have benefited from the public ImageNet dataset, which contains 136 gigabytes of digital images [65]; and language modeling has benefited from efforts like the One Billion Word Benchmark [10]
  • The scale of these datasets motivates a dataparallel approach to training: a distributed file system holds the data, and a set of workers processes different subsets of data in parallel.
  • A distributed system can shard the model across many processes, to increase the available network bandwidth when many workers are simultaneously reading and updating the model
  • Conclusion:

    The authors have described the TensorFlow system and its extensible dataflow-based programming model.
  • The core idea of this paper is that TensorFlow’s dataflow representation subsumes existing work on parameter server systems, and offers a uniform programming model that allows users to harness large-scale heterogeneous systems, both for production tasks and for experimenting with new approaches.
  • Since the authors released TensorFlow as open-source software, over 8,000 people have forked the source code repository, the binary distribution has been downloaded 500,000 times, and the users have published dozens of machine learning models that use TensorFlow
Tables
  • Table1: Step times for training four convolutional models with different libraries, using one GPU. All results are for training with 32-bit floats. The fastest library for each model is shown in bold
Download tables as Excel
Related work
  • Single-machine frameworks Many machine learning researchers carry out their work on a single—often GPUequipped—computer [41, 42], and many flexible singlemachine frameworks have emerged to support this scenario. Caffe [36] is a high-performance framework for training declaratively specified convolutional neural networks that runs on multicore CPUs and GPUs. Theano [2] allows programmers to express a model as a dataflow graph, and generates efficient compiled code for training that model. Torch [17] has an imperative programming model for scientific computation (including machine learning) that supports fine-grained control over the order of execution and memory utilization.

    While these frameworks do not satisfy our requirement for distributed execution, TensorFlow’s programming model is close to Theano’s dataflow representation (§3).

    Batch dataflow systems Starting with MapReduce [22], batch dataflow systems have been applied to a large number of machine learning algorithms [71], and more recent systems have focused on increasing expressivity and performance. DryadLINQ [74] adds a high-level query language that supports more sophisticated algorithms than MapReduce. Spark [75] extends
Reference
  • M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. G. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. A. Tucker, V. Vanhoucke, V. Vasudevan, F. B. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR, abs/1603.04467, 2016. arxiv.org/abs/1603.04467. Software available from tensorflow.org.
    Findings
  • [3] A. Angelova, A. Krizhevsky, and V. Vanhoucke. Pedestrian detection with a large-field-of-view deep network. In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pages 704–711. IEEE, 2015. CalTech PDF.
    Google ScholarLocate open access versionFindings
  • [4] Arvind and D. E. Culler. Annual review of computer science vol. 1, 1986. chapter Dataflow Architectures, pages 225–251986. www.dtic.mil/cgi-bin/GetTRDoc?Location=U2& doc=GetTRDoc.pdf&AD=ADA166235.
    Locate open access versionFindings
  • [5] J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 201arxiv.org/abs/1412.7755.
    Findings
  • [6] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137– 1155, 2003. www.iro.umontreal.ca/ ̃lisa/pointeurs/ BengioDucharmeVincentJauvin jmlr.pdf.
    Locate open access versionFindings
  • [7] T. Brants and A. Franz. Web 1T 5-gram version 1, 200catalog.ldc.upenn.edu/LDC2006T13.
    Google ScholarFindings
  • [8] M. Burrows. The Chubby lock service for looselycoupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI ’06, pages 335–350, Berkeley, CA, USA, 2006. USENIX Association. www.usenix.org/legacy/event/osdi06/tech/full papers/burrows/burrows.pdf.
    Locate open access versionFindings
  • [9] R. H. Byrd, G. M. Chin, J. Nocedal, and Y. Wu. Sample size selection in optimization methods for machine learning. Mathematical Programming, 134(1):127–155, 2012. dx.doi.org/10.1007/s10107012-0572-5.
    Google ScholarLocate open access versionFindings
  • [10] C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, and P. Koehn. One billion word benchmark for measuring progress in statistical language modeling. CoRR, abs/1312.3005, 2013. arxiv.org/abs/1312.3005.
    Findings
  • [11] J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting distributed synchronous SGD. In International Conference on Learning Representations Workshop Track, 2016. arxiv.org/abs/1604.00981.
    Findings
  • [12] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In Proceedings of the Workshop on Machine Learning Systems at Neural Information Processing Systems (LearningSys), Dec. 2015. www.cs.cmu.edu/ muli/file/mxnet-learning-sys.pdf.
    Locate open access versionFindings
  • [13] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014. arxiv.org/abs/1410.0759.
    Findings
  • [14] T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project Adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 571–582, 2014. www.usenix.org/system/files/conference/osdi14/ osdi14-paper-chilimbi.pdf. convnet-benchmarks, 2016.
    Locate open access versionFindings
  • [16] E. S. Chung, J. D. Davis, and J. Lee. LINQits: Big data on little clients. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pages 261–272, New York, NY, USA, 2013. ACM. doi.acm.org/10.1145/2485922.2485945.
    Google ScholarLocate open access versionFindings
  • [17] R. Collobert, S. Bengio, and J. Mariethoz. Torch: A modular machine learning software library. Technical report, IDIAP, 2002. infoscience.epfl.ch/record/82802/files/rr02-46.pdf.
    Google ScholarFindings
  • [18] D. Crankshaw, P. Bailis, J. E. Gonzalez, H. Li, Z. Zhang, M. J. Franklin, A. Ghodsi, and M. I. Jordan. The missing piece in complex analytics: Low latency, scalable model management and serving with Velox. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings, 2015. arxiv.org/abs/1409.3809.
    Findings
  • [19] H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing. GeePS: Scalable deep learning on distributed GPUs with a GPUspecialized parameter server. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, 2016. www.pdl.cmu.edu/PDLFTP/CloudComputing/GeePS-cui-eurosys16.pdf.
    Locate open access versionFindings
  • [20] A. Dai, C. Olah, and Q. V. Le. Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998, 2015. arxiv.org/abs/1507.07998.
    Findings
  • [21] J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, 2012. Google Research PDF.
    Google ScholarLocate open access versionFindings
  • [22] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI’04, Berkeley, CA, USA, 2004. USENIX Association. research.google.com/archive/mapreduceosdi04.pdf.
    Google ScholarLocate open access versionFindings
  • [23] A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. DeVISE: A deep visualsemantic embedding model. In Advances in Neural Information Processing Systems, pages 2121–2129, 2013. research.google.com/pubs/archive/41473.pdf.
    Google ScholarLocate open access versionFindings
  • [24] J. Gonzalez-Dominguez, I. Lopez-Moreno, P. J. Moreno, and J. Gonzalez-Rodriguez. Neural Networks, 64:49–58, 2015.
    Google ScholarLocate open access versionFindings
  • [25] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 2672– 2680, 2014. papers.nips.cc/paper/5423-generativeadversarial-nets.
    Google ScholarLocate open access versionFindings
  • [26] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. arxiv.org/abs/1512.03385.
    Findings
  • [27] G. Heigold, V. Vanhoucke, A. Senior, P. Nguyen, M. Ranzato, M. Devin, and J. Dean. Multilingual acoustic models using distributed deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8619–8623. IEEE, 2013. research.google.com/pubs/archive/40807.pdf.
    Google ScholarLocate open access versionFindings
  • [28] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI’11, pages 295–308, Berkeley, CA, USA, 2011. USENIX Association. www.cs.berkeley.edu/ ̃alig/papers/mesos.pdf.
    Locate open access versionFindings
  • [29] G. E. Hinton. Learning distributed representations of concepts. In Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pages 1–12. Hillsdale, NJ: Erlbaum, 1986. www.cogsci.ucsd.edu/ ̃ajyu/Teaching/Cogs202 sp13/Readings/hinton86.pdf.
    Locate open access versionFindings
  • [30] G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82– 97, 2012. www.cs.toronto.edu/ ̃gdahl/papers/ deepSpeechReviewSPM2012.pdf.
    Locate open access versionFindings
  • [31] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for internetscale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pages 11–11, Berkeley, CA, USA, 2010. USENIX Association. www.usenix.org/legacy/event/atc10/tech/full papers/Hunt.pdf.
    Locate open access versionFindings
  • [32] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015. arxiv.org/abs/1502.03167.
    Findings
  • [33] B. Jacob et al. gemmlowp: a small selfcontained low-precision GEMM library, 2015. github.com/google/gemmlowp.
    Google ScholarFindings
  • [40] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1725–17IEEE, 2014. research.google.com/pubs/archive/42455.pdf.
    Google ScholarLocate open access versionFindings
  • [41] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014. arxiv.org/abs/1404.5997.
    Findings
  • [42] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012. papers.nips.cc/paper/4824imagenet-classification-with-deep-convolutionalneural-networks.pdf.
    Google ScholarLocate open access versionFindings
  • [34] B. Jacob, G. Guennebaud, et al. Eigen library for linear algebra. eigen.tuxfamily.org.
    Google ScholarFindings
  • [35] S. Jean, K. Cho, R. Memisevic, and Y. Bengio. On using very large target vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1–10, Beijing, China, July 2015. Association for Computational Linguistics. www.aclweb.org/anthology/P15-1001.
    Locate open access versionFindings
  • [36] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675–678. ACM, 2014. arxiv.org/pdf/1408.5093. 8608, Institute for Cognitive Science, UCSD, La Jolla, 1986. cseweb.ucsd.edu/ ̃gary/PAPER-
    Findings
  • N. Jouppi. Google supercharges machine learning tasks with TPU custom chip, 2016. cloudplatform.googleblog.com/2016/05/Googlesupercharges-machine-learning-tasks-with-customchip.html.
    Google ScholarFindings
  • [43] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin. Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 10:1–40, Jan. 2009. deeplearning.cs.cmu.edu/pdfs/1111/jmlr10 larochelle.pdf.
    Google ScholarLocate open access versionFindings
  • [44] A. Lavin and S. Gray. Fast algorithms for convolutional neural networks. CoRR, abs/1509.09308, 2015. arxiv.org/abs/1509.09308.
    Findings
  • [45] Q. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Ng. Building highlevel features using large scale unsupervised learning. In ICML’2012, 2012. Google Research PDF.
    Google ScholarLocate open access versionFindings
  • [46] M. Li, D. G. Andersen, J. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 583–598, 2014. www.usenix.org/system/files/conference/osdi14/osdi14paper-chilimbi.pdf.
    Locate open access versionFindings
  • [47] M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 661–670, New York, NY, USA, 2014. ACM. www.cs.cmu.edu/ ̃muli/file/minibatch sgd.pdf.
    Locate open access versionFindings
  • [39] R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu. Exploring the limits of language modeling. CoRR, abs/1602.02410, 2016. arxiv.org/abs/1602.02410.
    Findings
  • [48] C. J. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in Go using deep convolutional neural networks. arXiv preprint arXiv:1412.6564, 2014. arxiv.org/abs/1412.6564.
    Findings
  • [49] F. McSherry, M. Isard, and D. G. Murray. Scalability! But at what COST? In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems, HOTOS’15, Berkeley, CA, USA, 2015. USENIX Association. www.usenix.org/system/files/conference/hotos15/ hotos15-paper-mcsherry.pdf.
    Locate open access versionFindings
  • [50] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations: Workshops Track, 2013. arxiv.org/abs/1301.3781.
    Findings
  • [51] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 02 2015. dx.doi.org/10.1038/nature14236.
    Google ScholarLocate open access versionFindings
  • [58] K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. Chung. Toward accelerating deep learning at scale using specialized logic. In Hot Chips: A Symposium on High Performance Chips. HOTCHIPS, August 2015. research.microsoft.com/apps/pubs/default.aspx?id=246506.
    Google ScholarLocate open access versionFindings
  • [59] R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. In ICML (3), volume 28 of JMLR Proceedings, pages 1310–1318. JMLR.org, 2013. www.jmlr.org/proceedings/papers/v28/pascanu13.pdf.
    Locate open access versionFindings
  • [61] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6):519– 530, 2013. people.csail.mit.edu/fredo/tmp/Halide5min.pdf.
    Google ScholarLocate open access versionFindings
  • P. Moritz, R. Nishihara, I. Stoica, and M. I. Jordan. SparkNet: Training deep networks in Spark. In International Conference on Learning Representations, 2016. arxiv.org/abs/1511.06051.
    Findings
  • Movidius Ltd. Movidius announces Deep Learning Accelerator and Fathom software framework, 2016. www.movidius.com/news/movidius-announcesdeep-learning-accelerator-and-fathom-softwareframework.
    Findings
  • D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439–455. ACM, 2013. Microsoft Research PDF.
    Google ScholarLocate open access versionFindings
  • A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, et al. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296, 2015. arxiv.org/abs/1507.04296.
    Findings
  • Nervana Systems.
    Google ScholarFindings
  • NVIDIA Corporation. NCCL: Optimized primitives for collective multi-gpu communication, 2016. github.com/NVIDIA/nccl.
    Google ScholarFindings
  • [62] B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693–701, 2011. papers.nips.cc/paper/4390-hogwild-a-lockfree-approach-to-parallelizing-stochastic-gradientdescent.
    Google ScholarLocate open access versionFindings
  • [63] C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: a compiler and runtime for heterogeneous systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 49–68. ACM, 2013. research-srv.microsoft.com/pubs/201110/sosp13dandelion-final.pdf.
    Google ScholarLocate open access versionFindings
  • [64] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by backpropagating errors. Cognitive modeling, 5:3, 1988. www.cs.toronto.edu/ hinton/absps/naturebp.pdf.
    Locate open access versionFindings
  • [65] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. arxiv.org/abs/1409.0575.
    Findings
  • [66] A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proc. VLDB Endow., 3(1-2):703–710, Sept. 2010. vldb.org/pvldb/vldb2010/papers/R63.pdf.
    Google ScholarLocate open access versionFindings
  • www.usenix.org/legacy/event/osdi08/tech/full papers/yu y/yu y.pdf.
    Findings
  • [67] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1139–1147. JMLR Workshop and Conference Proceedings, 2013. jmlr.org/proceedings/papers/v28/sutskever13.pdf.
    Google ScholarLocate open access versionFindings
  • [68] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014. papers.nips.cc/paper/5346-sequenceto-sequence-learning-with-neural.
    Google ScholarLocate open access versionFindings
  • [69] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR’2015, 2015. arxiv.org/abs/1409.4842.
    Findings
  • [75] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012. www.usenix.org/system/files/conference/nsdi12/nsdi12final138.pdf.
    Locate open access versionFindings
  • [76] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton. On rectified linear units for speech processing. In ICASSP, 2013. research.google.com/pubs/archive/40811.pdf.
    Google ScholarFindings
  • [70] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015. arxiv.org/abs/1512.00567.
    Findings
  • [71] C. tao Chu, S. K. Kim, Y. an Lin, Y. Yu, G. Bradski, K. Olukotun, and A. Y. Ng. Map-reduce for machine learning on multicore. In B. Scholkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 281–288. MIT Press, 2007. papers.nips.cc/paper/3150-mapreduce-for-machine-learning-on-multicore.pdf.
    Google ScholarLocate open access versionFindings
  • [72] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems, page 18. ACM, 2015. research.google.com/pubs/archive/43438.pdf.
    Google ScholarLocate open access versionFindings
  • [73] O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton. Grammar as a foreign language. Technical report, arXiv:1412.7449, 2014. arxiv.org/abs/1412.7449.
    Findings
  • [74] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed dataparallel computing using a high-level language. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 1–14, Berkeley, CA, USA, 2008. USENIX Association.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments