# TensorFlow: A system for large-scale machine learning

OSDI, 2016.

EI

Weibo:

Abstract:

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computatio...More

Code:

Data:

Introduction

- Machine learning has driven advances in many different fields [3, 5, 23, 24, 30, 27, 40, 45, 48, 50, 55, 68, 69, 73, 76].
- Recent breakthroughs in image classification models have benefited from the public ImageNet dataset, which contains 136 gigabytes of digital images [65]; and language modeling has benefited from efforts like the One Billion Word Benchmark [10]
- The scale of these datasets motivates a dataparallel approach to training: a distributed file system holds the data, and a set of workers processes different subsets of data in parallel.
- A distributed system can shard the model across many processes, to increase the available network bandwidth when many workers are simultaneously reading and updating the model

Highlights

- In recent years, machine learning has driven advances in many different fields [3, 5, 23, 24, 30, 27, 40, 45, 48, 50, 55, 68, 69, 73, 76]
- We introduce the TensorFlow system1 for experimenting with new models, training them on large datasets, and moving them into production
- We focus on neural network training as a challenging systems problem, and select two representative applications from this space: image classification and language modeling
- Recent breakthroughs in image classification models have benefited from the public ImageNet dataset, which contains 136 gigabytes of digital images [65]; and language modeling has benefited from efforts like the One Billion Word Benchmark [10]
- Effective learned models for image recognition, language modeling, document clustering, and many other problems have a large number of parameters
- By choosing a unified dataflow graph to represent all computation in TensorFlow, we have enabled users to experiment with features that were built into the runtime of our previous system [21]

Conclusion

- The authors have described the TensorFlow system and its extensible dataflow-based programming model.
- The core idea of this paper is that TensorFlow’s dataflow representation subsumes existing work on parameter server systems, and offers a uniform programming model that allows users to harness large-scale heterogeneous systems, both for production tasks and for experimenting with new approaches.
- Since the authors released TensorFlow as open-source software, over 8,000 people have forked the source code repository, the binary distribution has been downloaded 500,000 times, and the users have published dozens of machine learning models that use TensorFlow

Summary

## Introduction:

Machine learning has driven advances in many different fields [3, 5, 23, 24, 30, 27, 40, 45, 48, 50, 55, 68, 69, 73, 76].- Recent breakthroughs in image classification models have benefited from the public ImageNet dataset, which contains 136 gigabytes of digital images [65]; and language modeling has benefited from efforts like the One Billion Word Benchmark [10]
- The scale of these datasets motivates a dataparallel approach to training: a distributed file system holds the data, and a set of workers processes different subsets of data in parallel.
- A distributed system can shard the model across many processes, to increase the available network bandwidth when many workers are simultaneously reading and updating the model
## Conclusion:

The authors have described the TensorFlow system and its extensible dataflow-based programming model.- The core idea of this paper is that TensorFlow’s dataflow representation subsumes existing work on parameter server systems, and offers a uniform programming model that allows users to harness large-scale heterogeneous systems, both for production tasks and for experimenting with new approaches.
- Since the authors released TensorFlow as open-source software, over 8,000 people have forked the source code repository, the binary distribution has been downloaded 500,000 times, and the users have published dozens of machine learning models that use TensorFlow

- Table1: Step times for training four convolutional models with different libraries, using one GPU. All results are for training with 32-bit floats. The fastest library for each model is shown in bold

Related work

- Single-machine frameworks Many machine learning researchers carry out their work on a single—often GPUequipped—computer [41, 42], and many flexible singlemachine frameworks have emerged to support this scenario. Caffe [36] is a high-performance framework for training declaratively specified convolutional neural networks that runs on multicore CPUs and GPUs. Theano [2] allows programmers to express a model as a dataflow graph, and generates efficient compiled code for training that model. Torch [17] has an imperative programming model for scientific computation (including machine learning) that supports fine-grained control over the order of execution and memory utilization.

While these frameworks do not satisfy our requirement for distributed execution, TensorFlow’s programming model is close to Theano’s dataflow representation (§3).

Batch dataflow systems Starting with MapReduce [22], batch dataflow systems have been applied to a large number of machine learning algorithms [71], and more recent systems have focused on increasing expressivity and performance. DryadLINQ [74] adds a high-level query language that supports more sophisticated algorithms than MapReduce. Spark [75] extends

Reference

- M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. J. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mane, R. Monga, S. Moore, D. G. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. A. Tucker, V. Vanhoucke, V. Vasudevan, F. B. Viegas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR, abs/1603.04467, 2016. arxiv.org/abs/1603.04467. Software available from tensorflow.org.
- [3] A. Angelova, A. Krizhevsky, and V. Vanhoucke. Pedestrian detection with a large-field-of-view deep network. In Robotics and Automation (ICRA), 2015 IEEE International Conference on, pages 704–711. IEEE, 2015. CalTech PDF.
- [4] Arvind and D. E. Culler. Annual review of computer science vol. 1, 1986. chapter Dataflow Architectures, pages 225–251986. www.dtic.mil/cgi-bin/GetTRDoc?Location=U2& doc=GetTRDoc.pdf&AD=ADA166235.
- [5] J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 201arxiv.org/abs/1412.7755.
- [6] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137– 1155, 2003. www.iro.umontreal.ca/ ̃lisa/pointeurs/ BengioDucharmeVincentJauvin jmlr.pdf.
- [7] T. Brants and A. Franz. Web 1T 5-gram version 1, 200catalog.ldc.upenn.edu/LDC2006T13.
- [8] M. Burrows. The Chubby lock service for looselycoupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation, OSDI ’06, pages 335–350, Berkeley, CA, USA, 2006. USENIX Association. www.usenix.org/legacy/event/osdi06/tech/full papers/burrows/burrows.pdf.
- [9] R. H. Byrd, G. M. Chin, J. Nocedal, and Y. Wu. Sample size selection in optimization methods for machine learning. Mathematical Programming, 134(1):127–155, 2012. dx.doi.org/10.1007/s10107012-0572-5.
- [10] C. Chelba, T. Mikolov, M. Schuster, Q. Ge, T. Brants, and P. Koehn. One billion word benchmark for measuring progress in statistical language modeling. CoRR, abs/1312.3005, 2013. arxiv.org/abs/1312.3005.
- [11] J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. Revisiting distributed synchronous SGD. In International Conference on Learning Representations Workshop Track, 2016. arxiv.org/abs/1604.00981.
- [12] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. In Proceedings of the Workshop on Machine Learning Systems at Neural Information Processing Systems (LearningSys), Dec. 2015. www.cs.cmu.edu/ muli/file/mxnet-learning-sys.pdf.
- [13] S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759, 2014. arxiv.org/abs/1410.0759.
- [14] T. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project Adam: Building an efficient and scalable deep learning training system. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 571–582, 2014. www.usenix.org/system/files/conference/osdi14/ osdi14-paper-chilimbi.pdf. convnet-benchmarks, 2016.
- [16] E. S. Chung, J. D. Davis, and J. Lee. LINQits: Big data on little clients. In Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pages 261–272, New York, NY, USA, 2013. ACM. doi.acm.org/10.1145/2485922.2485945.
- [17] R. Collobert, S. Bengio, and J. Mariethoz. Torch: A modular machine learning software library. Technical report, IDIAP, 2002. infoscience.epfl.ch/record/82802/files/rr02-46.pdf.
- [18] D. Crankshaw, P. Bailis, J. E. Gonzalez, H. Li, Z. Zhang, M. J. Franklin, A. Ghodsi, and M. I. Jordan. The missing piece in complex analytics: Low latency, scalable model management and serving with Velox. In CIDR 2015, Seventh Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 4-7, 2015, Online Proceedings, 2015. arxiv.org/abs/1409.3809.
- [19] H. Cui, H. Zhang, G. R. Ganger, P. B. Gibbons, and E. P. Xing. GeePS: Scalable deep learning on distributed GPUs with a GPUspecialized parameter server. In Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, 2016. www.pdl.cmu.edu/PDLFTP/CloudComputing/GeePS-cui-eurosys16.pdf.
- [20] A. Dai, C. Olah, and Q. V. Le. Document embedding with paragraph vectors. arXiv preprint arXiv:1507.07998, 2015. arxiv.org/abs/1507.07998.
- [21] J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Y. Ng. Large scale distributed deep networks. In NIPS, 2012. Google Research PDF.
- [22] J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation - Volume 6, OSDI’04, Berkeley, CA, USA, 2004. USENIX Association. research.google.com/archive/mapreduceosdi04.pdf.
- [23] A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. DeVISE: A deep visualsemantic embedding model. In Advances in Neural Information Processing Systems, pages 2121–2129, 2013. research.google.com/pubs/archive/41473.pdf.
- [24] J. Gonzalez-Dominguez, I. Lopez-Moreno, P. J. Moreno, and J. Gonzalez-Rodriguez. Neural Networks, 64:49–58, 2015.
- [25] I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8-13 2014, Montreal, Quebec, Canada, pages 2672– 2680, 2014. papers.nips.cc/paper/5423-generativeadversarial-nets.
- [26] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015. arxiv.org/abs/1512.03385.
- [27] G. Heigold, V. Vanhoucke, A. Senior, P. Nguyen, M. Ranzato, M. Devin, and J. Dean. Multilingual acoustic models using distributed deep neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pages 8619–8623. IEEE, 2013. research.google.com/pubs/archive/40807.pdf.
- [28] B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI’11, pages 295–308, Berkeley, CA, USA, 2011. USENIX Association. www.cs.berkeley.edu/ ̃alig/papers/mesos.pdf.
- [29] G. E. Hinton. Learning distributed representations of concepts. In Proceedings of the Eighth Annual Conference of the Cognitive Science Society, pages 1–12. Hillsdale, NJ: Erlbaum, 1986. www.cogsci.ucsd.edu/ ̃ajyu/Teaching/Cogs202 sp13/Readings/hinton86.pdf.
- [30] G. E. Hinton, L. Deng, D. Yu, G. E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Process. Mag., 29(6):82– 97, 2012. www.cs.toronto.edu/ ̃gdahl/papers/ deepSpeechReviewSPM2012.pdf.
- [31] P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: Wait-free coordination for internetscale systems. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, pages 11–11, Berkeley, CA, USA, 2010. USENIX Association. www.usenix.org/legacy/event/atc10/tech/full papers/Hunt.pdf.
- [32] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR, abs/1502.03167, 2015. arxiv.org/abs/1502.03167.
- [33] B. Jacob et al. gemmlowp: a small selfcontained low-precision GEMM library, 2015. github.com/google/gemmlowp.
- [40] A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 1725–17IEEE, 2014. research.google.com/pubs/archive/42455.pdf.
- [41] A. Krizhevsky. One weird trick for parallelizing convolutional neural networks. arXiv preprint arXiv:1404.5997, 2014. arxiv.org/abs/1404.5997.
- [42] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems, 2012. papers.nips.cc/paper/4824imagenet-classification-with-deep-convolutionalneural-networks.pdf.
- [34] B. Jacob, G. Guennebaud, et al. Eigen library for linear algebra. eigen.tuxfamily.org.
- [35] S. Jean, K. Cho, R. Memisevic, and Y. Bengio. On using very large target vocabulary for neural machine translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1–10, Beijing, China, July 2015. Association for Computational Linguistics. www.aclweb.org/anthology/P15-1001.
- [36] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675–678. ACM, 2014. arxiv.org/pdf/1408.5093. 8608, Institute for Cognitive Science, UCSD, La Jolla, 1986. cseweb.ucsd.edu/ ̃gary/PAPER-
- N. Jouppi. Google supercharges machine learning tasks with TPU custom chip, 2016. cloudplatform.googleblog.com/2016/05/Googlesupercharges-machine-learning-tasks-with-customchip.html.
- [43] H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin. Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 10:1–40, Jan. 2009. deeplearning.cs.cmu.edu/pdfs/1111/jmlr10 larochelle.pdf.
- [44] A. Lavin and S. Gray. Fast algorithms for convolutional neural networks. CoRR, abs/1509.09308, 2015. arxiv.org/abs/1509.09308.
- [45] Q. Le, M. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, and A. Ng. Building highlevel features using large scale unsupervised learning. In ICML’2012, 2012. Google Research PDF.
- [46] M. Li, D. G. Andersen, J. Park, A. J. Smola, A. Ahmed, V. Josifovski, J. Long, E. J. Shekita, and B.-Y. Su. Scaling distributed machine learning with the Parameter Server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 583–598, 2014. www.usenix.org/system/files/conference/osdi14/osdi14paper-chilimbi.pdf.
- [47] M. Li, T. Zhang, Y. Chen, and A. J. Smola. Efficient mini-batch training for stochastic optimization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 661–670, New York, NY, USA, 2014. ACM. www.cs.cmu.edu/ ̃muli/file/minibatch sgd.pdf.
- [39] R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu. Exploring the limits of language modeling. CoRR, abs/1602.02410, 2016. arxiv.org/abs/1602.02410.
- [48] C. J. Maddison, A. Huang, I. Sutskever, and D. Silver. Move evaluation in Go using deep convolutional neural networks. arXiv preprint arXiv:1412.6564, 2014. arxiv.org/abs/1412.6564.
- [49] F. McSherry, M. Isard, and D. G. Murray. Scalability! But at what COST? In Proceedings of the 15th USENIX Conference on Hot Topics in Operating Systems, HOTOS’15, Berkeley, CA, USA, 2015. USENIX Association. www.usenix.org/system/files/conference/hotos15/ hotos15-paper-mcsherry.pdf.
- [50] T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In International Conference on Learning Representations: Workshops Track, 2013. arxiv.org/abs/1301.3781.
- [51] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 02 2015. dx.doi.org/10.1038/nature14236.
- [58] K. Ovtcharov, O. Ruwase, J.-Y. Kim, J. Fowers, K. Strauss, and E. Chung. Toward accelerating deep learning at scale using specialized logic. In Hot Chips: A Symposium on High Performance Chips. HOTCHIPS, August 2015. research.microsoft.com/apps/pubs/default.aspx?id=246506.
- [59] R. Pascanu, T. Mikolov, and Y. Bengio. On the difficulty of training recurrent neural networks. In ICML (3), volume 28 of JMLR Proceedings, pages 1310–1318. JMLR.org, 2013. www.jmlr.org/proceedings/papers/v28/pascanu13.pdf.
- [61] J. Ragan-Kelley, C. Barnes, A. Adams, S. Paris, F. Durand, and S. Amarasinghe. Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. ACM SIGPLAN Notices, 48(6):519– 530, 2013. people.csail.mit.edu/fredo/tmp/Halide5min.pdf.
- P. Moritz, R. Nishihara, I. Stoica, and M. I. Jordan. SparkNet: Training deep networks in Spark. In International Conference on Learning Representations, 2016. arxiv.org/abs/1511.06051.
- Movidius Ltd. Movidius announces Deep Learning Accelerator and Fathom software framework, 2016. www.movidius.com/news/movidius-announcesdeep-learning-accelerator-and-fathom-softwareframework.
- D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439–455. ACM, 2013. Microsoft Research PDF.
- A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. De Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, et al. Massively parallel methods for deep reinforcement learning. arXiv preprint arXiv:1507.04296, 2015. arxiv.org/abs/1507.04296.
- Nervana Systems.
- NVIDIA Corporation. NCCL: Optimized primitives for collective multi-gpu communication, 2016. github.com/NVIDIA/nccl.
- [62] B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693–701, 2011. papers.nips.cc/paper/4390-hogwild-a-lockfree-approach-to-parallelizing-stochastic-gradientdescent.
- [63] C. J. Rossbach, Y. Yu, J. Currey, J.-P. Martin, and D. Fetterly. Dandelion: a compiler and runtime for heterogeneous systems. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 49–68. ACM, 2013. research-srv.microsoft.com/pubs/201110/sosp13dandelion-final.pdf.
- [64] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning representations by backpropagating errors. Cognitive modeling, 5:3, 1988. www.cs.toronto.edu/ hinton/absps/naturebp.pdf.
- [65] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3):211–252, 2015. arxiv.org/abs/1409.0575.
- [66] A. Smola and S. Narayanamurthy. An architecture for parallel topic models. Proc. VLDB Endow., 3(1-2):703–710, Sept. 2010. vldb.org/pvldb/vldb2010/papers/R63.pdf.
- www.usenix.org/legacy/event/osdi08/tech/full papers/yu y/yu y.pdf.
- [67] I. Sutskever, J. Martens, G. E. Dahl, and G. E. Hinton. On the importance of initialization and momentum in deep learning. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1139–1147. JMLR Workshop and Conference Proceedings, 2013. jmlr.org/proceedings/papers/v28/sutskever13.pdf.
- [68] I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014. papers.nips.cc/paper/5346-sequenceto-sequence-learning-with-neural.
- [69] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR’2015, 2015. arxiv.org/abs/1409.4842.
- [75] M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association, 2012. www.usenix.org/system/files/conference/nsdi12/nsdi12final138.pdf.
- [76] M. D. Zeiler, M. Ranzato, R. Monga, M. Mao, K. Yang, Q. Le, P. Nguyen, A. Senior, V. Vanhoucke, J. Dean, and G. E. Hinton. On rectified linear units for speech processing. In ICASSP, 2013. research.google.com/pubs/archive/40811.pdf.
- [70] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. CoRR, abs/1512.00567, 2015. arxiv.org/abs/1512.00567.
- [71] C. tao Chu, S. K. Kim, Y. an Lin, Y. Yu, G. Bradski, K. Olukotun, and A. Y. Ng. Map-reduce for machine learning on multicore. In B. Scholkopf, J. C. Platt, and T. Hoffman, editors, Advances in Neural Information Processing Systems 19, pages 281–288. MIT Press, 2007. papers.nips.cc/paper/3150-mapreduce-for-machine-learning-on-multicore.pdf.
- [72] A. Verma, L. Pedrosa, M. Korupolu, D. Oppenheimer, E. Tune, and J. Wilkes. Large-scale cluster management at Google with Borg. In Proceedings of the Tenth European Conference on Computer Systems, page 18. ACM, 2015. research.google.com/pubs/archive/43438.pdf.
- [73] O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton. Grammar as a foreign language. Technical report, arXiv:1412.7449, 2014. arxiv.org/abs/1412.7449.
- [74] Y. Yu, M. Isard, D. Fetterly, M. Budiu, U. Erlingsson, P. K. Gunda, and J. Currey. DryadLINQ: A system for general-purpose distributed dataparallel computing using a high-level language. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 1–14, Berkeley, CA, USA, 2008. USENIX Association.

Full Text

Tags

Comments