AI helps you reading Science

AI generates interpretation videos

AI extracts and analyses the key points of the paper to generate videos automatically


pub
Go Generating

AI Traceability

AI parses the academic lineage of this thesis


Master Reading Tree
Generate MRT

AI Insight

AI extracts a summary of this paper


Weibo:
We show that preprocessing can be the bottleneck in end-to-end DNN inference

Jointly Optimizing Preprocessing and Inference for DNN-based Visual Analytics

Proc. VLDB Endow., no. 2 (2020): 87-100

Cited by: 2|Views64
EI

Abstract

While deep neural networks (DNNs) are an increasingly popular way to query large corpora of data, their significant runtime remains an active area of research. As a result, researchers have propose...

Code:

Data:

0
Introduction
  • Deep neural networks (NNs) power a range of visual analytics tasks and systems [8, 34, 37, 39] due to their high accuracy, but stateof-the-art DNNs can be computationally expensive.
  • To execute visual analytics queries efficiently, systems builders have developed optimizations to trade off accuracy and throughput [8, 34, 37, 39, 43]: more accurate DNNs are more computationally expensive [31, 61, 62].
  • This prior work focuses solely on reducing DNN execution time
  • These systems were built before recent DNN accelerators were introduced and were benchmarked on older accelerators.
  • Tahoma benchmarks on the NVIDIA K80 GPU, which executes ResNet-50 at 159 images/second
Highlights
  • Deep neural networks (NNs) power a range of visual analytics tasks and systems [8, 34, 37, 39] due to their high accuracy, but stateof-the-art DNNs can be computationally expensive
  • Our results show that new accelerators have dramatically improved throughputs and reduced both dollar and power costs of DNN execution
  • We describe cost modeling for DNNs and how prior work estimated the throughput of DNN execution
  • We have found that standard ResNet configurations [31] (18 to 152) strongly outperforms specialized NNs used in prior work
  • Until very recently, executing the DNN computational graph was the overwhelming bottleneck in DNN execution, but we show evidence that trend has reversed (§2)
  • We introduce novel methods of achieving accuracy and throughput trade-offs by using natively present, low-resolution visual data
  • We show that preprocessing can be the bottleneck in end-to-end DNN inference
Methods
  • The authors benchmarked the popular ResNet-50 model for image classification [31], which has widely been used in benchmarking [1, 21] and has been considered expensive.
  • The authors used the g4dn.xlarge AWS instance which has 4 vCPU cores: this configuration is cost balanced between vCPUs and the accelerator (§7).
  • This instance type is optimized for DNN inference; similar instances are available on other cloud providers.
  • The authors evaluate the optimizations on four image datasets and four video datasets.
  • The task for the image datasets is image classification.
  • The task for the video datasets is an aggregation query for the number of target objects per frame.
  • The authors measure query runtime as the error bounds were respected
Conclusion
  • The authors show that preprocessing can be the bottleneck in end-to-end DNN inference.
  • The authors show that the preprocessing costs are accounted for incorrectly in cost models for selecting models in visual analytics applications.
  • To address these issues, the authors build Smol, an optimizing runtime engine for end-to-end DNN inference.
  • Smol contains two novel optimization for end-to-end DNN inference: 1) an improved cost model for estimating DNN throughput and 2) joint optimizations for preprocessing and DNN execution that leverage low-resolution data.
  • The authors evaluate Smol and these optimizations and show that Smol can achieve up to 5.9× improvements in throughput
Tables
  • Table1: Throughput of ResNet-50 on the T4 with three different execution environments. Keras was used in [<a class="ref-link" id="c8" href="#r8">8</a>]. The efficient use of hardware can result in over a 17× improvement in throughput. We used the optimal batch size for each framework (64, 256, and 64 respectively)
  • Table2: Throughput and top-one accuracy for ResNets of different depths. As shown, there is a trade off between accuracy and throughput (i.e., computation)
  • Table3: We show measurements of preprocessing, DNN execution, and pipelined end-to-end DNN inference for three configurations of DNNs and input formats: balanced, preprocessing-bound, and DNN-execution bound. We measure the throughput in images per second of preprocessing, DNN execution, and end-to-end DNN inference on the left. We show the throughput estimation and error in estimation for three cost models on the right. We bold the most accurate estimate. As shown, Smol matches or ties the most accurate estimate for all conditions.1
  • Table4: A list of popular visual data formats and their lowfidelity features. Many popular formats contain methods of decoding parts of the visual data, including the popular
  • Table5: Throughput of ResNet-50 on GPU accelerators. Throughput has improved by over 94× in three years and will continue to improve. The T4 is an inference optimized accelerator that is significantly more power efficient than the V100, but contains similar hardware units
  • Table6: Summary of dataset statistics for the still image datasets we used in our evaluation. The datasets range in difficulty and number of classes. bike-bird is the easiest dataset to classify and imagenet is the hardest to classify
  • Table7: Effect of training procedure and input format on accuracy for ResNet-50 and ResNet-34 on imagenet, the most difficult dataset. Smol can achieve an accuracy throughput trade-off by simply changing the input format, e.g., low-resolution ResNet-50 (low-resol train, 50) on 161, JPEG (q = 95) achieves approximately the same accuracy as ResNet-34 (reg train, 34) on full resolution data (full resol), namely 71.94% accuracy compared to 72.72% accuracy. Smol can also achieve no loss in accuracy for easier datasets (e.g., bike-bird)
  • Table8: Throughput and cost of Smol with and without optimizations at variable number of vCPU cores to achieve
Download tables as Excel
Related work
  • Visual analytics systems. Contemporary visual analytics systems leverage DNNs for high accuracy predictions and largely focus on optimizing the cost of executing these DNNs [8, 12, 38, 39, 43]. These systems typically use smaller proxy models, such as specialized NNs to accelerate analytics. However, as we have showed, modern hardware and compilers can create bottlenecks elsewhere in the end-to-end execution of DNNs.

    Other video analytics systems, such as Scanner [54] or VideoStorm [70] optimize queries as a black box. These systems aim to use all available hardware resources but do not jointly optimize preprocessing and DNN execution.
Funding
  • This research was supported in part by affiliate members and other supporters of the Stanford DAWN project—Ant Financial, Facebook, Google, Infosys, NEC, and VMware—as well as Toyota Research Institute, Northrop Grumman, Amazon Web Services, Cisco, and the NSF under CAREER grant CNS-1651570
Study subjects and analysis
visual datasets: 8
This runtime engine a) efficiently pipelines preprocessing and DNN execution for inference, b) places preprocessing operations on the CPU or GPU in a hardware- and input-aware manner, and c) efficiently manages memory and threading for high throughput execution. We implement these optimizations in a novel system, Smol, and evaluate Smol on eight visual datasets. We show that its optimizations can achieve up to 5.9x end-to-end throughput improvements at a fixed accuracy over recent work in visual analytics

image datasets: 4
Overview. We evaluate our optimizations on four image datasets and four video datasets. The task for the image datasets is image classification

visual datasets: 8
Furthermore, accelerators will become more efficient. We evaluated Smol on eight visual datasets and show that Smol can outperform baselines by up to 5.9× for image datasets and 10× for video datasets at a fixed accuracy level. 8.1 Experimental Setup

video datasets: 4
8.4 Video Analytics Experiments. We evaluated Smol on the four video datasets described above. We used the exact experimental configuration from BlazeIt as the baseline, with the exception of executing BlazeIt’s specialized NNs in Smol’s optimized runtime engine

Reference
  • 2018. MLPerf. https://mlperf.org/.[2] 2019.folly.https://github.com/facebook/folly.[3] 2019. NVIDIA TensorRT.https://developer.nvidia.com/tensorrt [4] 2019. ONNX.https://onnx.ai/
    Locate open access versionFindings
  • [5] Jorge Albericio, Alberto Delmás, Patrick Judd, Sayeh Sharify, Gerard O’Leary, Roman Genov, and Andreas Moshovos. 2017. Bit-pragmatic deep neural network computing. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 382–394.
    Google ScholarLocate open access versionFindings
  • [6] Jorge Albericio, Patrick Judd, Tayler Hetherington, Tor Aamodt, Natalie Enright Jerger, and Andreas Moshovos. 2016. Cnvlutin: Ineffectual-neuron-free deep neural network computing. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 1–13.
    Google ScholarLocate open access versionFindings
  • [7] Corrado Alessio. 2019. Animals-10. https://www.kaggle.com/alessiocorrado99/animals10
    Findings
  • [8] Michael R Anderson, Michael Cafarella, Thomas F Wenisch, and German Ros. 2019. Predicate Optimization for a Visual Analytics Database. ICDE (2019).
    Google ScholarLocate open access versionFindings
  • [9] Elizabeth Arens. 2019. Always Up-to-Date Guide to Social Media Image Sizes. https://sproutsocial.com/insights/social-media-image-sizes-guide/
    Findings
  • [10] Christopher M Bishop. 2006. Pattern recognition and machine learning. springer.
    Google ScholarFindings
  • [11] Tom B Brown, Nicholas Carlini, Chiyuan Zhang, Catherine Olsson, Paul Christiano, and Ian Goodfellow. 201Unrestricted adversarial examples. arXiv preprint arXiv:1809.08352 (2018).
    Findings
  • [12] Christopher Canel, Thomas Kim, Giulio Zhou, Conglong Li, Hyeontaek Lim, David Andersen, Michael Kaminsky, and Subramanya Dulloor. 201Scaling Video Analytics on Constrained Edge Nodes. SysML (2019).
    Google ScholarLocate open access versionFindings
  • [13] Srimat Chakradhar, Murugan Sankaradas, Venkata Jakkula, and Srihari Cadambi. 20A dynamically configurable coprocessor for convolutional neural networks. ACM SIGARCH Computer Architecture News 38, 3 (2010), 247–257.
    Google ScholarLocate open access versionFindings
  • [14] Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. 2018. {TVM}: An Automated End-to-End Optimizing Compiler for Deep Learning. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 578–594.
    Google ScholarLocate open access versionFindings
  • [15] Yunji Chen, Tianshi Chen, Zhiwei Xu, Ninghui Sun, and Olivier Temam. 2016. DianNao family: energy-efficient hardware accelerators for machine learning. Commun. ACM 59, 11 (2016), 105–112.
    Google ScholarLocate open access versionFindings
  • [16] Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 367–379.
    Google ScholarLocate open access versionFindings
  • [17] Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 27–39.
    Google ScholarLocate open access versionFindings
  • [18] François Chollet et al. 20Keras.
    Google ScholarFindings
  • [19] Charilaos Christopoulos, Athanassios Skodras, and Touradj Ebrahimi. 2000. The JPEG2000 still image coding system: an overview. IEEE transactions on consumer electronics 46, 4 (2000), 1103–1127.
    Google ScholarLocate open access versionFindings
  • [20] Cody Coleman, Daniel Kang, Deepak Narayanan, Luigi Nardi, Tian Zhao, Jian Zhang, Peter Bailis, Kunle Olukotun, Chris Re, and Matei Zaharia. 2018. Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark. arXiv preprint arXiv:1806.01427 (2018).
    Findings
  • [21] Cody Coleman, Deepak Narayanan, Daniel Kang, Tian Zhao, Jian Zhang, Luigi Nardi, Peter Bailis, Kunle Olukotun, Chris Ré, and Matei Zaharia. 2017. DAWNBench: An End-to-End Deep Learning Benchmark and Competition. Training 100, 101 (2017), 102.
    Google ScholarLocate open access versionFindings
  • [22] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255.
    Google ScholarLocate open access versionFindings
  • [23] Steven Eliuk, Cameron Upright, Hars Vardhan, Stephen Walsh, and Trevor Gale. 2016. dMath: Distributed Linear Algebra for DL. arXiv preprint arXiv:1611.07819 (2016).
    Findings
  • [24] Clément Farabet, Berin Martini, Benoit Corda, Polina Akselrod, Eugenio Culurciello, and Yann LeCun. 2011. Neuflow: A runtime reconfigurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on. IEEE, 109–116.
    Google ScholarLocate open access versionFindings
  • [25] Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Todd Massengill, Ming Liu, Daniel Lo, Shlomi Alkalay, Michael Haselman, Logan Adams, Mahdi Ghandi, et al. 2018. A configurable cloud-scale DNN processor for real-time AI. In Proceedings of the 45th Annual International Symposium on Computer Architecture. IEEE Press, 1–14.
    Google ScholarLocate open access versionFindings
  • [26] T Gale, S Eliuk, and C Upright. 2017. High-Performance Data Loading and Augmentation for Deep Neural Network Training. In GPU technology conference 2017.
    Google ScholarLocate open access versionFindings
  • [27] Vinayak Gokhale, Jonghoon Jin, Aysegul Dundar, Berin Martini, and Eugenio Culurciello. 2014. A 240 g-ops/s mobile coprocessor for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 682–687.
    Google ScholarLocate open access versionFindings
  • [28] Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A Horowitz, and William J Dally. 2016. EIE: efficient inference engine on compressed deep neural network. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on. IEEE, 243–254.
    Google ScholarLocate open access versionFindings
  • [29] Song Han, Huizi Mao, and William J Dally. 2015. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149 (2015).
    Findings
  • [30] Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. 2017. Mask r-cnn. In ICCV. IEEE, 2980–2988.
    Google ScholarLocate open access versionFindings
  • [31] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.
    Google ScholarLocate open access versionFindings
  • [32] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
    Findings
  • [33] Andrew G Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
    Findings
  • [34] Kevin Hsieh, Ganesh Ananthanarayanan, Peter Bodik, Paramvir Bahl, Matthai Philipose, Phillip B Gibbons, and Onur Mutlu. 2018. Focus: Querying Large Video Datasets with Low Latency and Low Cost. OSDI (2018).
    Google ScholarLocate open access versionFindings
  • [35] Norman P Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, Sarah Bates, Suresh Bhatia, Nan Boden, Al Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on. IEEE, 1–12.
    Google ScholarLocate open access versionFindings
  • [36] Patrick Judd, Jorge Albericio, Tayler Hetherington, Tor M Aamodt, and Andreas Moshovos. 2016. Stripes: Bit-serial deep neural network computing. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1–12.
    Google ScholarLocate open access versionFindings
  • [37] Daniel Kang, Peter Bailis, and Matei Zaharia. 2019. BlazeIt: optimizing declarative aggregation and limit queries for neural network-based video analytics. Proceedings of the VLDB Endowment 13, 4 (2019), 533–546.
    Google ScholarLocate open access versionFindings
  • [38] Daniel Kang, Peter Bailis, and Matei Zaharia. 2019. Challenges and Opportunities in DNN-Based Video Analytics: A Demonstration of the BlazeIt Video Query Engine. CIDR.
    Google ScholarLocate open access versionFindings
  • [39] Daniel Kang, John Emmons, Firas Abuzaid, Peter Bailis, and Matei Zaharia. 2017. NoScope: optimizing neural network queries over video at scale. PVLDB 10, 11 (2017), 1586–1597.
    Google ScholarLocate open access versionFindings
  • [40] Chris Leary and Todd Wang. 2017. XLA: TensorFlow, compiled. TensorFlow Dev Summit (2017).
    Google ScholarFindings
  • [41] Shuangchen Li, Dimin Niu, Krishna T Malladi, Hongzhong Zheng, Bob Brennan, and Yuan Xie. 2017. Drisa: A dram-based reconfigurable in-situ accelerator. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. ACM, 288–301.
    Google ScholarLocate open access versionFindings
  • [42] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. 2016. Ssd: Single shot multibox detector. In European conference on computer vision. Springer, 21–37.
    Google ScholarLocate open access versionFindings
  • [43] Yao Lu, Aakanksha Chowdhery, Srikanth Kandula, and Surajit Chaudhuri. 2018. Accelerating Machine Learning Inference with Probabilistic Predicates. In SIGMOD. ACM, 1493–1508.
    Google ScholarLocate open access versionFindings
  • [44] Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, et al. 2019. Mlperf training benchmark. arXiv preprint arXiv:1910.01500 (2019).
    Findings
  • [45] Bert Moons and Marian Verhelst. 2016. A 0.3–2.6 TOPS/W precision-scalable processor for real-time large-scale ConvNets. In VLSI Circuits (VLSI-Circuits), 2016 IEEE Symposium on. IEEE, 1–2.
    Google ScholarLocate open access versionFindings
  • [46] NVIDIA. 2019. NVIDIA DALI. https://docs.nvidia.com/deeplearning/sdk/dalideveloper-guide/docs/index.html
    Findings
  • [47] NVIDIA. 2020. NVIDIA T4 Tensor Core GPU for AI Inference. https://www.nvidia.com/en-us/data-center/tesla-t4/
    Findings
  • [48] Shoumik Palkar, James J Thomas, Anil Shanbhag, Deepak Narayanan, Holger Pirk, Malte Schwarzkopf, Saman Amarasinghe, Matei Zaharia, and Stanford InfoLab. 2017. Weld: A common runtime for high performance data analytics. In Conference on Innovative Data Systems Research (CIDR).
    Google ScholarLocate open access versionFindings
  • [49] Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel Emer, Stephen W Keckler, and William J Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 27–40.
    Google ScholarLocate open access versionFindings
  • [50] Seong-Wook Park, Junyoung Park, Kyeongryeol Bong, Dongjoo Shin, Jinmook Lee, Sungpill Choi, and Hoi-Jun Yoo. 2015. An energy-efficient and scalable deep learning/inference processor with tetra-parallel MIMD architecture for big data applications. IEEE transactions on biomedical circuits and systems 9, 6 (2015), 838–848.
    Google ScholarLocate open access versionFindings
  • [51] Adam Paszke, Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang, Zachary DeVito, Zeming Lin, Alban Desmaison, Luca Antiga, and Adam Lerer. 2017. Automatic differentiation in pytorch. (2017).
    Google ScholarFindings
  • [52] Maurice Peemen, Arnaud AA Setio, Bart Mesman, Henk Corporaal, et al. 2013. Memory-centric accelerator design for Convolutional Neural Networks.. In ICCD, Vol. 2013. 13–19.
    Google ScholarLocate open access versionFindings
  • [53] William B Pennebaker and Joan L Mitchell. 1992. JPEG: Still image data compression standard. Springer Science & Business Media.
    Google ScholarFindings
  • [54] Alex Poms, William Crichton, Pat Hanrahan, and Kayvon Fatahalian. 2018. Scanner: Efficient Video Analysis at Scale (To Appear). (2018).
    Google ScholarFindings
  • [55] PyTorch Team. 2018. The road to 1.0: production ready PyTorch. https://pytorch.org/blog/the-road-to-1_0/
    Findings
  • [56] Atul Rahman, Jongeun Lee, and Kiyoung Choi. 2016. Efficient FPGA acceleration of convolutional neural networks using logical-3D compute array. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2016. IEEE, 1393–1398.
    Google ScholarLocate open access versionFindings
  • [57] Brandon Reagen, Paul Whatmough, Robert Adolf, Saketh Rama, Hyunkwang Lee, Sae Kyu Lee, José Miguel Hernández-Lobato, Gu-Yeon Wei, and David Brooks. 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. In ACM SIGARCH Computer Architecture News, Vol. 44. IEEE Press, 267–278.
    Google ScholarLocate open access versionFindings
  • [58] Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, et al. 2019. Mlperf inference benchmark. arXiv preprint arXiv:1911.02549 (2019).
    Findings
  • [59] Yongming Shen, Michael Ferdman, and Peter Milder. 2017. Maximizing CNN accelerator efficiency through resource partitioning. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on. IEEE, 535–547.
    Google ScholarLocate open access versionFindings
  • [60] Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions on circuits and systems for video technology 22, 12 (2012), 1649–1668.
    Google ScholarLocate open access versionFindings
  • [61] Mingxing Tan and Quoc V Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv preprint arXiv:1905.11946 (2019).
    Findings
  • [62] Mingxing Tan, Ruoming Pang, and Quoc V Le. 2019. Efficientdet: Scalable and efficient object detection. arXiv preprint arXiv:1911.09070 (2019).
    Findings
  • [63] David Taubman and Michael Marcellin. 2012. JPEG2000 image compression fundamentals, standards and practice: image compression fundamentals, standards and practice. Vol. 642. Springer Science & Business Media.
    Google ScholarLocate open access versionFindings
  • [64] Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, et al. 2017. Scaledeep: A scalable compute architecture for learning and evaluating deep networks. In ACM SIGARCH Computer Architecture News, Vol. 45. ACM, 13–26.
    Google ScholarLocate open access versionFindings
  • [65] Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200-2011 dataset. (2011).
    Google ScholarFindings
  • [66] Gregory K Wallace. 1992. The JPEG still picture compression standard. IEEE transactions on consumer electronics 38, 1 (1992), xviii–xxxiv.
    Google ScholarLocate open access versionFindings
  • [67] Thomas Wiegand, Gary J Sullivan, Gisle Bjontegaard, and Ajay Luthra. 2003. Overview of the H. 264/AVC video coding standard. IEEE Transactions on circuits and systems for video technology 13, 7 (2003), 560–576.
    Google ScholarLocate open access versionFindings
  • [68] William Wong. 2018. Habana Enters Machine-Learning Derby with Goya Platform. https://www.electronicdesign.com/industrial-automation/habanaenters-machine-learning-derby-goya-platform
    Findings
  • [69] Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1492–1500.
    Google ScholarLocate open access versionFindings
  • [70] Haoyu Zhang, Ganesh Ananthanarayanan, Peter Bodik, Matthai Philipose, Paramvir Bahl, and Michael J Freedman. 2017. Live Video Analytics at Scale with Approximation and Delay-Tolerance. In NSDI, Vol. 9. 1.
    Google ScholarLocate open access versionFindings
  • [71] Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.
    Google ScholarLocate open access versionFindings
  • [72] Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, and Yichen Wei. 2017. Flow-guided feature aggregation for video object detection. arXiv preprint arXiv:1703.10025 (2017).
    Findings
Author
Ankit Mathur
Ankit Mathur
Teja Veeramacheneni
Teja Veeramacheneni
Your rating :
0

 

Tags
Comments
数据免责声明
页面数据均来自互联网公开来源、合作出版商和通过AI技术自动分析结果,我们不对页面数据的有效性、准确性、正确性、可靠性、完整性和及时性做出任何承诺和保证。若有疑问,可以通过电子邮件方式联系我们:report@aminer.cn
小科