FloatPIM: in-memory acceleration of deep neural network training with high precision

Mohsen Imani
Mohsen Imani
Saransh Gupta
Saransh Gupta
Yeseong Kim
Yeseong Kim

Proceedings of the 46th International Symposium on Computer Architecture, pp. 802-815, 2019.

Cited by: 23|Bibtex|Views122|DOI:https://doi.org/10.1145/3307650.3322237
EI
Other Links: dl.acm.org|dblp.uni-trier.de|academic.microsoft.com
Weibo:
Our evaluation shows that FloatPIM can achieve on average 4.3× and 15.8× higher speedup and energy efficiency as compared to PipeLayer, the state-of-the-art Processing In-Memory accelerator, during training

Abstract:

Processing In-Memory (PIM) has shown a great potential to accelerate inference tasks of Convolutional Neural Network (CNN). However, existing PIM architectures do not support high precision computation, e.g., in floating point precision, which is essential for training accurate CNN models. In addition, most of the existing PIM approaches ...More

Code:

Data:

0
Introduction
  • Artificial neural networks, in particular deep learning [3, 4], have wide range of applications in diverse areas including: object detection [5], self driving car, and translation [6].
  • The on-chip caches do not have enough capacity to store all data for large size CNNs with hundreds of layers and millions of weights
  • This creates a large amount of data movement between the processing cores and memory units which significantly slows down the computation.
  • The result of accumulation passes through an activation function (д)
  • This function is traditionally a Sigmoid [29], but recently Rectangular Linear Unit (ReLU) is the most commonly used [3].
  • The activation results are used as the input for the neurons in the layer
Highlights
  • Artificial neural networks, in particular deep learning [3, 4], have wide range of applications in diverse areas including: object detection [5], self driving car, and translation [6]
  • FloatPIM works with any bipolar resistive technology which is the most commonly used in existing Non-Volatile Memory (NVM)
  • Our evaluation shows that FloatPIM-HP can provide 8.2× higher energy-delay product (EDP) improvement while requiring 3.9× larger memory as compared to FloatPIM-LP
  • Our evaluation shows that FloatPIM in high performance and low power modes can achieve 818.4 GF LOPS/s/W and 695.1 GF LOPS/s/W power efficiency which are higher than both ISAAC (380.7 GOPS/s/W ) and PipeLayer (142.9 GOPS/s/W ) design
  • FloatPIM is a flexible Processing In-Memory (PIM)-based accelerator that works with floating-point as well as fixed-point precision
  • All existing PIM architectures can support Convolutional neural networks (CNN) acceleration just using fixed-point values, which results in up to 5.1% lower classification accuracy than floating point precision supported by FloatPIM
  • Our evaluation shows that FloatPIM can achieve on average 4.3× and 15.8× (6.3× and 21.6×) higher speedup and energy efficiency as compared to PipeLayer (ISAAC), the state-of-the-art PIM accelerator, during training
Results
  • 7.1 Experimental Setup

    The authors have designed and used a cycle-accurate simulator based on Tensorflow [45, 46] which emulates the memory functionality during the DNN training and testing phases.
  • The authors use HSPICE for circuit-level simulations to measure the energy consumption and performance of all the FloatPIM floating-point/fixedpoint operations in 28nm technology.
  • FloatPIM works with any bipolar resistive technology which is the most commonly used in existing NVMs. Here, the authors adopt memristor device with a VTEAM model [36].
  • The model parameters of the memristor, as listed in Table 1, are chosen to produce switching delay of 1ns, a voltage pulse of 1V and 2V for RESET and SET operations in order to fit practical devices [30].
Conclusion
  • The authors proposed FloatPIM, the first PIM-based DNN training architecture that exploits analog properties of the memory without explicitly converting data into the analog domain.
  • FloatPIM is a flexible PIM-based accelerator that works with floating-point as well as fixed-point precision.
  • The authors' evaluation shows that FloatPIM can achieve on average 4.3× and 15.8× (6.3× and 21.6×) higher speedup and energy efficiency as compared to PipeLayer (ISAAC), the state-of-the-art PIM accelerator, during training
Summary
  • Introduction:

    Artificial neural networks, in particular deep learning [3, 4], have wide range of applications in diverse areas including: object detection [5], self driving car, and translation [6].
  • The on-chip caches do not have enough capacity to store all data for large size CNNs with hundreds of layers and millions of weights
  • This creates a large amount of data movement between the processing cores and memory units which significantly slows down the computation.
  • The result of accumulation passes through an activation function (д)
  • This function is traditionally a Sigmoid [29], but recently Rectangular Linear Unit (ReLU) is the most commonly used [3].
  • The activation results are used as the input for the neurons in the layer
  • Results:

    7.1 Experimental Setup

    The authors have designed and used a cycle-accurate simulator based on Tensorflow [45, 46] which emulates the memory functionality during the DNN training and testing phases.
  • The authors use HSPICE for circuit-level simulations to measure the energy consumption and performance of all the FloatPIM floating-point/fixedpoint operations in 28nm technology.
  • FloatPIM works with any bipolar resistive technology which is the most commonly used in existing NVMs. Here, the authors adopt memristor device with a VTEAM model [36].
  • The model parameters of the memristor, as listed in Table 1, are chosen to produce switching delay of 1ns, a voltage pulse of 1V and 2V for RESET and SET operations in order to fit practical devices [30].
  • Conclusion:

    The authors proposed FloatPIM, the first PIM-based DNN training architecture that exploits analog properties of the memory without explicitly converting data into the analog domain.
  • FloatPIM is a flexible PIM-based accelerator that works with floating-point as well as fixed-point precision.
  • The authors' evaluation shows that FloatPIM can achieve on average 4.3× and 15.8× (6.3× and 21.6×) higher speedup and energy efficiency as compared to PipeLayer (ISAAC), the state-of-the-art PIM accelerator, during training
Tables
  • Table1: VTEAM Model Parameters for Memristor kon koff αon, αoff
  • Table2: FloatPIM Parameters
  • Table3: Workloads
  • Table4: Error rate comparison and PIM supports
Download tables as Excel
Related work
  • There are several recent studies adopting alternative low-precision arithmetics for DNN training [51]. work in [52, 53] proposed DNN training on hardware with hybrid dynamic fixed-point and floating point precision. However, in terms of convolutions neural network, the work in [14, 54] showed that fixed-point is not the most suitable representation for CNN training. Instead, the training can perform with lower bits of floating point values.

    Modern neural network algorithms are executed on different types of platforms such as GPU, FPGAs, and ASIC chips [55,56,57,58,59,60,61,62,63].

    Prior work attempted to fully utilize existing cores to accelerate neural networks. However, in their design the main computation still relies on CMOS-based cores, thus has limited parallelism. To address data movement issue, work in [64] proposed a neural cache architecture which re-purposes caches for parallel in-memory computing. Work in [65] modified DRAM architecture to accelerate DNN inference by supporting matrix multiplication in memory. In contrast, FloatPIM performs a row-parallel and non-destructive bitwise operation inside non-volatile memory block without using any sense amplifier. FloatPIM also accelerates DNN in both training and testing modes.
Funding
  • This work was partially supported by CRISP, one of six centers in JUMP, an SRC program sponsored by DARPA, and also NSF grants #1730158 and #1527034
Reference
  • L. Song, X. Qian, H. Li, and Y. Chen, “Pipelayer: A pipelined reram-based accelerator for deep learning,” in High Performance Computer Architecture (HPCA), 2017 IEEE International Symposium on, IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • A. Shafiee, A. Nag, N. Muralimanohar, R. Balasubramonian, J. P. Strachan, M. Hu, R. S. Williams, and V. Srikumar, “Isaac: A convolutional neural network accelerator with in-situ analog arithmetic in crossbars,” in Proceedings of the 43rd International Symposium on Computer Architecture, pp. 14–26, IEEE Press, 2016.
    Google ScholarLocate open access versionFindings
  • Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.
    Google ScholarLocate open access versionFindings
  • J. Schmidhuber, “Deep learning in neural networks: An overview,” Neural networks, vol. 61, pp. 85–117, 2015.
    Google ScholarLocate open access versionFindings
  • C. Dong, C. C. Loy, K. He, and X. Tang, “Learning a deep convolutional network for image super-resolution,” in European conference on computer vision, pp. 184– 199, Springer, 2014.
    Google ScholarLocate open access versionFindings
  • L. Deng, D. Yu, et al., “Deep learning: methods and applications,” Foundations and Trends® in Signal Processing, vol. 7, no. 3–4, pp. 197–387, 2014.
    Google ScholarLocate open access versionFindings
  • D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al., “Mastering the game of go with deep neural networks and tree search,” nature, vol. 529, no. 7587, p. 484, 2016.
    Google ScholarLocate open access versionFindings
  • K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
    Findings
  • M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, “Xnor-net: Imagenet classification using binary convolutional neural networks,” in European Conference on Computer Vision, pp. 525–542, Springer, 2016.
    Google ScholarLocate open access versionFindings
  • M. N. Bojnordi and E. Ipek, “Memristive boltzmann machine: A hardware accelerator for combinatorial optimization and deep learning,” in High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on, pp. 1–13, IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • P. Chi, S. Li, C. Xu, T. Zhang, J. Zhao, Y. Liu, Y. Wang, and Y. Xie, “Prime: A novel processing-in-memory architecture for neural network computation in reram-based main memory,” in Proceedings of the 43rd International Symposium on Computer Architecture, pp. 27–39, IEEE Press, 2016.
    Google ScholarLocate open access versionFindings
  • S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding,” arXiv preprint arXiv:1510.00149, 2015.
    Findings
  • M. Courbariaux, Y. Bengio, and J.-P. David, “Training deep neural networks with low precision multiplications,” arXiv preprint arXiv:1412.7024, 2014.
    Findings
  • C. Louizos, M. Reisser, T. Blankevoort, E. Gavves, and M. Welling, “Relaxed quantization for discretized neural networks,” arXiv preprint arXiv:1810.01875, 2018.
    Findings
  • “Bfloat16 floating point format..” https://en.wikipedia.org/wiki/Bfloat16_
    Findings
  • “Intel xeon processors and intel fpgas..” https://venturebeat.com/2018/05/23/
    Findings
  • “Intel xeon and fpga lines.” https://www.top500.org/news/ https://www.tomshardware.com/news/
    Findings
  • [19] “Google cloud..” https://cloud.google.com/tpu/docs/tensorflow-ops.
    Findings
  • [20] “Tpu repository with tensorflow 1.7.0..” https://blog.riseml.com/
    Findings
  • [21] J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, S. Vasudevan, D. Moore, B. Patton, A. Alemi, M. Hoffman, and R. A. Saurous, “Tensorflow distributions,” arXiv preprint arXiv:1711.10604, 2017.
    Findings
  • [22] “Google. 2018-05-08. retrieved 2018-05-23. in many models this is a drop-in replacement for float-32..” https://www.youtube.com/watch?v=vm67WcLzfvc&
    Findings
  • [23] B. Feinberg, U. K. R. Vengalam, N. Whitehair, S. Wang, and E. Ipek, “Enabling scientific computing on memristive accelerators,” in 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pp. 367–382, IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • http://newsroom.intel.com/community/intel_newsroom/blog/2015/07/28/
    Findings
  • [25] M. Cheng, L. Xia, Z. Zhu, Y. Cai, Y. Xie, Y. Wang, and H. Yang, “Time: A training-inmemory architecture for memristor-based deep neural networks,” in Proceedings of the 54th Annual Design Automation Conference 2017, p. 26, ACM, 2017.
    Google ScholarLocate open access versionFindings
  • [26] Y. Cai, T. Tang, L. Xia, M. Cheng, Z. Zhu, Y. Wang, and H. Yang, “Training low bitwidth convolutional neural network on rram,” in Proceedings of the 23rd Asia and South Pacific Design Automation Conference, pp. 117–122, IEEE Press, 2018.
    Google ScholarLocate open access versionFindings
  • [27] Y. Cai, Y. Lin, L. Xia, X. Chen, S. Han, Y. Wang, and H. Yang, “Long live time: improving lifetime for training-in-memory engines by structured gradient sparsification,” in Proceedings of the 55th Annual Design Automation Conference, p. 107, ACM, 2018.
    Google ScholarLocate open access versionFindings
  • [28] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, pp. 1097–1105, 2012.
    Google ScholarLocate open access versionFindings
  • [29] L. K. Hansen and P. Salamon, “Neural network ensembles,” IEEE transactions on pattern analysis and machine intelligence, vol. 12, no. 10, pp. 993–1001, 1990.
    Google ScholarLocate open access versionFindings
  • [30] S. Kvatinsky, D. Belousov, S. Liman, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, “MagicâĂŤmemristor-aided logic,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 61, no. 11, pp. 895–899, 2014.
    Google ScholarLocate open access versionFindings
  • [31] S. Gupta, M. Imani, and T. Rosing, “Felix: Fast and energy-efficient logic in memory,” in 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 1–7, IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • [32] A. Siemon, S. Menzel, R. Waser, and E. Linn, “A complementary resistive switchbased crossbar array adder,” IEEE journal on emerging and selected topics in circuits and systems, vol. 5, no. 1, pp. 64–74, 2015.
    Google ScholarLocate open access versionFindings
  • [33] S. Kvatinsky, G. Satat, N. Wald, E. G. Friedman, A. Kolodny, and U. C. Weiser, “Memristor-based material implication (IMPLY) logic: design principles and methodologies,” TVLSI, vol. 22, no. 10, pp. 2054–2066, 2014.
    Google ScholarLocate open access versionFindings
  • [34] J. Borghetti, G. S. Snider, P. J. Kuekes, J. J. Yang, D. R. Stewart, and R. S. Williams, “Memristive switches enable stateful logic operations via material implication,” Nature, vol. 464, no. 7290, pp. 873–876, 2010.
    Google ScholarLocate open access versionFindings
  • [35] B. C. Jang, Y. Nam, B. J. Koo, J. Choi, S. G. Im, S.-H. K. Park, and S.-Y. Choi, “Memristive logic-in-memory integrated circuits for energy-efficient flexible electronics,” Advanced Functional Materials, vol. 28, no. 2, 2018.
    Google ScholarLocate open access versionFindings
  • [36] S. Kvatinsky, M. Ramadan, E. G. Friedman, and A. Kolodny, “Vteam: A general model for voltage-controlled memristors,” IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 62, no. 8, pp. 786–790, 2015.
    Google ScholarLocate open access versionFindings
  • [37] N. Talati, S. Gupta, P. Mane, and S. Kvatinsky, “Logic design within memristive memories using memristor-aided logic (magic),” IEEE Transactions on Nanotechnology, vol. 15, no. 4, pp. 635–650, 2016.
    Google ScholarLocate open access versionFindings
  • [38] M. Imani, S. Gupta, and T. Rosing, “Ultra-efficient processing in-memory for data intensive applications,” in Proceedings of the 54th Annual Design Automation Conference 2017, p. 6, ACM, 2017.
    Google ScholarLocate open access versionFindings
  • [39] A. Haj-Ali et al., “Efficient algorithms for in-memory fixed point multiplication using magic,” in IEEE ISCAS, IEEE, 2018.
    Google ScholarFindings
  • [40] M. Imani, D. Peroni, Y. Kim, A. Rahimi, and T. Rosing, “Efficient neural network acceleration on gpgpu using content addressable memory,” in 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1026–1031, IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • [41] C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie, “Overcoming the challenges of crossbar resistive memory architectures,” in 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA), pp. 476–488, IEEE, 2015.
    Google ScholarLocate open access versionFindings
  • [42] A. Nag, R. Balasubramonian, V. Srikumar, R. Walker, A. Shafiee, J. P. Strachan, and N. Muralimanohar, “Newton: Gravitating towards the physical limits of crossbar acceleration,” IEEE Micro, vol. 38, no. 5, pp. 41–49, 2018.
    Google ScholarLocate open access versionFindings
  • [43] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778, 2016.
    Google ScholarLocate open access versionFindings
  • [44] A. Ghofrani, A. Rahimi, M. A. Lastras-Montaño, L. Benini, R. K. Gupta, and K.-T. Cheng, “Associative memristive memory for approximate computing in gpus,” IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 6, no. 2, pp. 222–234, 2016.
    Google ScholarLocate open access versionFindings
  • [45] F. Chollet, “keras.” https://github.com/fchollet/keras, 2015.
    Findings
  • [46] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, et al., “Tensorflow: Large-scale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
    Findings
  • [47] X. Dong, C. Xu, N. Jouppi, and Y. Xie, “Nvsim: A circuit-level performance, energy, and area model for emerging non-volatile memory,” in Emerging Memory Technologies, pp. 15–50, Springer, 2014.
    Google ScholarFindings
  • [48] D. Compiler, R. User, and M. Guide, “Synopsys,” Inc., see http://www.synopsys.com, 2000.
    Findings
  • [49] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1–9, 2015.
    Google ScholarLocate open access versionFindings
  • [50] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer, “Squeezenet: Alexnet-level accuracy with 50x fewer parameters and< 0.5 mb model size,” arXiv preprint arXiv:1602.07360, 2016.
    Findings
  • [51] P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaev, G. Venkatesh, et al., “Mixed precision training,” arXiv preprint arXiv:1710.03740, 2017.
    Findings
  • [52] M. Drumond, T. Lin, M. Jaggi, and B. Falsafi, “End-to-end dnn training with block floating point arithmetic,” arXiv preprint arXiv:1804.01526, 2018.
    Findings
  • [53] D. Das, N. Mellempudi, D. Mudigere, D. Kalamkar, S. Avancha, K. Banerjee, S. Sridharan, K. Vaidyanathan, B. Kaul, E. Georganas, et al., “Mixed precision training of convolutional neural networks using integer operations,” arXiv preprint arXiv:1802.00930, 2018.
    Findings
  • [54] C. De Sa, M. Leszczynski, J. Zhang, A. Marzoev, C. R. Aberger, K. Olukotun, and C. Ré, “High-accuracy low-precision training,” arXiv preprint arXiv:1803.03383, 2018.
    Findings
  • [55] H. Sharma, J. Park, D. Mahajan, E. Amaro, J. K. Kim, C. Shao, A. Mishra, and H. Esmaeilzadeh, “From high-level deep neural models to fpgas,” in The 49th Annual IEEE/ACM International Symposium on Microarchitecture, p. 17, IEEE Press, 2016.
    Google ScholarLocate open access versionFindings
  • [56] J. Albericio, A. Delmás, P. Judd, S. Sharify, G. O’Leary, R. Genov, and A. Moshovos, “Bit-pragmatic deep neural network computing,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 382–394, ACM, 2017.
    Google ScholarLocate open access versionFindings
  • [57] T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam, “Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in ACM Sigplan Notices, vol. 49, pp. 269–284, ACM, 2014.
    Google ScholarLocate open access versionFindings
  • [58] V. Aklaghi, A. Yazdanbakhsh, K. Samadi, H. Esmaeilzadeh, and R. Gupta, “Snapea: Predictive early activation for reducing computation in deep convolutional neural networks,” ISCA, 2018.
    Google ScholarLocate open access versionFindings
  • [59] K. Hegde, J. Yu, R. Agrawal, M. Yan, M. Pellauer, and C. W. Fletcher, “Ucnn: Exploiting computational reuse in deep neural networks via weight repetition,” arXiv preprint arXiv:1804.06508, 2018.
    Findings
  • [60] C. Ding, S. Liao, Y. Wang, Z. Li, N. Liu, Y. Zhuo, C. Wang, X. Qian, Y. Bai, G. Yuan, et al., “C ir cnn: accelerating and compressing deep neural networks using blockcirculant weight matrices,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 395–408, ACM, 2017.
    Google ScholarLocate open access versionFindings
  • [61] E. Nurvitadhi, G. Venkatesh, J. Sim, D. Marr, R. Huang, J. Ong Gee Hock, Y. T. Liew, K. Srivatsan, D. Moss, S. Subhaschandra, et al., “Can fpgas beat gpus in accelerating next-generation deep neural networks?,” in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pp. 5– 14, ACM, 2017.
    Google ScholarLocate open access versionFindings
  • [62] S. Han, X. Liu, H. Mao, J. Pu, A. Pedram, M. A. Horowitz, and W. J. Dally, “Eie: efficient inference engine on compressed deep neural network,” in Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pp. 243–254, IEEE, 2016.
    Google ScholarLocate open access versionFindings
  • [63] Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al., “Dadiannao: A machine-learning supercomputer,” in Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 609–622, IEEE Computer Society, 2014.
    Google ScholarLocate open access versionFindings
  • [64] C. Eckert, X. Wang, J. Wang, A. Subramaniyan, R. Iyer, D. Sylvester, D. Blaauw, and R. Das, “Neural cache: Bit-serial in-cache acceleration of deep neural networks,” arXiv preprint arXiv:1805.03718, 2018.
    Findings
  • [65] S. Li, D. Niu, K. T. Malladi, H. Zheng, B. Brennan, and Y. Xie, “Drisa: A dram-based reconfigurable in-situ accelerator,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, pp. 288–301, ACM, 2017.
    Google ScholarLocate open access versionFindings
  • [66] M. N. Bojnordi and E. Ipek, “The memristive boltzmann machines,” IEEE Micro, vol. 37, no. 3, pp. 22–29, 2017.
    Google ScholarLocate open access versionFindings
  • [67] M. Imani, M. Samragh, Y. Kim, S. Gupta, F. Koushanfar, and T. Rosing, “Rapidnn: In-memory deep neural network acceleration framework,” arXiv preprint arXiv:1806.05794, 2018.
    Findings
  • [68] S. Gupta, M. Imani, H. Kaur, and T. S. Rosing, “Nnpim: A processing in-memory architecture for neural network acceleration,” IEEE Transactions on Computers, 2019.
    Google ScholarLocate open access versionFindings
  • [69] M. Imani, S. Gupta, and T. Rosing, “Genpim: Generalized processing in-memory to accelerate data intensive applications,” in 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 1155–1158, IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • [70] S. Salamat, M. Imani, S. Gupta, and T. Rosing, “Rnsnet: In-memory neural network acceleration using residue number system,” in 2018 IEEE International Conference on Rebooting Computing (ICRC), pp. 1–12, IEEE, 2018.
    Google ScholarLocate open access versionFindings
  • [71] M. Imani, A. Rahimi, D. Kong, T. Rosing, and J. M. Rabaey, “Exploring hyperdimensional associative memory,” in 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 445–456, IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • [72] Y. Kim, M. Imani, and T. Rosing, “Orchard: Visual object recognition accelerator based on approximate in-memory processing,” in 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), pp. 25–32, IEEE, 2017.
    Google ScholarLocate open access versionFindings
  • [73] M. Zhou, M. Imani, S. Gupta, and T. Rosing, “Gas: A heterogeneous memory architecture for graph processing,” in Proceedings of the International Symposium on Low Power Electronics and Design, p. 27, ACM, 2018.
    Google ScholarLocate open access versionFindings
  • [74] M. Zhou, M. Imani, S. Gupta, Y. Kim, and T. Rosing, “Gram: graph processing in a reram-based computational memory,” in Proceedings of the 24th Asia and South Pacific Design Automation Conference, pp. 591–596, ACM, 2019.
    Google ScholarLocate open access versionFindings
  • [75] M. Imani, S. Gupta, S. Sharma, and T. Rosing, “Nvquery: Efficient query processing in non-volatile memory,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2018.
    Google ScholarLocate open access versionFindings
Full Text
Your rating :
0

 

Tags
Comments