Low latency RNN inference with cellular batching

EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, pp. 31:1-31:15, 2018.

Cited by: 24|Bibtex|Views91|DOI:https://doi.org/10.1145/3190508.3190541
EI
Other Links: dblp.uni-trier.de|academic.microsoft.com

Abstract:

Performing inference on pre-trained neural network models must meet the requirement of low-latency, which is often at odds with achieving high throughput. Existing deep learning systems use batching to improve throughput, which do not perform well when serving Recurrent Neural Networks with dynamic dataflow graphs. We propose the techniqu...More

Code:

Data:

Your rating :
0

 

Tags
Comments