Low latency RNN inference with cellular batching
EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE, pp. 31:1-31:15, 2018.
Performing inference on pre-trained neural network models must meet the requirement of low-latency, which is often at odds with achieving high throughput. Existing deep learning systems use batching to improve throughput, which do not perform well when serving Recurrent Neural Networks with dynamic dataflow graphs. We propose the techniqu...More
Full Text (Upload PDF)
PPT (Upload PPT)