CacheGen: KV Cache Compression and Streaming for Fast Large Language Model Serving
PROCEEDINGS OF THE 2024 ACM SIGCOMM 2024 CONFERENCE, ACM SIGCOMM 2024(2024)
Key words
Large Language Models,KV Cache,Compression
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined