In this session, we welcome Yutao Sun from Tsinghua University, who co-authored the paper "You Only Cache Once: Decoder-Decoder Architectures for Language Models".
About the paper:
--------------------------
YOCO is a decoder-decoder architecture for LLMs which only caches key-value pairs once to improve inference memory, prefill latency, and throughput across context lengths and model sizes.
🔬 You Only Cache Once: Decoder-Decoder Architectures for Language Models: arxiv.org/pdf/...
📝 Yutao Sun, Li Dong, Yi Zhu, Shaohan Huang, Wenhui Wang, Shuming Ma, Quanlu Zhang, Jianyong Wang, Furu Wei
Read also:
----------------
📰 The Deep Dive. Follow the latest AI research and industry trends: unifyai.substa...
📖 Blogs. Dive into the AI deployment stack: unify.ai/blog
Follow us:
----------------
Website: unify.ai
Github: github.com/uni...
Discord: / discord
Twitter: / letsunifyai
Reddit: / unifyai
#ai #machinelearning #deeplearning
7 сен 2024