Inference & GPU Optimization: AWQ

Подписаться 10 тыс.

50% 1

Join us as we explore cutting-edge techniques to optimize Large Language Models (LLMs) for inference! This event will dive into the tradeoffs between performance and cost in both LLMs and Small Language Models (SLMs). Learn how quantization, specifically Activate-aware Quantization (AWQ), compresses models while maintaining top-notch performance. We'll break down the findings from recent research and show you how to apply these techniques using Transformers. If you're interested in maximizing output while minimizing compute, this is an event you won't want to miss!
Event page: bit.ly/GPUOpti...
Have a question for a speaker? Drop them here:
app.sli.do/eve...
Speakers:
Dr. Greg, Co-Founder & CEO AI Makerspace
/ gregloughane
The Wiz, Co-Founder & CTO AI Makerspace
/ csalexiuk
Apply for our new AI Engineering Bootcamp on Maven today!
bit.ly/aie1
For team leaders, check out!
aimakerspace.i...
Join our community to start building, shipping, and sharing with us today!
/ discord
How'd we do? Share your feedback and suggestions for future events.
forms.gle/ZTeb...

Опубликовано:

1 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 1

@AI-Makerspace 6 дней назад

AI Makerspace: Activation Aware Weight Quantization (AWQ): colab.research.google.com/drive/1eCwenXmSd7u8ZM3TSqIDm4V7BZ_3K6dA?usp=sharing Event Slides: www.canva.com/design/DAGRxxAiqtw/MIy6aqafzIfThBRBTPc86Q/view?DAGRxxAiqtw&