Explore how to make LLMs faster and more compact with my latest tutorial on Activation Aware Quantization (AWQ)! In this video, I demonstrate how to apply AWQ to quantize Llama 3, achieving a model that's not only quicker but also smaller than its non-quantized counterpart. Dive into the details of the process and see the benefits in real-time. If you found this video helpful, don't forget to like, comment, and subscribe for more insightful content like this!
Join this channel to get access to perks:
/ @aianytime
To further support the channel, you can contribute via the following methods:
Bitcoin Address: 32zhmo5T9jvu8gJDGW3LTuKBM1KPMHoCsW
UPI: sonu1000raw@ybl
GitHub: github.com/AIA...
Activation Aware Quantization Research paper: arxiv.org/pdf/...
Quantized Model on HF here: huggingface.co...
#llama3 #genai #ai
29 сен 2024