This video demonstrates an innovative workflow that combines Meta's open-weight Llama 3 8B model with efficient fine-tuning techniques (LoRA and PEFT) to deploy highly capable AI on resource-constrained devices.
We start by using a 4-bit quantized version of the Llama 3 8B model and fine-tune it on a custom dataset. The fine-tuned model is then exported in the GGUF format, optimized for efficient deployment and inference on edge devices using the GGML library.
Impressively, the fine-tuned Llama 3 8B model accurately recalls and generates responses based on our custom dataset when run locally on a MacBook. This demo highlights the effectiveness of combining quantization, efficient fine-tuning, and optimized inference formats to deploy advanced language AI on everyday devices.
Join us as we explore the potential of fine-tuning and efficiently deploying the Llama 3 8B model on edge devices, making AI more accessible and opening up new possibilities for natural language processing applications.
Be sure to subscribe to stay up-to-date on the latest advances in AI.
My Links
Subscribe: / @scott_ingram
X.com: / scott4ai
GitHub: github.com/sco...
Hugging Face: huggingface.co...
Links:
Colab Demo: colab.research...
Dataset: github.com/sco...
Unsloth Colab: colab.research...
Unsloth Wiki: github.com/uns...
Unsloth Web: unsloth.ai/
25 сен 2024