Llama 3 Fine Tuning for Dummies (with 16k, 32k,... Context)

Просмотров 32 тыс.

% 666

Learn how to easily fine-tune Meta's powerful new Llama 3 language model using Unsloth in this step-by-step tutorial. We cover:
* Overview of Llama 3's 8B and 70B parameter models
* Benchmarks showing Llama 3's strong performance
* How to fine-tune Llama 3 efficiently with Unsloth
* Choosing sequence length based on your data
* Configuring the model, adapter layers, and hyperparameters
* Preparing a custom fine-tuning dataset
* Training the model and monitoring results
* Testing the fine-tuned model on new data
* Saving and publishing your custom model to Hugging Face
Code, example data, and resources provided. Fine-tune Llama 3 for summarization, question-answering, analysis and more. Integrate your model into applications, chatbots, and pipelines.
Free Trial - Our New Diagram Tool: softwaresim.com/pricing/ ("RU-vid24" for 25% Off)
Demonstration Diagram and Code: github.com/nodematiclabs/llama-3-finetune-unsloth
ML Engineering Themed Music via Udio Beta
If you are a cloud, DevOps, or software engineer you’ll probably find our wide range of RU-vid tutorials, demonstrations, and walkthroughs useful - please consider subscribing to support the channel.
0:00 Introduction
1:38 Conceptual Overview
5:31 Initial Setup
7:05 Token Counting
9:57 Model and Quantization
10:23 QLoRA Adapter Layers
11:05 Dataset Preparation
16:43 Trainer Specification
18:43 Inference Testing
21:12 Model Saving (Hugging Face)

Наука

Опубликовано:

24 апр 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 47

@MYPL89 Месяц назад

What is the difference between (push_to_hub) and (push_to_hub_merged in 4bits)? Great video btw, many thanks!!

@triynizzles 3 месяца назад

At 10:32 the unsloth comment says "We only need to update 1 to 10% of all parameters" what does that mean? I recently created my own training data, it has 1015 questions and answers, when I run the trainer for 1 epoch, it only does 127 steps, shouldn't do more?

@nodematic 3 месяца назад

There are "base model" parameters, and then "adapter layer" parameters that are added at the end of the base model, when doing this LoRA fine-tuning. The comment is highlighting that we are only working with the adapter layers at the end when doing this fine tuning - which is around 1-10% of all the parameters. This is normal. You could do full-parameter fine-tuning (which updates the base model parameters), but that's not worth the high computational demands and complexity for most use cases. Each one of your steps is doing a batch when fine-tuning. The effective batch size is: per_device_train_batch_size * gradient_accumulation_steps * number_of_devices. For the demonstrated setup, the effective batch size is 8, meaning 127 steps covers up to 127*8=1016 of your Q&A examples. So, you're using all your Q&A examples, and doing a full pass over your training data, in the 127 steps. You could bump the epochs value if you want to do multiple passes.

@drnicwilliams 5 месяцев назад

LOL “We don’t need this code, so let’s put it in a text cell”

@ralvin3614 4 месяца назад

Really love the play list! Great vidieo.

@triynizzles 4 месяца назад

I have been having tremendous difficulty, can this be run locally in VScode?

@nodematic 4 месяца назад

We haven't tested this, but it should work. The biggest concern would be if you don't have enough GPU memory on your local machine or if you don't have a clean Python packages and CUDA setup.

@triynizzles 4 месяца назад

@@nodematic I have read about it more and it looks like windows isnt acting too friendly and most people are running Linux. :(

@slimpbisquitte3942 5 месяцев назад

Really comprehensive and well-explained! Great work! I wonder if it is also possible to fine-tune not a text generator but an image generator. Does someone have any ideas? I am super new to this field and pretty much in the dark so far. Could not find something for image generation yet :/ Thanks for any suggestions!

@nodematic 5 месяцев назад

We'll try to make a video on this. Thanks for the suggestion.

@AshrafiArad 3 месяца назад

Great Video. Loved the fun generated musics. We don't need this code, so let's put it in the text cell =))

@ChituyiDalmasWakhusama 4 месяца назад

Hi, i keep getting this error "TypeError: argument of type 'NoneType' is not iterable" It is originating from "usr/local/lib/python3.10/dist-packages/transformers/utils/hub.py" Could you please share the requirements.txt. Also it only happens when i try to push "merge_16bit". merge_4bit works just fine!

@shehanmkg 5 месяцев назад

Great explanation. This could be a stupid question. How do we fine-tune for trigger function calling?

@nodematic 5 месяцев назад

Thanks for your question-it's definitely not a stupid one! In your dataset, have fields like "instruction", "prompt", and "function", and then do the string interpolation to create your text field (you could do it similar to the video, but replace "### Story" with "### Prompt" and "### Summary" with "### Function"). Make sure your training set has a consistent format for the function to trigger, and a consistent fallback value for non-triggering cases. Overall, the process should be quite similar to the video. Your model itself won't be able to actually trigger the function - only identify the right function to trigger (and possibly the arguments to supply to the function). You'll need to execute the function as a "next step" in some broader pipeline, application, service, or script. Hope I'm understanding the question correctly and that helps.

@danielhanchen 5 месяцев назад

Oh fantastic video as always - absolutely packed with detailed information so great work!

@SahlEbrahim 2 месяца назад

why arent we tokenizing the finetuningdataset? is it automatically done in the sft trainer

@nodematic 2 месяца назад

Yes, it's done by the Trainer

@nishitp28 5 месяцев назад

Nice Video, What should be the format for data extraction, if I want to extract data from a chunk? Can I include something like: """ {Instruction or System Prompt} ### {Context or Chunks} ### {Question} ### {Answer} """

@nodematic 5 месяцев назад

The "###" lines signify headers, so I wouldn't put your content on those lines - rather, they are used to categorize the line(s) of text below each header. If you're using a chunk of content (e.g., via some sort of RAG approach), yes, you could have that as a separate categorization. Something like: """ {instruction} ### Background {chunk} ### Question {question} ### Answer {answer} """ For the best results, use the header terms in your instruction. For the example above, this could be something like "Based on the provided background, which comes from documentation, FAQs, and/or support tickets, answer the supplied question as clearly and factually as possible. If the background is insufficient to answer the question, answer "I don't know".".

@alokrajsidhaarth7130 5 месяцев назад

Great Video! I had a doubt about RoPE Scaling. How efficient is it and to what extent does it help solve the LLM context window size issue? Thanks!

@nodematic 5 месяцев назад

RoPE is the standard way to solve the context window size issue with these open models. It can come at a quality cost, but it's basically the best method we have if you need to go beyond the model's default context window. Use it only if you truly need the additional tokens. In the video's example, the RoPE scaling is needed, because you simply can't summarize a 16k token story by only looking at the second-half 8k of tokens.

@npip99 5 месяцев назад

@@nodematic @nodematic Is there an easy API for RoPE? I don't even need fine-tuning, I just need a chat completion API for 32k context Llama 3

@nodematic 5 месяцев назад

Yes, you can use RoPE without fine-tuning (e.g., off-the-shelf Llama 3 with a 32k context). I would recommend using Hugging Face libraries, which can be configured for RoPE scaling (for example TGI RoPE scaling is detailed here huggingface.co/docs/text-generation-inference/en/basic_tutorials/preparing_model).

@ShdowGarden 4 месяца назад

hi, I am fine tuning llama 3 model but i am facing some issue. Your video was great. I was hoping to connect with you. Can we connect?

@nodematic 4 месяца назад

Thanks. You can reach out via email at community@nodematic.com. We often do not have the staff to handle technical troubleshooting or architectural consulting, but we'll answer if we can.

@triynizzles 4 месяца назад

Hello, I don't understand how at 11:00 I can change the "yahma/alpaca-cleaned" to a local .json file on my pc?

@nodematic 4 месяца назад

The Hugging Face datasets library is used in either case, to compile a dataset of training strings. The load_dataset("yahma/alpaca-cleaned") approach (or similar) is only if you have your dataset in Hugging Face. The Dataset.from_dict used in the video should work if you read in the data from your local json and use it for the dictionary's "text" value. Depending upon how the text is structured in your JSON, you may need to do string interpolation - the end result "text" values for the dataset need to be pure strings.

@triynizzles 4 месяца назад

@@nodematic Thank you! I may have more questions in the future. :)

@Itsgosm 5 месяцев назад

Amazing video!, been curious if had to train a set of codes, which would have indentations (take example python code), will it still require data to be in ]standard format of having 'instruction', 'output' and 'input'? 150+ codes with quite high complexity will it be possible to train it? are there any other ways to set up the dataset? and is Llama3 capable of getting trained on un-structured data?

@nodematic 5 месяцев назад

Yes, you could use a different, non-Alpaca-style format. For the "text" field creation via string interpolation, replace that with a text block of your code lines (including line breaks). Llama-3 does well on HumanEval, so I suspect it would work well for your described use case. Just be careful with how you create your samples - getting the model to stop after generating the right line/block of code may not be easy (although you could trim things down with post-processing).

@simonstrandgaard5503 4 месяца назад

Great explanation. The background music was a little distracting.

@nodematic 4 месяца назад

Thanks for the feedback - we'll keep this in mind on future videos.

@galavant_amli 2 месяца назад

Would better, if someone give noob tutorial or guide for how to prepare dataset. I do get data is set of input and output, but I dont know to label data

@SameerUddin-q5k 5 месяцев назад

do we need to create repo first before push to hub command ?

@nodematic 5 месяцев назад

No, just replace "hf/model" with your username (or organization name) and desired model name. Also, if you want a private repo, add a private=True argument to push_to_hub_merged.

@adnenbenabdelaali6016 5 месяцев назад

Great video and nice code, can you do this context length extension for Deepseek Coder model ?

@nodematic 5 месяцев назад

I believe it's possible, but I haven't tried yet and there isn't an existing Unsloth model for this. We'll look into it though and try to create a video. Thanks for the suggestion.