Fine-tuning LLM with QLoRA on Single GPU: Training Falcon-7b on ChatBot Support FAQ Dataset

Venelin Valkov

Подписаться 27 тыс.

Просмотров 56 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

21 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 122

@venelin_valkov Год назад

Full text turorial (requires MLExpert Pro): www.mlexpert.io/prompt-engineering/fine-tuning-llm-on-custom-dataset-with-qlora

@ko-Daegu Год назад

is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?

@mariocuezzo8027 Год назад

@@ko-Daegu i wanna know this too!

@shivamkapoor7634 Год назад

I pushed my model to hugging face can you please tell me how can i deploy that model

@sithlordi5170 Год назад

Wow, finally a working guide on how to finetune LLM's. Thank you very much 🙏

@LifeTravelerAmmu Год назад

Hello Veneline can you please provide the colab notebook (falcon-qlora-fine-tuning.ipynb)…..if possible

@dataflex4440 Год назад

Please make a video on how to increase the inference speeds that is the major problem every one is facing

@maidacundo3471 Год назад

when adding new special token like and shouldnt you add that tokens to the tokenizer, resize the embedding layer of the model and finetune it? I think this should help the model during the training but also increase the number of trainable paramenters.

@thevitorialima Год назад

I just subscribed!! Your tutorials are straightforward and to the point. Love your content. Keep up with the amazing content! 🙌 ✨✨✨

@josephtsangko3558 10 месяцев назад

Really nice! Thanks for the clearance of the explanation! I wonder, what is the loss function's input here? What is there being compared? Is this self-supervised? So opaque!

@ikjb8561 Год назад

Great video. Would the response times be faster with a better GPU?

@amnasherafal Год назад

Nice video Venelin Valkov, I wanted to ask if I have an input size of 4k+ tokens can I train it on a single GPU?

@ChuanMeng-q4f Год назад

For the tokenizer, I think we should set padding_side="left", because it is a causal llm. What do you think of it?

@quachhengtony7651 Год назад

Is the model multilingual? Can I fine tune it in another language?

@henkhbit5748 Год назад

Great video, and very interesting if you want to find tune with your own dataset 👍 a pity that the response took a long time… any idea how to get it faster?

@ko-Daegu Год назад

is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?

@mariocuezzo8027 Год назад

Excelente video! I need to configure and train a local gpt for chat with SQL database, which one is the better option for fine tunning with single GPU for that?

@TailorJohnson-l5y Год назад

I watch all of your videos, they are wonderful. This one is BY FAR my fav. I know it must have taken a lot of time but THANK YOU so much for doing it! It is so thorough, can we do same thing with MTP-7B?

@venelin_valkov Год назад

I would guess the training process can be similar for MTP-7B, but can't be sure. Try it and let me know. Thank you for watching!

@TailorJohnson-l5y Год назад

@@venelin_valkov I will try and let you know!

@tadificilaxalogin Год назад

@@TailorJohnson-l5y Did it work? :D

@TailorJohnson-l5y Год назад

@@tadificilaxalogin Idk what Im doing wrong here but I have tried to reply to this 4 times and after a day or so it gets removed... It does not work with mtp-7b

@tadificilaxalogin Год назад

@@TailorJohnson-l5y Thanks !! I have had progress with falcon 40b and redpajama. Unfortunately, it seems to be difficult to use this algorithm with more than one GPU with. Have you set your prompt style for training? I am doing these tests now.

@ashioyajotham Год назад

Thank you so much ! Just curious, can it run on a free colab?

@d_b_ Год назад

Fantastic tutorial. Does the training data need to be in Question/Answer format? Would this work if instead this data was a single large block of text and not as structured? Do the models need to be on the Hugging Face servers for inference?

@enggm.alimirzashortclipswh6010 Год назад

never finetune your model on raw data, however, you can do pre-training on raw text.

@d_b_ Год назад

@@enggm.alimirzashortclipswh6010 So there's no concept of something like "unsupervised fine tuning"? If I wanted to adapt a LLM on emails I've sent to sound more like me, I would not want to train from scratch would I?

@yashjain6372 Год назад

@d_b @enggm.alimirzashortclipswh6010 How to fine tune if data look like this? Review(col1) Nice cell phone, big screen, plenty of storage. Stylus pen works well. Analysis(col2) [{“segment”: “Nice cell phone”,“Aspect”: “Cell phone”,“Aspect Category”: “Overall satisfaction”,“sentiment”: “positive”},{“segment”: “big screen”,“Aspect”: “Screen”,“Aspect Category”: “Design”,“sentiment”: “positive”},{“segment”: “plenty of storage”,“Aspect”: “Storage”,“Aspect Category”: “Features”,“sentiment”: “positive”},{“segment”: “Stylus pen works well”,“Aspect”: “Stylus pen”,“Aspect Category”: “Features”,“sentiment”: “positive”}]

@IchSan-jx5eg Год назад

Hello, Great video so far. Let me ask some questions here: 1. What should I do if my training loss is not decrease consistently (sometimes up, sometimes down) ? 2. How to use multiple GPU? I always get OOM if I use Falcon-40B, so I rented 2 GPUs in cloud provider. Unfortunatelly, it ran just for 1 GPU.

@yashjain6372 Год назад

Read about deepspeed packaage

@tptodorov123 Год назад

Браво, Венелине!

@ElearningMode Год назад

Thanks for the great video, can we merge back the adapter.bin to it's original model ? can you make a video onit ?

@SAVONASOTTERRANEASEGRETA Год назад

Hello, since you are very good can you explain two simple things to me? 1- why do Assistants find less than half of what they have in the file? Example: search for Julius Caesar (it is stored 1000 times, but they only find it 10/20 times) question 2 are there any ggml templates specialized in history? Thanks Claudiio

@Mohith7548 Год назад

I get this error: Any idea on how to resolve this: RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations. Parameter at index 63 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.

@TheKizoch Год назад

I get this same error. could you resolve it?

@joaoalmeida4380 Год назад

Hi, thank you for the video! If I want a small model like falcon 7b or other model like t5, to make bots for QA or FAQ, but I need to use and tune for my own language, ex. Portuguese or Spanish. What’s your suggestion? Because I don’t need a large multi language model for this, I think 😅

@bolarinwarahmonismail8248 Год назад

Does it work without the high RAM, I'm using a free version

@shivamkapoor7634 Год назад

I pushed my model to hugging face can you please tell me how can i deploy that model please!

@yusufkemaldemir9393 Год назад

Some of the models recently published/released are not working on M2 MacOS. Any idea if you could make it feasible for M2 Max MacOS? Thanks

@venelin_valkov Год назад

No idea at the moment, there is still no paper with details on the model. You might try the "quickstart" with the transformers library here: huggingface.co/tiiuae/falcon-7b-instruct

@prakaashsukhwal1984 Год назад

great video Venelin..thanks for sharing! will you be sharing any such training video with dialogue datasets for contextual conversations?

@venelin_valkov Год назад

Do you have a dataset in mind? Thanks for watching!

@prakaashsukhwal1984 Год назад

@@venelin_valkov somehow i am unable to paste the URLs of the datasets (tried multiple times :( ).. i have shared a suggestive list in in this google doc and thanks again for the wonderful set of videos. docs.google.com/document/d/1wqCKudZnx0XMsJ8J2n1wfOpG68M9chP_8-zeaU7s53g/edit?usp=sharing

@prakaashsukhwal1984 Год назад

@@venelin_valkov do you think any of the above datasets are useful ? :)

@ggximenez Год назад

Does anyone knnow how to fine tune a QLoRA over another LoRA on a specific model? There is a LoRA that fine-tunes the original Llama model with a translated and cleaned version of Alpaca dataset for Brazilian Portuguese. I would like to fine-tune another LoRA over that.

@PavPetukhov Год назад

Wow, thanks a lot for the video!

@LinPure Год назад

I'm facing this error: mat1 and mat2 shapes cannot be multiplied (26x4544 and 1x10614784) while running this codeblock with torch.inference_mode(): outputs = model.generate( input_ids=encoding.input_ids, attention_mask=encoding.attention_mask, generation_config=generation_config, ) Does anyone have any ideas how I could solve this? Not sure if the problem was caused because I'm using 'prepare_model_for_int8_training' instead of 'prepare_model_for_kbit_training" since I got an error of 'cannot import name 'prepare_model_for_kbit_training' from 'peft'' even on the latest version of peft library

@weystrom Год назад

How much VRAM did you end up using?

@venelin_valkov Год назад

The Google Colab showed 6.9GB VRAM and 4.6GB RAM, during the training (with parameters shown in the video). Not sure how accurate it is, though.

@minhducha8574 11 месяцев назад

How do we compute metrics of this model? When I add compute_metric into trainer and it was error. Can you please add the compute_metric?

@pvlr1788 Год назад

I don't get why inference is so slow. It should be at least as fast as the training. It's true that each "generate" means the model does inference multiple times, does beam search etc... but the same thing happens when you train the model. What am I missing?

@Timotheeee1 Год назад

when you train the model, it gets trained on every token in the text batch at once (it outputs logits at every step)

@pvlr1788 Год назад

@@Timotheeee1 ok, I see. You mean that during the training the model DOES NOT beam search. Am I right? It Just tries to minimize cross entropy loss on next token. I guess beam search is not even differentiable...

@georgetarida5653 Год назад

Does the custom dataset needs to be in english or It could be in any language?

@venelin_valkov Год назад

The Common Crawl dataset (used for this model) contains 40+ languages, so you should be able to use different languages. I haven't tried it myself, though. More info here: commoncrawl.org/ That being said their dataset "RefinedWeb" contains primarily English: huggingface.co/datasets/tiiuae/falcon-refinedweb

@sumitmamoria Год назад

why is the inference consistently slower? Do we know how to speed it up ?

@alyssonmach Год назад

A doubt a little out of the context of the video... are Deep Learning models as used as machine learning models in tabular data?

@sandesh-n3r 11 месяцев назад

Can we Train model with context (Question: " ", Context: " ", Answer:" " ) . So model will answer from context, Like a RAG ???

@meenalpatidar9405 Год назад

Can someone please share the code that has been used in this tutorial

@subhamchoudhary4091 Год назад

I loaded the trained model and it downloaded the whole model again. When I tried generating text according to my use-case with the trained weights, it didn't provide the correct result.

@thisurawz 9 месяцев назад

Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?

@AI_ML_DL_LLM Год назад

Thanks for the video, the masked language model MLM is set to be "False", then how the model is fine-tuned?

@venelin_valkov Год назад

Using "just" language modelling (predict next token). More info here: paperswithcode.com/task/language-modelling

@amparoconsuelo9451 Год назад

Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model? Can you modify a GPT model?

@shivamkapoor7634 Год назад

how to deploy this chat bot model after pushing it to hugging face? i'm talking about qlora fine tuned model

@venelin_valkov Год назад

I made a video on this topic: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-HI3cYN0c9ZU.html Thank you for watching!

@cryptojointer Год назад

what does bnb_4bit_use_double_quant=True do? tried searching for answers, coming up with nothing! lol

@biodata-i1e Год назад

Try example, stuck on training part, having error IndexError: Invalid key: 78 is out of bounds for size 0. Does anyone faced with similar?

@gokhanersoz5239 Год назад

Can you solve that ?

@sathvikreddy4807 Год назад

hey there, how do I create a generative AI chatbox with my own data? let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data I have juggled through the internet today and found 1) Data collection 2) Data preprocessing 3) Selecting a pre trained model(cause it is easy than creating one) 4) Fine tuning the model 5) Iteration This is my understanding as of now so basically how do I have preprocess the data? do I have to learn NLP for that?

@chanderbalaji3539 Год назад

I followed the code above and got following output return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin) RuntimeError: The size of tensor a (24) must match the size of tensor b (19) at non-singleton dimension 1 kindly help a newbie, only change I made was removing #device_map="auto" when loading the base model as I have dual gpu and it was throwing error with 8 bit

@priyabnsl Год назад

Please share the notebook

@aimaven Год назад

How do we add our own data? Just change the link in the jupyter notebook?

@kaihaoliu7869 Год назад

can you share the link to your notebook?

@Jeong5499 Год назад

My model generates multiple redundant answers e.g. : xxxx : xxxx : xxxx : xxxx. How to solve it?

@brijeshkaran5369 Год назад

You're the Best 💯thanks a lot for the video! Can you please upload a video implementing this tutorial using langchain framework.🥺

@venelin_valkov Год назад

You mean use the trained model with LangChain? Thank you for watching!

@brijeshkaran5369 Год назад

@@venelin_valkov yes so it'll be useful for the community "end to end" implementation 🙂

@AnimeOtakuArt Год назад

Can you make a QLoRA for text-summarization task on Falcon7B. That would be very much helpful. Cheers 🍻🍻

@MattJonesYT Год назад

With CUDA you can launch many threads at the same time for a single kernel to solve a problem. Is there a way to do something similar with GPT models? I asked chatgpt and it basically said the limiting factor would probably be the memory needed for each thread might take up about .5 gb. So for instance, if you have 4 gb free GPU RAM after loading the model you should in theory be able to run 8 queries through the gpu at a time. How would that be done with a local gpt?

@pvlr1788 Год назад

As far as I know, if you have free GPU memory, you simply do batched inference, I guess some kind of cuda multi threading takes place there. You can see that training batch size is 1. I guess that bigger batch would cause GPU OOM error.

@MattJonesYT Год назад

@@pvlr1788 Thank you!!! "Batched inference" is exactly the term I was looking for. I see there are scripts for getting that working on various GPT models so it is correct.

@pvlr1788 Год назад

@@MattJonesYT it should work for every model, as long as you have enough cuda memory. In case of 7B model, you probably need some top-tier GPU to inference a batch bigger than 1.

@sharathgilla4412 Год назад

can someone help me out! my issue is I am trying to fine tune dolly V2 using above method but im getting output which it was giving before fine tuning in the video, Im not getting single response as output If anyone faced this issue and fixed it please let me know, do i need to change any config or model ? suggestions are welcome! thanks

@safihaider6715 Год назад

I am getting error while executing trainer.run() saying: "can't copy out of meta tensor, no data!"

@ShaileshPatel-s1p Год назад

I have two sample dataset like bello 1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...] 2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ] so how can i fine tune above dataset using falcon llm model Please help me

@ghezalahmad Год назад

Thank you so much

@nourghaliaabassi931 Год назад

is the notebook available ?

@AIwithParissan 11 месяцев назад

many thanks , shall we have colab link or file?

@lifeofcode Год назад

I was getting an error from the trainer "paged_adamw_8bit is not a valid optimizer names" though I used the same git urls with commit short hashes as shown in the video for pip install command. I ended up having to clone and install transformers from source to get the proper transformers library with the "paged_adamw_8bit" option.

@venelin_valkov Год назад

Strange, just reran the notebook (without changes) and training started as usual.

@lifeofcode Год назад

I must of messed up my pip install commands somehow though I'm not sure how since I was able to find the commit hash in the GitHub logs. Still pip gave "did not find branch or tag 'e03a9cc' assuming revision or ref" error. Luckily I was able to get past it and everything worked beautifully thank you!

@flaviovitoriano2429 Год назад

Can anyone help me please? i get the following error on the Training Part: IndexError: Invalid key: 78 is out of bounds for size 0

@flaviovitoriano2429 Год назад

The error occur in the following line: trainer.train()

@riyajatar6859 11 месяцев назад

if its assistant model , doesn't it should respond only when human asks the questions to him? here it generate the question and answers on its own.

@Ryan-yj4sd Год назад

Deploying this model as an API endpoint on hugging face currently fails. Do you know how to fix it? RuntimeError(f\"weight {tensor_name} does not exist\") RuntimeError: weight transformer.word_embeddings.weight does not exist "},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}

@odev6764 Год назад

I followed your video but I'm struggling with repeated answer. Only modification I did was not send model to huggingface after trained, and it is repeating end text after . I tried to change dataset to a larger one I have in portuguese, and set it to max_steps=5000 but same issue. could you give me a tip to avoid this repeation like you showed in inference before training?

@zorbat5 Год назад

You should fine tune it, so less data. It is pretrained with a huge amount of data.

@zorbat5 Год назад

Other than that, it's playing around with different parameters. Try to learn how the parameters affect the behaviour. If it doesn't give you the desired result, go to the plain downloaded model en train it again. You'll discover a lot of funny behaviour of the AI with different settings. Also, the parameters are sensitive so keep that in mind. Don't change too much, take it slow.

@pawancreation2311 Год назад

Hi, I'm struggling with the same issue from 2 days, I have used falcon sharded version and fine tunned it with 2000 custom QA dataset, developed by me. Answer coming is this : How JP Morgan help me? : JP Morgan helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. Can you please suggest what to do as you can see clearly that text is repeating. Please help me 🙏

@pawancreation2311 Год назад

@@zorbat5please help me what can I do please 😭

@zorbat5 Год назад

@@pawancreation2311 Play around, learn what everything does and feel how the AI reacts to certain parameters or finetunes. Oh, read some books abount machine learning to get a better understanding.

@RoyRajjyoti Год назад

great video Venelin. I tried to implement qlora using your code but I am getting this error "RuntimeError: unscale_() has already been called on this optimizer since the last update(). "

@LifeTravelerAmmu Год назад

where you can get the code ? ..... are you typing manually ??

@kaihaoliu7869 Год назад

I have that too, how did you solve it

@IchSan-jx5eg Год назад

@@kaihaoliu7869 I have to install transformers==4.30.1 instead of newest dev transformers to get rid the error.

@gokhanersoz5239 Год назад

T4 enough for tranining ?

@venelin_valkov Год назад

The QLoRA adapter is trained using T4, yes!

@oncelscu8089 Год назад

abi ben bu LLM islerine yeni girdim de bana yardimci olabilir misin birkac soru sorsam

@gokhanersoz5239 Год назад

@@oncelscu8089 elbette

@Purulence-bw7nt Год назад

Hi bro. Amazing tutorial. I am getting this error: "ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`question` in this case) have excessive nesting (inputs type `list` where type `int` is expected)." I tried suggested fixed from huggingface and github but can't solve the issue. Any idea how to fix it?

@Purulence-bw7nt Год назад

@Pranjal Yadav Thanks for replying.I am following the code line by line. I have tried it on the same dataset he is using. Still getting the same error. Any idea?

@gokhanersoz5239 Год назад

@@Purulence-bw7nt solve problem ?

@Purulence-bw7nt Год назад

@@gokhanersoz5239 No, I couldn't solve it. Have you solved it?

@gokhanersoz5239 Год назад

@@Purulence-bw7nt No, I couldn’t solve it, I did the 8-bit version for opt without including the same method 4 bits. However, with the newly received updates, there have been changes and different errors occur. opt does not work in the codes I write.

@Purulence-bw7nt Год назад

@@gokhanersoz5239 I tried with other optimizers, it fixes the optimizers issue but not sure about the performance since I am not able to start the training process and keep getting the ValueError no matter what I do..