is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
when adding new special token like and shouldnt you add that tokens to the tokenizer, resize the embedding layer of the model and finetune it? I think this should help the model during the training but also increase the number of trainable paramenters.
Really nice! Thanks for the clearance of the explanation! I wonder, what is the loss function's input here? What is there being compared? Is this self-supervised? So opaque!
Great video, and very interesting if you want to find tune with your own dataset 👍 a pity that the response took a long time… any idea how to get it faster?
is this way to finetune Falcon only or any OS model? also, is it possbile to finetune a model to pickup a new langugae ? like it never trained on french now it can answer french questions ?
Excelente video! I need to configure and train a local gpt for chat with SQL database, which one is the better option for fine tunning with single GPU for that?
I watch all of your videos, they are wonderful. This one is BY FAR my fav. I know it must have taken a lot of time but THANK YOU so much for doing it! It is so thorough, can we do same thing with MTP-7B?
@@tadificilaxalogin Idk what Im doing wrong here but I have tried to reply to this 4 times and after a day or so it gets removed... It does not work with mtp-7b
@@TailorJohnson-l5y Thanks !! I have had progress with falcon 40b and redpajama. Unfortunately, it seems to be difficult to use this algorithm with more than one GPU with. Have you set your prompt style for training? I am doing these tests now.
Fantastic tutorial. Does the training data need to be in Question/Answer format? Would this work if instead this data was a single large block of text and not as structured? Do the models need to be on the Hugging Face servers for inference?
@@enggm.alimirzashortclipswh6010 So there's no concept of something like "unsupervised fine tuning"? If I wanted to adapt a LLM on emails I've sent to sound more like me, I would not want to train from scratch would I?
Hello, Great video so far. Let me ask some questions here: 1. What should I do if my training loss is not decrease consistently (sometimes up, sometimes down) ? 2. How to use multiple GPU? I always get OOM if I use Falcon-40B, so I rented 2 GPUs in cloud provider. Unfortunatelly, it ran just for 1 GPU.
Hello, since you are very good can you explain two simple things to me? 1- why do Assistants find less than half of what they have in the file? Example: search for Julius Caesar (it is stored 1000 times, but they only find it 10/20 times) question 2 are there any ggml templates specialized in history? Thanks Claudiio
I get this error: Any idea on how to resolve this: RuntimeError: Expected to mark a variable ready only once. This error is caused by one of the following reasons: 1) Use of a module parameter outside the `forward` function. Please make sure model parameters are not shared across multiple concurrent forward-backward passes. or try to use _set_static_graph() as a workaround if this module graph does not change during training loop.2) Reused parameters in multiple reentrant backward passes. For example, if you use multiple `checkpoint` functions to wrap the same part of your model, it would result in the same set of parameters been used by different reentrant backward passes multiple times, and hence marking a variable ready multiple times. DDP does not support such use cases in default. You can try to use _set_static_graph() as a workaround if your module graph does not change over iterations. Parameter at index 63 has been marked as ready twice. This means that multiple autograd engine hooks have fired for this particular parameter during this iteration. You can set the environment variable TORCH_DISTRIBUTED_DEBUG to either INFO or DETAIL to print parameter names for further debugging.
Hi, thank you for the video! If I want a small model like falcon 7b or other model like t5, to make bots for QA or FAQ, but I need to use and tune for my own language, ex. Portuguese or Spanish. What’s your suggestion? Because I don’t need a large multi language model for this, I think 😅
No idea at the moment, there is still no paper with details on the model. You might try the "quickstart" with the transformers library here: huggingface.co/tiiuae/falcon-7b-instruct
@@venelin_valkov somehow i am unable to paste the URLs of the datasets (tried multiple times :( ).. i have shared a suggestive list in in this google doc and thanks again for the wonderful set of videos. docs.google.com/document/d/1wqCKudZnx0XMsJ8J2n1wfOpG68M9chP_8-zeaU7s53g/edit?usp=sharing
Does anyone knnow how to fine tune a QLoRA over another LoRA on a specific model? There is a LoRA that fine-tunes the original Llama model with a translated and cleaned version of Alpaca dataset for Brazilian Portuguese. I would like to fine-tune another LoRA over that.
I'm facing this error: mat1 and mat2 shapes cannot be multiplied (26x4544 and 1x10614784) while running this codeblock with torch.inference_mode(): outputs = model.generate( input_ids=encoding.input_ids, attention_mask=encoding.attention_mask, generation_config=generation_config, ) Does anyone have any ideas how I could solve this? Not sure if the problem was caused because I'm using 'prepare_model_for_int8_training' instead of 'prepare_model_for_kbit_training" since I got an error of 'cannot import name 'prepare_model_for_kbit_training' from 'peft'' even on the latest version of peft library
I don't get why inference is so slow. It should be at least as fast as the training. It's true that each "generate" means the model does inference multiple times, does beam search etc... but the same thing happens when you train the model. What am I missing?
@@Timotheeee1 ok, I see. You mean that during the training the model DOES NOT beam search. Am I right? It Just tries to minimize cross entropy loss on next token. I guess beam search is not even differentiable...
The Common Crawl dataset (used for this model) contains 40+ languages, so you should be able to use different languages. I haven't tried it myself, though. More info here: commoncrawl.org/ That being said their dataset "RefinedWeb" contains primarily English: huggingface.co/datasets/tiiuae/falcon-refinedweb
I loaded the trained model and it downloaded the whole model again. When I tried generating text according to my use-case with the trained weights, it didn't provide the correct result.
Can you do a video on finetuning a multimodal LLM (Video-LlaMA, LLaVA, or CLIP) with a custom multimodal dataset containing images and texts for relation extraction or a specific task? Can you do it using open-source multimodal LLM and multimodal datasets like video-llama or else so anyone can further their experiments with the help of your tutorial. Can you also talk about how we can boost the performance of the fine-tuned modal using prompt tuning in the same video?
Can a subsequent SFT and RTHF with different, additional or lesser contents change the character, improve, or degrade a GPT model? Can you modify a GPT model?
hey there, how do I create a generative AI chatbox with my own data? let us say I have data regarding a company and I want to create a "chatgpt" kinda thingy which can answer the questions which I have related to that data I have juggled through the internet today and found 1) Data collection 2) Data preprocessing 3) Selecting a pre trained model(cause it is easy than creating one) 4) Fine tuning the model 5) Iteration This is my understanding as of now so basically how do I have preprocess the data? do I have to learn NLP for that?
I followed the code above and got following output return (q * cos) + (rotate_half(q) * sin), (k * cos) + (rotate_half(k) * sin) RuntimeError: The size of tensor a (24) must match the size of tensor b (19) at non-singleton dimension 1 kindly help a newbie, only change I made was removing #device_map="auto" when loading the base model as I have dual gpu and it was throwing error with 8 bit
With CUDA you can launch many threads at the same time for a single kernel to solve a problem. Is there a way to do something similar with GPT models? I asked chatgpt and it basically said the limiting factor would probably be the memory needed for each thread might take up about .5 gb. So for instance, if you have 4 gb free GPU RAM after loading the model you should in theory be able to run 8 queries through the gpu at a time. How would that be done with a local gpt?
As far as I know, if you have free GPU memory, you simply do batched inference, I guess some kind of cuda multi threading takes place there. You can see that training batch size is 1. I guess that bigger batch would cause GPU OOM error.
@@pvlr1788 Thank you!!! "Batched inference" is exactly the term I was looking for. I see there are scripts for getting that working on various GPT models so it is correct.
@@MattJonesYT it should work for every model, as long as you have enough cuda memory. In case of 7B model, you probably need some top-tier GPU to inference a batch bigger than 1.
can someone help me out! my issue is I am trying to fine tune dolly V2 using above method but im getting output which it was giving before fine tuning in the video, Im not getting single response as output If anyone faced this issue and fixed it please let me know, do i need to change any config or model ? suggestions are welcome! thanks
I have two sample dataset like bello 1) [{ "en": "Hello, how are you today?", "fr": "Bonjour, comment ça va aujourd'hui ?" },...] 2) [ { "text": "Ravi is a young man from India who loves panipuri." },... ] so how can i fine tune above dataset using falcon llm model Please help me
I was getting an error from the trainer "paged_adamw_8bit is not a valid optimizer names" though I used the same git urls with commit short hashes as shown in the video for pip install command. I ended up having to clone and install transformers from source to get the proper transformers library with the "paged_adamw_8bit" option.
I must of messed up my pip install commands somehow though I'm not sure how since I was able to find the commit hash in the GitHub logs. Still pip gave "did not find branch or tag 'e03a9cc' assuming revision or ref" error. Luckily I was able to get past it and everything worked beautifully thank you!
Deploying this model as an API endpoint on hugging face currently fails. Do you know how to fix it? RuntimeError(f\"weight {tensor_name} does not exist\") RuntimeError: weight transformer.word_embeddings.weight does not exist "},"target":"text_generation_launcher","span":{"rank":0,"name":"shard-manager"},"spans":[{"rank":0,"name":"shard-manager"}]}
I followed your video but I'm struggling with repeated answer. Only modification I did was not send model to huggingface after trained, and it is repeating end text after . I tried to change dataset to a larger one I have in portuguese, and set it to max_steps=5000 but same issue. could you give me a tip to avoid this repeation like you showed in inference before training?
Other than that, it's playing around with different parameters. Try to learn how the parameters affect the behaviour. If it doesn't give you the desired result, go to the plain downloaded model en train it again. You'll discover a lot of funny behaviour of the AI with different settings. Also, the parameters are sensitive so keep that in mind. Don't change too much, take it slow.
Hi, I'm struggling with the same issue from 2 days, I have used falcon sharded version and fine tunned it with 2000 custom QA dataset, developed by me. Answer coming is this : How JP Morgan help me? : JP Morgan helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. They helped me to understand the market and the opportunities that were available. Can you please suggest what to do as you can see clearly that text is repeating. Please help me 🙏
@@pawancreation2311 Play around, learn what everything does and feel how the AI reacts to certain parameters or finetunes. Oh, read some books abount machine learning to get a better understanding.
great video Venelin. I tried to implement qlora using your code but I am getting this error "RuntimeError: unscale_() has already been called on this optimizer since the last update(). "
Hi bro. Amazing tutorial. I am getting this error: "ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`question` in this case) have excessive nesting (inputs type `list` where type `int` is expected)." I tried suggested fixed from huggingface and github but can't solve the issue. Any idea how to fix it?
@Pranjal Yadav Thanks for replying.I am following the code line by line. I have tried it on the same dataset he is using. Still getting the same error. Any idea?
@@Purulence-bw7nt No, I couldn’t solve it, I did the 8-bit version for opt without including the same method 4 bits. However, with the newly received updates, there have been changes and different errors occur. opt does not work in the codes I write.
@@gokhanersoz5239 I tried with other optimizers, it fixes the optimizers issue but not sure about the performance since I am not able to start the training process and keep getting the ValueError no matter what I do..