Fine tuning LLama 3 LLM for Text Classification of Stock Sentiment using QLoRA

Trade Mamba

Подписаться 2,1 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

12 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 74

@MLAlgoTrader 5 месяцев назад

Code Here including short explanation on how to get dataset. github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb

@andrewbritt3117 5 месяцев назад

Hello, thanks for the really informative walkthough. I was looking to go back through your notebook for further review, however the notebook no longer available from the link.

@MLAlgoTrader 5 месяцев назад

@@andrewbritt3117 github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb

@andrewbritt3117 5 месяцев назад

@@MLAlgoTrader thanks!

@ranxu9473 5 месяцев назад

thanks dude that's very useful for me.

@MLAlgoTrader 5 месяцев назад

@ranxu9473 thank you

@liberate7604 3 месяца назад

Thank you, finally a proper instruction for classification on RU-vid🥇🥇🥇🥇

@MLAlgoTrader 3 месяца назад

Thanks for kind words!!. Feel free to share video if you can lol since RU-vid Algo hates me. I'll do more llm stuff

@Daniel-fm7yh Месяц назад

Finally a legit document classification tutorial - thank you! what a legend!

@MLAlgoTrader Месяц назад

Wow thanks for nice comments ! Share video lol 🤣 I want to make more such videos but am looking for the right topic. I am thinking rlhf or something up for suggestions

@aakashsharma3179 9 дней назад

I have a doubt regarding the modules, is there a methodology for picking those specific ones? can I read more about them somewhere perhaps?? Great video btw. Very well put together and to the point.

@MLAlgoTrader 6 дней назад

Thanks. I am not sure exactly I follow. You want to write a specific example? Thanks

@darklaneanjana4844 2 дня назад

Thank you very much, very helpful.

@MLAlgoTrader 2 дня назад

Thanks. Means a lot to me. To be honest I completely forgot what I did here so I plan to redo it in a medium post :) Thanks!

@am7-p 5 месяцев назад

Once again, thank you for the informative channel and sharing this video.

@MLAlgoTrader 5 месяцев назад

Thanks I thought you guys on average didn't like LLM videos lol. My click through rate is low so it makes me happy you say that.

@am7-p 5 месяцев назад

@@MLAlgoTrader What is click through rate ?

@am7-p 5 месяцев назад

@@MLAlgoTrader Also, please consider that knowing what you are working on helps me to plan for the next steps of my development Currently, I use and pay for OpenAI API, but I do plan for implementing a LLama in my home-lab. Once I start to learn and practice LLama, I will go through your videos again.

@MLAlgoTrader 5 месяцев назад

This was is small like

@MLAlgoTrader 5 месяцев назад

Honestly, it is completely random. My next videos are on sequential bootstrap, implementing a gap trading strategy both with stocks and with options, the dangers of backtesting, and then I also plan to do ib_insnyc for begginers. ...I think it llama 3 8b params works free version of colab for a bit until you get kicked of gpu. There is also this api I used I think you get quite a bit free at first. docs.llama-api.com/quickstart .

@R8man012 3 месяца назад

@MLAlgoTrader Hi, can I have one question for you please? I am trying to use the Llama3 8b model for text classification. I have about 170k records and 11 categories. The maximum accuracy I was able to achieve was 68%. The data is properly preprocessed, I also used for example the Bert and Roberta model where both models had over 90% accuracy. I would expect better results from a model type like Llama3 8b. I used both 8bit and 4bit quantization (both have similar results) and LoRA. I also played with different hyperparameters, but the results were hardly different. Do you think that, in short, this model might be a bad choice for text classification? Couldn't we discuss this somewhere in a private message with more details? Thank you.

@MLAlgoTrader 3 месяца назад

Hey thanks for the message. I just don't have time since I need to move apartments. Roberta is more direct for classification for it so it isn't that surprising.. I will try to get back to you sometime just don't have time at all now

@md.abusayed6789 3 месяца назад

@R8man012 @MLAlgoTrader I am working with a classification problem and using llama3 q-lora. On 10k rows(data) it's performance around 98% accuracy. What I am facing for 10k data it's taking 1 hour (trainable params: 13,639,680 || all params: 7,518,572,544 || trainable%: 0.1814). How do I make it fast and it work for the whole dataset(2.4 million rows) for a reasonable amount of time?

@MLAlgoTrader 3 месяца назад

@@md.abusayed6789 Faster/Stronger GPU or multiple GPUs like the big boys have :)

@최용빈-g3k 2 месяца назад

Thank you so much:) I have a question! when doing fine-tuning with sequence classification head, why don't you use apply_chat_template? Is there any reason?👀

@MLAlgoTrader 2 месяца назад

Hey sorry I didnt' response I didn't have time since I was moving. Let me try to remember what on earth I did and get back to you lol

@licraig7652 3 месяца назад

Thank you so much for the toturial, it's so clear. I'm wondering If I can add some context to each training text, such as some explanation of how to classify different sentiment, I don't know if it works, LLMs like llama hava ablebility to understanding context, Maybe it would help, What's your opinion?

@MLAlgoTrader 3 месяца назад

Do you mean like in the prompt describing more clearly what you want for positive neutral and bearish?

@licraig7652 3 месяца назад

@@MLAlgoTrader Yes, I would add the prompt at the begining of each text, something like "Classify the text messages to 1. positive, explanation: xxxxxxx. Example: xxxxx 2. negative, explanation:xxxx, example:xxxxxx. the message is "Tesla's market cap soared to over $1 trillion ...""

@MLAlgoTrader 3 месяца назад

For some llms it does better even that way before fine tuning but fine tuning makes that less necessary. Check out deeplearning.ai course on llama index he does it similar to what you suggest

@licraig7652 3 месяца назад

@@MLAlgoTrader Thank you so much.❤

@divanabilaramdani8818 3 месяца назад

thank you so much, ive watched the whole video and it helps me alot

@MLAlgoTrader 3 месяца назад

Thanks a lot lol it is my best video haha 😂 Please share if you can .

@michelejoshuamaggini3822 4 месяца назад

Thank you for your video! I have the following question for you: when you're making predictions before fine-tuning the model, are you evaluating the capabilities of the model with a zero-shot learning?

@MLAlgoTrader 4 месяца назад

Not exactly. You more of add a last linear layer then a classification layer so those weights are not yet trained.

@michelejoshuamaggini3822 4 месяца назад

@@MLAlgoTrader in that case how could I implement this linear layer then a classification layer in your code? I'm interested in comparing 0 shot and few shot learning with this model.

@MLAlgoTrader 4 месяца назад

@@michelejoshuamaggini3822 So this automatically adds these layers. To be precise. "The LLaMa Model transformer with a sequence classification head on top (linear layer). LlamaForSequenceClassification uses the last token in order to do the classification, as other causal models (e.g. GPT-2) do. Since it does classification on the last token, it requires to know the position of the last token. If a pad_token_id is defined in the configuration, it finds the last token that is not a padding token in each row. If no pad_token_id is defined, it simply takes the last value in each row of the batch. Since it cannot guess the padding tokens when inputs_embeds are passed instead of input_ids, it does the same (take the last value in each row of the batch). " See huggingface.co/docs/transformers/main/en/model_doc/llama2 I don't have time now, but I can show you in code sometime how you would see it.

@MLAlgoTrader 4 месяца назад

@@michelejoshuamaggini3822 As for the 0 shot classification, that is something like a car is something with 4 wheels and a motorcycle is something with 2 can you please classify car or motorcycle? What I do here is like 0 shot classification for sentiment analysis ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-nMKYuSALmpc.html

@salmakhaled-hn6gw 5 месяцев назад

Thank you so much, it is very informative. Could I ask you when will you provide the notebook you worked on?

@MLAlgoTrader 5 месяцев назад

Yes the delay is cuz I need a notebook t explain how to get the data

@MLAlgoTrader 5 месяцев назад

So I literally was about to share video but I had a bug so needed to restart. Must wait 24 hours due to api limit. So I'll send it 25 hours from now lol!

@MLAlgoTrader 5 месяцев назад

Code: github.com/adidror005/youtube-videos/blob/main/LLAMA_3_Fine_Tuning_for_Sequence_Classification_Actual_Video.ipynb

@salmakhaled-hn6gw 5 месяцев назад

@@MLAlgoTrader Thank you so much🙏

@MLAlgoTrader 5 месяцев назад

@@salmakhaled-hn6gw No problem. There are a few more things I left out hopefully we can cover them in another video like loading the model and merging with QlORA weights. Does the part about getting the data make sense? You need that to run the notebook!

@AfekShusterman-x3f 5 месяцев назад

great video mate! loved it

@MLAlgoTrader 5 месяцев назад

Glad you enjoyed

@khachapuri_ 5 месяцев назад

Is there a way to remove attention-mask from Llama-3 to turn it into a giant BERT (encoder-only transformer)?

@MLAlgoTrader 5 месяцев назад

Being on 0 sleep I'll quote chatgpt and get back to answering you later lol.... Turning Llama-3 into an encoder-only transformer like BERT, by removing the attention mask, is theoretically possible but involves more than just altering the attention mechanism. Here are the steps and considerations for this transformation:Modify Attention Mechanism: In Llama-3, which is presumably an autoregressive transformer like GPT-3, each token can only attend to previous tokens. To make it behave like BERT, you need to allow each token to attend to all other tokens in the sequence. This involves changing the attention mask settings in the transformer's layers.Change Training Objective: BERT uses a masked language model (MLM) training objective where some percentage of the input tokens are masked, and the model predicts these masked tokens. You would need to implement this training objective for the modified Llama-3.Adjust Tokenizer and Inputs: BERT is trained with pairs of sentences as inputs (for tasks like next sentence prediction), and uses special tokens (like [CLS] and [SEP]) to distinguish between sentences. You would need to adapt the tokenizer and data preprocessing steps to accommodate these requirements.Retraining the Model: Even after these modifications, the model would need to be retrained from scratch or fine-tuned extensively on a suitable dataset because the pre-existing weights were optimized for a different architecture and objective.Software and Implementation: You need to ensure that the transformer library you're using supports these customizations. Libraries like Hugging Face Transformers are quite flexible and might be useful for this purpose.This transformation essentially creates a new model, leveraging the architecture of Llama-3 but fundamentally changing its operation and purpose. Such a project would be substantial and complex but interesting from a research and development perspective.

@khachapuri_ 5 месяцев назад

@@MLAlgoTrader Thank you so much, appreciate the response! Since its a classification task it makes sense to remove the mask (make it encoder-only) and retrain the model to another objective function. I was just wondering technically how would you remove the mask from llama-3? and maybe also add a feedforward layer? Is it possible to edit the architecture like that?

@rizmira 2 месяца назад

Hello, thank you so much for this video. Could you please explain how to load the model once it saved please ?

@MLAlgoTrader 2 месяца назад

Hey thanks for nice words! This was so long ago I forget where I have it. Their documentation sucks for this I'll try to find my example but it might take me a few days.

@MLAlgoTrader 2 месяца назад

Hey sorry I said I would get back to you, but moving apartments so might take longer than I expected sorry..

@rizmira 2 месяца назад

@@MLAlgoTraderNo problem, I'll wait! Thank you so much for taking the time to look for it.

@rizmira 2 месяца назад

@@MLAlgoTrader Hello ! I hope everything goes well ! I come back to you to know if you know how soon you can find this, my internship ends in a few days but I still can’t load my registered model. Your help would really help me a lot. Thanks again!

@dariyanagashi8958 5 месяцев назад

Hello! Thank you so much for your tutorial, it is very helpful and easy to follow. I started applying it in on my custom binary dataset, but stumbled on the training step. I get the error with this line of code: labels = inputs.pop("labels").long() KeyError: 'labels' My inputs look like this: ['input_ids', 'attention_mask'] and I don't understand which "labels" are you referring to in that line. If it is not difficult for you, could you explain what it means? I would be most grateful! UPD: I renamed the columns of my dataset to "text" and "labels", and it solved the issue! 😀

@MLAlgoTrader 5 месяцев назад

I will get back to you

@MLAlgoTrader 5 месяцев назад

Hey sorry haven't gotten to this. Haven't forgot I will look this week sometime just overwhelmed.

@dariyanagashi8958 5 месяцев назад

@@MLAlgoTrader hi! I actually updated my comment that I found the workaround for that issue, although I still vaguely understand how it helped. Need to read more documentation, I guess. Anyways, thank you for your tutorial, it helped me with my thesis 😊

@MLAlgoTrader 5 месяцев назад

Wow very happy to hear!!!

@MLAlgoTrader 5 месяцев назад

Your comment made my day. I'll do more videos related to nlp/llm/rag/etc.. soon I hope

@azkarathore4355 4 месяца назад

Can you make video for fine-tuning llama 3 for machine translation task

@MLAlgoTrader 4 месяца назад

Always down for new idea but I don't know if I can get to that soon..I had an idea to do text summarization which can be done in a similar architecture way to machine translation but different metrics of course.

@MLAlgoTrader 4 месяца назад

What I mean is I am willing but probably can't do it in next two months. What languages were you thinking just wondering?

@azkarathore4355 4 месяца назад

@@MLAlgoTrader English to urdu language