Python Sentiment Analysis Project with NLTK and 🤗 Transformers. Classify Amazon Reviews!!

Rob Mulla

Подписаться 182 тыс.

Просмотров 379 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 414

@nixonsebastian2892 2 года назад

great content, this deserves a million views... {'roberta_neg': 0, 'roberta_neu': 0, 'roberta_pos': 100}😀

@robmulla 2 года назад

Haha. Best comment! Pinned.

@xBaphometHx Год назад

Pos should be 1, since the maximum value is 1. lol

@48-tarunsalgotra81 Год назад

@@robmulla plz give ur what's app no

@smi14172 9 месяцев назад

Good one!!😅

@Thikondrius Год назад

I don't often left comments on youtube but, finally someone that explains everything from scratch...I am a JS developer. And it's really cool your that you explain every piece of code. That really helped, I was able to understand everything.

@robmulla Год назад

Hey! I really apprecaite this comment. Thanks so muc.

@SaurabhSingh-oi5ev Год назад

Your videos like gem to me learned a lot your use of modules packages are like cherry on cake. Currently I'm working as an Jr. Data scientist in KPMG but man oh man you taught me many things thank you 😊 🙏

@robmulla Год назад

Great to hear you enjoyed the video. Data science is a never ending learning journey for all of us!

@IndianHacker-hisBest Год назад

Bro, I just need to talk to u. I wanted to ask few questions regarding the profile you are working on. I have secured a job with Deloitte but want to switch to KPMG (Gurgaon).

@Nitesh717 2 года назад

Hey brother , you just provided the best NLP sentiment project , your channel deserve million+ subscriber , nd now I am just one new subscriber now to reach you there

@robmulla 2 года назад

Thank you so much 😀

@dgr8a1 Год назад

You are my newly found Python mentor. Good content Rob

@robmulla Год назад

Happy to be! There are a lot of good channels out there.

@juan.o.p. 2 года назад

Really interesting video. I've been following a lot of your tutorials lately and I must say that I really like the way you explain things, it's so easy to understand and follow along. Thank you!

@robmulla 2 года назад

Thanks so much for the feedback Juan. It's always hard to tell when I'm recording these if they are any good, so it's great to hear that it is helpful to you.

@louie0187 Год назад

This may be the test tutorial on any language/library/app I have ever watched. One part, very concise and well explained. Thank you.

@robmulla Год назад

Glad it was helpful! This comment makes me really happy and excited to make more tutorials!

@bazoo513 Год назад

More of an appetite wetter. to make any use of it, I have to learn Python first 😀 But then, that's valuable by itself.

@sootybuu2963 2 года назад

This was a good tutorial. I'm trying to get my feet wet in data analytics and found myself overwhelmed while trying to read the NLTK documentation, so thanks for the structured guidance. I'm working on analyzing sentiment across a dataset I've gathered myself, so I wasn't following along in kaggle and hit a hiccup as AutoModelForSequenceClassification requires pytorch and I initialized a python 3.10 environment. Oopsy poopsy. All the same, you made my headache significantly less daunting. Thank you. :)

@robmulla 2 года назад

Thanks so much. I’m glad it helped you get started with NLTK it can be a lot easier when you see it in action once. Setting up an environment that works with all the packages can also sometimes be frustrating so I can relate!

@farhadnikhashemi8681 Год назад

Thanks for such a wonderful tutorial. I used your shared data on my own with Google Collab and worked so well. Just I had to download a few more libraries for tokenization. Wonderful content and I truly enjoyed it.

@mateusbalotin7247 2 года назад

Amazing content man! Your channel and videos deserve a lot more attention. Hope you have an amazing week!!

@robmulla 2 года назад

Thanks so much. I really appreciate the feedback. Please consider sharing the video with anyone else you think might learn from it.

@davv02 Год назад

just did all of that as a thesis by myself without knowing you made a video about it lol, luckily I've used a different Bert model from hug face at least. Nice video btw!

@robmulla Год назад

Thanks!

@AndrewSeywright Год назад

Thank you so much for this step by step process it has opened up all sorts of new analysis opportunities for our customer insights. Really well explained and easy to follow

@atharvpatawar8346 Год назад

Huge thank you to you!!! I recently participated in a ML hackathon and they had sentiment analysis as one of their problem statements. I had watched your video prior to the competition and used hugging face whereas everyone else used the standard vader. I ended up getting the highest accuracy and placed first, all in my second year of engineering. Genuinely, can’t thank you enough for the information! Team random_state42

@mohammedmehdi1940 Год назад

Mil gaya tu yaha

@robmulla Год назад

This is so awesome! Thanks for sharing. I posted a screenshot of your comment on twitter, hope that's ok!

@bhaumik3118 Год назад

Btw huge fan of your statistics' notes Mr. Patawar, didn't expect to find you here.

@mohammedmehdi1940 Год назад

@@bhaumik3118 i also study statistics from mr patawar

@TANISHQTHUSE 7 месяцев назад

nice man

@thisisvazqz Месяц назад

I've just recently found myself interested in Computer Vision and NLP and I've finally gotten to the right content creators, this video absolutely rocks! And I fouind it 2 years late, I wonder how far are you now in this topic, if ever you come back to this comment section could I ask how did you get so experienced in this topic and how did you learn how to tackle all this problems? Thank you!

@ibrahimkhanjabarkhail 5 месяцев назад

Just completed it. I really enjoyed working on it. Your way of teaching is just awesome!

@pavlostsoukias8147 2 года назад

Rob, you are the Best! Thank you for all the quality content you are uploading! Greetings from Greece!

@robmulla 2 года назад

Thanks so much Pavlos for watching. Sending a 💙 to Greece.

@brindhaganesan3580 Год назад

I’m so glad I found this channel!!

@robmulla Год назад

Me too!

@kmkushad Год назад

Thanks for the video, we have a school project to do anything coding related and while my classmates are using scratch I wanted to do something flashier, and some kind of language analysis seemed the way to go. I'll use this video as inspiration.

@robmulla Год назад

I love it! Good luck on your project !

@techingenius2540 Год назад

insane

@robmulla Год назад

@@techingenius2540 in the membrane?

@798185xz 3 месяца назад

Who are you? My saver! I was asked to conduct a sentiment analysis on reviews from my internship. I was doing computer vision at the graduate school. New to NLP. Thanks God.

@SuperMjJang Год назад

I've watched bunch of ML videos and you are THE TOP! 👍👍👍

@TugelaCo Год назад

I rarely comment on YT videos but this is amazing! +1 subscriber!

@robmulla Год назад

That really means a lot to me. Thanks for leaving a comment.

@Curious_Citizen0 Год назад

Pls make more such videos, that was great. I am a data engineer and wants to move to Data Science, please make videos for guidance also. Love from India

@robmulla Год назад

I will! Hope this video was helpful for you in your journey into data science.

@ColaWen 6 месяцев назад

Awesome! I am shocked that everything is so efficient and amazing. THANKS!

@robmulla 6 месяцев назад

Glad it was helpful! Share the video with friends.

@priyanshnegi03 Год назад

Really great, helped me a lot in my project!

@robmulla Год назад

Glad it helped. Thanks for watching.

@stevebim000 2 года назад

Extremly useful, super easy to understand! Thank you so much for a great and valuable video !!

@robmulla 2 года назад

Really appreciate the feedback. Comments like this make me want to keep making more videos!

@vinitkumarpatel1030 4 месяца назад

Very good explanation . Thanks a lot❤❤

@analysis_maestro_taha Год назад

Thank you very much for this video. I'm new to the field of Data Analysis and related disciplines so this sentimental analysis project is pretty insightful for me.

@robmulla Год назад

Glad you found it helpful

@jstello 2 года назад

how you don't have 100k subs, defeats me.

@robmulla 2 года назад

Hah. Thanks Juan. Maybe someday 😊

@naderbazyari2 Год назад

I am so happy to have discovered your channel. Many thanks friend.

@abhishekpadmanabhan3945 6 месяцев назад

Excellent video, started coding with chatgpt, and this adds a new layer of info , thank you mate :) Subd

@srishtikaranth Год назад

i cannot thank you enough , you saved my 6th semester

@evansala7814 7 месяцев назад

Great video. Your explanations were very clear and concise and easy to follow.

@ChitranshThakurM22AI543 2 месяца назад

00:01 In today's video, we'll explore sentiment analysis on Amazon reviews using traditional and more complex models. 02:26 Importing and reading data for sentiment analysis 07:28 Tokenization and part of speech tagging in NLTK 09:55 Introduction to VADER for sentiment analysis 15:12 Looping through Amazon review data to calculate polarity scores. 17:33 Perform sentiment analysis with NLTK and 🤗 Transformers 22:04 Explains the positive, neutral, and negative sentiments in Amazon reviews 24:24 Transformer-based deep learning models from Hugging Face are easy to use and powerful 28:45 Introduction to sentiment analysis with NLTK and Transformers 31:02 Running sentiment analysis on text using Vader and Roberta 35:28 Comparing vader and roberta sentiment analysis scores using seaborn's pair plot. 37:45 Vader model less confident compared to Roberta model 41:59 Hugging Face Transformers makes sentiment analysis simple and efficient 44:09 Explored models and ran sentiment analysis on Amazon reviews. Crafted by Merlin AI.

@patrickonodje1428 Год назад

I founf this video immensely helpful Rob Thanks

@robmulla Год назад

So glad you found it helpful!!

@ademhilmibozkurt7085 Год назад

What a video! I lovee this. Please keep continue this content. Greetings

@robmulla Год назад

Thank you! Will do, Adem!

@chrisogonas Год назад

Great resource! Thanks Rob.

@robmulla Год назад

Glad you liked it! Thanks for watching.

@daredevilxrage Год назад

The huggingface model , should it require any preliminary dataset while we are importing it?

@astitwapanwar9621 Год назад

dude in 26:03 while writing the pertained model from hugging face it throwing an error. "Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. " and my connection is very good I had run this around 40 times with good connection and still throwing that error and also changed the model from hugging face please help me on this

@robmulla Год назад

You might want to check and make sure the source hasn't changed from the hugging face site. They might have changed this specific model and your refrence might need to be updated.

@dailypolyglot2815 Год назад

Had the same problem. Just solved it. Unlike aveage laptop, Kaggle notebook is not connected to internet. To get an internet access with your Kaggle notebook you need to go through a phone verification. Look for the notebook option menu on the right side.

@pythonicd1239 Год назад

@@dailypolyglot2815 thank you so much!

@sebastianbenitez4401 2 года назад

thank you for this content! Great quality! Now subscribed!

@robmulla 2 года назад

Thanks so much for watching!

@ngominhhieu6602 6 месяцев назад

A great video! Many thanks for your valuable content.❤

@666rony Год назад

crystal clear explanation thanks my friend

@robmulla Год назад

Glad you liked it!

@timdentry9754 Год назад

One of the best tutorials on Vader and the Huggingface Transformers I have seen. One question I had: How is the confidence score calculated on the Pipeline model and is there a way to evaluate the model's performance on these calculations?

@robmulla Год назад

Thanks so much for the feedback. Glad you found it helpful. Evaluating the model performance is a bit tricky without ground truth labels. The output of the Pipeline model is essentially the probability the model predicts of each class given the dataset it was trained on. Check out the actual model description on the huggingface site here along with the noted limitations: huggingface.co/distilbert-base-uncased-finetuned-sst-2-english Specifically this part is interesting: ``` Based on a few experimentations, we observed that this model could produce biased predictions that target underrepresented populations. For instance, for sentences like This film was filmed in COUNTRY, this binary classification model will give radically different probabilities for the positive label depending on the country (0.89 if the country is France, but 0.08 if the country is Afghanistan) when nothing in the input indicates such a strong semantic shift. In this colab, Aurélien Géron made an interesting map plotting these probabilities for each country. ```

@timdentry9754 Год назад

@@robmulla FWIW - I reached out to the creator of this and what I was told is that the score is calculated using the activation function after the final layer of the neural net. It is used to determine polarity (and is not a confidence score). The model returns an array with the score for each polarity, and the larger is the prediction. The values will always be positive, regardless of the actual sentiment class tagged to the text. This is unlike Vader's model which provides a composite polarity score that could be a positive or negative float based on the inferred sentiment (positive, negative, neutral).

@robmulla Год назад

@@timdentry9754 thanks for clarifying. Cool that you got a response from the creator!

@ryrylc Год назад

Awesome video. Would be great to see you follow the sentiment analysis with a topic analysis. I’ve seen a few different options out there (LDA, Top2Vec and BERTopic), but would love to see your take on it.

@robmulla Год назад

Great suggestion! I'll keep that in mind for future videos.

@GaurangDave Год назад

@@robmulla Looking forward to that!! :)

@rishirajmathur07 Год назад

Great content. Please do more content model which solves attrition prediction for org. Very complex subject because its hard to find already made models on such topics. It would be great help if you can make something attrition prediction model with variables more than 45-50.

@karthiksheggoju738 11 месяцев назад

I really liked this video a lot, it answered lot of my questions, thanks a lot.

@nishanths5724 5 месяцев назад

24:45 the hugging face model is not laoding properly

@DailyVibz 4 месяца назад

WOW! Help me learn some Python of this level ! i am now at 0. learning to install it.

@deepeshrajak3407 Год назад

your content is goldmine

@robmulla Год назад

Thank you sir! Share the goldmine with others!

@usamaarif5763 4 месяца назад

Thanks for this video, it was descriptive, well structured and well explained. I have two questions and I would appreciate if you can give your opinion and guidence on that. 1. At the end of the day star reviews and sentiment are giving the same results so how can we justify going through all this process when we already have a very good indication of user sentiment based on the star reviews. 2. How can we get the strength and weakness of the product based on the reviews using the sentiment analysis.

@MuhammadHanif-tj3dr 2 месяца назад

thank you sir. you are my savior

@manasghosh3709 4 месяца назад

Excellent explanation and material. Thank you for your efforts in making learning enjoyable. A brief query about reviews that are negative (5 stars) and positive (1 stars), where the algorithm is unable to forecast the relevancy score. Regarding these kinds of situations, how would you advise handling them??

@analyticswithadam Год назад

This is a great video, thanks a lot.

@robmulla Год назад

Glad you like it. Thanks for watching

@ahmadnawaz3683 Год назад

Rob you are the best. Hands Down mate.

@jenniferchi2117 Год назад

Thank you so much for this video tutorial! I wanted to ask if you created the Amazon review dataset from scratch or was it already pre-made from somewhere else?

@fabricembida4526 8 месяцев назад

Good, very good video! You cannot imagine how valuable this kind of video is for someone like me who is trying to transition to data science...

@CaribouDataScience Год назад

Very interesting!!

@robmulla Год назад

Thanks!

@rajatshukla2605 Год назад

Extremely helpful! Thanks a bunch!

@osmanson8212 Год назад

abi eline koluna sağlık çok güzel olmuş. türkçe karakterleri cozememdik

@robmulla Год назад

Thanks?!

@thuhuong-it0107 Год назад

great!! i hope you will create video more than!! tkssssssssss

@robmulla Год назад

Thank you, I will. I appreciate you watching.

@OnLyhereAlone Год назад

@robmulla, great presentation but I have looked through videos on your channel, it appears you have not done one on finetunning a BERT model with custom dataset. I am particularly wanting to learn how you would finetune a BERT model for multiclass text classification, maybe on Google collab. I think many of us subscribers would love it. Thanks.

@Midhun938 Год назад

Love from India ♥️

@robmulla Год назад

Thanks! ❤️

@blanka_herceg Год назад

This video was genius and very helpful thank you

@spicytuna08 Год назад

wow. speechless. both you and ml.

@sdsquiresful 5 месяцев назад

Both the VADER and ROBERTA model struggled with sentences with more context. For instance, both rated the sentence "I have had better in the past. It works well enough, but temper your expectations." as overwhelmingly positive. Are there ways to capture that context?

@sudurimabanerjee4612 5 месяцев назад

Thanks for the video. Very well explained. Is there any token limit for the transformer based Roberta model ?

@VaibhavSharma-c2s 7 месяцев назад

I can't find a link to the data-set used.

@francofmm 7 месяцев назад

New viewer and sub!! great work!!!

@mohammedmehdi1940 Год назад

Thankyou

@robmulla Год назад

You’re welcome 😊

@Vision267 4 месяца назад

What is the purpose of sentiment analysis. What are you going to achieve or solve

@siddharthuzumaki Год назад

23:42 Step 3. Roberta Pretrained Model. RoBERTa base sentiment I am getting a value error. That is we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on. But My internet connection is good. What can I do about it?

@lohithburra5353 10 месяцев назад

yeah im getting the same problem ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

@N1KLAZ96 2 года назад

Useful video, thanks! You show how tokenizing works, but then you don't use it when you perform the VADER sentiment analysis. Why don't you do that?

@robmulla 2 года назад

Thanks for the feedback Niklas. You are correct that VADER handles the parsing of text and assignment of sentiment per word so we don't have to tokenize like with the transformer model. Check out the source code for VADER and it might make a little more sense - it handles specific cases like if words should "boost" the intensity of the sentiment and/or specific idioms: www.nltk.org/_modules/nltk/sentiment/vader.html

@vikasmishra4385 Год назад

Hi Rob, I am getting SSL error. Not sure why this is coming earlier it was working fine any idea why this error turning up.

@gangxaaku 2 года назад

Top-notch 🔥 !!

@robmulla 2 года назад

Thanks Akshat!

@ShahZ Год назад

@Rob, this another one of your masterpiece. Almost 300 comments and counting. How about a refresher on a newer Deep Learning Model :)

@MehakFatima-mx1ix Год назад

Amazing video! One question though. Initially we tokenized the data, found their part of speech and then grouped them into entities. However the vader and roberta model were ran on the raw example. does it mean that data cleaning/manipulation like dropping stop words etc isnt required for the models or did i understand it incorrectly?

@prithviiyer 10 месяцев назад

from what i understood, the vader model automatically doesn't include stop words so just by using it it gets rid of them.

@pythonicd1239 Год назад

26:00 if youre getting wn error here get phone verified (check notebook options on the right) and turn internet on

@dhaaraniselvam225 Год назад

Thanks from June 2023😂

@pythonicd1239 5 месяцев назад

@@dhaaraniselvam225 you're welcome from April 2024 :)

@New_in_AI 4 месяца назад

This is super cool, I love it❤. I'm also a youtuber with the new channel about AI and tech reviews. I will be watching your content.

@vinayakmahadevan7416 Год назад

Hey Rob , I was trying to execute the code where you extract the model trained on twitter comments but I keep getting the error "Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on." even though I am connected to the Internet . Could you please help me out ?

@robmulla Год назад

Hmm. It might be because hugging face is down. I’m not sure you could post an issue on their GitHub. Good luck.

@SapnaSingh-rw1py Год назад

facing the sm prblm!!! could you pls help me out

@nurones3468 10 месяцев назад

did you find any solution? same problem for me

@jainishsavaliya317 Месяц назад

I face thhe same issue, Its maybe because of the internet connection turn on internet for Kaggle..first you have to verify with your phone number.

@rohitkochikkatfrancis Год назад

What does ** mean in the code : Output = model(**encoded_text)

@mohan250s 2 года назад

you are awesome bro

@robmulla 2 года назад

No, YOU are awesome. Thanks for watching.

@PrincyyMangla 10 месяцев назад

from where we get the dataset

@kimnhunguyent1489 Год назад

Hi, thank you for the amazing video. Your presentation was informative and insightful. Looking forward to your future content! Btw, I want to ask how can I save my expected result, it seems like I had a good training and dont want to keep going. What should I do in this situation ? Thank you

@jimmyxrs 22 дня назад

I'm having an issue running the tokenizer. Some goggling suggests downloading 'punkt' which i have done but still not working...Anyone experience this? I restarting my notebook from scratch after installing punkt and even ran code to check its installed which it says it is.

@dailyuploads3959 Месяц назад

Sir I know pandas and data cleaning.But there is a alot of models in data science. which model I learn. My nitche in data science is Sales and marketing.Give me Some tips thanks.

@marcodigennarobari 4 месяца назад

great stuff!!

@andreascalenghe8068 Год назад

Great content, thanks

@baneledludlu7983 Год назад

Great content, wow!😍,how on earth did you master all that😅.!

@robmulla Год назад

Thanks for the feedback. I'm still learning every day just like you. The great part about data science is that there is always something new to master.

Год назад

Nice work

@robmulla Год назад

Thanks for watching!

@huansun5384 Год назад

Very well explained video and clear guidance! I have a question about the preprocessing part of the text before putting it into the tqdm sia loop, do we directly put the raw data into it, or do we do the tokenize, remove stop words and stuff first, and then go for the sentiment analysis? Looking forward to your reply!

@robmulla Год назад

Hey Huan! Glad you found the video helpful. I'm not sure about the loop you are referring to but typically the text needs to be tokenized, but depending on the model it may handle that within the predict function. Hope that helps.

@huansun5384 Год назад

@@robmulla Hi Medallion, got it and that makes sense, thanks for the clarification!

@chikezieigwebuike2189 Год назад

How do i take the binary sentiments and give more specific emotions? For example, POSITIVE {Happiness, surprise, Amusement, Contentment, etc.}

@HrisavBhowmick Год назад

Nice video! Now instead of sentences suppose there were paragraphs . What will be your approach to find sentiment of paragraphs?

@navaneeths4694 Год назад

Great content, Really loved the explanation. I'm new to sentiment analysis but was wondering this : My objective is to score a set of reviews online of products, so shouldn't i first do a set of text pre-processing like normalization, spell check , lemmatization, tokenization before feeding each sentence into the pre-trained transformers model ? . How much of a difference would this cause in accuracy of predictions ?

@robmulla Год назад

Stoaked you enjoyed it Navaneeth! This video only scratches the surface. The tokenization and preprocessing of the text is usually built into the model pipeline and would depend on the model you are using. I'm not sure abuot how it would impact the accuracy but for instance VADER I believe stop words are removed. Worth looking into for sure!

@NisaRoy-jo2wi 6 месяцев назад

Great content.thank u

@dynahmhyte 8 месяцев назад

I am confuse with the my id = ['ID'] part. where did that come from?

@muslumyildiz5694 2 года назад

you are awesome.. thanks a lot..

@robmulla 2 года назад

Thanks for watching. Share with a friend!

@shambhavisingh4129 Год назад

Great Video! Just wanted to know is there a way to find the performance of our model in terms of confusion matrix, accuracy and precision score? Thank you

@robmulla Год назад

Great question. You can read about the model evaluation on the huggingface model card. It’s hard to properly evaluate independently because it requires a labeled dataset.

@shambhavisingh4129 9 месяцев назад

@@robmulla Thank you