Movie Recommendation System With Python And Pandas: Data Project

Подписаться 59 тыс.

Просмотров 66 тыс.

50% 1

In this project walkthrough, we'll learn how to create a movie recommendation system using Jupyter, Python, and Pandas. By the end, we'll be able to type the name of a movie into an input box, and instantly get recommendations for other movies we might like. This is an exciting project that can go into a portfolio, or help you learn.
We'll start with the MovieLens 25M dataset, which contains movie reviews and ratings. Then, we'll build a search engine to find a specific movie title in our data. We'll then be able to create a recommendation engine to recommend specific movies.
You can download the data here - files.grouplens.org/datasets/... .
And you can view the code for this project here - github.com/dataquestio/projec... .
If you enjoyed this tutorial, check out this link bit.ly/3O8MDef for free courses that will help you master data skills.
Chapters
00:00 - Introduction
01:36 - Reading in our movie data with pandas
02:41 - Cleaning movie titles with regex
04:20 - Creating a tfidf matrix
08:21 - Creating a search function
13:10 - Building an interactive search box with Jupyter
18:05 - Reading in movie ratings data
19:29 - Finding users who liked the same movie
25:51 - Finding how much all users like movies
29:06 - Creating a recommendation score
32:02 - Building a recommendation function
33:38 - Creating an interactive recommendation widget
37:05 - Next steps
---------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef

Опубликовано:

10 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 75

@vikasparuchuri Год назад

Here's all of the code for this video - github.com/dataquestio/project-walkthroughs/blob/master/movie_recs/movie_recommendations.ipynb . And you can download the dataset here - files.grouplens.org/datasets/movielens/ml-25m.zip . Enjoy :)

@meditationhealingmusic6550 9 месяцев назад

Thank you Much appreciate it so for walking us through this project . I am very excited to learn every single day Dataquest.

@charlesvictory169 7 месяцев назад

You are too good!!! This was very helpful. I had to subscribe immediately. Thanks so much

@narsimharao8565 Год назад

Feel in love with the tutorial ❤️.

@prityar042 9 месяцев назад

This project was really amazing and I have to say this video is very underrated. I actually shared this with my batch mates they liked it too.

@rajeevmenon1975 2 года назад

Real interesting video Vikas. Really engrossing. Keep coming up with such quality stuff.

@Dataquestio 2 года назад

Thanks, Rajeev!

@kaiserkonok Год назад

Loved this video🔥

@ayeshaabbas8696 Год назад

Thank You So Much Sir. lots of respect ..

@johhnykimsey5180 Год назад

thank you very much it was a great video

@thehiddenguy655 Год назад

Thank You sir this helps me a lot

@nanaphiona4462 2 года назад

Thanks for the inspirations

@nil-xo4ce 2 года назад

sick video 🔥

@doopao Год назад

Vik u r the very best!

@yapwlm913 Год назад

Hi Vik, that is a great demonstration of building a recommendation system. Thank You! But it might be more interesting if the constructed recommendation system is applied to Streamlit as the framework can be more solid I think?

@alirezanorouzi8924 Год назад

thanks for sharing , i use it

@soumyaranjith2951 Год назад

Thank you so much Sir😍🙏🙏🙏

@dayaramd2709 2 года назад

very very good job

@gandiyasasri Месяц назад

It is very good and most useful in our daily life

@shreyam3259 Год назад

Hello, I am learning programming Python by myself. I was wondering if you could tell me, in brief, the overall workflow of this project so it would be easier to understand. (Maybe 5-6 points to summarize). And why we choose this particular method.

@staniherstaniher9300 Год назад

Nice video. Please, can you make a video where you evaluate this model using metrics such as ndcg@, diversity, accuracy...?

@k-popworldwide3282 Год назад

Can someone please explain the part in this built system where the data preprocessing, Train Test Split, Model Fit, and Model Tune have been done?

@swapnilchowdhury3957 Год назад

I have written the code in vscode Jupyter notebook. I am facing a problem because the input and output widgets are not displaying. No text box is appearing. How to solve this issue

@johndiba1321 2 года назад

attempting to apply this lesson and data scraping to create a book recommendation system for my portfolio. should be able to get a nice dataset from goodreads

@Dataquestio 2 года назад

Hi John - I made a video about book recommendations earlier that might help - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-x-alwfgQ-cY.html .

@aishwaryakolte538 8 месяцев назад

In building search box, my recommendations are not changing. It shows same recommendations as for Toy Story. I made the after changes of removing the fixed movie title, yet there was no change in the recommendations. Could you please help for the same?

@yadavvishu1869 11 месяцев назад

I am running the same code in vs code but it only showing two value we put in code input not giving any button or search bar out😢 how to tackle it

@tekinbayrakl7886 Год назад

hi. when we create recommedation score, you said we want big difference between similar and all. Why is that?

@domakondajyothi33 4 месяца назад

this really helped me ...but at the end i got an error as 'list' object has no attribute 'indexing'...so what can i do

@sauravchauhan9280 Год назад

i am building a web app for this but cant figure out what model to save

@vanesszatoke2977 Год назад

Hi! Very good demonstration of building a recommendation system. The best which I have found! I have a question. Is this user-based or item-based collaborative recommendation type what you are doing? Because other videos I checked, there created a kind of user-item matrix and checked the correlation between users or movies according to the type of rec systems (user- or item-based). If I should make my bet I would say it is more like an item-based one, but I am not sure! 😄 Thank you if you are answering!

@Dataquestio Год назад

Hi Vanessa - I would consider this to be a version of user-based collaborative filtering. We have to make some modifications, since we're only passing in a single movie (versus a matrix of our preferences). We're then finding movies that people similar to us liked more than the general population liked.

@LazyLee295 Год назад

Hi, this is the recommender item based on users' rating but can you do the recommender user based on item they like? for exemple if a user have a list anime they like then we recommend based on that list. Thank you for reading my comment.

@ManojYadav-ut7ew 11 месяцев назад

which model is used to create this?

@abdulkareemridwan8762 2 года назад

Lost interest in ML earlier this year..your tutorial was really a turnaround..Really appreciate 🙏

@Dataquestio 2 года назад

That's amazing to hear, Abdulkareem! -Vik

@SoundofSilence1 2 года назад

@@Dataquestio yes Vik is an amazing teacher.

@quizzesya 11 месяцев назад

The type of this recommendation system is content based filtering right?

@mr.random4960 4 месяца назад

Which method is used here? Collaborative or content based?

@soumyaranjith2951 Год назад

I can't import the data into jupiter notebook.When I try to import it, there is an error occuring.

@Rosh__138 6 месяцев назад

Which algorithms are used in this video for building model? Anyone can tell!!??

@khanhtruongphamngoc2246 Месяц назад

how to evaluate the accuracy of the model sir?

@AmIThereYet. Год назад

What algorithm is used here?

@anushkab8867 Год назад

how can i make recommendation system based on genres??

@hiashraful Год назад

How can I build this project in vscode?

@tanishqshivram9419 Год назад

sir i am not getting any output neither am I getting any error . Can you pls help me out?

@asishkottakota3920 Год назад

@Dataquest i am unable to get the widget

@sanika8866 Год назад

how to import dataset as csv??

@chinmoypadhi Год назад

how can we add k-fold cross validation technique to this collaborative filtering model?Any example will be great

@Dataquestio Год назад

You would need to define an error metric, and then label data. Then you could evaluate against the metric. There's an example here with measuring the ranks of NBA MVPs - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3cn1nHlbFVw.html .

@hanweiz84 2 года назад

Appreciate if you can also show how to host this on a web server. Thanks a lot! This is awesome

@Dataquestio 2 года назад

Thanks, Ang! I'll look into doing this for a future video. -Vik

@losmi-iv2qt 6 дней назад

Is this made with Machine Learning since I don't think I heard it being mentioned? Anyways nice tutorial man!

@ayunymoba5974 7 месяцев назад

is this count as hybrid recommendation? cuz there is tf idf cosine simillarity (content based) and also item based filtering (collaborative based)?

@domakondajyothi33 4 месяца назад

ya even i want to know this>>>is this hybrid or colab??

@rahouanitoufik7375 Год назад

How to build this recommendations system drug in Java

@narsimharao8565 Год назад

Also we can make recommendations using knn clusters, so those who like action movie, may get action movie recommended. But we have to do more analysis why users given most rating for them. Am I correct Vik.., please correct me. Just doubt

@Dataquestio Год назад

Hi - you could use knn to segment users, then base predictions on the clusters. You could also use k-nearest neighbors. You probably won't get much over collaborative filtering (the technique here), since they both use similar techniques to find similar users.

@ekeminiben6885 2 месяца назад

Thank you very much sir for this inspiring tutorial. Please I want to build a recommender system, "The aim of this study is to design and implement a Recommender System for clothing styles based on user body type derived from user body measurements." Please can you help with this kind of project or how can I go about it from getting the dataset to completion. Thank you

@Zuthilios Год назад

What environment are you building this in? I was following this tutorial but in gitpod for me, the jupyter widgets aren't behaving. The Text and TextArea widgets don't ever appear, a FloatText widget will sometimes appear, IntSlider and Select widgets will often appear and sometimes none of them will appear. This changes randomly even when making no changes. The output space is always there and working correctly, but I can't seem to find any cause or solution to this issue, it's thrown me off continuing this tutorial. Perhaps it's a version issue, I'll try following your code for the versions you used as a last option.

@Dataquestio Год назад

That's strange - I used JupyterLab on my own computer.

@abex8713 5 месяцев назад

Does it have UI

@Han-ve8uh 2 года назад

1. Why at 12:43 "most similar result is last in list"? According to np.argpartition docs, "The order all elements in the partitions is undefined". You only provided -5 so we are only certain the -5th position is correct but can make no conclusions on -4 to -1. If a sequence of ints was passed to kth parameter instead, then we can be sure of the order of last 5. 2. For this demo we always start with a single movie_id, then do the calculations, feels like some work will be repeated if we change the input movie_id and all of the work is done at inference time. Are there opportunities for caching or precomputing anywhere? 3. Why is a .unique() added at 20:40? That line was focused on movie_id = 1. I assume for each movie, each user will only rate it once? This means given movie_id = 1, all users will already be unique so no need for unique(). If each movie can be rated by a user twice (either with same or different scores), that feels like bad DB design of appending instead of updating. If indeed there were multiple ratings from same user for same movie, we should deduplicate (eg. take latest in time rating) during data preprocessing before any sort of recommendation analysis to prevent hacky fixes like adding .unique() to work around bad data. What do you think?

@Dataquestio 2 года назад

Hi Han, great questions. 1. That's a good point about argpartition. In practice, the results appear to be ordered, so I didn't worry too much about it. For example, exact matches are always the top result. As you mentioned, you could pass in a sequence to get 100% guaranteed ordering. 2. There's a tradeoff between simplicity of the solution (making it easier to teach/demo) and speed of the solution. We're precomputing the tf/idf matrix and other items that are common across all movies. If I was deploying this to a web service, instead of precomputing, I would just cache outputs. So the first generation would be slow, but subsequent searches would hit the cache. This is because precomputing for all of the movies would take a while, and if we wanted to update our algorithm, etc, we'd need to redo all the precomputation. 3. In this case, I added .unique() as a defensive check. I don't think it was necessary, since the data should be unique on movieId/userId pairs. But yes, if we were doing this in production, we would want to deduplicate upfront instead of on the fly for better performance.

@talhajalil8674 11 месяцев назад

I have written the same exact code and when I use "Men 1995" as title to look for similarity I get probability of zero for entire array. Why?

@talhajalil8674 11 месяцев назад

@vikasparuchuri

@user-kb3id2kd2y 11 месяцев назад

sir i want synopsis of this project asap

@Sparkss22 Год назад

this is popularity based or content based???

@Shankara018 Год назад

can i add this project in my portfolio?

@Dataquestio Год назад

You definitely can. I'd recommend following some of the next steps and building the project out a little more on your own, though.

@maglionejm 2 года назад

It would be very interesting to build a web application with Flask for the search engine. Could you show that in your next video? Also, it would be awesome to make an API with the generated model... What do you think? Using pickle perhaps?

@Dataquestio 2 года назад

Hi Juan - I'll take a look at doing this as a part 2 video. You could make an API for sure - I would look into this - www.django-rest-framework.org/ .

@SoundofSilence1 2 года назад

@@Dataquestio it would be really great if you could show us how to build a web app with the same.

@devzaks8912 Год назад

My own concern is how to make an API with the model. Then we can make requests to it with a movie and get recommendations. Please anyone that has done this should let me know 😔