So informative and easy to follow. I love this. Thank you so much for taking the time to create this video. It's so important to know how the concepts we learn in class can be applied in real life. This has changed everything for me. Thank you again.
Sir, I've seen almost all of these concepts painfully "explained" in many different ways, but never have I seen them presented as elegantly and intuitively! Excellent video!
Most of us who have been watching your videos are changed forever. We are convinced now that there are better ways to teach machine learning and your way is one of the better ways. Thanks
No doubt. It is definitely an excellent tutorial, and give a reasonable answer of why the weights in the hidden layer is the embedding of a movie or a person. Thanks a lot.
Actually I do still have a question after this video: How do we know how many features we are supposed to have? i.e. how were you able to decide the factorized metrices are 2 x N and M x 2? Does it mean you might end up getting a feature that is a combination of multiple "actual features" and you need to further break it down?
That is something which people experiment with by seeing what gives them the best result. In general, people experiment with values proportional to the logarithm of number of unique items.
I was driven here after reading a chapter on RGA's book where the mention "collaborative filtering". I was curious and decided to learn more about it. I would like to know though, what computer language is generally used to achieve this? Thank you for the very simple and fun explanation.
This is a work of art. Never thought matrix factorization could be explained so effortlessly yet so clearly. You have helped me a lot with this sir! Thank You, God bless you!
Given a new movie M6, how do we assign feature values for the movie? Related question is- given a new person E, how do we come up with their interest in different features?
Luis! You are a fantastic teacher! Not everyone can explain complicated concepts in a way that every body understands. Your teaching style shows the depth of your knowledge! Thank you!
I find your teaching method not only to be great but also very valuable to motivate young people to take up Machine Learning. You could make it even better by also relating it to the math (Linear Algebra, Calculus, Probability) in a more familiar form. Make sure that anyone teaching and learning ML in a college environment will be aware of your videos. Great stuff.
Thank you for the video. I just wonder if I need a matrix with numbers all filled up for the training and I test the trained model on some sparse test matrix?
in your example, we have two latent factors, so how do we know which one should increase which one should decrease to reduce the error, it seem like you have to increase/decrease both of them at same time
Hi Serrano, A suggestion please. Before walking through a detailed example, please first introduce the overall concept/algorithm/intuition, and, the content/agenda. First tell the learner what they would expect to see/learn, then start teaching them. Thanks for all the useful videos!
I think your students don't know the sentence of falling in exam.one doubt 25:15 I didn't understand how exactly we change those values any ratio of features should be followed?
Great video Luis :) I have one question though that how do we decide the number of latent of latent features and what are the trades off using high/low number of latent features. Thanks
hi Luis, the Yannet github code, it is very hard to understand how the factorization is done for a beginner who has not previous education on word embedding and CNN pytorch. Is there any other code or tutorials you can recommend which shows how the factorization is done? The best will be if you can make one for word2vec !
The explanation for gradient descent was great, but I'm a little confused about the 25:00 minute part. In the matrix, the (1,1) element is 1.44, but the actual value is 3. So, we need to increase something. It could be [f1][m1], [f2][m1], [A][f1], or [B][f2]. How do we decide which one to increase? And by increasing which value and by what factor can we get accurate results? Increasing a single value or multiple values can potentially bring us closer to the answer. If anyone has an answer for this doubt, please clarify. I'm curious to know.
Hey sorry for knit-picking but at 17:01, the red triangle would have transposed shape i.e. greater height(2000 users) than width (1000 movies) !! Great video though !! Please make one on Gaussian Mixture models.
Do you think the dot product calculation assumes that a low rating (of 1 on 5) will not decrease their overall rating for the movie? (I.e. the calculation only adds the preferences & if Person A doesn't like Action, their rating for a comedy + action movie will equal that of only a comedy movie and the overall rating for a movie will not be reduced because they dislike action). Is this a limitation of the dot product calculation? How do you think Netflix takes this into account?
you made whole bunches of other contents about CF or Matrix Factorization boring and meaningless. Thank you so much for this incredibly plain explanation.
If NMF is a type of dimensionality reduction/unsupervised algorithm....then I thought unsupervised programs don't have an error function and therefore do not need an optimization function like "gradient descent" to correct itself and reduce the error function??
hi I'm curious about why at around 2:22 the table isn't realistic? Why you say that there's similarity between them? How do I observe the similarity? I thought middle and right table are both fine... And tks! it's a great vd :)
Not too sure how realistic are the data in the 5x4 matrix, but not all users who have watched a movie would give a rating, and not every user would have watched every single movie. How do we deal with such disparity? I don't think we can just assign zeros right?
Great explanation, you seem to understand the concept very clearly. Subscribed immediately! any videos on expectation maximization, svd, dimensionality reduction ? or resources that you liked most ?
Nicely explained. Small nit: you say square to avoid ambiguity between positive or negative which is a misleading simplification. The reason to do that is to avoid the errors from canceling each other out when you add them up for all ratings. That is indeed the step you show next so easy to add an accurate explanation
Hi prof. Thank you for an amazing lecture, but can you tell me how can i deal with cold start problems like if the user is new and don’t have any info or the movie is new?
this video ends way too early. So when you're training your model on sparse data, what do you do with the unrated movies? Do you treat them as zeros? Do you exclude them from your error function?
are we giving features of users and movies as input or are they extracted by MF algo. itself? During gradient descent, is the algo. learning weights to each of the features or the algo changes the features, as shown in the video?
The instructor does an excellent job of breaking down concepts and explaining them step by step in a way that is easy to understand. I appreciate the time and effort put into creating such an informative and well-presented video. Thank you for sharing your knowledge with us.
One issue I see right away is that this solution doesn't scale. Netflix for instance has way more movies and way more users than in this table and the number of calculations increases exponentially as those 2 dependencies increases.
elegantly explained. like the description very friendly introduction. I was struggling to see how matrix factorization plays role in recommendation system. no I got it Thanks
WooooW! That has been as simple as possible! If person understands something, he can be able to explain it even to the child - I mean level of understanding is amazing! Thank you!
How will you decide the number of features in general? There will be a technique to identify the optimum number of features? Can you suggest some of the algorithms?