Welcome to my channel on Machine Learning. My plan is to make one or two videos per month to clarify complex topics, dive into the code, offer tips and tricks about TensorFlow, Keras, Scikit-Learn, PyTorch, deployment, performance and more. Hope you'll enjoy it!
About me: I am the former lead of RU-vid's video classification team, and author of the O'Reilly book Hands-On Machine Learning with Scikit-Learn and TensorFlow. I'm blown away by what Deep Learning can do, and I feel incredibly fortunate to call it my job. I hope I can help as many people as possible join the party!
Very interesting and informative talk even tho some times has passed and machine learning has probably changed something. But its nice to hear from an expert how they put things together and give some references where someone can start from scratch.
In the last slide of the red panda example, why the sum of predicted distribution exceeds 100%? The sum of both the distribution should be equal to 100% right? If that is not the case, how to explain the situation that all the classifications have 100% predicted probability? And the CrossEntropyLoss would be 0 in that case.
Oops, you're right, the sum of the predicted probabilities should indeed be 100%, good catch! Apparently I can't count. 😅 I'll add this error to the video description.
Question: at about 4 minutes where you talk about equivariance, would it be fair to assume that all the capsules should have moved if you rotate the image? All the smallest capsules stayed still and just the larger ones rotated.
What if we send only changes. E.g. 0 bits are send if there is no change. That should improve information content. Would it be optimal if we send first bit which direction weather moved 0-more sunny, 1-more rainy followed by amount of change. Then if we have 8 total weather states possible, so max change is 7, to encode 7 we need 3 bits + directional bit in most extreme case. And plenty often we send 0 bits and it works for both weather patterns. Is that reasoning valid?
Interesting question. Indeed, only sending the changes would be quite efficient, especially if there are frequent repetitions (such as 10 sunny days in a row). In practice, it's a very commonly used optimisation in telecommunications. One drawback of this approach is that you can't tell the difference between "there's no change" and "the weather station is broken". 😃
Can you please give me a hint? You was a yt video classification PM. I can't find anything on how youtube defines topics of videos and if this info is available in youtube api. Can you please give me a clue where to find this info, I can't find literally anything.
I did a video on RU-vid video classification: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-zzTbptEdKhY.html I left RU-vid in 2016, so this may be outdated information, but hopefully it will help you.
I really enjoyed the way you are explaining it. It's so inspiring watching and learning difficult concepts from the author of such an incredible book in the ML realm. I wish you could teach via video other concepts as well. Cheers, Roxi
Ok, i maybe should pay more attention when reading my books, but when i heard here that CrossEntropy is entropy + KL it made sense, then when i read my notes i wrote something similar, but without even realizing how big it was.
This is by far one of the best explanations of cross entropy loss on RU-vid. Another video that complements this by asking why does weighting the predicted distribution by the true distribution be even considered a loss? ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-LOh5-LTdosU.html Also how does one make a model output a probability distribution? The role of softmax ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-p-6wUOXaVqs.html
It is not really a clear explanation because it assumes the viewer already knows what Shannon was trying to formalize. At 1:06 you jump into "dividing uncertainty by 2" without having defined what "uncertainty" means first and how it could be measured itself. So you are kind of explaining unknown by unknown.
Le monde de l'OPA regorge de possibilités, présentant une tapisserie d'éléments complexes et dynamiques. L'entropie, la cross-entropie et la divergence de Kullback-Leibler apportent de la précision et de la précision, débloquant les outils pour former une entité unifiée. Grâce à l'échange de connaissances et à la poursuite de l'harmonie, une intégration optimale permet aux entreprises d'atteindre une vision partagée du succès. La langue de la théorie de l'information a le pouvoir de débloquer un avenir vibrante.
i am stuggling with the implementation in python as i have different num of lengths for the distributions (means different row numbers). while using scipy.special rel_entr i am having the error of shape mismatch. anyone ? any idea ?
i'm loving the slides and explaination. I noticed the name in the corner and thought, oh nice i know that name. then suddenly... It's the author of that huge book i love!
Haven't seen a better, clearer explanation of entropy and KL-Divergence, ever, and I've studied information theory before, in 2 courses and 3 books. Phenomenal, this should be made the standard intro for these concepts, in all university courses.