Hello! First of all, thank you for uploading the material. Very, very good course. However, in this part of EBMs, I'm a little bit confused: Lets supposed that I've trained a denoising AE (or other variation) with a bunch of y's. After training, how do I use it in practice? I'd pick a random z and use to generate a y_tilde? From which distribution I'd sample such z?
I have a few videos on that on my most recent playlist, second chapter. There, I explain how the Perceptron (a binary neuron with an arbitrary number of inputs) used an error correction strategy for learning. Let me know if you have any other question. 😇😇😇 Chapter 2, video 4-6 ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-g4sSU6B99Ek.html
This principle of running differential equations backward is used in diffusion when you find the lagrange Loss function from the Score which is the time reversing Langevin dynamic equation. Cost and Energy or Momentum and Energy. Both are deterministic reversible dynamic systems.
Thank you for sharing. I'v one question about the NNs on scrambled data. If I had to make a prediction I would have said we will have an accuracy about 15%, not more, thanks to the amount of pixel that can help to determine which digit is corresponding. So is it enough to get an accuracy of 83-85% or there is something else? I supposed that the fully connected neural network would have duplicate the filters, but there is no change with the scrambled data.
@@alfcnz Yes of course. If think my french explanation was not clear either. I would have assumed that with scrambled data, we would have had an accuracy of around 15%, not more (which is more than 10% thanks to the fact that by counting the number of pixels, the model can have an idea of which digit is the most probable). I have trouble understanding how the model can achieve as "good" results as 85% on scrambled data. Does the model count the number of pixels and determine it that way, or is there something else? I had assumed that in reality, the dense model would have worked like a ConvNet by learning the same kernels multiple times. Essentially, we would have had weight redundancy to get something similar to a ConvNet. Is it because of the lack of parameters in the dense network? If we had given a lot more parameters to it, would it have come back to having a ConvNet with weight redundancy to "simulate" the filter's movement? In french: J'aurai supposé qu'avec les données brouillées on aurait eu une précision d'environ 15% pas plus. (Ce qui est plus que 10% grâce au fait qu'en comptant le nombre de pixel le modèle peut avoir une idée de quel chiffre est le plus probable. J'ai du mal à comprendre comment le modèle peut avoir d'aussi "bon" résultats que 85% sur des données brouillées. Est-ce que le modèle compte le nombre de pixel et le détermine comme ça ou il y autre chose ? J'avais supposé qu'en réalité le modèle dense aurait fonctionné comme un ConvNet en apprenant plusieurs fois les mêmes kernel. En gros on aurait eu une redondance des poids pour avoir quelque chose de ressemblant à un ConvNet. Est-ce à cause du manque de paramètre du réseau Dense ? Si on avait donné beaucoup plus de paramètres à celui-ci, est-ce que ce serait revenu à avoir un convNet avec une redondance des poids pour "simuler" le déplacement du filtre ? Thank you
There's a lot going on in this question. First, let's address the fully-connected model. The model does not care if you scramble the input or if you don't. If smartly initialised, the model will learn *the same* weights but with a permutated order. That's why the model performance is (basically) the same before and after permutation. Until here, are you following? Do you have any specific question on this first part of my answer?
@14:04 Paraphrase: Missing a positive (i.e., false negative) is more critical (i.e., worse) than a FALSE POSITIVE. (Note: "falsely identify a negative case" means "falsely identify AS A POSITIVE what is actually a negative case.)
Thank you so much for sharing this series. I especially loved the vintage ConvNets and the brain part :) I have a question: I didn't understand how we define the number of feature maps. For example, in 1:27:00 , how did we go from 6 feature maps in layer 2 to 12 feature maps in layer 3? (By the way, there are 16 feature maps in layer 3 (C3) in the architecture of LeNet-5 in this paper: yann.lecun.com/exdb/publis/pdf/lecun-98.pdf (Fig 2. the architecture of LeNet-5).
*Summary* *Probability Recap:* * *[**0:00**]* *Degree of Belief:* Probability represents a degree of belief in a statement, not just true or false. * *[**0:00**]* *Propositions:* Lowercase letters (e.g., cavity) represent propositions (statements). Uppercase letters (e.g., Cavity) are random variables. * *[**5:15**]* *Full Joint Probability Distribution:* Represented as a table, it shows probabilities for all possible combinations of random variables. * *[**10:08**]* *Marginalization:* Calculating the probability of a subset of variables by summing over all possible values of the remaining variables. * *[**17:04**]* *Conditional Probability:* The probability of an event happening given that another event has already occurred. Calculated as the ratio of joint probability to the probability of the conditioning event. * *[**16:14**]* *Prior Probability:* The initial belief about an event before observing any evidence. * *[**16:40**]* *Posterior Probability:* Updated belief about an event after considering new evidence. *Naive Bayes Classification:* * *[**32:48**]* *Assumption:* Assumes features (effects) are conditionally independent given the class label (cause). This simplifies probability calculations. * *[**32:48**]* *Goal:* Predict the most likely class label given a set of observed features (evidence). * *[**44:04**]* *Steps:* * Calculate the joint probability of each class label and the observed features using the naive Bayes assumption. * Calculate the probability of the evidence (observed features) by summing the joint probabilities over all classes. * Calculate the posterior probability of each class label by dividing its joint probability by the probability of the evidence. * Choose the class label with the highest posterior probability as the prediction. * *[**36:24**]* *Applications:* * *Digit Recognition:* Classify handwritten digits based on pixel values as features. * *[**47:34**]* *Spam Filtering:* Classify emails as spam or ham based on the presence of specific words. * *[**33:56**]* *Limitations:* * *Naive Assumption:* The assumption of feature independence is often unrealistic in real-world data. * *[**42:11**]* *Data Sparsity:* Can struggle with unseen feature combinations if the training data is limited. *Next Steps:* * *[**1:05:58**]* *Parameter Estimation:* Learn the probabilities (parameters) of the model from training data. * *[**59:53**]* *Handling Underflow:* Use techniques like logarithms and softmax to prevent numerical underflow when multiplying small probabilities. i used gemini 1.5 pro to summarize the transcript
They are a bit off. The first two titles should not be simultaneous nor at the very beginning. Similarly, Gemini thinks that the first two titles of Naïve Bayse Classification are also simultaneous. I can see, though, how these could be helpful, if refined a bit.
Please, check out the first video of the playlist, where an overview of the course in provided. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-GyKlMcsl72w.html
Yeah… After the pandemics, classes have returned to be held in person. This semester it was a surprise I was going to teach remotely again. And I have to say I prefer it, since then I can share my work with y'all! ❤️❤️❤️
When I first started college a year ago, I found that the lua interpreter is faster than Python, so I've been wondering if I could use lua to build deep learning models... In the past two months, I was surprised to find that my idea had already been eliminated when I was in primary school... But I'm still curious, so now I'm trying to experience deep learning and data science computing tasks with torch in lua...
I would recommend moving to PyTorch since that’s the latest version of Torch itself. Yes, you could use Lua out of curiosity, but as of 2024, there are no advantages in doing so. Python has become the lingua franca of deep learning and has a large wealth of libraries, making the whole ecosystem very convenient to use.
wow, never been happier with a notification! I think I missed this course, may I ask where's the first part? btw are we getting any introduction to diffusion methods this time?
This is a new undergraduate course I just started teaching this year. I'll post an intro video about this series in a few days. Diffusion models are taught in my graduate level class (and not from me yet).
@@alfcnz thanks for the response. Is there anyway to access your graduate courses ? It's fine if it isn't free. As much as I'd like to visit your classes unfortunately there are people like me where we're two continents apart and not much feasible to apply for the classes. My only hope is to get them digitally.
There are several editions already available here on RU-vid about my graduate course. Feel free also to check out my homepage to see all courses I’m offering.