Welcome to DataMListic (former WhyML)! On this channel I explain various machine learning concepts that I encounter in my learning journey. Enjoy the ride! ;)
The best way to support the channel is to share the content. However, If you'd like to also support the channel financially, donating the price of a coffee is always warmly welcomed! (completely optional and voluntary) ► Patreon: www.patreon.com/datamlistic ► Bitcoin (BTC): 3C6Pkzyb5CjAUYrJxmpCaaNPVRgRVxxyTq ► Ethereum (ETH): 0x9Ac4eB94386C3e02b96599C05B7a8C71773c9281 ► Cardano (ADA): addr1v95rfxlslfzkvd8sr3exkh7st4qmgj4ywf5zcaxgqgdyunsj5juw5 ► Tether (USDT): 0xeC261d9b2EE4B6997a6a424067af165BAA4afE1a
Great video, thanks. Still struggling with the origin of the loss of that degree of freedom at the end. You said "if we have the sample mean, we don't need to know all 3 data points but actually only 2 because the 3rd can be estimated using the sample mean and the other 2 points." This makes sense, I agree that knowing the sample mean and 2 of the 3 points gives you the third. Where I'm struggling is how can you know that sample mean unless all three data points are known and set freely? In other words, how could you know the mean of a sample of three points if you only know two points in that sample? In other words, don't you need to know all three points in order to get the sample mean that you then use to say that third isn't necessary?
I think I got through it. So yes, you obviously need to know all values in your sample in order to calculate the sample mean. Where the N-1 comes in is when we then want to calculate the variance of that sample of which you now know the mean. We know from the proof you linked to in your description that sample variance calculated using N is *not* a good estimate of pop. variance since it's biased to too small of a value. Why is it so? Variance of a sample makes no conceptual sense unless the mean of the sample is know and set (variance sets how much elements in the sample differ from the mean). So to figure the sample variance, the mean must be set and once the mean is set, how many elements are free to vary for the sample mean to remain? The answer is N-1 as the Nth is no longer free to vary for the mean to remain as it is. So only N-1 elements actually contribute to the variance of a sample with a given sample mean. Calculating the sample variance with N-1 rather than N will therefore give of a better estimate of the population variance (which would be calculated with N). It's a bit like hiding a ball under 1 of 3 cups. Once you know the ball must be under one of the cups, you only need to check a max of 2 cups to know under which one it will be (because if it's neither cup A nor cup B, it must be cup C, no need to check it).
Great video. My understanding is that you would almost always use Bagging, evaluate the results and, if good enough, stop there. However, you COULD go on to try various boosting methods to see if the model improved even more but at what cost? If the best boosted model (Adaboost, XGBoost, etc) performed 1% better but took 3x longer to compute then boosting the already-bagged models might not be worth it right? Still trying to cement in my mind the process flow from a developer standpoint 😉
Interesting video, didn't hear about google DataGemma, really a fascinating concept, thanks! I would also like to add that the first part of the video could have been more dynamic, I felt like up to DataGemma included the images were too "still" and didn't provide much in simplifying/visualizing the concepts, when talking about o1 reasoning, for example a showcase of its Chain of Thought output could have been helpful; from then on I liked the pace and the information provided, so good job! Waiting for the next week one!
so how does it do open set object detection exactly? by making the cross-modality features lie close to the textual features in the embedding space using contrastive loss, it automatically learns to decode the correct bounding boxes for visual objects it hasn't been trained on as well? seems like magic
Those are gold! Thank you so much for this wonderful effort! A question out of pure curiosity: How long did it take you to attain such a level of knowledge?? I'm learning on my own at my own pace, revising things that I may come across, and it's just an endless pool of knowledge. Yet... you seem to already know most of these and are even able to teach them in a very intuitive way!
Thanks! Glad you liked the explanation! Regarding your question, I am 1000% not the most knowledgeable person in the ML space, I know many, many people that know more than I do. However, what I can say is that if you study any field long enough, you encounter certain concepts (like the covariance matrix) again and again in different scenarios, and you tend to get a deeper understanding of how it works. Then, it gets easier to explain it.
Thanks for the explanation, there is one thing that I hope you can elaborate on. With the weight matrix randomised, why is it easier for the NN to learn Zero Matrix compared to the Identity matrix?
Good question! IMO mainly because you also usually use weight regularization (i.e. L2) in your final loss, so the NN can easily shrink the weight's matrix values to 0.
Good question! Of course there's always a chance of getting stuck in a local optima because you are basically using a greedy algorithm here, and that's why you usually perform the search algorithm multiple times, so you reduce the chance of that happening.
I've checked the scripts I used to generate the plots for this video. It seems that in the plot at 2:51 I forgot to normalize the amplitude (i.e. divide by half of the max frequency). Sorry for the confusion this may have caused!
If you're interested in learning more about AI, you can check out the following reading list: ru-vid.com/group/PL8hTotro6aVGtPgLJ_TMKe8C8MDhHBZ4W&si=u9Gk38MaQ7VLH3lf