Three suggestions: 1) A dedicated play list about distributions, which also talks about the relationships between them (i.e. How taking the limit of Bernoulli yields Poisson or conditioning Poisson yields Bernoulli etc...) 2) Another playlist for stochastic processes 3) In each playlist, tell also the relationships between the elements you are talking about, this shows us the big picture.
As much as I would love for this to happen, these must already be taking a phenomenal amount of time to make. The videos by themselves are gold enough. We can crowdsource this effort and tag the playlists here
This is great. My stats teacher in high school didn't teach from first principles and I didn't care that he didn't but this is such a simple explanation. Thanks!
Hi Josh! thank you for doing all these videos in your channel. I also hope to see a playlist of distributions and their relationships. that is, the big map of distributions.
Please make another book and explain all of these statistics fundamentals at one place. the ML book doesn't cover all of the concepts. It's gonna save me a lot of time for note taking.
Will you be doing Markov Chain Monte Carlo type of video? Although it's just for my own, but that would be great for others to learn from it if there will be one =)
Hi Sir You are the only person on RU-vid I have ever seen who resolved their subscribers or non subscribers issues. Whenever I ask you reply to me Thank you for that. I have one question which is bothering me a lot. During probability distribution examples we have given parameters of distributions in our examples. So my questions are: 1. In the real world we don't have given parameters with us, we always have sample data with us? So how we get that given parameters in real world scenario. 2. As in question 1 i said we always have sample data with us not the entire population but in the probability distribution function we are passing parameters like mu and Sigma in case of normal probability distribution, so those parameters are population parameters or estimated population parameters from sample data? Thank you in advance
The answer to your questions are in these videos: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vikkiwjQqfU.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-SzZ6GpcfoQY.html
Hi Josh, I just realized you don't have any videos related to PDF (or PMF) and CDF, or at least I couldn't find them so far. I would really appreciate it if you can make a video about them as well!
Bam! that was a great explanation. I have one silly question: Is there any difference between a statistical distribution and probability distribution? Thanks in advance!
Dear Josh, Thank you for the clear explanation of probability distributions; it helped me recall sampling distributions. May I ask a question? When referring to the normal distribution as an example, does 'sampling distribution' specifically mean SD and MEAN, or does it refer to SE and mean of the mean? Or are those both the sampling distribution?"
Here's an example. Say like we collected 8 measurements from a normal distribution and calculated the mean. Then we repeated the process (collected another 8 measurements and calculated the mean), then the collection means would be the "sampling distribution of the means". The standard deviation of that sampling distribution would be the standard error of the mean.
If I'd had this when I was at school and university, I might have done some more statistics and probability, instead of running a mile because it seemed so strange compared with the rest of maths
By chance, do you have any material on Bayesian Inference? I'm trying to understand Expectation Maximization (EM) but it's wrecking my mind... all your videos are incredible btw :)
EM algorithm · Initialize (randomly) · Iterate until convergence - Expectation step: estimate the membership Zji using the θ of the last iteration - Maximization step: update the parameters of the distributions θ,π using zji Soft version of the k-means algorithm · Cluster probabilities Zji instead of choosing the next centroid · Use all data points (weighted by cluster probabilties) to re-estimate the centroids (means, in addition also the covariance matrix)
This is a good question, since choosing the right bin size has a large effect on what the histogram will look like. The strange thing, however, is you can have more bins then you have data. For example, imagine you wanted to draw a histogram where values on the x-axis could be anywhere between 1 and 100. Now imagine each bin was one unit wide, so we had 100 bins. Now imagine we got 50 samples, but all of the values were between 45 and 55. The histogram would have a mound of data in the middle, but nothing on the edges. Does that make sense?
so as I looked at this I tried to figure out why someone just wouldn't use a line graph or such. Then I realized that maybe an important thing to mention is that this histograph is only measuring one characteristic and not 2. So we are not measuring age and height as having some sort of relationship where as you get older your height goes up. But we are just finding how often a certain range of one particular characteristic occurs. If my assumption is correct would this be a good thing to mention in the video? Also would you ever attach some questions along with each video to help provoke thought?
Great video as usual please keep them coming but how we can make sure that this data follows this particular type of distribution or it is well defined by this curve before making any inference about population from sample when we don't have enough data in our sample ??
This is the million dollar question!!!! How do you pick the right distribution? Sometimes it's just known. For example, so many people have flipped so many coins over the years that we know that flipping a coin a bunch of times follows a binomial distribution. Likewise, sometimes we can make a basic assumption, that a coin should land heads 50% of the time on average, and then just work out the math and essentially derive a binomial distribution from scratch. However, sometimes it's not so obvious or easy - as a result, people can use the "wrong" distribution. In my field (genetics), people used a Poisson distribution to model something (RNA-seq data) for years before discovering that the Poisson distribution didn't allow for enough variation in the measurements, so they switched to a negative-binomial distribution. So, here's my advice for selecting the "correct" distribution: 1) See if other people have looked at this type of data before, if so, see what distribution they used. 2) If it's new, you can collect tons of data and that will tell you, or you can think really hard about the data and what's generating it and that might give you a clue about what sort of distribution you should use. When all else fails, there are always "non-parametric" methods that are just statistics methods that do not assume you know what the distribution is.
StatQuest with Josh Starmer Hey Josh, thank you so much for clarification. Just want to know a little more about the answer you have given - ● As you mentioned in the 2nd step of process, do try to collect more data that helps you to know about the distribution of data. Right ? What if i don't have access to tons of data ? ● And one more question which i want which is related to normal distribution and that is - when we say the data is normally distributed, it means that our data is following the bell shaped curve but the bell shaped curve in this case represent what the intuitive curve we visualize when we draw the histogram of that data or when we draw the `PDF` of that data ? ● One request from my side - if possible please try to make videos on non parametric tests. Thank you
@@statquest i didnt expect a reply either, (glad you did) first time(today) watching your vedios and i love the homemade feel of the intro, also i have question about standerd deviation, all the vedios i searched focuses on how to apply it, but i dont yet have a vlear picture of what it is, what i undestood so far : standerd deviation is the messure of spread(in a normal distribution) and there is a formula to calculate it fron a given mean, my question is, how do we derive the formula? i like to think of mean as the mid point so sum/n makes sence.(as division is opposite of repeated sum) similar to that, when i look at the formula of std, i see we find the average of squre the distance between the mean and x then we find the root of the result. is this a geuss to square and the find the root? or we could have cubed and find cuberoot? thanks for your time.
@@thebestedits3845 See: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vikkiwjQqfU.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-SzZ6GpcfoQY.html
That's the magic of the curve - it's it is - it is and gives us a sense of what the histogram would look like if we had the time and money to measure everyone in the planet. If so, there wouldn't be a gap there.
It depends on the curve you want to draw. If you want to draw a normal distribution, then you can plug in the mean and standard deviation of your data into the equation for a normal curve and... bam! you'll get a curve.
Wish you had explained all the fundamentals in a sequence. Like for us as beginners we are unable to identify which video should be watched first? Can you make a sequence or guide us from where to start?
What do you do with values that fall exactly between 2 bins?? like if the bins are 4.5 to 5.5 and 5.5 to 6.5, where do you put the value of 5.5? Which bin? please help
Sort of like rounding, you just have to decide in advance which way things like that will go - there is no "right" answer. You could decide to always round down, so 5.5 would go into the 5 bin, or you could decide to round up, so 5.5 would go into the 6 bin. It's up to you. Another thing that you have to fiddle around with is how wide the bins should be - different widths can result in different looking distributions - so it's a good idea to try a few and see which one makes the most sense.
For example I have some data, then in mini tab I find my data distribution, for example it has normal or poisson or F or T distribution. I want to know what I can do or what I understand, when I find my data distribution?
Hi Josh, sorry for asking a perhaps obvious question, but I've been struggling to wrap my head around the area under the curve part. Shouldn't the probability that a particular value is in the given interval be equal to the area under that part divided by the area under the entire curve. I've seen people explain this with a heads/tails uniform distribution, where two events (heads, tails) are on the x axis, and the probability of that event happening on the y axis (0.5). However, how does this all translate to literal values, such as the number of people with a certain height...
@@statquest Yeah, thanks, I had a few things mixed up. When you had shown the histogram, I thought of the typical ones where the y axis represented the number of things that fall into each range, not the percentage of things. I was confused as to how the area could be equal to one. Thanks for the answer nonetheless!
@@leonandorfi5191 my two cents. when we are talking about the histogram, then the area is not equal to 1 but when we are talking about the bell shaped probability distribution curve, then the area under the curve is 1. moreover in histogram there is no curve, so we cant say technically "area under the CURVE" for histogram. am i correct? i dont know LOL
Hey josh, do you have a good link to explain how to integrate the Gaussian curve to arrive at probabilities? I would like to learn how to do it by hand. I understand calculus. Thanls again for the great content!
I don't have one on hand. However, I remember that it has a trick - you have to do a substitution to make it work. To quote Roger Berger (the guy that taught me this): If you know the trick, integrating the normal curve is easy. If you don't know the trick, you'll never figure it out.
Sure, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vikkiwjQqfU.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-SzZ6GpcfoQY.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sHRBg6BhKjI.html
The formula for the normal curve is kind of complicated. However, if you want to learn how to fit it to data, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-XepXtl9YKwc.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Dn6b9fCIUpM.html
The parameters we are supplying into pdf or pmf are the parameters but actually we are not collecting data for population then how in question we have given that parameters?
We can estimate them. For details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vikkiwjQqfU.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-SzZ6GpcfoQY.html
Doubt it that it would make any sense to calculate missing values using calculus. For 6" people calculus will give us ~1.5. What if in reality the are 20 of such height?
It depends. Of course you can make mistakes, but, believe it or not, height is normally distributed, so we really can use a normal distribution ( ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rzFX5NWojp0.html ) to impute missing values.
hi Josh, I'm enjoying your videos, but I think it's only the US who uses inches/feet units of measurement. The rest of the world (which is the majority of the world's population) is on centimeters/meters. 😅
This is one of the first videos I ever made, and back then, no one watched my videos other than a few friends based in the US. Since then I've changed to using more universal metrics.
@@statquest thank you Josh, I placed an order in Lulu for a copy of your book (I believe it's on its way). You're a good teacher. I like to learn visually, and your diagrams are the best I've found so far.
@@statquest Thanks Josh, got the book within a week of placing the order. It is excellent. It reads as easy as reading a comic book, and the images are the best way to explain this topic. I'm also running some examples with TensorFlow, image classification with Fashion MINST, and learning to use the OpenAI API. I'm trying to understand when to use the different activation and loss functions, and architectures for Neural Networks. I didn't have a background in Statistics, so your book helps me with that. But I knew Linear Algebra and Gradients, and was happy to see this again.
@@user-mo9fz4tk2r Awesome! For activation functions, unless you want to do something very specific, people just use the ReLU. To see examples of doing something specific, see my video on LSTMs: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YCzL96nL7j0.html