Expected Values for Continuous Variables!!!

StatQuest with Josh Starmer

Подписаться 1,3 млн

Просмотров 87 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

2 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 181

@statquest 2 года назад

Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/

@theburtmacklin9615 3 года назад

Amazing videos; if you only knew the way in which your channel had changed things for me… what started as a “hey this video looks cool” has largely been responsible for my wife enrolling into a data science postgrad program in the University of Waterloo.

@statquest 3 года назад

That's amazing! Wish your wife good luck for me and I hope it goes well. It's very exciting! BAM! :)

@Thedavidk 3 месяца назад

@@statquest I did but the integral stuff was a bridge to far for me 🙂

@raulsena3917 2 года назад

Amazing videos, just wanted to thank you for all you have taught me. Now I am doing a Master's Degree in Artificial Intelligence and planned to do a Ph.D. and all thanks to your videos.

@statquest 2 года назад

Wow! That is awesome! Good luck!

@bogdankhamelyuk3857 2 года назад

same bro. Also doing MS in AI and reminiscing some ML basics XD

@josherickson5446 3 года назад

Triple bam bro! Now just need to refresh some calculus :)

@statquest 3 года назад

Yep!

@newsupdates3622 3 года назад

Double bam! Here to watch another great statquest video for a refresher

@statquest 3 года назад

Thanks!

@jerrycheung8158 2 года назад

Great video! May I ask where is the next video to go to for solving the n - 1 puzzle?

@statquest 2 года назад

Unfortunately I haven't made it yet. However, this is my favorite webpage on the topic: online.stat.psu.edu/stat415/lesson/1/1.3

@iroikoarakide8990 8 месяцев назад

Why do I have to attend college classes when you explain these concepts so thoroughly and in a third of the time? Great video as always man you are a Godsend

@statquest 8 месяцев назад

Wow, thanks!

@sharan9993 3 года назад

When can i know why we divide sd by n-1? I hav seen ur videos on sd and know we need to subtract something but why 1? U said we need to know about expected values first. So when can i learn? Love ur videos. Thanks a lot

@statquest 3 года назад

Believe it or not, we still have a long way to go. Sorry this process is so painful!

@statquest 3 года назад

That said, if you want to skip to the head of the class, see: online.stat.psu.edu/stat415/lesson/1/1.3

@sharan9993 3 года назад

@@statquest ohhh i will wait bcoz i am curious yet lazy😂

@sharan9993 3 года назад

@@statquest i checked it out and it went over my head. Could u tell me the path that leads there? I will search and read them. Thanks a lot

@statquest 3 года назад

@@sharan9993 If I could, you wouldn't be waiting for my videos.

@homunculide8567 Год назад

Thanks for that and another videos with all that useful explanations... I'll keep binge watching. Regarding your explanation at 10:06... The second term and all following appear wrong to me, because the intervals are 10s each and not 20s, 30s, 40s... this the area of the rectangle should be likelihood x 10(s) for all terms (discrete rectangle) You are the expert but I can't wrap my head around that.

@statquest Год назад

I'm not sure I fully understand your equation. The area of each rectangle (the probability) is 10 x likelihood (obviously the values are approximated and rounded in the figure). The time, 's', should only be used to determine the likelihood at a specific point.

@stef1790 27 дней назад

Thank you! Really helped with refreshing my statistics!

@statquest 27 дней назад

Glad it helped!

@ahmadatta66 2 года назад

I'm in the library and I did not realize my laptop speaker was on, and the video started, as always, with a song. :D I'm not even embarrassed. Everybody should hear your songs, they are amazing haha

@statquest 2 года назад

BAM! :)

@durgeshkshirsagar116 3 года назад

You sound like kids' rhymes but when you teach, teach like a pro.

@statquest 3 года назад

Thanks! :)

@durgeshkshirsagar116 3 года назад

@@statquest welcome

@zalavadiaridham 2 года назад

Amazing video josh!!! I'm waiting for the video on why we divide by n-1 when we compute the sample variance. Thank you for the very informative content that you put out.

@statquest 2 года назад

Thank you!

@hrig 2 года назад

yes! n-1please!!!!!!!!

@jaspermuller9851 8 месяцев назад

Is there any update on the n-1?

@dicksang2 6 месяцев назад

I am following through and want your version of n-1 as denominator to estimate the variance by sample variance. Thanks!!

@taotaotan5671 3 года назад

probability is fun : )

@statquest 3 года назад

BAM! :)

@taotaotan5671 3 года назад

@@statquest I guess the next video will talk the variance of a random variable :)

@satyamraj2779 7 месяцев назад

you are too cute with your bams!

@statquest 7 месяцев назад

@norukamo 7 месяцев назад

Just discovered this channel an hour ago and now I'm binging your videos. They're soo good and easy to understand. BAM!

@statquest 7 месяцев назад

Thank you very much! :)

@otavioaugustodeassuncao9858 2 года назад

Currently i'm studying Statistics at Federal University of São Carlos (UFScar) and just wanted to thank you for all the helpful and fun content that you've been posting... Not only had helped me to understand but also has made me like Statistics even more

@statquest 2 года назад

Thank you very much! :)

@megasun 11 месяцев назад

Number is not right around 8:25, the rectangle area should be approximate 0.3894. I was very surprised that the given rectangle area of 0.4 is bigger than the given integrated result of 0.39, because the rectangle area looks slightly smaller than the area under the curve ... still I love these videos. And of course, for the purpose of this lecture, having more digits here is distracting and not helpful.

@statquest 11 месяцев назад

Sorry if my inconsistent rounding through you off.

@selva279 3 года назад

Eagerly waiting for your book...

@statquest 3 года назад

Working on it! :)

@fedos 3 года назад

I think I've had too much to drink, because I read the notification as "Star Quest with Josh Starmer", and I was confused about your sudden shift to astronomy. I'm going to have to watch this tomorrow.

@Erosis 3 года назад

Star Quest would make a great April Fools joke. Have any astronomy friends, Josh?

@statquest 3 года назад

Once StatQuest makes me a billionaire, I'll make spaceship called Star Quest!!! You guys can come along for the ride.

@samieweee7468 8 месяцев назад

I am a bit confused is the value in y axis the likelihood of meeting a person after xth seconds or the number of people we meet after x th seconds cause during initial explanation the dots at each x were number of people we met after xth time.

@statquest 8 месяцев назад

What time point, minutes and seconds, are you asking about?

@samieweee7468 8 месяцев назад

@@statquest 2:37

@statquest 8 месяцев назад

@@samieweee7468 At 2:37, each dot represents an individual person. In other words, StatSquatch is creating a histogram. However, histograms have problems - gaps in the data and the data are not continuous (they have to be put in bins). Thus, we use an exponential distribution to approximate the histogram. The exponential distribution doesn't have gaps and is continuous. And, from there on out, we use the distribution, which gives us likelihoods on the y-axis, rather than the histogram, which gives us the number of people on the y-axis.

@SunSan1989 Год назад

I watched all the videos carefully, but why can't I find it? Why choose n-1 instead of N minus 2 or n-0.5?

@statquest Год назад

Unfortunately I haven't made that one yet. In the mean time, check out: online.stat.psu.edu/stat415/lesson/1/1.3

@Letsbetog 3 года назад

You are an awesome singer As well as A great Teacher.🙃🙃🙃

@statquest 3 года назад

Thanks!

@adampotter760 Год назад

What the next video to watch to find out the elusive reason for dividing by n-1!? Please reply!

@statquest Год назад

Unfortunately I don't have the video yet. In the mean time, check out: online.stat.psu.edu/stat415/lesson/1/1.3

@kakashisensei8146 3 года назад

Josh, I'm really thankful to all your vdos, it's enlightening, lol. Please can you make some vdos on distribution functions (e.g., normal distribution)?

@statquest 3 года назад

I've got a basic video on the normal distribution here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-rzFX5NWojp0.html and a video on maximum likelihood estimation with the normal distribution here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Dn6b9fCIUpM.html All of my videos can be found here: statquest.org/video-index/

@kakashisensei8146 3 года назад

@@statquest you're the best.. Thanks a lot!

@awaisjabbar8304 3 месяца назад

Liked the video, before it even starts, awesome channel, why I have not heard of you before.

@statquest 3 месяца назад

Thank you!

@НиколайТолстов-м3и 2 года назад

1) Why cant we just take average of all waiting times to get how much on average we would wait? 2) How often in real life we can use some sort of formula like exponential to describe a distribution? I think in most of the cases we can use NN to fit the distribution

@statquest 2 года назад

Expected values are helpful for doing Statistics and getting a sense of how likely future events will be. This is why we go through the trouble to do all this math instead of just taking an average or fitting an NN to the data.

@drelijahmikail3916 24 дня назад

shouldn't it be a Riemann sum where the interval is the same? Why is the multiplier: 10, 20, 30, etc, instead of constant interval of 5 or 10?

@statquest 23 дня назад

Because the x-axis value (10, 20, 30) represents time and we want to know the probability of observing something during that block of time.

@AG-cx1ug 6 месяцев назад

For each rectangle shouldn't we be multiplying the likelihood value by 10 and not 20,30? Since the area is just between 10-20 or 20-30?

@statquest 6 месяцев назад

At 9:00 we calculate the y-axis values (the likelihoods) for each rectangle by 10 because it is the distance between each tick-mark on the x-axis. This gives us the probabilities of each event. However, the events, or outcomes, themselves, are the x-axis values. So, starting at 9:10, we multiply the probabilities by the outcomes (the x-axis coordinates). When we add these products up, we get a weighted average of the outcomes (weighted by their probability of occurring). Is that what you are asking about?

@AG-cx1ug 6 месяцев назад

Did we finally get the final equation as at 7:40? Both are just area under the curve? Width wasn't being used before 11:00 - we were just using the specific outcome x, how did both width and specific outcome come in the final equation?

@statquest 6 месяцев назад

The equation at 7:40 is just the equation for the area under a curve defined by an exponential distribution, not the equation for the expected value of an exponential distribution. The equation for the expected value of an exponential distribution doesn't show up until 12:39. The big difference is that we multiply the formula for the exponential distribution by 'x', a specific outcome. Width isn't part of this equation because we take the limit earlier as the number of rectangles goes to infinity and their widths go to 0.

@عَدِيُّ-م3ح 3 года назад

I am a simple man, I see integrals, I click. and thank God.

@statquest 3 года назад

bam!

@MrDanituga 3 года назад

One of the best explanations I have ever had! Congratulations on the video and on your content in general. Watch every video :D

@statquest 3 года назад

Thanks! 😃

@VladLanz 3 месяца назад

Looks like the math is wrong for the approximation of expected value of distribution: 1) for each bin we compute: f(x) = lambda * e ^ (-lambda * x), where f(x) is the value of probability density function (PDF) for the middle point 2) we compute probability of each bin: p(x) = f(x) * delta, where delta is the width of the bin, in our case = 10 3) This is a step with mistake: we calculate contribution for each bin and sum everything up: E(x) = sum ( p(x) * x ), where x is the middle/average point for each bin, but in your video you took upper bound instead of an average value for each bin. If you do calculation this way, you get 19.23 which is closer to true value

@statquest 3 месяца назад

I wouldn't say the math is wrong because the purpose is only to illustrate a concept, rather than how the math is actually done. In practice, we don't do a summation, we take the integral.

@VladLanz 3 месяца назад

@@statquest sure, in practice we take the integral. But for approximation it makes more sense to take average value for each bin rather than it's top value. [5, 15, 25, ... 95] instead of [10, 20, 30, ... 100]. In case that's you and not some hired assistant who's answering comments here: thank you for your work, you're amazing🤗 You're the main reason i managed to remember everything i learned in university more than 10 years ago, started to master ml and deep learning and began working as data analyst

@statquest 3 месяца назад

@@VladLanz That's me! Thanks! :) (and I still wouldn't say taking the edge is 'wrong' - different, and maybe it doesn't make as much sense for the sake of getting the best approximation, but not wrong).

@6872elpado 3 года назад

I didn't know you could call the integral of g'(x) the anti-derivative. Why didn't you just integrate g'(x) to get g(x)?

@statquest 3 года назад

"Antiderivative" and "Indefinite Integral" are synonyms. So we can say it either way. en.wikipedia.org/wiki/Antiderivative

@timehealthfit1891 3 года назад

This is a great video, want to be youtube friends?

@statquest 3 года назад

bam!

@neptunesbounty1786 3 года назад

1 downvote by StatSquatch

@statquest 3 года назад

dang! :)

@alexandersmith6140 11 месяцев назад

Hi Josh. What video(s) after this should I watch to get closer to understanding why we divide by (n - 1) when finding the sample variance and sample covariance? An intuitive explanation for this seems nowhere to be found across the entirety of the internet, and the StatQuest channel has thus far been a divine gift of comprehension.

@statquest 11 месяцев назад

Have you seen this one: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-sHRBg6BhKjI.html Other than that, the best I can do is refer you here: online.stat.psu.edu/stat415/lesson/1/1.3 One day I turn that page into StatQuest, but not for a while.

@atg6174 8 месяцев назад

@@statquest why not now? i think its time to do so ...😢

@quanghoang3801 5 месяцев назад

@@statquest i have waited for 4 years 😢

@AG-cx1ug 6 месяцев назад

At 11:13 why is there still an x value at x * L(X=x) * width? Wasn't the x the width?

@statquest 6 месяцев назад

'x' is the value on the x-axis. It refers to a specific outcome or event happening at a specific time.

@waisyousofi9139 19 дней назад

AS USUAL, BEST!

@statquest 18 дней назад

Thanks again!

@ANGGAKAHFI-kr4yy 4 месяца назад

Sorry sir i dont get it when u use 0.05 for the lambda,

@statquest 4 месяца назад

What time point, minutes and seconds, are you asking about?

@b1ueberrycheesecake Год назад

BAM!!

@statquest Год назад

@MrAzrai99 3 года назад

Just realizing the significance of the calculus I've learnt long time ago ":"(

@statquest 3 года назад

bam!

@MrAzrai99 3 года назад

@@statquest Please make a video on multivariate normal distribution next🙇‍♂️

@statquest 3 года назад

@@MrAzrai99 I'll keep that in mind.

@mohammedk.h.f3016 3 года назад

Thank you very much

@statquest 3 года назад

bam! :)

@tauqeeralitapya6447 Год назад

at 9:39, why are we multiplying 20 with 0.2 and not 10 as the interval is still 10 between 10 and 20?

@statquest Год назад

20 represents the specific, discretized, outcome. In other words, we have discrete time points, 10 seconds, 20 seconds, 30 seconds, 40 seocnds, etc. that we have to wait before we meet someone in StatLand. Since we have a histogram, everyone we meet between 0 and 10 seconds is categorized as "someone we met in 10 seconds". Likewise, everyone we meet between 10 and 20 seconds is categorized as "someone we met in 20 seconds". etc.

@priyankjain9970 Год назад

@@statquest then it should be like = (10*0.4) + (20*(0.4+0.2)) + (30*(0.4+0.2+0.1)) and so on.. and that's because for x=10 , probability will be area under curve till 10 which is 0.4, and for x=20 probability will be area under curve till 20 which is 0.63(roughly equal to 0.6) ... Please correct me if I am wrong

@Thedavidk 4 месяца назад

Wow, that got complicated really quickly

@statquest 4 месяца назад

Did you first watch the expected values for discrete variables: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-KLs_7b7SKi4.html

@গোলামমোস্তফা-শ৮থ 2 года назад

Probability =( height*weight). Of rectangle How did you find that? Can you please explain how it indicates the probability? .. and also explain me that what is the value of "x" in continous random variables "mean"?

@statquest 2 года назад

The area under the curve between two points represents the probability of something happening between those two points (see: 4:48). This is simply how probability distributions are defined. We can solve for that probability exactly using calculus (see: 4:49), or we can approximate it using rectangles (height * weight).

@AG-cx1ug 6 месяцев назад

What if the data is erratic and doesn't fit with any of the lambda values to make the curve?

@statquest 6 месяцев назад

Then you can either use a different distribution, or you can try to approximate things with a histogram.

@AG-cx1ug 6 месяцев назад

@@statquest Thank you!

@tedchirvasiu 2 года назад

5 years later: if you watched this video hoping to learn exactly why we divide by n-1, you are one step closer to understanding this mystery, but not quite there yet.

@statquest 2 года назад

Yep. Sorry it is taking so long.

@alikhalili9961 2 года назад

Where is the n-1??????????? I am outraged. Lol. By the way, great video. Looking forward for more.

@statquest 2 года назад

Unfortunately this is just the first of many steps. :(

@eduquest3169 2 года назад

Thanks a lot dear, may god bless you Josh.

@statquest 2 года назад

Thanks!

@AG-cx1ug 8 месяцев назад

At 14:30 isn't the anti derivative just the integral?

@statquest 8 месяцев назад

yep

@TJ-hs1qm 2 года назад

ahh... how come that Likelyhood and Probability actually mean different things ? 🤷‍♂️

@statquest 2 года назад

See: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pYxNSUDSFH4.html

@RoyalYoutube_PRO 4 месяца назад

Has the mystery of 'n-1' been resolved yet?

@statquest 4 месяца назад

Not yet. The best I can do is give you a link: online.stat.psu.edu/stat415/lesson/1/1.3

@sparshasherke250 Год назад

this explanation is magical :cries:

@statquest Год назад

@isurumahakumara 3 года назад

I couldn't find any quests on Time Series, If there aren't any would love to see one in the future!!!

@statquest 3 года назад

Me too! BAM! :)

@maxyen9892 2 года назад

Could you comment on how we would find expected values for a normal probability distribution, given a mean and standard deviation?

@statquest 2 года назад

The expected value for a normal distribution is its mean.

@aysan7513 2 года назад

omg you saved my life

@statquest 2 года назад

bam!

@amirhosseinshafieian3951 3 года назад

THANK YOU for your great videos I just have a question, I think( I am not sure) that this data in this example follows Poisson distribution, not exponential!!!!! , am I right???

@amirhosseinshafieian3951 3 года назад

I think I got it my self

@statquest 3 года назад

Poisson is discrete and models something very differently from what we are modeling here with an exponential distribution. For details, see: en.wikipedia.org/wiki/Poisson_distribution

@Altamira_ Год назад

Thanks for all of your videos they are great. I'm waching the full playlist "statistics fundamentals" and have to admit that since I understood all of the previous videos thanks to your great explanations I was lost on this one when you used integrals :/ I think it's the video that requires the more calculcus knowledge.

@statquest Год назад

As long as you understand the discrete case, you should be good to go.

@Altamira_ Год назад

@@statquest Thanks :)

@nir199820 3 года назад

I am beginning to think that the all "dividing by n-1" thing is a hoax :)

@statquest 3 года назад

@pigritor 3 года назад

Dear Sir, could you do video on Linear Mixed models and GEE? Maybe you could create a separate donation target with threshold which you need to achieve to make one? It would be extremely useful. It is long waited and hard topic with a lot of contradictory info.

@statquest 3 года назад

I'll keep that in mind.

@mamadi-uk7kv 2 года назад

Perfect man,Great

@statquest 2 года назад

Thanks a lot!

@davonraymond3274 Год назад

appreciate you

@statquest Год назад

Thank you!

@auzaluis 3 года назад

BAM!

@statquest 3 года назад

@pibruks 2 года назад

$$ #BAM!!!

@statquest 2 года назад

@ahmedabuali6768 3 года назад

Dear Professor, I would like to ask you if you have a very good lecture note for this book, introduction to mathematical statistics Robert v. hogg, I am ready to pay for that, I like your methodology in presenting.

@statquest 3 года назад

Unfortunately I've never read that book.

@ahmedabuali6768 3 года назад

@@statquest Ok, I would like to take this chance again to ask you if possible to add topics about Bayesian statistics. Many thanks and I am still following you :)

@statquest 3 года назад

@@ahmedabuali6768 I'm working on those videos right now. They should be out soon.

@ahmedabuali6768 3 года назад

@@statquest wow, very good, attached also the pdf presentation please, I am ready to buy it. Go ahead, dear Prof.:)

@oraciopozzo8694 2 года назад

When you approximate the expected value it is confusing that you add areas whose base is not the constant 10 interval. The areas of the rectangles should be in your example variable height given by the formula times 10 and not 10, 20, … Otherwise the rectangles are meaningless.

@statquest 2 года назад

I'm sorry that was confusing to you. However, each rectangle has the same width (as you can see in the illustration). However, each rectangle represents a different specific outcome. So 10, and 20 are not different widths, but different outcomes. 10 is the outcome represented by the first rectangle (with width = 10) and 20 is the outcome represented by the second rectangle (also with width = 10).

@priyankjain9970 Год назад

@statquest Год назад

@@priyankjain9970 No, it's (10*0.4) + (10 * 0.2) + (10 * 0.1) + (10 * 0.09) + ... etc. To approximate the area under the curve, we add up the area of each rectangle. The area of each rectangle is the width (10) times the height (0.4 for the first, 0.2 for the second, 0.1 for the third, etc.).

@priyankjain9970 Год назад

@@statquest Thanks for reply. Actually the height is 0.04 for first, 0.02 for second, 0.01 for third and so on ( as explained by you @8.29 in video). Therefore area of rectangle will be 0.4 for first, 0.2 for second, 0.1 for third and so on. My concern is following As you stated E(X) = Σ x * p(X=x) .. This means E(X) = 10*(probability till 10) + 20 * (probability till 20) + 30 * (probability till 30) and so on. Now probability till 10 means area under curve till 10 which is = 0.4 probability till 20 means area under curve till 20 which is = 0.6 (approx) probability till 30 means area under curve till 30 which is = 0.7 (approx) Therefore E(X) should be (as per my understanding) = 10*0.4 + 20*0.6 + 30*0.7 + .... Please help me to understand this

@statquest Год назад

@@priyankjain9970 Sorry about the typos with the area vs height. That said, the probability of observing an event between 10 and 20 seconds is not the cumulative probability of observing an event between 0 and 10 or between 10 and 20. Your equations use the cumulative probabilities, which is not correct in this situation. To clarify, the expected value is "the probability of observing an event between 0 and 10 seconds times the outcome, 10 (this is just the label for the any event that occurs between 0 and 10 seconds) + the probability of observing an event between 10 and 20 times the outcome, 20 (again this is just the label for any event that occurs between 10 and 20) + the probability of observing an event between 20 and 30 seconds time the outcome 30 + ....