Тёмный

p-hacking: What it is and how to avoid it! 

StatQuest with Josh Starmer
Подписаться 1,3 млн
Просмотров 140 тыс.
50% 1

Опубликовано:

 

1 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 273   
@statquest
@statquest 4 года назад
NOTE: This StatQuest was brought to you, in part, by a generous donation from TRIPLE BAM!!! members: M. Scola, N. Thomson, X. Liu, J. Lombana, A. Doss, A. Takeh, J. Butt. Thank you!!!! Support StatQuest by buying my book The StatQuest Illustrated Guide to Machine Learning or a Study Guide or Merch!!! statquest.org/statquest-store/
@lcoandrade
@lcoandrade 4 года назад
Why the standard is 0.05? What can you say about making changes in the significance value to prove a point? Why 5%?
@RealMathematician21stCentury
@RealMathematician21stCentury 2 года назад
@@lcoandrade He has no answer to your question because p values are really one big hack that don't prove anything at all. Search for: "Cohen - Things I have learned so far."
@dantevale0
@dantevale0 4 года назад
Him: "Imagine there's a virus" Me: Yea I'm there
@statquest
@statquest 4 года назад
Yep...
@gabrielcasagrande9463
@gabrielcasagrande9463 4 года назад
Damn virus :(
@VaneyRio
@VaneyRio 4 года назад
I studied Industrial engineering, however, engineering tends to have a way too "practical" approach in Mexico, so my statistics teachers were more focused on "formula -> result -> verdict" rather than a true explanation of why we do things. This series has helped me a lot in understanding the whole movement behind some of the everyday uses i give to my statistics knowledge. That's truly worthy of a triple BAM! Thank you for sharing your knowledge.
@statquest
@statquest 4 года назад
Muchas gracias!!! :)
@evianpullman7647
@evianpullman7647 3 года назад
@@statquest just Found this site, videos today; How do you say 'BAM!" in spanish ??! :)
@statquest
@statquest 3 года назад
@@evianpullman7647 BAM!, Doble BAM!!, y Triple BAM!!! :)
@evianpullman7647
@evianpullman7647 3 года назад
@@statquest :) you the da-man !! later i want to be a Triple Bam member, (when i get my Moneys$$$ stuff in order-repair).
@statquest
@statquest 3 года назад
@@evianpullman7647 Thank you!
@VadikUglov
@VadikUglov 4 года назад
that's the most enthusiasm i've seen anyone show when explaining hypothesis testing))) Thanks for making this thing clear!
@statquest
@statquest 4 года назад
Thank you! :)
@alecvan7143
@alecvan7143 4 года назад
"Instead of feeling great shame" made me laugh out loud hahah
@alecvan7143
@alecvan7143 4 года назад
Great lead-out!
@statquest
@statquest 4 года назад
Thank you very much! :)
@MadhushreeSinha
@MadhushreeSinha 4 года назад
Just can't leave your videos without giving a Like!!!! Thank You for making our life easy with your pedagogy!!
@statquest
@statquest 4 года назад
Thank you very much! :)
4 года назад
I am a great fan of your work. Here is an idea/need to enhance learning: It would be helpful that after a video or a set of videos you give us problems (or homework) to learn better. In further videos, you could give us the answers. Thank you
@HussainAlyousif
@HussainAlyousif 4 года назад
i second that
@statquest
@statquest 4 года назад
That's a great idea, and thank you for supporting StatQuest!!! :)
@chunyuji6162
@chunyuji6162 4 года назад
Thank you for the video! I have a question though. Since we are testing different drugs, why do we need to consider the p-values of other drugs for False Discovery Rate? I thought these were independent events.
@Thyagohills
@Thyagohills 4 года назад
I believe this is a intrinsic problem the way "expediction experiments" works. Even if the drugs are independent, given the multiple testing there is the problem with inflation of type I errors. The same would occur by measuring 100 uncorrelated variables from a homogenous two sample (null). You'd expect false discoveries even if the variables are themselves independent and equall between the groups. Also, the are procedures for FDR based on independent and dependent scenarios, eg., the BH and BY methods.
@yihuang4875
@yihuang4875 4 года назад
LOVE YOUR VIDEOS!!! After discovering one of your videos, I started to review my statistics from the very basic. Here is a little question. I can't figure out how to calculate the p-value from the statistical test that compares the means of two sets of samples, based on the things I learned from the last one 'How to Calculate p-values'. In 'How to Calculate p-values' I understand calculating the p-value of one event can see whether the occurrence of one event is that special (
@statquest
@statquest 4 года назад
It sounds like you are asking how to calculate a p-value for a t-test. There are two ways to do it. The "normal way" (and I don't have a video for that), and the "more flexible linear model" way. I have a video for that. Here's the playlist: ru-vid.com/group/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@yihuang4875
@yihuang4875 4 года назад
@@statquest Wow thank you for the reply!
@christianscodecorner3176
@christianscodecorner3176 3 года назад
*shameless self promotion* 😂 josh could purposely make a whole video on nonsense and I’d shamelessly go watch it
@statquest
@statquest 3 года назад
Bam! Thank you for your support! :)
@davidcmpeterson
@davidcmpeterson 2 года назад
This video starts out tough: "imagine a virus" Damn, can't you come up with something less hard for me to imagine, such as Santa Claus visiting people's houses all on the same night?
@statquest
@statquest 2 года назад
:)
@LiquidBrain
@LiquidBrain 4 года назад
7:09 for "Ohhh Nooo"
@statquest
@statquest 4 года назад
:)
@poiuytrew09876
@poiuytrew09876 4 года назад
I think I'm in love with you
@statquest
@statquest 4 года назад
BAM! :)
@shaquibrahman77
@shaquibrahman77 4 года назад
We already do😍😍
@Umar_P
@Umar_P 3 года назад
Double BAM😎
@drpkmath12345
@drpkmath12345 4 года назад
Good contents indeed. Sometimes students find it easy to use critical value method but feel kind of difficulty with p value. Your content explains clearly!
@statquest
@statquest 4 года назад
Awesome! Thank you very much. :)
@joaobacelo3154
@joaobacelo3154 10 месяцев назад
Congratulations and thank you for the videos! I appreciate the clarity, simplicity, and humor in your content. While you probably don't need more compliments, I wanted to express how much I enjoy it and how I find it to be a brilliant way to convey key concepts in statistics. BAM!
@statquest
@statquest 10 месяцев назад
Thank you very much!
@chrisvaccaro229
@chrisvaccaro229 4 года назад
In the hood they say 'snitches get stitches' In the lab we say 'p-hackers are backward'
@statquest
@statquest 4 года назад
:)
@CarlosFerreira-zg1rp
@CarlosFerreira-zg1rp 4 года назад
imagine there was a virus... well, I guess that it is quite easy to imagine that ...
@statquest
@statquest 4 года назад
Yep....
@anapeleteirovigil3199
@anapeleteirovigil3199 4 месяца назад
So tempted to send this video to my former PhD director... 🙄 Thank you for your immensely valuable content!!! Learning more on your channel than in class 😅
@statquest
@statquest 4 месяца назад
bam! :)
@LittleLightCZ
@LittleLightCZ 2 года назад
"I don't do p-value hacking, I raise my alpha level. I'm José Mourinho."
@statquest
@statquest 2 года назад
:)
@Y--H
@Y--H 4 года назад
I almost understand the first example about drug A to Z, except for one difficulty. If we assume that the distributions for each drug are independent, from the perspective of drug Z, there’s only one test. Then this will not have the multiple testing problem.
@statquest
@statquest 4 года назад
If something has a 5% chance of happening, then 5% of the time we do it, we'll get a false positive. So now imagine we did 100 tests for all 27 drugs. That means 5 of the tests for drug A are false positives, 5 of the tests for drug B are false positives, etc. So we have 100 rows of results, with mixtures of true negatives and false positives. When we test each drug 1 time, we get one of those rows, and it's very likely that it will include at least one false positive.
@Thyagohills
@Thyagohills 4 года назад
The P-hacking culture is so endemic that sometimes I seriously get so demotivated that I wonder leaving the field entirely. It's depressing. Thanks, Josh. Sorry about the vent.
@statquest
@statquest 4 года назад
Noted!
@lomuscko
@lomuscko 3 года назад
Os bolsonaristas precisam assistir esse vídeo. Tem muito médico se achando "pesquisador" no Brasil
@statquest
@statquest 3 года назад
Obrigado! :)
@Ennocb
@Ennocb 3 года назад
Really appreciate the videos. Helps me a great deal understanding what's behind the formulas and what their numbers mean.
@statquest
@statquest 3 года назад
Thanks!
@manishakadri5052
@manishakadri5052 3 года назад
Such a lucid explanation!!!.. You make stats so much more fun!!. Thank you
@statquest
@statquest 3 года назад
Thank you! :)
@gaming_ayrus
@gaming_ayrus 3 года назад
To the 3 guys who disliked.... no bam
@statquest
@statquest 3 года назад
:)
@jiaxuanchen8652
@jiaxuanchen8652 3 года назад
You are my god! I love you
@statquest
@statquest 3 года назад
Thanks!
@danielbaena4691
@danielbaena4691 4 года назад
Waiting anxiously the Power Analysis video!!! As always Josh, thank you
@statquest
@statquest 4 года назад
ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-VX_M3tIyiYk.html
@zipfslaw3771
@zipfslaw3771 3 года назад
You are amazing. I have taught statistics, but learn something from your videos every time!
@taotaotan5671
@taotaotan5671 4 года назад
I just finished my RNA-seq homework...
@statquest
@statquest 4 года назад
BAM! :)
@JerryWho49
@JerryWho49 4 года назад
Another great statquest, thanks. Here‘s a xkcd version of your drug testing: xkcd.com/882/
@statquest
@statquest 4 года назад
A classic xkcd! :)
@crywulf44
@crywulf44 4 года назад
If each drug trial was a different separate study though, wouldn't it mean the p-value false positive would still stand? Because the authors are acting independently? So you get different results depending on if one person trialled or the drugs or if one person trialled each drug.
@statquest
@statquest 4 года назад
The way the p-value threshold is specified to be 0.05, we expect 5% false positives - and, overall, this is a good trade off of cost/benefit/risk/reward. So, if different people are doing tests, sure they will will get false positives from time to time. But the goal is to limit them within our own study, so we adjust the p-values with FDR.
@joanahernandez9084
@joanahernandez9084 4 года назад
Hi Statquest great video! I watched this video and your power analysis video and I have one quick question. If you already collected preliminary data, can you perform a power analysis on that data or would that be considered p-hacking as well? Thanks
@statquest
@statquest 4 года назад
You should use your preliminary data for doing a power analysis.
@amiralikhatib3650
@amiralikhatib3650 3 года назад
Thanks a lot for your insightful tutorial That was really useful :)
@statquest
@statquest 3 года назад
Glad it was helpful!
@statisticaldemystic6817
@statisticaldemystic6817 4 года назад
This is a great non-technical explanation.
@statquest
@statquest 4 года назад
Thanks! :)
@Christopher-gv4eh
@Christopher-gv4eh 16 дней назад
In the first example going through drugs A...Z you dealt with rejecting the null hypothesis that theres no difference between drug Z and not taking a drug. The next example deals with taking samples from a distribution of people who did not take a drug. At 4:30 what does it mean the null hypothesis is the two groups came from the same distribution? What does it mean that one would come from a different distribution?... Would it correspond to a different distribution of people who took medication instead? Im doing my best to follow along but this one has me confused. P.s thank you for the wonderful videos!
@statquest
@statquest 16 дней назад
It might be helpful to first watch my video on how to calculate p-values: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-JQc3yx0-Q9E.html and my video on the null hypothesis: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0oc49DyA3hU.html However, to answer your questions, we're trying to decide if there is a statistically significant difference in the two samples (note: there will always be differences in the two samples due to little random things that happen, so just seeing small differences isn't enough to decide that there is a profound difference between the two groups). If the difference is statistically significant, that suggests there is a true difference - and maybe one drug works better than another. However, the statistical test is not perfect. Sometimes the statistical test will say there is a difference, even when we know there isn't one (because both groups of measurements came from the same "distribution" of people - for example, people that all took the same drug, or people that didn't take any drug at all). When the test fails, it suggests that there is a profound difference in the groups, even though we know there isn't one.
@nicolasstegmann5855
@nicolasstegmann5855 3 месяца назад
I did not get one part: I understand why if I add 1 more measurement for each group I would be p-hacking. But does that mean that if I get more measurements than the one previously estipulated sample size by the power analysis I would be p-hacking? so if my power analysis says i need 100 samples, but by the time i did the test i already had 500, what then? What happens if instead of adding 1 more measurement for each group, im adding 100 more? Is this p-hacking as well?
@statquest
@statquest 3 месяца назад
It's better to either use the first set of data for a power analysis and start over.
@katielui131
@katielui131 6 месяцев назад
Hi I still don’t understand at 1:47 section, why is it p-hacking for rejecting the null hypothesis for drug Z? They aren’t like the latter example where they’re taking the samples from the same population? I presumed we call it a different population for each drug the samples are put on
@statquest
@statquest 6 месяцев назад
Say like we have 100 drugs and none of them are effective - they are all variations on a sugar pill. Then we test all of them like in this example. Well, due to random sampling of people taking the pill, there's a good chance that in one of those tests, a bunch of health people take one pill and a bunch of sick people take the other. This will make it look like there is a difference between the two pills even though it's not the pill, it's just the people that take the pill. Thus, this is p-hacking because it looks like the pills are different.
@phyzix_phyzix
@phyzix_phyzix Год назад
P-hacking is how you get a high volume of papers published and cited. It's incentivized in all academic fields. Just try asking a researcher if you could take a look at their data and they will ghost you.
@statquest
@statquest Год назад
noted
@bzaruk
@bzaruk 2 года назад
How do you calculate a p-value based on two means (from two samples) - isn't a p-value calculated between a number and a distribution? what is the exact process of using the two means parameters to calculate a p-value on a distribution?
@statquest
@statquest 2 года назад
We can calculate p-values for two means using something called a t-test. For details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-nk2CQITm_eo.html and then ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-NF5_btOaCig.html
@MrSreior
@MrSreior 7 месяцев назад
I love you
@statquest
@statquest 7 месяцев назад
:)
@SchergeSatans
@SchergeSatans 6 месяцев назад
I don't see how the first example with the different drugs is problematic, given that the drugs actually are different and that we draw a new sample every time.
@statquest
@statquest 6 месяцев назад
What if the only difference is the size, and they are all sugar pills?
@remia5
@remia5 4 года назад
Great work Josh! Could you also do a video on how to compute the effect size? Would effect size be a better replacement for p-value?
@statquest
@statquest 4 года назад
Here's a video that shows one way to compute effect size: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-VX_M3tIyiYk.html
@Dupamine
@Dupamine 4 года назад
What if i get a p value of 0.06 using a sample of 29 and do a power analysis AFTER that and power analysis says that i need a sample size of 30 so i add one more observation to my data. Would this be okay to do?
@statquest
@statquest 4 года назад
You should start from scratch.
@opencardyugioh
@opencardyugioh Год назад
Can you nuance what is meant by one more measurement and adding more data. I assume they are the same in this context regarding " a subtle form of p-hacking' segment
@statquest
@statquest Год назад
What time point, minutes and seconds, are you asking about?
@rajatchopra1411
@rajatchopra1411 Год назад
8:45 "Don't cherry pick your data and only do tests that look good" politicians: I'm gonna ignore what you just said!
@statquest
@statquest Год назад
:)
@unicornsandrainbowsandchic2336
@unicornsandrainbowsandchic2336 4 года назад
I feel like this video needs way more views...
@statquest
@statquest 4 года назад
Thank you! :)
@woowooNeedsFaith
@woowooNeedsFaith Год назад
Question: Why the drug-free experiments are repeated? Why not reuse one set of drug-free results, or even better, aggregate all the drug-free results into one single set?
@statquest
@statquest Год назад
Even if you re-used the drug-free results, the results would be the same - there's a chance that, due to random noise, you'll get results that look very extreme.
@alexandersmith6140
@alexandersmith6140 2 года назад
"...and compare these two means and get a p-value = 0.86." Wait, how do you compare two means to get a p-value? On the Statistics Fundamentals playlist I'm working through, it's only been explained so far how to determine the p-value of a certain event happening (like a Brazilian woman being a certain height). It hasn't yet explained how I can understand the statistical significance of two sample means being a given distance apart from each other if they're hypothesized to belong to the same overall population.
@statquest
@statquest 2 года назад
Sorry about that. In this case, I was just assuming that knowing the concept of a p-value would be enough to understand what was going on. However, if you'd like to know more about how to compare means, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-nk2CQITm_eo.html and then ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-NF5_btOaCig.html
@alexandersmith6140
@alexandersmith6140 2 года назад
@@statquest Thanks Josh! Wow, what a response time.
@ilikegeorgiabutiveonlybeen6705
@ilikegeorgiabutiveonlybeen6705 6 месяцев назад
what if i repeat all of the experiments with other samples a number of times, e.g. i do 20 experiments for each drug total, and then i try to look if anything changed is that real bad (cause it isnt guaranteed that i dont just get 20 false positives out of nowhere) or might that be useful in some obscure scenario
@statquest
@statquest 6 месяцев назад
You'll get more power if you combine all of the data together into a single experiment, so I'm not sure why you would do it another way.
@ilikegeorgiabutiveonlybeen6705
@ilikegeorgiabutiveonlybeen6705 6 месяцев назад
@@statquest oh okay thanks i didnt think of that
@ricardoafonso7563
@ricardoafonso7563 3 года назад
. a nice lecture.. .
@statquest
@statquest 3 года назад
Thanks!
@chendong2197
@chendong2197 4 года назад
Could you please explain how you calculated the p-value in these comparison examples?
@statquest
@statquest 4 года назад
Because these were just examples, I think I just made up values that seemed reasonable.
@chendong2197
@chendong2197 4 года назад
@@statquest I am not trying to justify the correctness of the number. Just wondering how to calculate the p-value in cases like the example. Do not recall you mentioned in previous videos. Thanks.
@statquest
@statquest 4 года назад
@@chendong2197 With this type of data we would use a t-test. I explain these in my playlist on linear models(which sound much scarier than they really are): ru-vid.com/group/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@mohammadhassanjafari481
@mohammadhassanjafari481 3 года назад
It was great 🏅🏅
@statquest
@statquest 3 года назад
Thank you! :)
@luqingren4997
@luqingren4997 4 года назад
Congratulate myself being a member :)
@statquest
@statquest 4 года назад
BAM! Thank you very much for your support! :)
@sumitbagga2504
@sumitbagga2504 4 года назад
Hahaha .. Never seen a teacher like you.. Hey Wait a minute...... I didnt see you.. Baam!!!!!
@statquest
@statquest 4 года назад
Nice one! :)
@sumitbagga2504
@sumitbagga2504 4 года назад
@@statquest I am learning from your videos to become data scientist :)
@poojakunte6865
@poojakunte6865 3 года назад
Do you take PhD students??
@statquest
@statquest 3 года назад
I wish! :)
@poojakunte6865
@poojakunte6865 3 года назад
@@statquest :(
@benjamin_markus
@benjamin_markus 3 года назад
I don't get how the first example with the drugs is p-hacking. After all you're testing different drugs, not doing the same test again.
@statquest
@statquest 3 года назад
If I took the same, exact drug, and gave it 27 labels, Drug A through Drug Z, then then tested each one, would it look like p-hacking?
@benjamin_markus
@benjamin_markus 3 года назад
​@@statquest In that case of course it would, but I find applying this logic to the case of truly different drugs confusing as that is surely not p-hacking if the tested drugs are not the same.
@christianbolt5761
@christianbolt5761 3 года назад
Cherry picking your study is like cherry picking your data.
@statquest
@statquest 3 года назад
Yep.
@viduradias4646
@viduradias4646 2 года назад
Thank you!
@statquest
@statquest 2 года назад
You're welcome!
@alexlee3511
@alexlee3511 Год назад
i am a little bit confusted that, as you mentioned previously, p
@statquest
@statquest Год назад
When we compute a p-value, we select a "null distribution", which is usually "no difference". If the null hypothesis is correct, and there is no difference, then a p-value threshold of significance of 0.05 means that there is a 5% chance we'll get a false positive.
@alexlee3511
@alexlee3511 Год назад
@@statquest so can i understand in this way: say the p-value is 0.05, when our observed data lays on the 5% of null distribution, there is also 5% of chance we incorrectly reject the null hypothesis if we decide the rejection, but we are still 95% confident to reject the null hypothesis.
@statquest
@statquest Год назад
@@alexlee3511 The p-value = 0.05 means that there's a 5% chance that the null distribution (no difference) can generate the observed data or something more extreme. To be honest, I'm not super comfortable interpreting 1 - 0.05 (alpha) outside of the context of a confidence interval.
@samiulsaeef2076
@samiulsaeef2076 3 года назад
One thing that confuses me, if the threshold 0.05 means there will be 5% false positives (bogus tests in this example), then how to link p-value (say we got 0.02) to false-positive? Is there anything like 2% false positive involved with p-value? I think that's not the case. I watched all of your p-value videos and it's been made clear. But the definition of "threshold" confuses me. I hope p-value has nothing to do with false-positives. Correct me if I am wrong.
@statquest
@statquest 3 года назад
The p-value tell us the probability that the null hypothesis ( i.e.. that there is no difference in the drugs ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-0oc49DyA3hU.html ) will give us a differences as extreme or more extreme than what we observed. When we choose a p-value threshold to make a decision, like all p-values < 0.05 will cause us to reject the null hypothesis that there is no difference in the drugs, then there is a 5% chance that the null hypothesis could generate a result as extreme or more extreme as what we observed, and thus, there is a 5% that we will incorrectly reject the null hypothesis and conclude that there is a difference between the two drugs when there is no difference.
@yijingwang7308
@yijingwang7308 Год назад
Great video! I have a question about drug testing. Is it better to perform an experiment that tests all of the drugs at once instead of testing each candidate one by one? Besides, if I want to test 6 candidates versus control, each of them has three tech replicates, and I performed the same experiment three times. Should I use the mean of the three tech replicates from the three times of experiments to calculate the p-value? Or are there better solutions for the experiential design? Look forward to your reply!
@statquest
@statquest Год назад
If you do all the tests at the same time, that's called an ANOVA, and you can do that - it will tell you if something is different or not. However, it won't tell you which test was different. In order to determine which test was different, you'll still have to do all the pairwise tests, so it might not be better. So it depends on what you want to do.
@yijingwang7308
@yijingwang7308 Год назад
@@statquest Many thanks for your reply! What about the technical replicates? If I have 6 technical replicates for each biological replicate, should I use the mean of technical replicate to stand for each biological replicate?
@statquest
@statquest Год назад
@@yijingwang7308 If you have technical replicates, then you probably want to use a "general linear model." Don't worry, I've got you covered here: ru-vid.com/group/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@kenny102000
@kenny102000 2 года назад
Thank you so much for explaining probability concepts with such good pedagogy (and positive energy). These lessons are usually soporiphic. Question: why do we need to re-test new samples of recovery WITHOUT any drug every time? Can't we just compare the results of each drug to the results of "no drug" ? As in, in practice, we would only have 1 placebo group against which we compare the results of a given drug, wouldn't we?
@statquest
@statquest 2 года назад
Sure, if you did all the tests at the same time, you could use the same placebo group for all the tests, however, that doesn't change the possibility of false positives.
@SebastianHay
@SebastianHay 2 года назад
If you get p-value of less than 0.05 and you set sample size beforehand, are there ways of further interrogating the data to ensure you haven't just got that unlikely but possible '1 in 20' result without accidentally reverse p-hacking (not sure of the right term but taking more samples, or sets of samples, so now the p-value is greater than 0.05)?
@statquest
@statquest 2 года назад
You can use a lower p-value threshold and you can adjust for multiple testing.
@epsilonzeromusic
@epsilonzeromusic 3 года назад
Can you please explain Why increasing the sample size after measuring the p-value would increase the likelihood of false error? I would think it would be the opposite. If you're adding 2 observations to each set, it's more likely that each of these observations are closer to the distribution mean than far away from them. This would imply that the sample means of both sets are more likely to come closer (p-value would increase) than for them to move apart (p-value would decrease)
@statquest
@statquest 3 года назад
Unfortunately that is not correct. Even if the new points are closer to the mean, there is a 25% chance that the new point for the observations with higher values will be on the high side of the true mean AND the new point for the observations with the lower values will be on the low side of the true mean.
@amalantony8934
@amalantony8934 3 года назад
in the experiment 0:29 - 2:18, what if the drug z was actually a better drug? p hacking would happen if we test the results of drug z itself multiple times until we get a false positive. In example mentioned above we just tested 1 drug relative to other drugs, which gave a p value less than 0.05, so why would that be a false positive?
@statquest
@statquest 3 года назад
If Drug Z was actually a better drug, then the times when we failed to detect a difference would be false negatives and the time when we detected a difference would have been a true positive. So that means it is possible to make both types of mistakes - if Drug Z is not different, then we can have false positives and if Drug Z is different, then we can have false negatives. The good news, is that we can prevent false negatives by doing a power analysis. For details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Rsc5znwR5FA.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-VX_M3tIyiYk.html
@shuklasrajan
@shuklasrajan Год назад
BAM 😂
@statquest
@statquest Год назад
:)
@pesco7790
@pesco7790 3 года назад
Hey Josh, but what if my real difference is smaller than what I considered when calculating the experiment Power? Can't I just keep the experiment running to reach a higher sample size, then reach a good experiment Power for the smaller difference and test it? Would that be p-hacking?
@statquest
@statquest 3 года назад
That would be p-hacking. Instead, use your data as "preliminary data" to estimate the population parameters and use that for a power analysis: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-VX_M3tIyiYk.html
@edwardgrigoryan3982
@edwardgrigoryan3982 Год назад
Bam? No. No bam.
@statquest
@statquest Год назад
:)
@Exhora
@Exhora 4 года назад
Let's say I am developing a new method of analysis and it gives me a p-value. Is it ok to keep changing my method to get a better p-value? Would this be p-hacking as well? Thank you for your videos! It helps me a lot!
@statquest
@statquest 4 года назад
I don't think so. For example, if you ran the wrong test on your data, then realized your mistake and ran the correct one, I don't think you should be penalized for that. However, in your example, you have to make sure that when you modify your test, you're not modifying it just to get a small p-value. Instead you are modifying it in ways that are statistically justifiable in a broad sense.
@scottwais1288
@scottwais1288 2 года назад
If example 1 is p-hacking because the drug z result relies on only one test, how do you view all the social experiments that rely on one test (because it would be too costly to reproduce them)?
@statquest
@statquest 2 года назад
The drug z result relied on us testing every single drug - repeating the process until we got a significant result.
@tricky778
@tricky778 4 года назад
If I'm doing an experiment then a lot of other people must have already done experiments and didn't get a useful result - so eventually the spare budget falls to me making my test dependent just the same as if I'd done all of them. Does that mean we must increase our sample size based on the area under the curve of the economic capacity for investigating the problem?
@statquest
@statquest 4 года назад
No. It is inevitable that some of our results will be false positives. So we just need to focus on our own experiments and do what we can. That said, we can reduce the number of false positives further by using complementary experiments that essentially show the same thing. Like a criminal trial that lots uses pieces of evidence used to convince us that someone is guilty or innocent, our final conclusions should be based on multiple pieces of evidence.
@zaharsadatnajafi8169
@zaharsadatnajafi8169 2 года назад
Thanks for your clear explanation !
@statquest
@statquest 2 года назад
Thank you! :)
@hafidhrendyanto2690
@hafidhrendyanto2690 2 года назад
how do you get the p values of two different sample?
@statquest
@statquest 2 года назад
In this video I used t-tests.
@elvislee7808
@elvislee7808 Год назад
Thank you. You are a awesome teacher!!
@statquest
@statquest Год назад
Wow, thank you!
@malishakapugamage7052
@malishakapugamage7052 4 года назад
Quality Contain. you earn a subscriber
@statquest
@statquest 4 года назад
Thanks!
@afanasnosdaglaz
@afanasnosdaglaz 4 года назад
Thank you A LOT for this video!!! Please, answer my question: which correction method for p-values should I use, if I make several (say, 30) comparisons, but each 2 groups of observations come from different distributions? To be specific, I compare two variants of 5 different enhancer sequences, each in 6 cell lines, using luciferase reporter technique.
@statquest
@statquest 4 года назад
I should have clarified that it doesn't matter if you mix and match distributions. When we do multiple tests, regardless of the distribution, we run the risk of getting false positives. So I would recommend adjusting your p-values with FDR (false discovery rate).
@AbhishekChandraShukla
@AbhishekChandraShukla 10 месяцев назад
😇
@statquest
@statquest 10 месяцев назад
:)
@shashankgpt94
@shashankgpt94 3 года назад
"Bam?" Bam with a question mark has a separate fan base
@statquest
@statquest 3 года назад
Ha! You made me laugh. :)
@shashankgpt94
@shashankgpt94 3 года назад
@@statquest And you make me learn statistics in a fun way 🥺
@TheLPfunnTV
@TheLPfunnTV 2 года назад
No bam.. go to p-hacking jail
@statquest
@statquest 2 года назад
:)
@likelotusflowers
@likelotusflowers 2 года назад
Is it really slowed down 1.5 x?))
@statquest
@statquest 2 года назад
Some people like 2x It really depends on how fluent you are in english.
@DhruvSharmal0W
@DhruvSharmal0W 3 года назад
This should be the stats Bible for genz
@statquest
@statquest 3 года назад
bam!
@camillep3925
@camillep3925 2 года назад
Hi, thank you for your great work ! However, there is something that I have a hard time processing. I understand the last example of p-hacking, but we are expected to do the same experiments 3 independent times. How do we analyze and represent these results without p-hacking ?
@statquest
@statquest 2 года назад
Is is really true that you are expected to do the same experiment 3 times? Or just that you are supposed to have at least 3 biological replicates within a single experiment? If you really are doing the exact same experiment 3 separate times, then you could avoid p-hacking by just requiring all 3 to result in p-values < whatever threshold for significance you are using.
@camillep3925
@camillep3925 2 года назад
@@statquest Thank you !
@jolojololo3221
@jolojololo3221 Год назад
HI, I wonder if remove outliers several times does it p-hacking too?
@statquest
@statquest Год назад
It depends. It could be. It could also be just removing junk data.
@jolojololo3221
@jolojololo3221 Год назад
@@statquest do you have any source for me to be sure not being a p hacker with outliers? Please xs
@statquest
@statquest Год назад
@@jolojololo3221 Here's something that might help: www.reddit.com/r/AskAcademia/comments/bcop6p/removing_outliers_phacking_or_legitimate_practice/
@dakeni8256
@dakeni8256 4 года назад
Amazing! I have a question about Benjamin-Hochberg method. Is this method only applicable to parameter tests such as t test and chi-square test, or is it also applicable to non-parametric tests such as Wilcoxon Signed Rank Test. Thanks a lot.
@statquest
@statquest 4 года назад
It works with all tests.
@dakeni8256
@dakeni8256 4 года назад
@@statquest Thanks a lot
@redcat7467
@redcat7467 3 года назад
Well basically the first 8 second says it's all, the rest is complimentary.
@statquest
@statquest 3 года назад
Bam! :)
@Vextrove
@Vextrove 3 года назад
intro is a banger
@statquest
@statquest 3 года назад
:)
@michaeljbuckley
@michaeljbuckley 2 года назад
Great video
@statquest
@statquest 2 года назад
Thanks!
@michaeljbuckley
@michaeljbuckley 2 года назад
@@statquest just discovered your channel. Do you cover different distributions. What I'm finding is explainions of them in isolation but nothing comparing them, when their used, how to test for them.
@statquest
@statquest 2 года назад
@@michaeljbuckley Unfortunately I don't have those videos either.
@michaeljbuckley
@michaeljbuckley 2 года назад
@@statquest ah well. Still looking forward to go through your channel more.
@deepakmehta1813
@deepakmehta1813 3 года назад
Amazing video, thanks Josh. One question: How did you get the p-value for 2 means ? Is there any video for that ?
@statquest
@statquest 3 года назад
You can do it with something called a 't-test'. There are two ways to do t-tests, the traditional way that is very limited, or you can use linear regression and it opens up all kinds of cool possibilities. If you want to learn how to do it with linear regression, see: ru-vid.com/group/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@deepakmehta1813
@deepakmehta1813 3 года назад
@@statquest Thank you so much.
@AlexanderYap
@AlexanderYap 4 года назад
I had expected that adding more data would have made it less likely to get such false positives. Why does the p value decrease as we add more data in the 2nd example?
@statquest
@statquest 4 года назад
I did some simulations with a normal distribution and when the p-value was between 0.05 and 0.1, adding more observations resulted in a 30% probability of a false positive.
@MrReadale
@MrReadale 4 года назад
@@statquest Excellent video. Thank you! So basically you say that observations in the power analysis can not be included in the final analysis? I have just worked through the book "Medical statistics at a glance". It says that it is OK , and calls it an "intetnal pilot study"?
@shivverma1459
@shivverma1459 3 года назад
I am really confused what do u mean by exact same distribution I mean only the people we are testing are the same the drugs are different therefore if we get different result then why do we assume it as a false positive.
@statquest
@statquest 3 года назад
What time point, minutes and seconds, are you asking about?
@shivverma1459
@shivverma1459 3 года назад
@@statquest 6:22
@shivverma1459
@shivverma1459 3 года назад
I am really confused what different and same distributions mean here yes the people we are testing on are same but the drug is different in each scenario right so it can have different effects
@statquest
@statquest 3 года назад
​@@shivverma1459 This example starts at 2:38, when I say that we are measuring how long it took people to recover and these people did not take any drugs. So there are no drugs to compare - we just collected two groups of 3 people each and compared their recovery times. When we see a significant difference, this is because of a false positive.
@shivverma1459
@shivverma1459 3 года назад
@@statquest ohh now I get it btw love your videos ❤️ love from India.
@nakul___
@nakul___ 4 года назад
Awesome video as always! Would love to hear your thoughts on moving the standard alpha to 0.005 or some alternatives to p-value reporting (surprisals/s-values) in a future video too!
@statquest
@statquest 4 года назад
When we talk about Bayesian stuff, we'll talk about alternatives to the p-value. As for changing the "standard" threshold for significance. That's always been a cost/benefit/risk balance and ends up being field specific. For example, in a lot of medical science, the threshold can be as high as 0.1 because it makes sense in terms of cost/benefit/risk.
@konstantinlevin8651
@konstantinlevin8651 Год назад
Hey Josh, thanks for the videos. I follow the 66daysofdata playlist and am curious if we'll learn the statistical tests you mentioned in the video
@statquest
@statquest Год назад
What time point, minutes and seconds, are you asking about?
@konstantinlevin8651
@konstantinlevin8651 Год назад
@@statquest such as 05:02
@statquest
@statquest Год назад
@@konstantinlevin8651 Yes, I teach how to compare means in this playlist: ru-vid.com/group/PLblh5JKOoLUIzaEkCLIUxQFjPIlapw8nU
@revolutionarydefeatism
@revolutionarydefeatism 4 года назад
We are testing some drugs to see if they change the recovery period distribution, right? Why do you repeat that they are from the same distribution? How we can be sure that the drug didn’t change the recovery period beforehand?
@revolutionarydefeatism
@revolutionarydefeatism 4 года назад
StatQuest with Josh Starmer thank you for your reply. So if I’m not wrong we generated those stochastic numbers with the same distribution as a tool to teach this stuff. But in reality we can’t be sure if they are form the same distribution or not. So we will test them and power analysis will help us to find the right amount of data we need to have a reliable statistical result. Little bam or not a bam at all? :-D
@statquest
@statquest 4 года назад
I'm sorry - my previous response was intended for someone else - I have no idea how that got mixed up. Anyway, here's the deal... The drugs can be from any distribution. However, the worst case scenario is when they both come from the same distribution. This means that any small p-value is a false positive (whereas, if they come from other distributions, then any small p-value is a true positive). So we assume the worst to determine how bad things can really get. Does that make sense?
@revolutionarydefeatism
@revolutionarydefeatism 4 года назад
@@statquest it is crystal clear now. Thank you very much. And I also started to listen to your music at the beginning of the videos. :-D
@woodrowhowe5536
@woodrowhowe5536 4 года назад
Please link the video on the false discovery rate on the video index?
@statquest
@statquest 4 года назад
Thanks for the suggestion. I've added it.
@IstEliCool
@IstEliCool 3 года назад
Oh how I love you
@statquest
@statquest 3 года назад
:)
@alsonyang230
@alsonyang230 Год назад
Big fan of StatQuest, really appreciate the work and humor you put into this. Just a question on the approach, why do we need to have different control groups for each of the drug for comparison? Can we have one control group that doesn't take any drugs, and have it to compared with all the treatment groups that take different types of drugs. Is there a pros and cons of doing this in comparison to what's done in the video?
@statquest
@statquest Год назад
You could do it that way, but you'll still run into the same problem.
@alsonyang230
@alsonyang230 Год назад
@@statquest Thanks for the prompt response. Yeh I agree that's by any mean not the solution to the p-hack problem or even attempt to mitigate it. I was wondering what is the pros and cons of having different control group for each drug vs having one control group for all. Under what scenario should i pick one approach over another?
@statquest
@statquest Год назад
@@alsonyang230 It all just depends. You want to control for all things other than the drug or "treatment" or whatever you are testing. So, if you can do everything all at the same time, you could just collect one control sample. But if there are changes (like time of year, or location), then you need to get extra controls.
@alsonyang230
@alsonyang230 Год назад
@@statquest Ah yeah, that makes a lot of sense. Thanks for the explanation!
@chrisvaccaro229
@chrisvaccaro229 4 года назад
Sweet! I was waiting for someone to post a good p-hacking tutorial! Now all my findings will be statistically significant! oh, I'm just kidding!
@statquest
@statquest 4 года назад
;)
@bijayapoudel3014
@bijayapoudel3014 3 года назад
Great explanation!!:)
@statquest
@statquest 3 года назад
Thanks! 😃
@bijayapoudel3014
@bijayapoudel3014 3 года назад
@@statquest DOUBLE BAM!!:)
@pakersmuch3705
@pakersmuch3705 4 года назад
love it
@statquest
@statquest 4 года назад
:)
@Commando303X
@Commando303X 4 года назад
Good videos; absolutely terrible intros.
@statquest
@statquest 4 года назад
Just skip the first 30 seconds and you'll be good to go! :)
@Commando303X
@Commando303X 4 года назад
@@statquest, statistically, I am unlikely to do that. Thank you.
@statquest
@statquest 4 года назад
OK.
@sdfsdf5313
@sdfsdf5313 4 года назад
​@@statquest The intros and the way they are represented grows on you as u progress through more content. Absolutely loving it. :) Statistically speaking there is no significant evidence to reject the null hypothesis: " Intros are awesome :)"
@statquest
@statquest 4 года назад
@@sdfsdf5313 BAM!
Далее
False Discovery Rates, FDR, clearly explained
18:27
Просмотров 215 тыс.
Power Analysis, Clearly Explained!!!
16:45
Просмотров 312 тыс.
Дикий Бармалей разозлил всех!
01:00
Лиса🦊 УЖЕ НА ВСЕХ ПЛОЩАДКАХ!
00:24
荧光棒的最佳玩法UP+#short #angel #clown
00:18
Вопрос Ребром - Серго
43:16
Просмотров 1 млн
How to calculate p-values
25:15
Просмотров 416 тыс.
AdaBoost, Clearly Explained
20:54
Просмотров 760 тыс.
P-Hacking: Crash Course Statistics #30
11:02
Просмотров 139 тыс.
p-values: What they are and how to interpret them
11:21
How to choose an appropriate statistical test
18:36
Просмотров 136 тыс.
Bootstrapping Main Ideas!!!
9:27
Просмотров 458 тыс.
Дикий Бармалей разозлил всех!
01:00