When you mentioned the Cauchy not having a mean it through me for a loop. I had never thought about how the integrals involved in computing an expectation values can just... not converge and the quantity just isn't defined for that distribution.
Yeah, I remember the Cauchy being a recurring frustration in my grad math classes. I had no idea what my professor was talking about. One day I just decided to try a quick simulation on it and then it became clear
Expectation, (and likewise variance and other moments) are just integrals. So if the area under the curve is infinite then you get non-convergence of expected values. Cauchy is just one of the few that happens to not converge in the first moment
Came across the video by accident but will definitely stay for longer. I'm honestly surprised that such a good video with clear explanation of the topic has such a small amount of views, definitely deserve more. Keep up the great work!
Thank you for sharing your knowledge! I'm curious and enthusiastic about data and statistics. I'm currently binge watching your videos on my spare time. Keep it up!
Great video. One note on the part with the supercomputers: I have the feeling that statisticians' code can oftentimes be sped up by quite a bit if one just takes efficiency into account. In a vectorized language like R, you want to avoid looping over vectors and dataframes. For example, your central limit theorem simulation can work with much more samples, for example as follows: ``` N
Thanks! I’m always happy to get some helpful snippets of code, especially if it makes my work faster. My limiting factor here was making the gif from the plots, those took forever to finish 💀
Another tip that i was experimenting with these days is the use of GPU's. Since random numbers can be generated independently that's a embarrassingly parallel task that's greatly accelerated by the use of GPU's ... i've got 5000-7000x speedups in some simple monte carlo simulations, the greater the number of generated numbers the greater the speedup.
Great video. Would love one explaining markov chain monte carlo methods. Another place where assumptions can be sneakily violated is the CLT because it assumes finite variance, so the standard cauchy distribution again gives a counterexample.
@@very-normal Hi, I just wanted to blurt out that in my field of Psychology, Marcov Chain Monte Carlo methods are used a lot for estimation of Item Response Models from Item Response Theory. Just a fun tidbit 🙂. Great video!
Quite insightful content, yes; but let's not underestimate the importance of good visuals on conveying a message (or simply hooking up new audience). Both of which your channel does wonderfully, great video!!
I remember when I was looking at Monte Carlo risk assessments I could just re run the model to get the result I wanted. It was such a flawed way of looking at risk.
Monte Carlo is well knwon as a method to simulate the behavior of particle in physics. The most popular particle transport software is Monte Carlo N Particle (MNCP), developed under The Los Alamos National Laboritory. Yes, it's where the Project Manhattan took place. In fact, the MC simulation was invented to overcome the problem when they create such weapon. It's really fascinating to know a method that originally invented as a weapon development during the war now has an immense broad of application in the world like forecasting, pharmaceutical development, finance, radiation science, etc. What a nice method (except for the comically long computational time)!
For my senior project for my Petroleum Engineering degree, at first i used Monte Carlos simulations to have a proximation of oil and gas production rates using the surrounding offsets (surrounding wells or with similar features) data. It was awesome 👌 I then created a machine learning model for more accurate forecasting:)
@@josephdaquila2479 Yes, not only in quantitative but also in behavioral: It did a good job of showing possible dips (days of no production). No overfitting.
wow sounds good man , at the risk of sounding like a noob (which I am) you basically used a monte carlo simulation to generate data and that data was then used to create a machine learning model?
At my university, student managed portfolio analysts have to code and run Monte Carlo on stocks to see all likely moves in price a stock can make based on volatility. It’s crazy
In that context, is assessing all likely moves/paths a variety of stocks can take meant to arrive at some average state of every stock in the portfolio? Are those paths and averages simulated independently or is there some type of Bayesian interdependence to the path likelihoods?
Very helpful for option pricing. Depends on your assumption around volatility or you can also feed in the market observed price and back out into a market implied volatility. It’s interesting stuff.
Nice video! I have a PhD in physics, and some of my colleagues use MC very extensively in their work. In some fields this is the main if not the only viable simulation method.
As a software engineering student I always wanted to do this in my stats classes. Just build up all the fancy distributions and tests from first principles (and the bog-standard PRNG you get in every language under the sun.) Thought it was a crutch, but nice to see that even you galaxy-brain types like to get computational.
One issue I have with this video is that it first describes statistics as in "We observe reality and make inference about parameters and our models from it and refine it in a feedback loop", which is a completely bayesian framework,, while in practice you then explain frequentist methods (like estimating the power, type 1 or 2 errors, and even the usage of the law of large numbers isn't entirely correct for that type of inference). It's not your fault that statistics is often taught like this, but the philosophical framework you think you've setup is different from the one you actually use, leading to confusion in the long run, like what does it really mean to do a null hypothesis test under the standard framework? If I can rule out something, can I also rule **in** that some parameters are a certain way? At the end of the day, what's the optimal way of deciding things about reality? Standard frequentist tests won't answer that for you. If you think that the methods you're using allow you to say "Oh, given this data (even simulated) the parameter must be within this range" then you've been misled and should rather search for bayesian methods.
I originally intended that the diagram get across the idea that people learn from data and do more experiments, frequentist or Bayesian. But I definitely see that the it’s more properly Bayesian in the way I’ve described it. I thought about talking about Bayesian stuff here but I decided against it. The Frequentist-vs-Bayesian topic deserves its own video I really appreciate the feedback btw, thanks!
@@very-normal Sure! I hope you include something that's not seen in the usual videos, for example what's the likelihood principle. If you want a brief introduction, the likelihood principle essentially says that assuming some model, if you observe some data you should arrive at the same conclusion no matter how you decided to gather it. For example, in two experiments on the same coin I can decide to: - Throw the coin 10 times - Throw the coin until 3 heads in a row appear Say I get the same exact result from these two different experiments, notice how the decision of when to stop doesn't influence the coin's behaviour itself. One would then think that whatever decision you make about the coin (for example determining whether it's a fair coin) should depend only on what you've seen, no matter the experiment, assuming the results were the same. It turns out however that frequentist methods don't respect this and will get you different p-values for the two cases, while bayesian methods respect it! This simplifies a lot of issues when deciding whether to stop midway when carrying an experiment, bayesians can do it without issue, frequentists will get different p-values from that decision.
That’s definitely a key idea to include in a video, I appreciate the introduction. RU-vid needs more Bayesian content, and I’ll be a part of pushing that lol
Really a nice video, you got a new subscriber :) I had a question if it's possible, I know you gave some examples but I would like to understand better the "intermediate" one: I should think that the hypothesis test applied is the same for all the simulations (in which for each one we test 2 models which, for example, estimate an unknown parameter) or what we are actually doing is investigating the power of 2 statistical tests? My doubt arises from the fact that the concept of power is something that usually refers to hypothesis tests and not to a "model"... but perhaps when you talk about power you mean the one that in machine learning models (for example) is usually called Recall (aka sensitive)?
Thanks! I’m glad you liked it! I can try to explain a bit more. The problem I was interested in was estimating a response rate for some hypothetical drug. I was comparing how different models perform in different types of data. Each of these models have different ways of estimating the response rate. Each of the models themselves contain a parameter which represents the response rate I want to estimate, so I can apply the same hypothesis for all of them and see if it was rejected or not based on the data I generate. Some models perform much more poorly when their assumptions aren’t met, and I was trying to quantify how badly their performance (I.e. power, type-I error) was affected compared to ideal situations. And you’re right, power and sensitivity are very similar, but come from slightly different contexts. My feeling is that power is for decision making in hypothesis tests, sensitivity is for prediction tasks. They both condition on there being a true effect or actually having some condition. As an aside, my simulation studies were actually Bayesian in nature, so my work was actually kinda like a Bayesian-frequentist hybrid. But that’s a whole other story lol, hope this helps to clarify
Thanks for the reply! Now it's clearer to me. It would be interesting to understand in detail what type of data you generated and how, what type of clinical trial you simulated and what models you used, etc. You could make a little spin-off of this video, it would be super cool! :)@@very-normal
Thanks! That could be a good video to do! I'll keep a note to myself about this comment. Both my MS and PhD are in Biostatistics. Slightly more applied than Statistics, but a lot of my coursework was with Statistics MS and PhDs. Responding to your other comment, I did a Ph.D because I really wanted the independent research skillset that Ph.Ds have. After going through most of my MS work, I felt like I had practical technical skills, but I felt like I could only do things after being told what to do. I liked the idea of being able to face a problem by myself, figure out a plan to tackle it, and then act on the plan. This isn't specific to Ph.Ds, but getting one gives you dedicated time to develop as an independent researcher, especially after you finish coursework. I think it's perfectly valid to try industry first before going for a PhD. It can give you a better idea of areas you like/hate, and make your time in the Ph.D more focused once you get in. But, you'll definitely feel the drop in pay going from industry to academia lol. Hope this briefly answers your question, I'll think about a more thoughtful response in the meantime.
Hi, Is there any good book that you will recommend me to have a better understanding of statistical modelling ? I had a course in my bachelor in statistics and I've always used some of these tools in other courses. But this explanation seems to be one step further. Is it possible to find literature explaining these concepts more in detail ? Thank you for the video, subscribed
Sure, I can try recommending something. What kind of problems you usually work with? Generally I’d say “Statistical Inference”by Casella Berger since it has solutions to help you check your understanding. I’ve also read a bit of “All of Statistics” by Larry Wasserman, though I have less experience with it. Thanks for watching and subscribing!
Great video. I love it when people break down and opperationalise the statistical process of collecting and testing empirical data. I'll be sharing this with future cognitive science students when supervising their thesis projects.
I dont get one thing: I studied this subject and I was wondering how I effectively uses this. I mean, as the professor told us, we have some amount of variables with a specific distribution. We then try to guess those distributions going to see the frequency of the combination on the values of thevariables, that are dependent on the distriburion. Suppose we have two boolean variables, we that have 4 possible states for the system, TT TF FT FF. We generate random samples for the probabilities and use them to see what value gets assigned to the whole system... but to be able to assign the correct value dont we have to know before the proabability of those variables? Are we simulating those values to check after how many sample we reach the actual proability, that we already knew? I might be missing something... If we have a some kind of system where you input datas and the output is a state than its ok for me to look for the probability of each state using montecarlo. Maybe this has nothing to do with finding THIS distrobution, rather something else?
This may not be the right answer for you, but it’s an educated guess. In this simulation, you have control over the success probability of these two Boolean/Bernoulli variables. These are the true underlying mechanisms. If you generate lots of samples from these two variables, you should see that the sample mean will approach the probability you chose. I think the thing you are missing is some numerical result of interest to you. In what you described, you’re interested in the probability of heads (I think) and that happens to be something you can control. But there may be other values you could be interested in that can be generated by this system. One example is the number of coin flips you need until you get your first heads. This number is of interest to gacha players how many times (on average) they might need to pull until they get what they want. lol relevant example to my life, but hopefully it helps clear things up
@@very-normal Thank you; great video. I didn't even learn Monte Carlo sims, because I thought it was not introduced as a way to build statistical models. I was happy with these "skills in statistics" video. Keep it up please
One thing I don't understand how do we obtain the expected value of a model. Since metric like biase, empirical SE and coverage depend on it. If we have the expected value, why build a model that will then try to obtain a value close to the expected value. Is the expected value something measured empiralcally (like in a wet lab)? In the paper presented, what is being used as the expected value.
heyo, I’ll do my best to answer your question, based on my understanding. I’m not quite sure what you mean by “expected value,” but I’m interpreting it as the average of a numerical result of interest (bias, SE, coverage). These are all functions of the estimated treatment effect, which means that they also have probability distributions. These distributions are all influenced by the underlying data and are too complex to derive analytically. The population means of these distributions would be interpreted as the “true/population” bias, SE, coverage, etc. The authors then estimate these using Monte Carlo simulations to produce a distribution for each of these metrics. The empiric mean of these metrics are listed in the table, and they are implied to be good estimates of their respective expected (“population”) values. In this case, “expected value” of the metric is the theoretical population value we’ll never see; the sample mean is the estimate we can get from data produced by Monte Carlo sims Hope this helps a little bit, it’s a great question
I got lost mid way through. Might help if you kept up with the concrete examples for each concept, like you did with the normal/cauchy plots. Liked your videos style.
im here cous youtube suggestion, I was looking for information about monte carlo simulations to ajust predictions on cycletime in software development. I'd be great if you can make a video about apply montecarlo simulations to solve problems like mine, someone told me that its better than taking the "average" but I dont really know why its better
I’m not really familiar with software dev, but I can take a crack at an answer. My best guess is that software deliverables have several steps that need to be done before they’re completed. Each of these steps takes time, but you don’t know how muchahead of time because that’s how coding is. A Monte Carlo approach here might be to give each “task” a distribution on the time it’ll take, like a Poisson. Then, the sum over all these “time distributions” gives you the total time needed to finish a project. By replicating this many times, you can get a sense of best and worse case scenarios (ie the variance of the total time), in addition to average behavior. This gives you more information to plan from, compared to a simple average of past times you’ve had to complete tasks
Man, it kills me too 💀I love that that the manim library can animate code typing for me, but I can’t get it to line them up. It’s better than watching me type with typos tho, that’s for sure
You’re right, it is going to zero, but outliers in Cauchy variables are common enough that this probably won’t happen. The sample size is already at 10,000, so it’s pretty large already The Law of Large Numbers more technically requires that the deviation away from the population mean stay within some arbitrary range with infinite sample size. Occasional outliers will push this deviation out of this range
@@very-normal the devil lives in your assumptions of what is large 7 bilion people in the world and there is data about each one. How about that for a sample?
I feel like I just Dont get it. I’ve always felt like Monte Carlo is useless. I hoped this video would change my mind, but it didn’t. I’m an actuary, have my FCAS designation, worked in stats for insurance for 9 years. If I can define the parameters to build my simulated dataset, then my thumb is on the scale. I get to decide the parameters of the data that I generate. If I am the one who builds the simulated data, how is that data an appropriate way to measure the power and bias of a model? Your example of Normal vs Cauchy resonates with me. If the underlying process is actually a different type of random than the type I choose to build for my simulation, then any conclusions drawn from the simulated data are unreliable. If i am pricing auto insurance, and I could use Monte Carlo to just decide how often different hypothetical drivers have hypothetical car accidents, then I would have a super power. But if I decide what hypothetical drivers and accidents to create, then my simulation is a reflection of my world view. Am I completely misunderstanding this?
Hey, thanks for watching! I’m sorry it didn’t fully resolve the questions you had. Based on what you said, it sounds like you have extensive statistical experience. I don’t think I can give you a fully satisfactory answer to your questions, but I’ll try! For the case of power, I can simulate datasets with a given treatment effect, null or not, and perform hypothesis tests with some model of choice. It’s only really “appropriate” when the real world matches with the data I simulated, which is practically impossible. But, for a biostatistician, it at least helps plan how many people are needed for a new trial to be successful. It’s more a planning tool than anything. You’re right that the simulation is really a reflection of how you think the world is. Any results you get from doing simulation are only applicable to whatever parameters you used to generate it, so it’s Monte Carlo is pretty limited in that respect. Like you mentioned the Normal vs Cauchy example, you can try to see how a model’s performance is influenced by deviations from ideal conditions. I wouldn’t say that the results would be unreliable, but a reflection of the fact that the model can be negatively influenced by misspecification. On a side note, do you have a textbook you like for actuarial statistics? It’s a totally new field to me, and you’re the first actuary I’ve encountered in my life lol
@@very-normal thanks for the thoughtful reply! I will concede any point in the bio stats domain. Maybe that’s the difference. If you and I need different results, then we can use different tools, and it’s about using the right tool for the job. I can appreciate the massive task there is to build a power test for bio stats, and that it would be cost prohibitive to do that all with organic tests. Thank you for sharing some knowledge about it :) If you want a nice intro to actuarial work (at least the branch that I know best!) I would point you towards the Casualty Actuarial Society (CAS) Intro to Ratemaking textbook by Werner and Modlin. It’s a free pdf available online. If the first chapter seems interesting, then you might enjoy more about the actuarial worldview! If the first chapter isn’t for you, then feel free to leave it alone. We are stats people who really dig into insurance problems.
Ai will replace every statistician on earth can I get credit for being one of the people pointing this out as soon as I heard about ai a year ago? Yes I know so many did but I just want credit because so many didn't understand the impact ai would have, and since I work in statistics, I immediately knew...
Spotted many errors in this video unfortunately. The worst one is not ubderstanding the Law of Large Numbers, it doesnt refer to a specific statistic like tge mean; but about the assymptotic dist of a sum of random variables.
'Professors are at leading edge of research.' - That's a generalization. Most of the research and writing work is done by multiple PhD and post-docs for the professor. Becoming a professor is more like a political achievement nowadays, rather than an academic one.
@@Yuvraj. when society breeds poor mentalities so that politicians can divide and conquer- it creates reality and illusions. When the richest men on earth don’t have college degrees and have people beneath them that know rocket science but can’t build rockets- there is something with the world
One way you could try is to apply the Central Limit Theorem. You can decide how far away you can tolerate having your sample mean be from the unknown population mean, which will give you a “width” you want. Since more data will shrink the variance of the sample mean, you can solve for a sample size that gives you the width (ie distance from population mean) you want. That being said, this won’t ever tell you exactly what the true population mean is, but it gives you a rough guess
You’re right that the sample mean is random, but you can control how much it will vary around the population mean with greater sample size. It will still vary based on different data, but the variance you solve for will keep it close to the population mean with high probability
Thanks for your interesting article. My intuition said there is something important about this mechanical effect. This model shows how a field represented by a sheet of elastic material under the right initial conditions naturally form quantized energy levels. From there it was easy to form very stable three dimensional structures using a very minimal amount of material. (similar to the way engineers built large light weight space structures) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-wrBsqiE0vG4.htmlsi=waT8lY2iX-wJdjO3 You and your followers might find the quantum-like analog useful in visualize nature properties of fields. I have been trying to describe the “U” shape wave that is produced in my amateur science mechanical model in the video link. I hear if you over-lap all the waves together using Fournier Transforms, it may make a “U” shape or square wave. Can this be correct representation Feynman Path Integrals? In the model, “U” shape waves are produced as the loading increases and just before the wave-like function shifts to the next higher energy level. Your followers might be interested in seeing the load verse deflection graph in white paper found elsewhere on my RU-vid channel. Actually replicating it with a sheet of clear folder plastic and tape and seeing it first hand is worth the effort.