Unbelievable, every second of this 19-minute, 13-second video was characterized by a very clear sequence of sentences that meshed together, without any mistake, to form a very crafted, well-understood explanation of all the ideas together. For instance, what makes the difference between most who explain and those like you is found here at 15:38: "Because when I'm in this loop, this is going to run for a long time when p is large, because it's barely ever gonna see two loses in a row." I was following along but it didn't completely click in a concrete sense until you grounded the logic with saying "because it's barely ever gonna see two loses in a row." You founded all explanations with some set of "extra words" that made everything you say click with minimal rewinding and mental pounding to understand them, and I find that two types of teachers/explainers/communicators exist in this world, those who consistently don't employ those extra words and those who consistently do incorporate them. I might nickname that feature the "connecting back to concrete-land" or "those extra few click words" whose presence or absence in speech make the difference between it clicking to an audience and it not. And the absence of such extra words, when multiplied over the many chunks of speech to hundreds of someone's utterances over a video or lecture, can culminate to a superficial understanding where you are left in abstract land but have an itchy, uneven feeling of why the circle around the explanation didn't make a complete closure of concreteness or a filling understanding. Those extra words, the attention to detail, and ensuring your learners really following along all the way to the end and closure of the circle of the explanation you intend to communicate really makes the difference in reaching a wider audience and making critical concepts well-understood. Thank you for the video!!
Another cool thing about your approach is that your "toy example" follows closely from Baye's original paper in which discussed tossing balls onto a flat surface and then locating an unknown point more and more closely. Thank again for these great videos!
Superb! What is so remarkable in your presentation is the balance that you bring to the challenge. Each approach has strengths and weaknesses and it is essential to keep those in mind at the start. We are currently weathering a huge storm in which someone advocates for "their" method and says explicitly (or strongly hints) that every other approach is garbage or "not science". That makes your channel a big breath of fresh air! Thank you!
I use MC for some fabrication process variation simulations in my research, and I must say this is one of the better explanations for the simulation. Good job. Couple of things: 1) The rules, emerge from understanding the problem. In my case the behavior of the material, possible probability distribution of faults (from case studies of real fabrication imperfections). You use MC when you know what the real world solution would look like but you need a more approachable, repeatable, or approximate approach to the solution (Eg: I cannot just fab the chip every time I want to see the impact of fabrication imperfections). 2) not everything has an analytical solution, or more correctly saying, the solution is very complex, having to consider various factors to reach the "true" solution. It's in scenarios like this where the application of MC truly shines.
I've wondered about the Monte Carlo method for a long time, but never needed to figure out what it was. Now I know what it is, and see some uses in my robotics hobby. Thanks!
Your videos are so clear and simple and I believe you will be the top youtube in data science concepts in the near future. One good suggestion will be to provide a snapshot of the things you wrote on board.
Great video as always! Backgammon and poker best play situations cannot be solved via closed form and therefore are mostly solved by MC (named rollout and solver respectively) Now that you taught us MC you have to teach MCMC (Markov Chain Monte Carlo) next!
THANKS ! Despite the many desierprperties of the aforementioned methodolist, the Monte Carlo method is still the most general and reliable sgohasgic method
Find the probability p(A) of the event A- that when throwing two "fair" dice, the sum of the numbers obtained was 7. Create a program for the simulation of n series (n>30) of 25 such switching and selection of balls, which at the output gives Xi - the number of realizations of event A in the i-th series (i=1,...,n). Consider the obtained sequence X1,...,Xn as a random sample from the Bin(25,p), 0
I'm weak at probability but the example is realistic and clearly explain both the deterministic way and the numeric way. I've tried for other values of p and it works. Thank you for make this video.
00:03 Monte Carlo method is a practical concept in data science, illustrated with examples and discussing advantages and disadvantages. 01:41 Monte Carlo simulation using 1 million points to calculate the probability of darts landing in a circle. 03:26 Using Monte Carlo Method to estimate the number of points inside a circle 05:10 Expected number of rounds to play until losing two times in a row 06:52 Analyzing the expected number of rounds in a game with three simple cases. 08:44 Expected number of rounds calculation using algebra 10:34 Simulation of game rounds using Monte Carlo method. 12:23 Monte Carlo simulation provides a relatively easy way to get close to the answer for difficult problems. 14:02 Monte Carlo methods allow non-experts to solve problems and can be easier than analytical methods. 15:48 Monte Carlo methods can take a really long time and become impractical to use. 17:21 Monte Carlo methods are not interpretable. 18:52 Choosing between simulation and analytic approach in Monte Carlo Methods
Good question; a grid would definitely work too but it would have to be pretty fine grained. Otherwise, you might get close to the true answer but not exactly, even with millions of samples. And, as a grid gets more and more fine grained, you approach the random sampling shown in this video.
This is and old question, but anyway, as far as I understand it, making a grid has a steeper dependency on the number of dimensions on your problem - for a very regular 2D area like on the circle example it's probably just about as good, but for weird n-dimensional shapes building a grid becomes too expensive, while MC may still hold. Also there are concepts like importance sampling that give more of an edge to MC methods. The circle in a square example is good to get an intuition of the basic idea of the method, but you shouldn't judge the method base on it because it's not really a practical application.
# Example of How many dots will fall randomely inside a circle in python (Monte Carlo Method) # Required Packages import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns ###------------------------------------------------------- # Lists of xs,ys and their squred values that fall inside the circle xs=[] ys=[] xsq=[] ysq=[] # Lists of xs,ys and their squred values that fall inside the whole square xs1=[] ys1=[] xsq1=[] ysq1=[] #n, circle= 10**3,0 n = 10**3 circle = 0 # the number of the dots that will fall inside the circle for i in range(n): # to get float random values between (-1,1) use uniform (1-,1) x=np.random.uniform(-1,1) y=np.random.uniform(-1,1) #x=np.random.random() #y=np.random.random() xs1.append(x) ys1.append(y) xsq1.append(float(x**2)) ysq1.append(float(y**2)) if float(x**2)+ float(y**2)
if u also iterate the p values from 0 to 1, you may have the interpration, at least the graph similiar to the closed form's. Aggree that it takes some more time😊
Thanks a million: the way you make any complex concept so simple and intuitive🎉 would you pls make a video on the proof of belman equation (action_value and state_value belman equation)? Cause It's the basic of reinforcement learning
I paused the video and coded (on my phone), with p being 50/50. I got 13 which I didn't expect. Than you said 6 and I had a feeling. 6 is nearly the half. So I checked the sign and yes I used
Very nice intuition of the concept, thank you! Would you mind sharing what your background is a bit? I'm studying artificial intelligence at uni, and we often get very hands-on courses which lack some deeper understanding of why we're doing things, and how things really work. Would you have any suggestion on what to do in order to build a base knowledge that would allow for a deeper understanding of things? I know it's not an easy question, but surely any insight will be useful!
Pure , very clear , simple and easy to understand type of explanation , to the point expiration Overall best video for understand the concept Thanks sir for providing such a great video
Just use monte carlo when you need to multiply two confidence intervals, or add up to confidence intervals with non-normal distribution. This is what it is developed for during the Manhattan Project in the 40's and frankly I don't see the point in forcing it onto anything else.
if p is higher, then rand() < p is more often true and that means nloss is more likely to reach 2 faster than the case where p is lower. so if p is higher, the while loop will end faster right.
I can't grasp how you came up with the recursive equation for the expected number of rounds. I understand that the expected value is the total sum of (expected outcome value * probability of each outcome) but I can't move forward from that. I wish I can understand your reasoning for the recursive equation. Could you explain or perhaps do a video on that?
I'll explain it the way I understood it 😉 What really renews our expectation is a success on a draw (round). We get kind of a refill (since the count of faillures is reset) then we capitalize on the rounds till the success and we average (weighted average) that with the expected number of rounds for the rest of the row. Let's call the later e'. As he explained, we have 3 cases. Two cases to success: 1- Succes in the first place (round). 2- Failure then success With the third case (two consecutive failures), there's no remaining expectation since the row ends (e'=0) . 3- Failure then Failure. So for: 1- We capitalize on 1 round with probability p and also get the remaining expected number of rounds e' with the same probability p (Yeah, it is the condition for getting this e'). So the updated expected value (weighted average) for this case is p*1 + p*e' = (1+e')*p 2- We capitalize on 2 rounds (one round for failure and one for success) with probability (1-p )*p. So the weighted average is (1-p )*p*2 + (1-p )*p*e" = (2+e") * (1-p)*p. Note that this time we call the remaining expected number e". 3- For the third case, we know we have 2 rounds with probability (1-p)*(1-p) and nothing after that (end of row). So the weighted average is 2*(1-p )* (1-p ) Now, the definition of expectation always consider limits when numbers (draws) approche infinity. In that case e' and e" will all converge to the same number e. We can then replace e' and e" with e. Hence the recursive formula.
Is there a markov chain model for this area calculation that converges quicker. Something that actively samples the inside of the circle preferentially.
One question. I think you said that the Monte Carlo is good for situations where you'd perhaps "not know geometry basics". However, at 4:10 , are we not using geometry basics to reason this algorithm?
A MODEL IS JUST YOUR FORMULA. MONTE CARLO IS JUST PUTTING RANDOM NUMBERS INTO YOUR MODEL. ADDING RANDOM NUMBERS IS JUST TO TEST YOUR FORMULA(MODEL). USING RANDOM NUMBERS TO SEE HOW UNPREDICTABLE(RANDOM EVENTS) EFFECT YOUR MODEL(FORMULA). JUST VERY SIMPLE ALGEBRA.
I think you could have just said the probability of losing 2x in a row is p^2, so is (1-p^2) probability of not losing 2x in a row. So 3 games out of every 4, but since the set was split, normalizing each set would give 6 games. Now I would want to run the simulation on both varying odds of losing and varying number of times in a row. I.e. 3x in a row w/ 50% w/l, id guess the EV would be 21?
Hello. I belive that the expected value of rounds, E(r), that you calculated analiticly is wrong. I think that the number of rounds until you get second (inclusive) "loose" in a row is a Negative Binomial with r=2 [Y-NB(r=2, p)] and the number of rounds equals Y+2. As Y+2 is a linear combination of Y then E(Y+2)=E(Y)+2. As a Y is Negative Binomial then E(Y)=r*p/(1-p) and E(Y+2)=2+r*p/(1-p). So, for example for p=0.3, the expected number of rounds would be number E(Y+2)=2+2*0.3/0.7=2.857 and for p=0.5, E(Y+2)=4.000. Thanks.
What inspired you to use such an example as the one you have for example 2? Can you think of any real world application? The one comes to mind is like for a product before it breaks down, so you can determine how much is the cost of warranty.
This is a great question: why is this useful? Indeed, your example is great. Another might be if you're looking for a job and you know the probability of getting a job is p, then on average, how many interviews do you need to go through before getting a job.
@ritvikmath At 11:05 if rand(0,1) is less than the probability of a win (p) then it means we lost the round and "nloss" must be incremented by 1. Am I missing something?
I can see what is causing this confusion. Assume that the RAND() function returns a number in [0,1] (uniformly distributed) at random. Say the probability of winning is, p = 0.9. In such a case, we will actually be playing a large number of rounds before losing. On the other hand, if p = 0.1, our game won't last that long as we might encounter two failures much earlier. So, in the first case, to simulate the abovementioned behavior, whenever RAND() returns a number less than 0.9, we will consider it as a success. Hope this clarifies!
I don't disagree with any of your comments, but I feel like you might be over-emphasizing the detractors. I would argue that there plenty of problems that cannot be solved using first principles. For example, analysis of circuit performance with ideal components is easy enough, but if you want the circuit to contain realistic components (values that follow distibutions) it may be too impractical to solve. This is because every component value can have interactions with every other value. Additionally, closed form solutions tend not to handle discontinuities well. For such problems MC are the bomb. Also, if you highlight that interpretability of an MC solution is an issue, I think it is only fair to acknowledge that the interpretability of the methodology is a credit. One of MC's strengths is that it tends not to hide its assumptions. If you are modeling a physical process, it is generally easy to see how each step in the process was simulated. In an peer review environment, this can save a lot bickering. In this same vain, MC can also be quick to modify in response to criticism. Closed form solutions... Not so much. In your 2nd example you returned the mean, which is what the problem asked for, but you automatically get distribution data, too. Add a bootstrap and you have a confidence estimate. If your MC has multiple inputs, you can wrap it and perform a sensitivity analysis. This buys back some of what was lost buy not having an tidy equation. Given the ease vs. power ratio, I wish engineers would learn MC as one of their basic tools, but this has not been my experience. My comments aside, thanks for covering this.
Hey, I really appreciate all your feedback! I'm also a huge believer in applying MC to problems if possible, but do feel like even in the realm of MC it is important to understand some of the first principles since it could inform why your simulation is taking a long time to run among other things. I think your comments on MC use in engineering is very valuable to others since personally I didn't study engineering. I also think your comment on getting a full distribution with MC is brilliant and is something I wish I'd have highlighted if I was able to go back and re-make this video. Thanks!
What a coincidence I just saw the same concept in the form of problem in joma tech channel where he wants us to find the value of pi which he tells its his one of his favourite interview problem . Do check it out guys
The first example can be easily optimized in python with a vectorized implementation: import numpy as np import pandas as pd df = pd.DataFrame({'x': np.random.uniform(-1,1,1000000), 'y': np.random.uniform(-1,1,1000000)}) df ['inCircle'] = df['x'] **2 + df['y']**2 6s for Khaleds implementation without print and visualization) while the vectorized version of the second example is only sligthly faster than Lukas' version: def experiment(p=0.5): r = 0 losses = 0 while losses != 2: r+=1 if np.random.random()
Hi ritvik great video, I had one doubt regarding code for the 2 consecutive losses simulation, with just checking for the counter for losses !=2 aren’t we just considering cases of not more than 2 losses in the whole experiment instead of checking for not more than 2 “consecutive” losses?
Hello rivik thanks for the video, can you clarify why you used if random number is less than probability of winning ( 0.5 if we use that), shouldn't it be if random number is greater than or equal to the probability of wining. Why should the number be lower and not higher than the probability of winning ( for second example)
Good question, I get confused by that sometimes too. Think of it like this; if the probability of winning is high like 0.9, and then you generate a random number between 0 and 1, you want to check if it is less than 0.9 because that will happen with a 90% chance. On the other hand, if the prob of winning is low like 0.1, then you want to check if the random number is less than 0.1 since that happens with a 10% chance.
Too bad you didn’t you didn’t push the circle in square example… it actually converges quite slowly… it is very sensitive to poor random number generators and numerical resolution issues , not as big problem as it was in past but still slows up, recall this problem requires random vectors. This is a great example to explore Monte Carlo methods, exposing the good and bad in the technique…. And I seen too many experts either get wrong or just brush over it …
Thanks for your great video! For the pi calculation one, may I ask what is the difference between MCMC and rejection sampling? I think MCMC also throws some points outside the circle. The only difference I could think about is rejection sampling we got some samples and MCMC is calculating some numbers based on those samples.
Hi Ritvik, In the second example, you mentioned that simulation time increases with the increase in value of p. I think the code anyways has to run all the lines for max 1M times, then how the simulation time is changing..given we are not adding any new operation.
Hey that's a good question. Indeed, we always do 1M rounds, but when the probability of winning a round is very very high (around 1), then each of the 1M games will last many many rounds since we need to see 2 losses in a row to move on to the next game. However, if the probability of winning a round is very very low (around 0), then almost surely the game will end in 2 rounds so our simulation runs much faster.