A no-formulas, graphical introduction to Copulas and why they are useful, all using simple Python libraries. Join the discussion: dirtyquant.com Github link to the notebook: github.com/tin...
Hi! Thank you for your work! Let me summarize the process to see if I got it, you start from two sets of data points which you know have some dependence (but not what sort of dependence) your do not know the distribution of each data set by itself either. You look at it and try to figure out the distribution that best approximate its behavior, once you have settled on one particular distribution you use its relevant cdf function to transform both of them to a uniformly distributed observations, then you try figure out the correlation between both uniform datasets, but to give a final twist, you use a function instead of a constant value for correlation, and that function is the copula. Hope I get at least some of it right, thank you!
The production and editing quality is astonishing and very refreshing and in my opinion, rather uncommon for quant finance videos. I hope your channel will grow!
@@dirtyquant - def! So quick Q - can this be used to generate correlated bernoulli variables? Left more details on your main website(in the forum). Would appreciate your thoughts. thanks!
yes, but most ppl will not understand. it's just like there have been real trader channels on youtube that went nowhere. meanwhile fake gurus have millions of subs
Question - you talk about "fitting" the copula near the end of the video. What do you mean by that? In your example there is no fitting, you just plot one CDF vs another CDF.
Hi, so with the gaussian copula there is very little to "fit", the correlation value/correlation matrix is all you get out, but it's not fitted in the traditional sense. With Gaussian all you have is the CDF in the uniform space, and from that extract a correlation. So yes, you are indeed correct, but the method is the same for more parametric copulas, the initial steps are the same. Thanks for watching and commenting. Cheers!
From what I am reading, I am getting an idea of what copulas do, but I am trying to figure how to apply copulas to my problems is the tough part. Plus, I am thinking of publshing my work in a journal that is more measurement science oriented, so I really want to get it right. I am good at math until it comes to proofs and I notice statistics tend use more a mathematician type presentation that say engineering and physics math.
Thank you for your video! It was very informative. I'm in the civil engineering field and doing hydrology studies for my masters thesis, and like you said in the beginning of your video, most textbooks I came across jump straight into the deep mathematical concepts without giving an overview and intuitive understanding of the subject as you have did here. I am currently using R to do this and I haven't learned python very well so do you have the R version of your example and could you share please. Thank you!
Hi mate. Glad you found the video useful. Plenty of places to get dizzy with maths, few places to explain it in plain English. Sorry but I don’t use R. It’s well used in the stats community so I think you should be able to take me example in Python and translate it to R. If you use Jupyter you can run Python and R so that would be handy! Best of luck in your studies mate
Hi, We transformed to Beta distribution to uniform as that becomes the common language that we can use to use copulas. Whatever distribution your data is in, there should "hopefully" be a way to translate it to uniform, that way we have more flexibility in using multivariate distribution. Most Multivariate distributions assume that each of the marginals (the individual pieces of data) follow the same distribution. If Data A and Data B are both beta distribution, then you can go ahead and use the Dirichlet distribution, which is the multivariate Beta. But that if Data A is Beta and Data B is normal? Copulas are the answer. In order to make it happen, we need to translate the data into a format which is common to both of them. That's why uniform comes in. To transform it we just use the Cumulative Distribution Function (CDF) of that distribution. Simple as that. The skill, is to know WHAT distribution our data is in. Hope that helps
Thankyou so much Sir, its really really helpful... They you explained it superb. Well, can you please tell me the about the data you generated? I mean is this R? Or anyother software... Plz help me . I wanna replicate it in the same way as you did . Thankyou
Hi. This is all done in Python, using Jupiter notebooks. You can get my code on my GitHub page, link in the video description. Glad you enjoyed the video :-)
Really thank your excellent video! One small question here: the correlation between gamma and beta distribution changed after the transformation. We can only indirectly control for the correlation by specifying the covariance matrix of the multi-Gaussian distribution. I am wondering if there is a way to directly control this correlation. Thanks again for providing great resources to the internet!
Hi Taotao, I am happy you found the content useful. What I am trying to show is that by using correlation, we are assuming a linear relationship between the 2 datasets. This is where we get the corr of 0.72, while actually it's 0.8. If you are happy with this, and understand that the data has a unique structure, then you can stop there. What copulas allow us to do is to use a universal language, the transformation from whatever distribution to the uniform, so we can apply our special copula (which might have strong tail dependence etc). It's just a tool to make our life easier. If you are happy with just corr between beta and gamma, then happy days!
@@dirtyquant Thanks for replying! I see what you mean, basically Pearson correlation, which assumes a linear relationship, isn’t a good measurement for distributions like gamma and beta. Instead, other measurements, like spearsman correlation is more appropriate to use here. Thanks again! Love this channel!
Your videos are super helpful. Thank you very much for your time! I have been looking for a good book of ML in finance. Do you recomend the one is on your desk?
Hi, I would recommend sebastianraschka.com/books/ It's not Finance specific, but it's really well written and easy to understand each of the models. Advances in Financial Machine Learning by Lopez Del Prado is very hard, and in my view not that practical for most people, but you can take a look if you like.
Hi mate, marginals is just a fancy word for the distribution of the 2 separate datasets. So time can have a certain distribution and money spent a totally different one. These 2 datasets are the marginals.
Intresting video. You assign the scatter plot in the video to be of the Gaussian type - but what about the clustering around (0,0) and (1,1)? Shouldn't the Gaussian copula have a larger grouping at (0.5,0.5)? I am a little confused about that at least.
Not sure what we are trying to model here. My understanding is that we are looking to better capture relationship between the variables time spent and money spent. What we are doing here seems to me is trying to model the Normals that were used to generate the uniform seeds. Why do we care about the correlation returning back to 0.8 when the correlation of 0.72 captures the data (albeit, doesn't catch non-linearity) more accurately? I just don't see what the transformation to uniform gives us. It is simply a long-winded way for calculating the correlations between the CDFs of the two Gaussians we started with. I see that in the real world we observe the time spent and money spent and then we can use that to find the correlation between the CDFs of the Gaussians but I don't see how that is useful. What am I missing?
Thanks so much for your attentive reply. The core reason to use copulas is to allow you to use different distributions for the marginals, i.e. the individual data sets, and once we have those, allow us to have a non gaussian relationship between them if we want. In my example we have a non linear correlation between the 2 variables, time and money, but by identifying the type of distributions in each, and transforming them to uniform, we now have a linear relationship. We are using a gaussian copula here, because that is all we need here. But It could be the case where the dependence might be really strong in the tails, so big spenders spend alot of time on the site, far more than your average, and now a gaussian copula isn't sufficient any more. As you say, "it's a long winded way to get the correlation of CDF", yes indeed, that is copulas. The reason why finding the correlation of the CDFs, is because you then have the true, non linear relationship between them, which allows you to simulate data and find probability distributions. so when someone spends 30 mins on the site, how much are they likely to spend. The next step after this is to have many variables, each with their own distributions, and then be able to pick the most suitable copula, or type of relationship between the transformed variables. Hope that clears it up. This is a basic example, without using formulas, as copula maths can be brutal for newcomers.
Would love to understand copulas but couldn't watch this video with the music (and even talking DJ) overwhelming it -- how am I supposed to concentrate on what you're saying? I thought it must be an accident but it was apparently intentional. I guess all your videos are like that? Very strange choice -- unwatchable.
Please, cut out the background music. For people that watch at a speeded up rate it is a nightmare. I'm trying to watch on x2 and just sounds like someone rattling spoons.
Hi, I have looked at the stats and it’s such a small % of people who listen to the video at a sped up rate that I would rather keep my style, with music in the background. But thanks for watching.
@@dirtyquant It's no surprise that people who use sped up function aren't watching, as people that want to watch at x2 will find a video without background music. That's why I was letting you know, as otherwise your content is excellent. With all the online teaching over the past year the speed up function is very widely used now. Most people that I know that use videos as a learning resource only watch videos at sped up rates. It's a life hack that is catching on. My university has gone up to x3 now which is amazing for getting through recorded lectures.
Totally understand, but I feel like videos without music are so damn boring, that I would rather lose the speed up crew than bore the rest to death. Thanks for the feedback. Maybe I will upload 2 versions, one with music and one without, so you can choose.
@@dirtyquant Check the stats for yourself, but it's something like 26% of users now watching in sped up mode in 2019. Up 10% from 2018 and expected to be around 50% about now. I study 'futurism', and this is certainly not going to be something that goes away any time soon, People are even watching dramatic content sped up now after Netflix introduced the feature due to popular demand. It's only music that ruins my sped up life! Anyway, do whatever you want. All the best.
Cool. I will post both and see which one gets more traction. RU-vid should introduce that feature, where you can choose the audio track for people that want 2X. Have a good one!