Тёмный
Econometrics, Causality, and Coding with Dr. HK
Econometrics, Causality, and Coding with Dr. HK
Econometrics, Causality, and Coding with Dr. HK
Подписаться
This channel covers all things data and econometrics, from a bunch of angles: how to do research design, how to do econometrics, and how to code it all up! You'll find videos that go along with my class material, with my causality book The Effect, and many other things.
How Software Defaults Threaten Science
16:31
10 месяцев назад
Комментарии
@juanherr19
@juanherr19 День назад
Awesome video - thank you!
@arunbhat1784
@arunbhat1784 3 дня назад
Really good explanation
@arunbhat7890
@arunbhat7890 5 дней назад
informative course really helped me
@arunbhat7890
@arunbhat7890 5 дней назад
really good explainatition
@arunbhat7890
@arunbhat7890 5 дней назад
really good explaination
@eustruria
@eustruria 9 дней назад
I feel that many people do this, including myself, because it is the easiest way to introduce your paper topic without jumping straight to "In this paper, I study the effects of X on Y". I remember middle school teachers asked us to structure our introduction paragraph in a very specific way (hook->context->"thesis statement").
@NickHuntingtonKlein
@NickHuntingtonKlein 9 дней назад
@@eustruria a hook is a good idea! But it should be a hook about what's actually in the paper.
@ahmadseifi9092
@ahmadseifi9092 9 дней назад
Thanks for your video. I have a question, if we have 3 distinct shocks which apply in different times, eg: shock 1 in 2010, 2011, ..., 2018, and shock 2 in 2010, 2011, ..., 2018, and also shock 3 in these time span; what is the best method to apply for our purpose?
@NickHuntingtonKlein
@NickHuntingtonKlein 9 дней назад
@@ahmadseifi9092 as in you have multiple different treatments, each of which is staggered? I might recommend a wooldridge approach, fully interacting all the treatments
@ahmadseifi9092
@ahmadseifi9092 9 дней назад
@@NickHuntingtonKlein yes, all of them are stagered, and moreover some of my observtions are parts of several treatment groups. can you send me a link of the mentioned method? thank you very much.
@NickHuntingtonKlein
@NickHuntingtonKlein 9 дней назад
@@ahmadseifi9092 it's the wooldridge method I mention in the video and in the linked chapter. I'll also point out I'm not certain this will extend to multiple overlapping treatments but it's a place to start looking
@pawekopytek7596
@pawekopytek7596 18 дней назад
It was 666 likes, sorry I ruined it 😉
@rohanbhatt1146
@rohanbhatt1146 19 дней назад
This honestly helped me so much, I can't thank you enough!
@charlesripley2505
@charlesripley2505 20 дней назад
Totally true on Python. It clearly is not made for stats! I prefer R/Rstudio. They were made for stats. Sorry, but Stata is dated--way dated. It is also incredibly expensive and, thus, anti-poor people here and in developing countries!
@eustruria
@eustruria 20 дней назад
I really appreciate how straight forward and simplistic your explanations are! You explain things so well so that people like me, who have only have an introductory level econometrics background, can understand and learn these concepts so easily! Thank you so much!!!!
@CharlieDickens-j4m
@CharlieDickens-j4m 28 дней назад
Hi mate. Not sure if you are still using platform, however, I have a question for you. For a job application assignment I need to us diff-in-diff to estimate the impact of a job training programme on log(employment). For context, in this hypothetical scenario, local government either roll out the programme (treatment group) or not (control group). I have data on log(employment) and log(population) for 3 periods. Two pre treatment periods (parallel trends assumption holds) and one post treatment period. My current regression looks like Δlog(employment) = time fixed effects + group fixed effects + DiD dummy + log(emp) + ε. Does it make sense to add an interaction term for each local government and the DID dummy to capture the heterogeneity in treatment effects for each treated unit?
@NickHuntingtonKlein
@NickHuntingtonKlein 28 дней назад
I would probably make log(employment) the dependent variable and leave it off the right side, but your version works too I think. I'm assuming treatment is applied to all cases at the same time - if treatment is staggered then the typical TWFE setup doesn't work. If you add an interaction term for each local government, then in effect what you're doing is running a separate DID for each local government vs. the same control group. There's nothing inherently wrong with this, although keep in mind that your effective sample size for each of the DIDs will be much smaller (i.e. do you have enough sample to actually be able to look at each effect separately) and you now have *many* parallel trends assumptions to investigate - one for each treated government - rather than just one.
@CharlieDickens-j4m
@CharlieDickens-j4m 20 дней назад
@@NickHuntingtonKlein Many thanks for the response, Nick. L(emp) on the right side was a typo and meant to be l(pop). And yes, its a one-off treatment, so static TWFE DiD works. I ended up dropping the interaction term based on what you said. However, I controlled for treatment and post treatment, and the DiD control was the interaction between the two. Anything inherently wrong with this?
@NickHuntingtonKlein
@NickHuntingtonKlein 20 дней назад
@@CharlieDickens-j4m seems fine to me! Standard DID
@CharlieDickens-j4m
@CharlieDickens-j4m 20 дней назад
@@NickHuntingtonKlein Legend. Thanks for the taking the time out of your day to reply!
@Luger0312
@Luger0312 28 дней назад
Dear Mr. Huntington-Klein, I have only today received my copy of your book "The Effect". I'm currently working on my final paper for my Bachelor's degree in sociology. I found your book when I was trying to get a better understanding of how to interpret squared terms and their respective base variables in a regression model. I read but a few paragraphs of the online book and decided to order a copy. Books about statistics aren't usually very "beginner friendly" but I found your approach of explaining things with little implied knowledge and, where knowledge needs to be implied, guiding to the respective chapter, very encouraging. I'll have to rewrite certain paragraphs a little. Not because my understanding and explanation was wrong, but because by the help of your book, I'll be able to describe certain contents more precise while also more easily understandable. After reading a few subchapters, I am already certain your book will not only be a great help for my paper, but will also be a go to lecture when trying to solve a problem or gain better understanding of statistical methods in the future. Or even for spare time lecture - something I wouldn't expect to say about a book discussion statistics. So, long story short, what I want to say is thank you.
@NickHuntingtonKlein
@NickHuntingtonKlein 28 дней назад
@@Luger0312 you're welcome! Glad you've enjoyed the book so far, and hope the rest is as helpful.
@rizkydarmawan6540
@rizkydarmawan6540 28 дней назад
Thank you for this. I needed a refresher on this particular subject and this video is one of the best there is. Simple and intuitive with good practical examples 👍
@atiyaabdulkarim716
@atiyaabdulkarim716 29 дней назад
Hi, Do you add co-variates in your model? If so, do you put covariates measured at baseline?
@NickHuntingtonKlein
@NickHuntingtonKlein 29 дней назад
Since in ITS the treatment applies to everyone at the same time, baseline-measured covariates can't be a source of confounding, so adding covariates won't solve any causal inference issue. But you can add them to improve predictive power and reduce noise. Covariates that hcange over time might in some cases be necessary to solve causal inference issues, but you need to be careful with these to avoid issues like post-treatment bias.
@aza6513
@aza6513 Месяц назад
its just econ do that hahah they just rediscover something and name it like new.
@NickHuntingtonKlein
@NickHuntingtonKlein Месяц назад
@@aza6513 wait till you hear about machine learning
@MB-sh9ur
@MB-sh9ur Месяц назад
Hi. I want to discuss with you regarding a project I'm doing. Can you please tell me how can I connect with you?
@bisiadeyemo3082
@bisiadeyemo3082 Месяц назад
The video will be a lot better if you explain the coefficients instead glancing through it. Even your book, you barely explain the coefficients. It’s not just you, other books on advanced methods do not do a very good job of explaining the coefficients.
@NickHuntingtonKlein
@NickHuntingtonKlein Месяц назад
What about the section titled "How do we interpret the results of this regression once we have estimated it?"
@bisiadeyemo3082
@bisiadeyemo3082 Месяц назад
@@NickHuntingtonKlein in journal articles, you are presented with only the coefficients and most students typically have problems explaining it. This is by far more important than the inner workings because most statistical software will do the calculations for you. I read through your instrumental variable section, and you barely explain the results of the first stage regressions. Similar with DID and regression discontinuity. This is not just you, several of the books that I have read, tend to pay little attention to the explaining coefficients
@donoiskandar6820
@donoiskandar6820 Месяц назад
Hi Nick. I have just started following your causality series, and it really is wonderful. I just wonder, in the case of fixed effect, does it could unintentionally control the collider and thus make a bias? let's say for the height vs basket ability in the NBA example (assuming there is height variation in each year, while there is no variation in NBA status across years)
@NickHuntingtonKlein
@NickHuntingtonKlein Месяц назад
Thank you! It would be an unusual case where fixed effects introduce collider bias, since for that to be the case, one of those fixed-over-time characteristics would have to be caused by two separate variable-over-time characteristics. It's certainly possible that there is a collider bias problem in the analysis anyway that the fixed effects don't solve, though. In the NBA example, there's already a collider bias problem having ot do with the ability to get into the NBA, and fixed effects would not resolve the issue.
@donoiskandar6820
@donoiskandar6820 29 дней назад
@@NickHuntingtonKlein Thank you for your enlightening answer Nick!
@donoiskandar6820
@donoiskandar6820 Месяц назад
60% go when sick, 10% go when not sick. thus 60 - 10 = 50% of going to the doctor is explained by being sick. are you assuming that the total sample of all people who are not being sick and those who are already being sick is identical?
@NickHuntingtonKlein
@NickHuntingtonKlein Месяц назад
Nope! It still works with uneven sample sizes.
@guzwall
@guzwall Месяц назад
Great explanation!!
@Seitanistin
@Seitanistin Месяц назад
Thanks from a german socioeconomy student! :)
@user-hp6in6vz3m
@user-hp6in6vz3m Месяц назад
Thank you for the amazing explanation! Your textbook is also super helpful. I would like to ask a question regarding other ways to estimate staggered treatment effects. I've come across some papers using the matching x classic DID, which looks like these: 1. matching x DID (transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_t + Treat_i * Post_t 2. matching x DID (without transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t I was wondering what are the pros and cons with these 2 models, and how good they are compared to TWFE staggered DID?
@NickHuntingtonKlein
@NickHuntingtonKlein Месяц назад
As long as you force it to drop the year just before treatment as the reference year, both of those should give the same result. However, both are wrong under staggered treatment so should only be used if treatment occurs all at the same time. Glad you like the book and videos!
@user-hp6in6vz3m
@user-hp6in6vz3m Месяц назад
@@NickHuntingtonKlein Hi thank you for the prompt reply!! I realized these 2 models are essentially doing the same thing. Could you also explain why this model cannot be used in a staggered treatment? Or any reference material that I can address to? I thought this can be an alternative to TWFE DID. Is it because it is not comparing the early treated with late treated?
@NickHuntingtonKlein
@NickHuntingtonKlein Месяц назад
@@user-hp6in6vz3m as for why this doesn't work, and other estimators, I'd recommend this video! Or the corresponding section of my book, 18.2.5 www.theeffectbook.net/ch-DifferenceinDifference.html#how-the-pros-do-it-2
@user-hp6in6vz3m
@user-hp6in6vz3m Месяц назад
@@NickHuntingtonKlein Ooh so Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t this method is the same with Two-way fixed effects DID? Could I understand it that when we fix some units and some year, the problem you mentioned for two way fixed effects would happen (like the forbidden comparison)?
@NickHuntingtonKlein
@NickHuntingtonKlein Месяц назад
@@user-hp6in6vz3m the model you posted with by-year effects is not the same as the regular TWFE model that is just before/after, but neither model works under staggered treatment.
@saadsarwar7162
@saadsarwar7162 2 месяца назад
Man! you are amaging. I have been searching videos on youtube to learn for a long time. But none of them was enough for me to understand properly. Thank You!!
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
Hi Dr.HK, could I understand that if the controls influence both x and y, then include them or not in the model will influence the coefficient on x and also adjusted R square. But if they only influence y not influence x, then they won't influence the coefficient on x and adjusted R square, but will influence R square. Thank you for taking the time.
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
Incorrect. Adjusted r square will be affected either way.
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
@@NickHuntingtonKlein Thank you very much!
@aleksandermolak5885
@aleksandermolak5885 2 месяца назад
As a person with hierarchical modeling background, I suffer major confusion every time I hear "fixed effects" in the other meaning 🙈
@Matthew-eb3di
@Matthew-eb3di 2 месяца назад
This is the best explanation and animation I’ve ever seen for multiple regression and control variables! 🎉🤩
@dany84ct
@dany84ct 2 месяца назад
What do you think about c#?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
I know they use it in computational finance sometimes. Wouldn't you have to program everything yourself from scratch though for most econometric applications? If you're willing to do that then c# would be fine but also so would any general purpose programming language.
@tareqalmahmud621
@tareqalmahmud621 2 месяца назад
could not find function "linearHypothesis"
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
It's in the car package, so install car and then library(car).
@zahradidarali5804
@zahradidarali5804 2 месяца назад
Great videos! Video would be even better if you spoke slower :)
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
Because of your videos, I think I understand fixed effects now. I appreciate your excellent explanations and your replies for our questions!
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
Great explanation for the R^2! Thanks a lot! If the R^2 is very low, does that mean there might be omitted variables?
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
Hi Dr.HK, in this video, you said "The omitted variable bias part, the ordinary squares will assign the effect of Z to being the effect of X". Could you explain why "the ordinary squares will assign", what does that mean? Thank you!
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
Meaning that if you regress Y on X alone, the coefficient on X will include both the effect of X and some part of the effect of Z. The statistical method can't separate out the Z effect since it doesn't know about Z, so that gets lumped into the X coefficient.
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
Hi Dr. HK, If I controlled person fixed effects, could I understand if the person never change cities, the coefficient on the city actually doesn't capture the influence of this person? The coefficient only captures the influence of person who move cities? Thank you!
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
Correct
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
"by changing a variable by making a wet spot in front of my store. That’s all there needs to be for there to be a causal relationship. Me changing this variable, making the floor wet changes, the distribution of another increases, the probability that somebody will fall even if nobody actually did. " -----it's really good to know, thank you!
@rkmofficial202
@rkmofficial202 2 месяца назад
Thank you very much, Dr. for sharing the link. Very interesting and knowledgeable.
@wangguan1548
@wangguan1548 2 месяца назад
Hey, Dr, HK. If it's a mediator factor, should we control it in a regression?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
Generally no.
@wangguan1548
@wangguan1548 2 месяца назад
@@NickHuntingtonKlein Thanks for response!!
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
This is really helpful for me. Could I understand demean is actually remove the influence of something we cannot observe but is unique to that individual? Thank you for your sharing and for your reply.
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
That's right. It removes anything about that individual that is constant over time. It will not remove things unique to that individuals that also change over time.
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
@@NickHuntingtonKlein Thank you so much! I appreciate your quick reply!
@qinghuafeng1705
@qinghuafeng1705 2 месяца назад
Really helpful, thank you very much! Now I know why controlling for drugs is a bad control. But could you let me know why you call confounding factor as "back door"? Why it is "back"? Thank you.
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
Thanks! The back door terminology comes from the fact that it's an alternate way you can get from cause to effect. On a causal diagram, you can follow arrows pointing from cause to effect (for example cause -> outcome) - those are front doors. But there are two ways to get out of your house - the front door or a back door! Back doors are alternate way to get from cause to effect (for example cause <- confounder -> outcome)
@ski34able
@ski34able 2 месяца назад
Very helpful thanks!
@lb.basnet
@lb.basnet 2 месяца назад
nice explanation
@wangguan1548
@wangguan1548 2 месяца назад
Fantastic explanation!!!
@wangguan1548
@wangguan1548 2 месяца назад
Hey, Dr, HK, I really love your videos and the way your expression. Btw, I found in this video, there may be a mistake/ mistype: at time 4.46, when X=1, isn't it the change in X should be beta1+2beta2, and likewise when X=5, the change should be beta1+10 beta2 ?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
Thanks! And yes a 2 got dropped.
@blaisepascal3905
@blaisepascal3905 2 месяца назад
I learned most of these languages and in the following order: Stata - Python - R - Julia And by far R and Julia are the best! (At least for me)
@ifeyinwaumeokeke2571
@ifeyinwaumeokeke2571 2 месяца назад
Hi Thanks very much for this video. I would love to know the package you installed before library(margins). Thank you. I am using version 4.3.1
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
The other two packages I loaded before margins were "wooldridge" (which I just used to get data) and "jtools" (which I used for regression tables, although these days I'd more likely use modelsummary)
@haraldurkarlsson1147
@haraldurkarlsson1147 2 месяца назад
As NRC fellow at NASA JSC I studied Martian Meteorites for my postdoc. Some of the samples were onsite while others had to be obtained from natural history museums or individual researchers. Although I was fairly successful in obtaining the stones I desired some refused to send or share samples (for reasons typically not given). What type of missing is that?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
If you think the decision to withhold the stones from you is related to the characteristics of the stones, that's MNAR. if the choice to withhold is random, it's MCAR. can't be MAR because you're missing the entire observation instead of just some of the values. Yours is more a case of sample selection than missing data (which usually implies you have some variables for your observations but not other variables)
@haraldurkarlsson1147
@haraldurkarlsson1147 2 месяца назад
@@NickHuntingtonKlein Interesting. These samples are typically rare and thus curators or individuals do not want to part with a big sample used for destructive analysis (different from simple loans). The material after the analysis had less or no value for some future work. Another reason is that museum curators (I used to be one) are simply not willing to part with rare samples. Finally, some may simply want to do the work you are proposing themselves.
@haraldurkarlsson1147
@haraldurkarlsson1147 2 месяца назад
Very interesting. Now there are some missings in the card data. Fathers' ed is missing about 23% and IQ about 32%. Is that of concern in the modelling?
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
Yes that can be a concern and may be enough to warrant an approach like multiple imputation
@haraldurkarlsson1147
@haraldurkarlsson1147 2 месяца назад
@@NickHuntingtonKlein What is considered an "acceptable" loss percentage wise? This is tricky stuff. I know that major issues have arisen due to improper imputation (e.g. Rogoff at Harvard if I recall correctly).
@NickHuntingtonKlein
@NickHuntingtonKlein 2 месяца назад
@@haraldurkarlsson1147 was Rogoff a multiple imputation issue? I thought it was something else. There's not really a specific cutoff (cutoffs that guide your inference or analysis in statistics are almost always a bad idea or at least subpar). But if there's a small amount of missing data (say in the like 5% range), then it likely won't cause a huge issue. More and at the very least you need to start thinking about why it's missing
@haraldurkarlsson1147
@haraldurkarlsson1147 2 месяца назад
@@NickHuntingtonKlein I think you are right in regards to RR (Reinhart and Rogoff). I may have mistaken omission of countries in the study by RR as the result of imputation. In the paper criticizing the results (Herndon, Ash and Pollin) it is stated that "The omitted countries are selected alphabetically. It is clear from the spreadsheet itself that these are random exclusions." (section 3.2 Spreadingsheet coding error). That is what caught my eye. However, it does show the effect of selective use of data and its dangers. Thanks for your reply.
@sebastionheitzmann3233
@sebastionheitzmann3233 3 месяца назад
Corellation is not causation, still a good decision😂 Cracked me up :)
@oumardiallo7292
@oumardiallo7292 3 месяца назад
Very neat!
@ginaelconstelmpoubou1986
@ginaelconstelmpoubou1986 3 месяца назад
Great