This channel covers all things data and econometrics, from a bunch of angles: how to do research design, how to do econometrics, and how to code it all up! You'll find videos that go along with my class material, with my causality book The Effect, and many other things.
I feel that many people do this, including myself, because it is the easiest way to introduce your paper topic without jumping straight to "In this paper, I study the effects of X on Y". I remember middle school teachers asked us to structure our introduction paragraph in a very specific way (hook->context->"thesis statement").
Thanks for your video. I have a question, if we have 3 distinct shocks which apply in different times, eg: shock 1 in 2010, 2011, ..., 2018, and shock 2 in 2010, 2011, ..., 2018, and also shock 3 in these time span; what is the best method to apply for our purpose?
@@ahmadseifi9092 as in you have multiple different treatments, each of which is staggered? I might recommend a wooldridge approach, fully interacting all the treatments
@@NickHuntingtonKlein yes, all of them are stagered, and moreover some of my observtions are parts of several treatment groups. can you send me a link of the mentioned method? thank you very much.
@@ahmadseifi9092 it's the wooldridge method I mention in the video and in the linked chapter. I'll also point out I'm not certain this will extend to multiple overlapping treatments but it's a place to start looking
Totally true on Python. It clearly is not made for stats! I prefer R/Rstudio. They were made for stats. Sorry, but Stata is dated--way dated. It is also incredibly expensive and, thus, anti-poor people here and in developing countries!
I really appreciate how straight forward and simplistic your explanations are! You explain things so well so that people like me, who have only have an introductory level econometrics background, can understand and learn these concepts so easily! Thank you so much!!!!
Hi mate. Not sure if you are still using platform, however, I have a question for you. For a job application assignment I need to us diff-in-diff to estimate the impact of a job training programme on log(employment). For context, in this hypothetical scenario, local government either roll out the programme (treatment group) or not (control group). I have data on log(employment) and log(population) for 3 periods. Two pre treatment periods (parallel trends assumption holds) and one post treatment period. My current regression looks like Δlog(employment) = time fixed effects + group fixed effects + DiD dummy + log(emp) + ε. Does it make sense to add an interaction term for each local government and the DID dummy to capture the heterogeneity in treatment effects for each treated unit?
I would probably make log(employment) the dependent variable and leave it off the right side, but your version works too I think. I'm assuming treatment is applied to all cases at the same time - if treatment is staggered then the typical TWFE setup doesn't work. If you add an interaction term for each local government, then in effect what you're doing is running a separate DID for each local government vs. the same control group. There's nothing inherently wrong with this, although keep in mind that your effective sample size for each of the DIDs will be much smaller (i.e. do you have enough sample to actually be able to look at each effect separately) and you now have *many* parallel trends assumptions to investigate - one for each treated government - rather than just one.
@@NickHuntingtonKlein Many thanks for the response, Nick. L(emp) on the right side was a typo and meant to be l(pop). And yes, its a one-off treatment, so static TWFE DiD works. I ended up dropping the interaction term based on what you said. However, I controlled for treatment and post treatment, and the DiD control was the interaction between the two. Anything inherently wrong with this?
Dear Mr. Huntington-Klein, I have only today received my copy of your book "The Effect". I'm currently working on my final paper for my Bachelor's degree in sociology. I found your book when I was trying to get a better understanding of how to interpret squared terms and their respective base variables in a regression model. I read but a few paragraphs of the online book and decided to order a copy. Books about statistics aren't usually very "beginner friendly" but I found your approach of explaining things with little implied knowledge and, where knowledge needs to be implied, guiding to the respective chapter, very encouraging. I'll have to rewrite certain paragraphs a little. Not because my understanding and explanation was wrong, but because by the help of your book, I'll be able to describe certain contents more precise while also more easily understandable. After reading a few subchapters, I am already certain your book will not only be a great help for my paper, but will also be a go to lecture when trying to solve a problem or gain better understanding of statistical methods in the future. Or even for spare time lecture - something I wouldn't expect to say about a book discussion statistics. So, long story short, what I want to say is thank you.
Thank you for this. I needed a refresher on this particular subject and this video is one of the best there is. Simple and intuitive with good practical examples 👍
Since in ITS the treatment applies to everyone at the same time, baseline-measured covariates can't be a source of confounding, so adding covariates won't solve any causal inference issue. But you can add them to improve predictive power and reduce noise. Covariates that hcange over time might in some cases be necessary to solve causal inference issues, but you need to be careful with these to avoid issues like post-treatment bias.
The video will be a lot better if you explain the coefficients instead glancing through it. Even your book, you barely explain the coefficients. It’s not just you, other books on advanced methods do not do a very good job of explaining the coefficients.
@@NickHuntingtonKlein in journal articles, you are presented with only the coefficients and most students typically have problems explaining it. This is by far more important than the inner workings because most statistical software will do the calculations for you. I read through your instrumental variable section, and you barely explain the results of the first stage regressions. Similar with DID and regression discontinuity. This is not just you, several of the books that I have read, tend to pay little attention to the explaining coefficients
Hi Nick. I have just started following your causality series, and it really is wonderful. I just wonder, in the case of fixed effect, does it could unintentionally control the collider and thus make a bias? let's say for the height vs basket ability in the NBA example (assuming there is height variation in each year, while there is no variation in NBA status across years)
Thank you! It would be an unusual case where fixed effects introduce collider bias, since for that to be the case, one of those fixed-over-time characteristics would have to be caused by two separate variable-over-time characteristics. It's certainly possible that there is a collider bias problem in the analysis anyway that the fixed effects don't solve, though. In the NBA example, there's already a collider bias problem having ot do with the ability to get into the NBA, and fixed effects would not resolve the issue.
60% go when sick, 10% go when not sick. thus 60 - 10 = 50% of going to the doctor is explained by being sick. are you assuming that the total sample of all people who are not being sick and those who are already being sick is identical?
Thank you for the amazing explanation! Your textbook is also super helpful. I would like to ask a question regarding other ways to estimate staggered treatment effects. I've come across some papers using the matching x classic DID, which looks like these: 1. matching x DID (transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_t + Treat_i * Post_t 2. matching x DID (without transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t I was wondering what are the pros and cons with these 2 models, and how good they are compared to TWFE staggered DID?
As long as you force it to drop the year just before treatment as the reference year, both of those should give the same result. However, both are wrong under staggered treatment so should only be used if treatment occurs all at the same time. Glad you like the book and videos!
@@NickHuntingtonKlein Hi thank you for the prompt reply!! I realized these 2 models are essentially doing the same thing. Could you also explain why this model cannot be used in a staggered treatment? Or any reference material that I can address to? I thought this can be an alternative to TWFE DID. Is it because it is not comparing the early treated with late treated?
@@user-hp6in6vz3m as for why this doesn't work, and other estimators, I'd recommend this video! Or the corresponding section of my book, 18.2.5 www.theeffectbook.net/ch-DifferenceinDifference.html#how-the-pros-do-it-2
@@NickHuntingtonKlein Ooh so Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t this method is the same with Two-way fixed effects DID? Could I understand it that when we fix some units and some year, the problem you mentioned for two way fixed effects would happen (like the forbidden comparison)?
@@user-hp6in6vz3m the model you posted with by-year effects is not the same as the regular TWFE model that is just before/after, but neither model works under staggered treatment.
Man! you are amaging. I have been searching videos on youtube to learn for a long time. But none of them was enough for me to understand properly. Thank You!!
Hi Dr.HK, could I understand that if the controls influence both x and y, then include them or not in the model will influence the coefficient on x and also adjusted R square. But if they only influence y not influence x, then they won't influence the coefficient on x and adjusted R square, but will influence R square. Thank you for taking the time.
I know they use it in computational finance sometimes. Wouldn't you have to program everything yourself from scratch though for most econometric applications? If you're willing to do that then c# would be fine but also so would any general purpose programming language.
Hi Dr.HK, in this video, you said "The omitted variable bias part, the ordinary squares will assign the effect of Z to being the effect of X". Could you explain why "the ordinary squares will assign", what does that mean? Thank you!
Meaning that if you regress Y on X alone, the coefficient on X will include both the effect of X and some part of the effect of Z. The statistical method can't separate out the Z effect since it doesn't know about Z, so that gets lumped into the X coefficient.
Hi Dr. HK, If I controlled person fixed effects, could I understand if the person never change cities, the coefficient on the city actually doesn't capture the influence of this person? The coefficient only captures the influence of person who move cities? Thank you!
"by changing a variable by making a wet spot in front of my store. That’s all there needs to be for there to be a causal relationship. Me changing this variable, making the floor wet changes, the distribution of another increases, the probability that somebody will fall even if nobody actually did. " -----it's really good to know, thank you!
This is really helpful for me. Could I understand demean is actually remove the influence of something we cannot observe but is unique to that individual? Thank you for your sharing and for your reply.
That's right. It removes anything about that individual that is constant over time. It will not remove things unique to that individuals that also change over time.
Really helpful, thank you very much! Now I know why controlling for drugs is a bad control. But could you let me know why you call confounding factor as "back door"? Why it is "back"? Thank you.
Thanks! The back door terminology comes from the fact that it's an alternate way you can get from cause to effect. On a causal diagram, you can follow arrows pointing from cause to effect (for example cause -> outcome) - those are front doors. But there are two ways to get out of your house - the front door or a back door! Back doors are alternate way to get from cause to effect (for example cause <- confounder -> outcome)
Hey, Dr, HK, I really love your videos and the way your expression. Btw, I found in this video, there may be a mistake/ mistype: at time 4.46, when X=1, isn't it the change in X should be beta1+2beta2, and likewise when X=5, the change should be beta1+10 beta2 ?
The other two packages I loaded before margins were "wooldridge" (which I just used to get data) and "jtools" (which I used for regression tables, although these days I'd more likely use modelsummary)
As NRC fellow at NASA JSC I studied Martian Meteorites for my postdoc. Some of the samples were onsite while others had to be obtained from natural history museums or individual researchers. Although I was fairly successful in obtaining the stones I desired some refused to send or share samples (for reasons typically not given). What type of missing is that?
If you think the decision to withhold the stones from you is related to the characteristics of the stones, that's MNAR. if the choice to withhold is random, it's MCAR. can't be MAR because you're missing the entire observation instead of just some of the values. Yours is more a case of sample selection than missing data (which usually implies you have some variables for your observations but not other variables)
@@NickHuntingtonKlein Interesting. These samples are typically rare and thus curators or individuals do not want to part with a big sample used for destructive analysis (different from simple loans). The material after the analysis had less or no value for some future work. Another reason is that museum curators (I used to be one) are simply not willing to part with rare samples. Finally, some may simply want to do the work you are proposing themselves.
Very interesting. Now there are some missings in the card data. Fathers' ed is missing about 23% and IQ about 32%. Is that of concern in the modelling?
@@NickHuntingtonKlein What is considered an "acceptable" loss percentage wise? This is tricky stuff. I know that major issues have arisen due to improper imputation (e.g. Rogoff at Harvard if I recall correctly).
@@haraldurkarlsson1147 was Rogoff a multiple imputation issue? I thought it was something else. There's not really a specific cutoff (cutoffs that guide your inference or analysis in statistics are almost always a bad idea or at least subpar). But if there's a small amount of missing data (say in the like 5% range), then it likely won't cause a huge issue. More and at the very least you need to start thinking about why it's missing
@@NickHuntingtonKlein I think you are right in regards to RR (Reinhart and Rogoff). I may have mistaken omission of countries in the study by RR as the result of imputation. In the paper criticizing the results (Herndon, Ash and Pollin) it is stated that "The omitted countries are selected alphabetically. It is clear from the spreadsheet itself that these are random exclusions." (section 3.2 Spreadingsheet coding error). That is what caught my eye. However, it does show the effect of selective use of data and its dangers. Thanks for your reply.