No video :(

Staggered Treatment in Difference-in-Differences (The Effects, Videos on Causality, Ep 56)

Econometrics, Causality, and Coding with Dr. HK

Подписаться 15 тыс.

Просмотров 14 тыс.

50% 1

Please visit www.theeffectb... to read The Effect online for free, or find links to purchase a physical copy or ebook.
The Effect is a book about research design and causal inference. How can we use data to learn about the world? How can we answer questions about whether X causes Y even if we can't run a randomized experiment? The book covers these things and plenty more. These videos are meant to accompany the book, although they can also be viewed on their own.
This video relates to material found in Chapter 18 of the book.
A version of this video without background music can be found here: • Staggered Treatment in...
All the DID stuff we've done so far has been about treatments that go into place at the same time, whether there's only one treated group or many groups all getting treated at once. But what if that's not the case? What if treatment goes into effect at different times? We used to think two-way fixed effects was fine for that. But oops! It's not. Why not, and what can we do instead?

Опубликовано:

25 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 72

@eustruria Месяц назад

I really appreciate how straight forward and simplistic your explanations are! You explain things so well so that people like me, who have only have an introductory level econometrics background, can understand and learn these concepts so easily! Thank you so much!!!!

@rohanbhatt1146 Месяц назад

This honestly helped me so much, I can't thank you enough!

@brad2349 Год назад

Love your explanation of this! Very clear and concise!

@dr.kingschultz Год назад

As always another amazing video!

@tahabilgic9439 Год назад

Great video, thanks!

@desperatewanderer742 6 месяцев назад

Thank you! You're so helpful. I was wondering why you say that DiD doesn't care that there are different control groups in different time periods, but it's the TWFE forces these treated groups that are already treated to act as if they're control groups... Why is that? not getting the connection there.

@NickHuntingtonKlein 6 месяцев назад

You're welcome! Basically, the fixed effects in twfe estimate the did effect by comparing "variation in treatment" against "no variation in treatment" in a given peirod. But there are two ways for treatment not to vary - starting untreated and staying that way, or being treated and staying that way next period. So the latter ends up getting included in the control group.

@desperatewanderer742 6 месяцев назад

@@NickHuntingtonKlein Thank you! And gotcha, and let's just say we didn't include the TWFEs, then I suppose the did effect will include some of the treated (but staying the same) group's already treated effect...?

@NickHuntingtonKlein 6 месяцев назад

@@desperatewanderer742 correct.

@TheArasmcz 6 месяцев назад

yellow flashes are highly annoying

@jibbersilvan 17 дней назад

Great video! Is there any video or other source where somebody explains this in Rstudios? That would be great!

@NickHuntingtonKlein 17 дней назад

The relevant packages are did (for Callaway and Sant'Anna) and etwfe (for wooldridge)

@svl8389 Год назад

Your book has been a great help to me to understand difference in difference. Could you further add some info in the did section about rollout did design in R. That would be really helpful.

@NickHuntingtonKlein Год назад

Yep! This is already slated for the second edition.

@matterne.i 8 месяцев назад

Great video, and your explanation is crystal clear! Thanks. I'm curious-can a staggered difference-in-differences analysis be used when there are varying treatment intensities among the subjects receiving the treatment?

@NickHuntingtonKlein 8 месяцев назад

Thank you! And to answer your question: It can but needs to specifically account for the continuous nature of the treatment variable. See e.g. arxiv.org/abs/2107.02637

@dunstanburghcastlegolfcour6440 Год назад

Timely clarification on time variant DID issues.... would be great to see how to solve the problem in package like Stata using the new approaches - would get a lot of interest I am sure.

@NickHuntingtonKlein Год назад

I discuss this (and other coding stuff) in the chapter itself. I'd recommend checking out the csdid package.

@dunstanburghcastlegolfcour6440 Год назад

Thank you very much - I have looked at your book (which is fantastic by the way!) and downloaded the package. The instructions/example included in the Stata package provide very useful help in implementation i.e. they have data set examples - which are invaluable in moving towards implementation of modelling. A point you make in your book - that the Stata DID package does not deal with time variant DID - is really worth putting in bold!! I spent a lot of time reading the Stata manual to try and find this out. It seems to dodge the issue entirely which is very frustrating and will no doubt create similar problems for others (i.e. why can't I do a parallel trends test etc.)! From what I can gather CSDID does allow for some kind of parallel trends test, post estimation, which is very handy. A youtube video running through an example of CS/DR DID Stata implementation using a dataset would be a big hit, I am sure.

@yangyijane Год назад

Thank you so much for the explanation. Could you make another video to apply this staggered DID in R?

@NickHuntingtonKlein Год назад

Thanks! There are R code examples and packages listed in the textbook chapter.

@ahmadseifi9092 28 дней назад

Thanks for your video. I have a question, if we have 3 distinct shocks which apply in different times, eg: shock 1 in 2010, 2011, ..., 2018, and shock 2 in 2010, 2011, ..., 2018, and also shock 3 in these time span; what is the best method to apply for our purpose?

@NickHuntingtonKlein 28 дней назад

@@ahmadseifi9092 as in you have multiple different treatments, each of which is staggered? I might recommend a wooldridge approach, fully interacting all the treatments

@ahmadseifi9092 28 дней назад

@@NickHuntingtonKlein yes, all of them are stagered, and moreover some of my observtions are parts of several treatment groups. can you send me a link of the mentioned method? thank you very much.

@NickHuntingtonKlein 28 дней назад

@@ahmadseifi9092 it's the wooldridge method I mention in the video and in the linked chapter. I'll also point out I'm not certain this will extend to multiple overlapping treatments but it's a place to start looking

@jacobmorgan7495 Год назад

Great balance of intuitive and technical. Are the Callaway & Sant'Anna /Woolridge estimators required when cases are matched to controls prior to the analysis (using propensity score matching, for example)? Thanks!

@NickHuntingtonKlein Год назад

Thanks! And not necessarily, you can do matching without C&S, but it is a good way to do it, especially with staggered treatment.

@user-hp6in6vz3m 2 месяца назад

Thank you for the amazing explanation! Your textbook is also super helpful. I would like to ask a question regarding other ways to estimate staggered treatment effects. I've come across some papers using the matching x classic DID, which looks like these: 1. matching x DID (transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_t + Treat_i * Post_t 2. matching x DID (without transforming the treatment year to t=0): The OLS looks like Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t I was wondering what are the pros and cons with these 2 models, and how good they are compared to TWFE staggered DID?

@NickHuntingtonKlein 2 месяца назад

As long as you force it to drop the year just before treatment as the reference year, both of those should give the same result. However, both are wrong under staggered treatment so should only be used if treatment occurs all at the same time. Glad you like the book and videos!

@user-hp6in6vz3m 2 месяца назад

@@NickHuntingtonKlein Hi thank you for the prompt reply!! I realized these 2 models are essentially doing the same thing. Could you also explain why this model cannot be used in a staggered treatment? Or any reference material that I can address to? I thought this can be an alternative to TWFE DID. Is it because it is not comparing the early treated with late treated?

@NickHuntingtonKlein 2 месяца назад

@@user-hp6in6vz3m as for why this doesn't work, and other estimators, I'd recommend this video! Or the corresponding section of my book, 18.2.5 www.theeffectbook.net/ch-DifferenceinDifference.html#how-the-pros-do-it-2

@user-hp6in6vz3m 2 месяца назад

@@NickHuntingtonKlein Ooh so Y_i,t = Treat_i + Post_i,t + Treat_i * Post_i,t this method is the same with Two-way fixed effects DID? Could I understand it that when we fix some units and some year, the problem you mentioned for two way fixed effects would happen (like the forbidden comparison)?

@NickHuntingtonKlein 2 месяца назад

@@user-hp6in6vz3m the model you posted with by-year effects is not the same as the regular TWFE model that is just before/after, but neither model works under staggered treatment.

@RobertWF42 10 месяцев назад

I'm working on a healthcare 12 month pre/post case-control analysis with longitudinal data (monthly obs) where patients in the treatment group received an intervention at staggered times over several years. We'd like to compare trmt patients with control patients to estimate the average trmt effect. Rather than building a separate control group dataset, which is likely very different from rhe trmt population & have to worry about unmeasured confounders, can we simply compare a trmt patient to another trmt patient who hasn't yet received the intervention (create an ad hoc ctrl group from the trmt group)? We'd match the trmt intervention date to the same date in our "control group" and compare pre to post 12 month outcomes.

@NickHuntingtonKlein 10 месяцев назад

Yep, no particular reason you couldn't do that as long as you think parallel trends holds for those comparisons. I would want to make sure that (a) you do have enough post-observation periods for your "control" group before they get treated (or might potentially start altering their behavior in anticipation of treatment), and you don't include any post-treatment data for your control group for when they're supposed to be acting as a control, and (b) you have enough observations in your "control" group and aren't introducing too much noise by throwing out most of your controls.

@RobertWF42 10 месяцев назад

@NickHuntingtonKlein Thanks Nick! If the parallel trends assumption doesn't hold for the pre-intervention data, can we avoid the rule by condensing the outcome observations into only two time points per member: Y0 = average pre-intervention outcome and Y1 = average post-intervention outcome? Or avoid DiD and instead run an ANCOVA by regressing Y1 on Y0 + trmt_flag + covariates.

@NickHuntingtonKlein 10 месяцев назад

@@RobertWF42 Keep in mind that parallel trends is a *theoretical* assumption that you can't actually observe (see my video on parallel trends). When you check for prior trends in pre-treatment data, you can at best get suggestive evidence of whether parallel trends holds, you can't actually check it. If your research design is "compare a newly-treated group as they change over time to an untreated group as they change over time, including both pre- and post-treatment periods", regardless of the way you estimate the model, you need to assume that the effect of time independent of treatment affects both groups equally, but can't actually observe it. So condensing to two time periods wouldn't help, since the change you need to assume parallel trends for is before vs. "after if the treatment hadn't occurred" and you've still got that change even if it's only a single pre vs a single post.

@user-dk9cg2cr9f Год назад

What if I want to test moderating effects using three-way interaction using a staggered DID model? How should I measure the moderator when different groups have different treatment times?

@NickHuntingtonKlein Год назад

I suspect it would work to take the Wooldridge approach and add another level of interactions, then average together the interaction effects to get your moderation effect. I don't actually know if this works but I suspect it would.

@yufangsun7725 Год назад

I have a stupid question. From what you say, do you mean that in the staggered difference in difference specification (without any of the improved estimators), the "post" variable is only 1 for 1 period after the treatment, and becomes 0 thereafter? (so first 0, then 1, then back to 0)

@NickHuntingtonKlein Год назад

Do you mean the two way fixed effects model? Post is 1 in all the post-treatment periods, not just the first one. But to be clear, this model does not work as intended if your treatment is staggered.

@aliothrosen9242 10 месяцев назад

If I want to draw a graph to see evolution of outcome variables of treated and untreated but the x-axis is relative time to the treatment, how can I add the line for untreated group in this graph since untreated doesn't have a relative time to the treatment?

@NickHuntingtonKlein 10 месяцев назад

Ideally, you draw a separate one of these graphs for each treatment cohort. That way, treatment time is fixed for each graph. 0

@robbiemaris6238 10 месяцев назад

Thanks for the great video! What about a situation where there is a staggered rollout over time on two dimensions (location and another covariate)? For example, a new teaching programme is rolled out for different school locations and subjects. So, in Year 1, some schools get the treatment but only teachers of some subjects get the treatment. In Year 2, more schools get the treatment and the list of subjects expands (so some of the teachers at schools initially treated in Year 1 are now treated because their subject is included). Hopefully that makes sense! It's almost like the number of subjects that the programme covers is a measure of treatment intensity that varies over time... However, it's not a linear measure of intensity! Maybe each subject-level teacher training is its own treatment?

@NickHuntingtonKlein 10 месяцев назад

I'd probably treat each subject/school combination as its own treatment, assuming you have data at this level. If I weren't worried about spillovers at all, I had subject-school level outcomes, and my outcomes were comparable between subject, this is definitely what I'd do. If you're worried that one subject getting trained will affect others in the same school before training occurs, that makes things more complicated.

@robbiemaris6238 10 месяцев назад

@@NickHuntingtonKlein thanks - that's very helpful! Assuming spillovers weren't an issue, how would that many treatments enter a DiD regression framework? Would there be seperate treatment dummys for all combinations interacted with a post treatment variable?

@NickHuntingtonKlein 10 месяцев назад

@@robbiemaris6238 Most staggered-DID methods, like those I mention in the video, do separate treatment dummies by cohort - i.e. by when the treatment started for that group.

@robbiemaris6238 10 месяцев назад

@@NickHuntingtonKlein great! So in the example, there would be seperate treatment dummies for each cohort (school) and subject combination?

@NickHuntingtonKlein 10 месяцев назад

@@robbiemaris6238 I don't know about subject, but cohort yes. See the Callaway and Sant'Anna or Wooldridge methods

@gabrielaterra3734 2 года назад

Thank you for the great explanation! When I have a design case like this, should I do matching for each year of treatment considering the treated in the period n+1 as possibly in the control group?

@NickHuntingtonKlein 2 года назад

I'd advise against attempting it by hand (unless that's something you do a lot) and instead use a package. For an approach with matching the Callaway and Sant'Anna estimator makes sense, csdid package in Stata or did in R. The appropriate control group can either be the entire set of groups that haven't been treated yet as of time n, or only the groups that never get treated. The former is more precise but you need to be willing to make some assumptions about the treatment not being anticipated

@chenzhang8005 10 месяцев назад

Thank you so much for the great content! I have one question regarding testing the pretrend using Csdid stata code. The pretrend is significantly different because treatment and control. I wonder what should I adjust to make sure the common trend assumption is met and how can I conduct matching manually for staggered DID. Thank you for your time 😊

@NickHuntingtonKlein 10 месяцев назад

There's no way to guarantee that parallel trends holds (since it is an assumption that you cannot observe in the data - common prior trends is suggestive of parallel trends but they aren't the same thing). But if you have some reason to believe, say, that trends differ because of some different starting value of a covariate, then controlling/matching for that variable will fix it. Adding it as a covariate in csdid will do the matching for you, no need to do it manually.

@chenzhang8005 10 месяцев назад

@@NickHuntingtonKlein Thank you for reply! I know that most papers just use the visual inspection to roughly check parallel pretrend, and I wonder whether it is possible to do the visual inspection on the staggered DID as well?

@NickHuntingtonKlein 10 месяцев назад

@@chenzhang8005 one approach is to do the same inspection, but separately for each cohort (and its matched control)

@lukaparisi4351 Год назад

Thank you for the video it complements the book in a great way! Quick question for you Nick, I am currently doing my thesis on Gun delay laws and how they affect gun-related suicides in the United States. Since a few states "switch-off" gun laws in my sample I am looking to use this staggered treatment difference-in-differences design. I am having trouble formulating my model equation, would the following equation be correct if i'm only looking at the ATT? ln(Y)_it = \alpha + \beta_1 ( Treatment_s x Post_{s\tau}) + \lambda_t + \mu_s + \epsilon_{st}, with \tau being the subscript for event time, subscript s for state and subscript t for calendar time?

@NickHuntingtonKlein Год назад

That would be biased for the reasons mentioned in the video. You'd need to use one of the staggered treatment estimators like Callaway and Sant'Anna. To if you want to incorporate the "switches back off" part that's harder, matrix completion would be one option

@MMichiganSalveRegina Год назад

But how do you aggregate the effects from separate models?

@NickHuntingtonKlein Год назад

Add or average them as desired. Everything should be coming from a single model so it should be straightforward to just do linear combinations of coefficients as you might normally do. In software, packages for Callaway and Sant'Anna or Wooldridge will include aggregation commands.

@renanchicarellimarques4272 2 года назад

Great video, as always! Do you have a good suggestion of an applied paper which uses the Callaway and Sant'Anna (2021) staggered DID estimator?

@NickHuntingtonKlein 2 года назад

Thanks! Because it's relatively new most of these are working papers still. But I'd recommend checking the list of papers that cite it. scholar.google.com/scholar?cites=6052894912618674159&as_sdt=5,48&sciodt=0,48&hl=en

@mountainsmusicbeer5532 11 месяцев назад

Your videos are fantastic (as is your enthusiasm). But I'm new to difference-in-differences. Can it be applied when there are different treatments to different cohorts? Specifically, I'm thinking about a 2-year language program, for which the control group had two years of face-to-face (F2F) classes. But as a result of covid19 preventions measures that began in 2020 (and ended in 2022), different cohorts had different course delivery methods. (Each cohort had the same standardized language test at the beginning of Year 1, end of Year 1, and end of Year 2.) 2018 cohort: Year 1 F2F classes, Year 2 F2F classes 2019 cohort: Year 1 F2F, Year 2 online classes 2020 cohort: Year 1 online, Year 2 online 2021 cohort: Year 1 online, Year 2 F2F

@NickHuntingtonKlein 11 месяцев назад

Thanks, and yep! The only part of what you said that isn't covered in this video is the non-monotonic treatment (treatment turns "on" over time for some but "off" over time for others).there are some did variants designed for that case, or You might try matrix completion for that.

@mountainsmusicbeer5532 11 месяцев назад

@@NickHuntingtonKlein Thanks for the quick reply. I'll need some time to think about this, but this is encouraging.

@vaibhavpuri9278 Год назад

Amazing video! please do expand this with STATA based example. Found it really informative.

@c.comploj3775 Год назад

You are targeting graduate students. Why should people read your book, if your videos are very simplistic? Explanations are good, but discussing assumptions etc. might be more relevant for students.

@NickHuntingtonKlein Год назад

1. Who says I'm targeting grad students? 2. The book discusses more assumptions, which sort of addresses your other question