Difference in Differences Estimation in Stata

SebastianWaiEcon

Подписаться 15 тыс.

Просмотров 201 тыс.

50% 1

Видео Поделиться Скачать Добавить в

An introduction to implementing difference in differences regressions in Stata.

Опубликовано:

31 мар 2018

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 214

@wm6698 2 года назад

Thank you so much for this! My concern is why didn't you run a complete regression model for house price? Why only a bivariate regression? (i.e., dependent and dummies).

@sebastianwaiecon 2 года назад

The purpose of this video is to demonstrate the basic technique of differences in differences estimation. You can certainly add controls to the basic model, but that is outside the scope of this video. You can search my channel for other videos on that.

@davecullins1606 4 года назад

You saved my exam in the previous semester, and you're saving me in this semester as well!

@dandellionsy6537 3 года назад

Thank you so much, I need it. My model might be more complicated but at least I can sense the idea of doing it. Awesome! Keep sharing more

@tdogz8932 4 года назад

I watched the video 2 years ago, it helped me understand the DID model and Stata so that I could finish my graduation dissertation on time. After the graduation, I published another paper using the same model, thank you sooooooooo much!!!!!!!!

@fgghdfg8638 3 года назад

Can you help me to do my diff in diff other way I will miss my year we can talk about price

@tdogz8932 3 года назад

@@fgghdfg8638 I'm sorry that I just see your message now. Hope you are doing fine with your dissertation:)

@amnashaukat7827 2 года назад

Can you help me in this technique?

@tamandanikuchanje260 Год назад

Hello can you help me?

@dargon1084 2 года назад

I learnt more in this video than six 2-hour videos of my own uni's lectures

@sylvieyin5261 3 года назад

Thank you so much. This video makes my HW much easier.

@simonazambelli5320 2 года назад

Thank you very much. You explained everything very clearly! Thanks

@sajidnoor9482 3 года назад

Thank you very much for explaining this very clearly.

@simonazambelli5320 Год назад

Love it! Thank you Sebastian!!

@sireenkhalili8631 2 года назад

Thank you so much for this video, it was really helpful!

@jonaFUN999 3 года назад

I’m from Andover, England and I approve this video 👍

@2thedata 2 года назад

Thank you so much! Your video helps me! :D

@timothyowuor9478 3 года назад

Nice tutorial on DID, thanks for saving me

@MrAdhoul Год назад

Gread video, thank you.

@huangkiana6165 3 года назад

THIS VIDEO SAVED ME FROM MY DEADLINE. THANK YOU SO MUCH *cry

@shamsunnahar2294 4 года назад

clear presentation. Do you have any video on two way cluster regression in stata. If yes, please send me the link here.

@user-vb3do7hh9v 2 года назад

Thanks, Very well explained. Can I get this dummy data set or can you please guide from where I can get such dummy data set for educational / learning purpose only ?

@sabrinanasir5844 4 года назад

Thanks for the video! If you don't have an ideal counterfactual control group (i.e. there are some slight differences between the treatment and control groups in the pre-treatment period), can you add other independent variables to the diff n diff when running the regression in Stata?

@sebastianwaiecon 4 года назад

Yes, you can.

@VINAYKUMAR-kf6kd 9 месяцев назад

Thanks for the detailed Info. what if my Dependent variable is Categorical like Anemia (Yes / No). What should i need to take B coefficient or Exp(B)?? And how to cross check in excel ?

@subhalakshmipaul4816 6 лет назад

Hello sir, please provide a video on reshape long from wide particularly when data sets is very large in size ..I.e., how to organise the variables before reshape... please sir ...

@keith-ole 4 года назад

Phenomenal explanation, thank you. If you wanted to include more prior years and a few years after, would you have to make a dummy variable for each year?

@sebastianwaiecon 4 года назад

You don't have to do that, but you might want to look into fixed effects models for that kind of thing.

@sarahfranz5748 3 года назад

Thanks for this video! One question: how would you proceed if you are comparing the difference between control and treated group across a 4 week period, testing whether the difference is bigger in the beginning and decreases?

@sebastianwaiecon 3 года назад

You can interact a time variable (linear trend, or quadratic, etc.) with a treatment dummy variable.

@aymanissa6722 Год назад

Thank for such informative video, Could you plz explain DiD method using diff command

@nazlcaneroglu4427 3 года назад

Thank you for the video! Btw is there any way that we can also see the trends of both groups by drawing a line graph in Stata? If the trends are same before the treatment period, we should be able to see that right?

@sebastianwaiecon 3 года назад

Yes, you can use a twoway graph to do that.

@adriabc7614 4 года назад

Hi Sebastian, very useful video at a great pace ;). In this example you compare the differences in price, how would you interpret the results if the variable is categorical (eg. completed studies, married, etc). Many thanks!

@sebastianwaiecon 4 года назад

You can only do this if the categorical variable is binary (eg. married and not married). Assign a 1 to married and 0 to unmarried. We now have a linear probability model (see my video on binary choice models). The interpretation of the diff-in-diff is now the difference in probability of being married.

@YY-ty5fx 4 года назад

What a clear explanation! I'm working on my own DD regression, and it really helped. Does the dependent variable 'price' cover prices before & after the treatment here, right?

@sebastianwaiecon 4 года назад

At the beginning of the video, I show the data browser and scroll through the data. You can see some observations are before and some are after.

@samknight7290 5 лет назад

Hi Sebastian, thank you very much for the video. Just wondering why you did not regress the other independent variables?

@sebastianwaiecon 5 лет назад

I wanted to keep things simple and focus just on the diff in diff technique. However, you can certainly add more variables to the regression as controls.

@md.arrahman7125 4 года назад

Dr. Thanks for your excellent explanation. Is this step the same for panel data as I planning to run DID for panel(2000-2019)? Expecting your kind suggestion

@sebastianwaiecon 4 года назад

I have some other videos on general panel data methods.

@mertbakirci6030 4 года назад

Hey, thanks for the great content here. QUESTION: How can I test for the "common trend" assumption of the DiD-estimator in Stata or in general? Thanks in advance!

@sebastianwaiecon 4 года назад

Usually, this is done informally by comparing the dependent variable movement across groups in an extended period of time before and after the treatment goes into effect. You need a lot more data than I have in this example.

@mertbakirci6030 4 года назад

@@sebastianwaiecon thank you!

@FannysVista 3 года назад

Hi Sebastian, your video helps me a lot to understand DID estimation. I have a follow-up question. Is it possible to estimate difference indifference for survey data analysis? I try it on my survey data. However, the DID from regression and the DID from manual collapse calculations show a different result.

@sebastianwaiecon 3 года назад

The actual source of the data shouldn't matter here, whether it's from a survey or not.

@trobberkah3425 2 года назад

Hi, im doing a DiD for my thesis, but im dealing with panel data. Do you know what i should do differently compared to the regression you show in this video? I noticed that there is a stata command for a fixed effects DiD regression for example.

@nunosilva1563 Год назад

I face exactly the same situation, can you please reply to the above question?

@nathanmasak 3 года назад

That's really helpful. Thank you. Did you ever run the "event study" model? I can't find resources on this model? Your input would be appreciated.

@sebastianwaiecon 3 года назад

I haven't, but a Google search turned up some resources. Best of luck with it.

@danielkrupah 2 года назад

Sir, please do you provide a paid service for the DD. I needed a coach

4 года назад

Hey, thanks. How do you do it with multiple time points?

@sebastianwaiecon 4 года назад

You can still make a variable indicating before and after treatment. You might also want to think about a fixed effects regression.

@abmakwara8010 4 года назад

Hi Sebastian thank you for the great content very informative, however i have a question, my research is looking at the impact of bank regulation implemented in 2014 and this regulation only affect bigger banks within my population. Banks with population of 25b and over. I have gathered panel data from 2010 - 2019. i intent on using performance ratios as depended and variable that determine profitability as control variables. I am using DID in FE model in Gretl to run the regression. I have generated some dummy variables , time dummy variable for the before and after, group dummy variable with those impacted by regulation as treatment group and the rest as control, regulatory dummy which i am not sure if its necessary. Two questions: 1. Is this research feasible in terms of parallel trend 2. will i need to interact all other variable in my model with time or the interaction only needs to be between time and group dummy. If yes then do i need to add group dummy on every interaction i do? 3. Is there need to add individual time effect since i am running the regression in FE model Many thanks in advance

@sebastianwaiecon 4 года назад

1. I have no idea, but it sounds like you have enough data to make that determination yourself. 2. You should think about this on a case-by-case basis. Think about what you're trying to accomplish and whether or not interactions would help with that. 3. Time dummy variables are an important component in FE. I have some videos on FE and panel data on my channel.

@katieleck9955 3 года назад

Hi, many thanks for the video. When I try to do DID for my panel data set, stata says that my treatment group dummy and did variable are omitted due to collinearity, do you know why this would be / how i could fix it?

@sebastianwaiecon 3 года назад

Most likely what happened is that you made a mistake creating your dummy variables. Click the magnifying glass button to look at your data to check what went wrong.

@lVaNeSsA90 2 года назад

what did u use rprice and lrprice varibles to?

@TommasoSchembri 3 года назад

Hi, thanks for the clear explanation. Is it possible to to a DID by percentage level? So that i come up with a %increase/decrease in the treatment group? thanks!!

@sebastianwaiecon 2 года назад

Yes, you can take the natural log of the dependent variable to get an approximation of a percentage change.

@pudurvivek 5 лет назад

Do we need to check the p values of the variables before understanding the effect of the interaction variable on the dependent variable?

@sebastianwaiecon 5 лет назад

If you want to know about p-values, I suggest taking a look at my video on hypothesis testing: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-lhoqZjQHHjk.html

@anumhajiani4233 4 года назад

A very useful video. Thank you so much. I have a question. So i created 3 columns similar to y81 nearinc and y81nric. I am running two part logit and glm model. Since the value of y81 and other two is either 0 or 1. Will we put i.y81 and etc? I mean before binary variable ain't we suppose to put i.

@sebastianwaiecon 4 года назад

For a binary variable, you will get the same result just putting the variable in or using the i. structure. If you have a categorical variable with more than two possible values, then you need to use i.

@anumhajiani4233 4 года назад

@@sebastianwaiecon Thanks a lot!!

@indagame9 5 лет назад

Have you ever done a coefplot to test the treatment effect? If so, I get a positive but not significant coefficient for my treat dummy variable. This would mean that the treatment group actually saw an increase in the fatalities (my y variable) or does it mean my treatment effect is positive? It is confusing because if I do a lowess plot on just the different states fatalities drops over time. However, in the coefplot the graph is trending upwards.

@sebastianwaiecon 5 лет назад

I don't use coefplot, but I don't see why it would show results any different from your regression table.

@nirobkothopokothon 6 лет назад

Hi, I would like to know whether Difference in differences analysis is suitable for a small data set thats contains only 2 years of data and have only 168 samples (84 control and 84 treatment)? Thank you so much.

@sebastianwaiecon 6 лет назад

I don't see any reason why not. However, with only 2 years of data, you have no idea of how the outcomes have been trending over time, and you may have a hard time justifying your counterfactual.

@nirobkothopokothon 6 лет назад

thank you so much.

@usmannasim618 4 года назад

Hi Sebastian, Can you also please describe the coding to be used when we have a dummy variable for 'treatment' and 'control' groups? Thanks,

@sebastianwaiecon 4 года назад

I did that in the video. The variable nearinc is the dummy variable for the treatment group.

@harunasanibk2662 5 лет назад

Sir, how am I supposed to run the data for both "treatment and control" groups? Should I run the data separately? Please, what command should I use?

@sebastianwaiecon 5 лет назад

I don't know what you mean by "run the data" here.

@yanvianna4737 2 года назад

Could you demonstrate how it would work when more than a year before and after treatment?

@IamPaste 4 года назад

How would you do it for 1978?

@amartilianom 6 лет назад

Hello, if you want to add control variables or covariates, do you add them normally at the regression? Thanks for the information!

@sebastianwaiecon 6 лет назад

Yes, I forgot to mention that in the video. You can add controls to the diff in diff regression as in any other.

@amartilianom 6 лет назад

Thanks. Another question would be, it is not necessary to tell Stata we have Panel Data when we have already created the dummy variables that differentiate the control and treatment group, and the pre and post periods? No need to run a fixed effects regression too, I guess. I'm just learning about the subject :)

@sebastianwaiecon 6 лет назад

For a simple DD like this, you don't need to use xtset, if that's what you're asking. You can actually think of a DD as a very simple sort of FE model that only has two groups and two periods. If you want to see more about FE, I also have a video on it.

@amartilianom 6 лет назад

I really appreciate your responses. Keep helping us!

@cherrykhalil7481 6 лет назад

Sebastian, thank yo so much for this video. Does the data have to be in long shape? Is there a way to run the diff in diff regression on a wide dataset? Thank you.

@sebastianwaiecon 6 лет назад

Yes, you can do it. Generate a new variable for the difference, then regress the difference on a dummy variable for the treatment group.

@cherrykhalil7481 6 лет назад

Thank you very much! What about the interaction dummy between year and dummy? Given that my dataset is a balanced panel of 400 firms observed in both 2008 and 2013? Thanks again

@sebastianwaiecon 5 лет назад

With the wide dataset, there's no interactions as you've already built it in by taking the difference ahead of time.

@jargodm 5 лет назад

@@sebastianwaiecon Just to follow up on this, if you do have the same units before and after, the paired difference test gives a different result than the regression you discuss in the video: Y = b1 + b2*treat + b3*time + b4*treat*time, which assumes independent samples, does it not?

@sebastianwaiecon 5 лет назад

I believe the estimate would be the same, but the standard error would be different.

@ssjvegeto4ever 3 года назад

Hi Sebastian, thanks a lot for the clean explanation! Could you tell me why you were inlcuding post-treatment levels of your covariates? Aren't they endogenous and thus result into bias? Thanks in advance!

@jackgandhi 3 года назад

I don't understand the question. What I showed here is the most basic version of diff in diff, with the bare minimum amount of variables needed. Even if I had added more variables, that would not have created any bias -- bias happens because you left variables out.

@ssjvegeto4ever 3 года назад

@@jackgandhi Thank you for the fast reply! Sorry I meant the covariate data structure. I recently did an DiD setup making use of this video's datastructure - and got the criticism that, since I included covariates with a time index for the post traetment period in the regression - these were endogenous and would thus impose bias.

@sebastianwaiecon 3 года назад

@@ssjvegeto4ever What you are describing is a common and valid criticism of time series analysis. The purpose of diff in diff is, if the data allows, solving this problem using a control and treatment group. The "post" dummy (y81 in the video) is not enough to establish a causal relationship. This is why we have the interaction term (y81nearinc in the video). In this video, y81 controls for effects over time that are constant across groups while nearinc controls for group effects that are constant over time. The interaction pulls out the estimated effect. This is not to say this method is perfect as there could still be endogeneity due to variables that are constant neither across groups nor across time, so you still may need to think about controls. The diff in diff method is just one tool in the analyst's toolbox.

@Muhammadilyas-ij6jh 3 года назад

Hello sir! I have a question...it looks like you first run a simple OLS regression and then you compute the differences using the collapse command. I do not understand whether to use just OLS regression and report the differences estimator (-18824) as the DID estimator. Please guide me..

@sebastianwaiecon 3 года назад

The number you gave estimates the difference between the treatment and control group before the treatment. We need to use the coefficient estimate for the interaction term to get the DID estimator.

@aung9211 5 лет назад

Could you please provide how to check the Equal Trend (Parallel Trend) assumption.

@sebastianwaiecon 5 лет назад

Unfortunately, we can't do it with this dataset, since we don't have extra data on either side of the change.

@oluwaseunoginni9828 5 лет назад

please , how did you generate the interaction variable?

@sebastianwaiecon 5 лет назад

Create an interaction term by multiplying the two variables you are interacting.

@nazda2007 3 года назад

Dear Sebastian, I am working on my dissertation using DiD, i included additional control variables in my model. However, the model suffers from heteroskedasticity and autocorrelation. How to deal with them?

@sebastianwaiecon 3 года назад

You might want to look at my videos on heteroscedasticity.

@emilieriislarsen5134 6 лет назад

Hi, Sebastian, thank you so much for your video. I was wondering if it's possible to do propensity score matching and difference in differences when my dependent variable is dichotomous?

@sebastianwaiecon 6 лет назад

I can't comment on specifics as I've never combined all of these myself. However, both diff in diff and propensity score matching can be done with dichotomous dependent variables. You just need to be careful about the issues inherent in linear probability. See my video on binary choice models for details.

@myleswhitmore8803 3 года назад

Hi SebastianWaiEcon, I am a student at Morehouse College, and I really enjoyed watching your video. I need help running a Diff in Diff regression for my research paper. For context, I am using Stata to analyze NAFTA's impact on GDP and trade flow for its member nations. To facilitate this process, I will be running an individual diff and diff analysis for each country. My dummy variable will be years before 1994 (when NAFTA was signed) and after 1994. My DV will be GDP growth. And my extra variables will be looking at human capital, agriculture industry growth percentage, manufacturing growth percentage, and other variables. However, I struggle with the Stata platform and would like your advice to ensure this regression runs smoothly.

@sebastianwaiecon 3 года назад

The most important thing for diff in diff is to identify a control and treatment group. In your case, that might be countries that were part of NAFTA and countries that were not.

@amnashaukat7827 2 года назад

@@sebastianwaiecon Enjoying your video.. But I neend help.. I have 25 countries and data from 1960-2020... How can I specify only one time 2012 while comparing it 2010-2016.. please help me

@sebastianwaiecon 2 года назад

@@amnashaukat7827 A fixed effects model may be more appropriate: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-H95BHswbT3w.html&ab_channel=SebastianWaiEcon

@consultingfaqs 5 лет назад

Could you please tell if we are using for example DHS data, which has data on demographics and health of a nation; but we want to see the effect of an external policy, like NREGA on labourforce participation of females ( the data for which is available in DHS). Then, should we merge NREGA data with DHS data, and then apply matching techniques to determine treatment and control groups? If not this, then how should we see the impact? Thanks

@sebastianwaiecon 5 лет назад

This question is specific to data I don't have any experience with and is therefore outside the scope of this video.

@pneumascope 4 года назад

I note that you have large Standard Errors in your findings. Does this in any way have an impact on the reliability of the findings or the interpretation of the overall impact of the program (or incinerator in this case)?

@sebastianwaiecon 4 года назад

It's all relative when it comes to standard errors. You could say an SE of about 8000, as it is here, is large, but the estimate is -20,000. Standard errors are always going to be big numbers when dealing with things like the prices of homes, which are in the tens of thousands. All other things being equal, larger standard errors mean less precision in the estimates. Here, we can still be quite confident the incinerator did decrease property values.

@Maria-ny2mj 5 лет назад

Hi! nice video thank you very much! I have a question, how do you do if there are time varying treatment ? in your example it would be… Imagine there is a neighbourhood (1) that got the incinerator got built in 81 but other neighbourhood (2)82, for example… Would it be reg price y81 y82 nearincneighbourdhood1 nearincneighorhood2 y81* nearincneighbourdhood1 y82*nearincneighorhood2? something like that?

@sebastianwaiecon 5 лет назад

You could also consider including interactions between y81 and neighborhood 2 and y82 and neighborhood 1. Once we get into more than 2 periods you should also be thinking of this as a fixed effects model. You may find my video on that helpful.

@Maria-ny2mj 5 лет назад

@@sebastianwaiecon thank you very much! I will give a look to the video!

@achintyawidhi2299 4 года назад

sir, what the difference between xtreg and reg? if i use data from year 2007 and 2014, should i use reg org xtreg? my dataset doesn't have same units across 2007 and 2014.

@sebastianwaiecon 4 года назад

Reg is the basic regression command and xtreg is used for panel data methods such as within estimation and random effects. If you don't have the same units across years (pooled cross section), then you probably want to use reg.

@zdavirandimuhammad1515 2 года назад

could you explain to us about Propensity Score Matching using STATA?

@frankzhao1678 3 года назад

Thank you so much, it is a great video. Could you please show me how to do a DiD with multi periods?

@sebastianwaiecon 3 года назад

Do you mean you have multiple periods before and after the change? It functions the same as this, but you need to define your "post" variable to include all periods after the change.

@frankzhao1678 3 года назад

@@sebastianwaiecon So if I have 2000-2010 data, and the policy happened in 2005. I need to set 2000-2004 equal to '0', and 2005-2010 equal to '1'?

@sebastianwaiecon 3 года назад

That would be the simplest way to do it. I'm not promising this is the perfect solution as you may need to think about more sophisticated ways to handle your specific data, but it is a good starting point.

@dalemantey6028 6 лет назад

Can you do a DD with logistic regression? Say I have a dichotomous outcome - for this example, it could be something like house sold (yes/no). Would it be a similiar stata code, just change "regress" to "logistic" or are the considerations within DD that might limit the statistical validity of that sort of analysis?

@sebastianwaiecon 6 лет назад

The principles which drive DD -- controlling for time trends and cross sectional trends -- are still useful for logits (and probits also). However, you need to be careful about the coefficient interpretations, as it's not as clean as in the least squares DD. I would suggest looking at my video on binary choice models for details.

@sebastianwaiecon 6 лет назад

For the code, yes, you can change "regress" to "logit" and it will run.

@dalemantey6028 6 лет назад

Thank you!

@FanettiMazakura 6 лет назад

Sebastian, what if I want to include id and time fixed effects in the regression? Do I only keep the interaction variable in the regression?

@sebastianwaiecon 6 лет назад

Unlike FE models, diff in diff does not necessarily have the same cross-sectional units across time periods. In my example, it's not the same houses in '78 and '81. As such, ID-based FE won't work. Here, the nearinc variable plays the same role as the FE. Your time dummy is already in there in DD.

@FanettiMazakura 6 лет назад

Yes, I get that. I have unbalanced panel data and I want to conduct a Difference-in-Differences with id and time fixed effects. Is // xtreg DepVar i.treated##i.during controls i.month , fe cluster(id) // the correct model to achieve that? Or do you think that it would be better to exclude the fixed effects?

@sebastianwaiecon 6 лет назад

If I'm understanding what you're trying to do correctly, I think you can include the fixed effects.

@motnaichuoiktnb 6 лет назад

Firstly thank you for your video which is very helpful. As you have mentioned in your comment it was not the same house in '78 and 81', does that mean your treatment and control group are not the same pre and post-treatment ?

@sebastianwaiecon 6 лет назад

The criterion for being in the control or treatment group is the same in both years, but the specific houses aren't the same.

@PolicywithJazzy 4 года назад

Hey! How do I generate a variable that separates the years?

@sebastianwaiecon 4 года назад

In this dataset, that is y81 -- a dummy variable with a 1 for 1981 and 0 otherwise. I have another video with some examples of how to create dummy variables: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-DuAhUpM-56E.html

@pujiannauli 6 лет назад

wht if the p value after the reg. for the dummy time*dummy group is not significant, how to fix this? thank you so much

@sebastianwaiecon 6 лет назад

You don't "fix" it, it's just the result you got. It tells you that you can't reject the hypothesis that your treatment had no effect. Now, it could be that you have some endogeneity that you need to control for, but statistical significance, or lack thereof, is not (by itself) a problem to be fixed.

@consultingfaqs 5 лет назад

@@sebastianwaiecon Hi, is the interaction term is insignificant, will adding more variables help us getting the result significant? Since, in the results show that the constant term is highly significant, which means that there is an omitted variable bias. I guess, adding more controls can help solve the problem for the insignificant interaction term.

@sebastianwaiecon 5 лет назад

@@consultingfaqs It bears repeating that the treatment not being significant is not a "problem" to be be solved unless you think this is because of an omitted variable. Tinkering around with different models with the explicit purpose of finding a significant effect is not an ethical use of data. The constant term being highly significant is also not evidence of omitted variables. I'm not sure where you got that idea. Adding more variables might or might not result in existing terms being more significant. It all depends on the direction of the bias, if there is one.

@vaishalisharma6519 5 лет назад

Hello sir. How to create the dummy for near inc. The actual command?

@sebastianwaiecon 5 лет назад

nearinc indicates whether the house is within 3 miles of the incinerator. There is a variable called "dist" which is the distance from the incinerator in feet. To create the dummy, we would use the command: gen nearinc = dist

@MrLi1231 3 года назад

Hi Sebastian, thank you so much. Quick question. Is this dataset a panel, or two separate cross section datasets? I am assuming it is two separate cross section, right?

@sebastianwaiecon 3 года назад

You are correct. It's a pooled cross section. It would be very unlikely for the same houses to be on the market in both years.

@MrLi1231 3 года назад

@@sebastianwaiecon Good point and thank you so much for the quick reply! I am working on a thesis and realised that I was supposed to be doing DiD when I had done a different methodology for the many few weeks. Your video is incredible. Big thanks from Australia!

@sebastianwaiecon 3 года назад

Happy to help!

@nip5554 5 лет назад

Hi what if I want to control for additional variables? Then the command "collapse (mean) y, by(after treatment) " is not sufficient. Please tell me what to do to control for variables.

@sebastianwaiecon 5 лет назад

You can add control variables, but you'll have to run a regression rather than using the collapse method.

@nip5554 5 лет назад

@@sebastianwaiecon Thanks :)

@vojtechkolar5897 Год назад

Hey, I kind of understand diff-in diff, now I am dealing with a problem, what if the control is on way larger levels than the treatment Lets stay Control before: 100, after: 200 = 100 % increase, Treatment before: 5, after 9. If I calculate the DID efffect using the standard table so like the diff between differnces i get in this case 100-4= 96!... So the conterfactual state of the world would in the case of treatment be 105 ? !, that does not make sense no? Even the R with OLS gives me these results. What am I doing wrong? Thank you

@vojtechkolar5897 Год назад

I get, that I can solve this problems by working with log-level model. But isnt this problem always with level-level dif in dif? What Am i missing?

@sebastianwaiecon Год назад

You can do diff in diff with the dependent variable in logs. That's no problem as long as you are careful with the percentage change interpretation.

@thanhtoba1464 5 лет назад

Thank you for your helpful sharing, when I run the command: "corr(y81 nearinc y81nrinc)" to test the autocorrelation between variables and the result shows there is an autocorrelation between "nearinc" and "y81nrinc" variables. The confidence of correlation is 0.5776. So my question is: what should we do in this situation.

@sebastianwaiecon 5 лет назад

First of all, "autocorrelation" is a very specific term, which you are using incorrectly. In time series data, this refers to a variable correlating with itself across time. In any case, you've pointed out that an interaction term is correlated with one of the variables you are interacting. This is true by definition. There isn't anything you do about that -- it would be strange if it were not the case. In a more general sense, there is nothing wrong with two variables in a regression being correlated with each other. That is completely normal and probably the case in most regressions.

@thanhtoba1464 5 лет назад

Thank you for pointing out my problem. You are right, it was my fault in using the term "autocorrelation". What I really mean is the "multicollinearity" but there was a mistake in typing. Anyway, according to the data in the video, the truth is "multicollinearity" really happens in the regression result because the coefficient of correlation between " nearinc" and "y81nrinc" variables is 0.5776. Usually, in the case of encountering "multicollinearity", we usually omit one of the two variables out of the model. However, it is impossible to omit any variable of these two variables due to the requirement of "Difference in difference" method because they must be included together to show the effect of the construction of the incinerator. That is why I asked the question "what should we do in this situation". And this problem not only happens in this example, but it also occurs in every "DID" model because we usually create a "did" variable by multiplying the "time" and "treated" variables (did = time * treated). And the consequence is there always is "multicollinearity" in "DID" model. Can you help me to solve this issue?

@sebastianwaiecon 5 лет назад

Multicollinearity is not a big deal. Getting into the practice of dropping variables because they are correlated with another variable in the model will lead you quickly into omitted variable bias. There is a simple test where you regress the one variable you are concerned about on all the other explanatory variables. If the R-squared is under 0.9, don't worry about it. As I explained previously, it is mathematically impossible for a variable and an interaction term involving it to be uncorrelated. The interaction term is absolutely key to a diff in diff regression.

@thanhtoba1464 5 лет назад

@@sebastianwaiecon Thank you very much for the explanation.

@gregorychung9421 4 года назад

@@sebastianwaiecon Hello, I found this video very helpful. However, when running my model, my DID variable keeps getting dropped because of collinearity. Is there a fix to that?

@Bibirallie 2 года назад

What if there are multiple before and after variables, but not one conclusive before and after or year variable.

@sebastianwaiecon 2 года назад

You may want to consider a fixed effects model instead.

@ariagalit1875 5 лет назад

Hi. My data ranges from 2009 to 2018, and i have both treatment and comparison groups. i just want to ask whether DID, just like what you did in the video, is applicable. I am not much familiar with the method and stata, actually.

@ariagalit1875 5 лет назад

And how come the interaction variable is all zero?

@sebastianwaiecon 5 лет назад

You can do DID if you set up a dummy variable to indicate when the treatment went into effect. Once this is in place, you can create the interaction term.

@ariagalit1875 5 лет назад

Thanks much for your reply sir

@manojsapkota4880 4 года назад

Hello sir I am interested on DID and want to know the command to run DID regression on Stata

@sebastianwaiecon 4 года назад

It's all in the video.

@raulfotso4032 3 года назад

Good morning for all.please i want know how to do a Fairlie décomposition.i am student lecturer in university of Douala

@peterdastan1288 Месяц назад

Does that mean house prices near garbage incinerator declined by an average of 21.13%?

@zdavirandimuhammad1515 2 года назад

hi thank you for the explanation. but can we req the data so we can also practice?

@sebastianwaiecon 2 года назад

This is the dataset KIELMC.dta from the Wooldridge econometrics textbook. It is widely available online.

@zdavirandimuhammad1515 2 года назад

@@sebastianwaiecon thank you. also for kindly reply my message. God bless. stay safe stay healthy

@johnkaimenyi9292 2 года назад

Hello, is DID regression possible in STATA 15.0?

@sebastianwaiecon 2 года назад

I'm not aware of any changes in recent versions of Stata that would change anything in this video.

@antoniomastrandrea967 4 года назад

Hi Sebastian, thank you for your video! I've two questions: 1) What should I do if the FE variables (time and individual) are not significant? (I mean p-value > 0.1) 2) Do I have to take care of R squared in this case? Thank you!

@sebastianwaiecon 4 года назад

1) If what you're after is measuring the treatment effect, this doesn't matter. 2) I don't know what you mean by "take care," but R squared is not particularly relevant in DID estimation.

@alexbrunofmn 5 лет назад

When was the incinerator built?

@sebastianwaiecon 5 лет назад

According to the original paper, construction took place from 1981-1984.

@DX-nh8qc 2 года назад

May I know How to type control covariable in stata

@jamesleleji9470 2 года назад

How can you do DID using SSPSS or R programming. Thanks

@sebastianwaiecon 2 года назад

The idea will be the same -- create dummy variables for treatment and time and an interaction, then put those in a regression.

@narlikar78 5 лет назад

Sir, Another question in this regard and I humbly request your attention at the earliest: Suppose I have a panel data set of 75 Banks for 5 years (Pre-merger) which have merged to become 30 Banks (also for 5 years Post Merger) and I have been able to establish my model using all the standard Panel Data Test viz. the F-test, BP-LM Test, and Hausman (1978) that it is a Fixed Effects Model. given that my Dependent Variable is an Index of Inclusion (whose values lie between 0 and 1), while all other Independent variables are metric data from Balance sheets of banks, with a time dummy (0 for pre-and post merger), CAN I run a Panel Tobit model knowing well that it is a fixed effects Model. I use Stata 14 for my econometrical model testing? I have been told that Panel Tobit can be accompanied only for Random Effects Model My problem is my Dependent variable has a truncated range ? Please guide asap

@sebastianwaiecon 5 лет назад

Mechanically, you can do it with dummy variables (see my fixed effects video). While I am not aware of a specific reason you should not do so, I don't know enough to definitively tell you one way or another.

@narlikar78 5 лет назад

Can we have your dataset used in the video to try the results again ourselves

@sebastianwaiecon 5 лет назад

The dataset is KIELMC.dta that comes with the Wooldridge econometrics book. You should be able to find it online.

@Diana-mo6mg 4 года назад

if you used logprice instead of price would the coefficient be different?

@sebastianwaiecon 4 года назад

Yes, it would. See my video on natural logarithms for how that would work.

@BrickTemplar 4 года назад

Hi Sebastian, I wonder what do we have to do if the effect is spread over the years, say, treatment was implemented in one year for the firms in one industry, next year for another? Say, over the three decades, the U.S. authorities have gradually cut import tariffs on a large variety of goods and services. CUT=1 if this happened, 0 otherwise. The equation will have a form of Investment=b1*tariff CUT + b2*lagged controls + industry FE etc, cluster by industry-year. I do not understand what do I have to add to a simple regression to make it diff-in-diffs in this case... Dummy CUT interacted with what?

@BrickTemplar 4 года назад

or, like in your example, incinerator would have been installed for one neighborhood in 1981, for another in 1985 etc, for another in 2005... y81 time dummy won't work anymore, so what do we have to interact?

@sebastianwaiecon 4 года назад

You'll need a dummy variable that "turns on" from a 0 to a 1 once the treatment is active. You won't be able to do this by building an interaction term, as it's more complex than that now. I'm not sure there's a better way than putting in the 1s on a case by case basis.

@GHSHAH Месяц назад

How to interpret the interaction term, also how to check it is significant or not.reply fast

@KIMKIM-bt6hr 3 года назад

Good morning. I am a student working with the DID model. Thanks to your DID explanation, I was able to complete my assignment smoothly. But yesterday, the professor asked, 'Why was the control variable excluded, so I couldn't actually answer it.' After class, the professor gave me a separate assignment. That is, put the control variable in and analyze it again. I want to use STATA again. But how do I add a control variable to the current video? Could you please advise which code to enter?

@sebastianwaiecon 3 года назад

You can simply add control variables to the DID regression, if you want.

@KIMKIM-bt6hr 3 года назад

@@sebastianwaiecon I'm a STATA beginner, so can you explain a little bit more about where to put this part?

@monikasrivastava5565 3 года назад

What are the steps to generate the result why u have not shown them. Plzz do i really need how to do it

@jaredgreathouse3672 6 лет назад

What if your data have multiple units treated and untreated at the same time? There, a clean post period makes no sense. If one city 1, for example, is being treated at time t, but city 2 and 4 aren't, but the next year, city 3 is being treated and so on, wouldn't you just do treatment##time variable

@sebastianwaiecon 6 лет назад

For that, you might want to look into a full fixed effects model. I have a video on that, as well.

@ashishstat 3 года назад

Can I have the link of data set used in this video

@sebastianwaiecon 3 года назад

It's KIELMC.dta, which comes with the Wooldridge econometrics textbook. You should be able to find it online.

@himaep_agungkrisyana1013 2 года назад

can i get do-file this stata?

@popi20101 3 года назад

What if we add more than 1 control variable? not only nearinc.

@sebastianwaiecon 3 года назад

You are always allowed to add controls if you think the DD method did not eliminate endogeneity.

@popi20101 2 года назад

And if we have 5 years of period 2007 to 2011 and the policy is announced at 2009, how to set the year variable?

@lateralus5117 6 лет назад

Hello, i ran into a problem when running my regression. My regression looks like this: regress DepVar post_tr_yr treat_group treat_groupXpost_tr_yr Where post_tr_yr is a dummy for year>2007 However my interaction term (treat_groupXpost_tr_yr) gets omitted due to collinearity. Is this a problem?

@sebastianwaiecon 6 лет назад

I always recommend you go to the data browser and take a look at the values. Presumably something went wrong in your variable generation.

@Ilaay23 6 лет назад

I also have this problem. My interaction term is omitted due to collinearity, does anyone know how you can fix this?

@bencaplan4565 6 лет назад

I have the same issue - what sort of issue in the variable generation can result in this?

@xMooshy 5 лет назад

@@bencaplan4565 for the time dummy, the control group also gets a 1 even if it is not treated at all

@mathewchandy9588 4 года назад

Is heteroscedasticity ever an issue when you conduct a difference-in-difference analysis?

@sebastianwaiecon 4 года назад

Yes, it is. In this example, you could imagine there might be a difference in the variance of prices with and without the incinerator.

@mathewchandy9588 4 года назад

@@sebastianwaiecon Then to solve this, would you add the vcerobust command at the end of your regression?

@sebastianwaiecon 4 года назад

I can't think of a theoretical reason why you couldn't do that. To be honest, I think most people just use robust all the time and don't really think about it.

@jodieteague8254 4 года назад

could you then graph this in Stata?

@sebastianwaiecon 4 года назад

Yes. You would do this after running the collapse to get all the averages. The "classic" diff in diff graph has the outcome on the vertical axis and time on the horizontal axis. There are three lines: the treated group, the untreated group, and a counterfactual with the same starting point as the untreated group but the same slope as the treated group. See my video on graphing for how to use the twoway command.

@jodieteague8254 4 года назад

@@sebastianwaiecon Thank you will do!

@fgghdfg8638 3 года назад

Hi professor I hope you are doing well I'm a follower on RU-vid professor can you help me to do an assignment in method difference in differences because I didn't find subject or data can help me to do it I must to do it other way I will repeat the year and I sleep only 3 hours more than 3 weeks just because of this project can you help me and if you want I can pay you to help me

@sebastianwaiecon 3 года назад

I recommend you ask your professor for help - it's what they're there for!