Thank you so much! I am a grad student trying to work with survey data and this helped immensely! Your video saved me a solid three hours of time I would have spent making mistakes in Stata.
Thank you so very much. 🙏🏾 Helping a friend with masters and this has really helped me understand and am able to move forward after completing this section to help her to move on. Cheers mucho
Thank you for this very precise and succinct lecture. A quick question: I saw positive coefficients when the base was "No Diploma" and negative coefficients when "Graduate". Kindly interpret one of the coefficients in either regression results.
The reason for this is that "no diploma" is the lowest income group. Any departure from that group would be associated with an increase in income. For example, the estimate for the bachelor's degree tells us how much more income bachelor's degree holders have above those with no diploma. Graduate, on the other hand, is the highest income group (people with PhDs, MBAs, and so forth). Here, people with (only) bachelor's degrees have less income on average than the base group.
Hi how can i apply this information if my categorical variable is already in numbers (0,1,2) and i need to regress it with a continuous variable? STATA doesn't know what the values correspond to but 0= conservative, 1=labour and 2=other
Hi thank you for the video very helfpful! I was wondering if now that you have the "educcodes" variable you can drop "educcategory"? Or should you keep it?
There is no particular reason you would drop it, but it's not needed for the "educcodes" variable to function properly once it's been created. You might want to keep in case you want to do something else with it later, though.
I only bought stata yesterday and first time using it for my Msc dissertation. This was so helpful. When using the label define command, I assume you have to type in the variable names exactly as was shown when tab. What if the variable name has spaces in between, as in instead of NODIPLOMA, it was NO DIPLOMA? The variable in my data had a space in between and I got a syntax error when I tried to use the command label define on it.
Found the answer to my own question by playing around with stata. When entering the name for instance NO DIPLOMA, make sure you enter it as 1 "NO DIPLOMA"
@@mussahemed1153 Yes, you got it. The reason you had this problem is that spaces are used as delimiters in Stata commands, similar to how commas are generally used in Excel functions. So, any time you have text with a space you need quotation marks.
Thanks so much for this clear explanation. I am now doing PhD and your videos helps me a lot for my analysis. However, can I use "i. " in multiple regression? Is there any differences in STATA 14 and 13 for this creating dummy variable command "i. " in front of the variables we are going to use? Please help me.
You can use the "i." structure in any regression. It definitely works the same in Stata 14 and 15. I haven't used 13 in a very long time, so I can't remember for sure.
@@sebastianwaiecon Thanks so much for the prompt reply. I tested in logistic regression in Stata 14, it works but that command from DO file (Stata 14) did not work in Stata 13. I tested it in other friend's computer as I have only stata 13. When I commend all DO files from Stat 14 to 13, this "i" structure did not work!! Again, is this "i." structure the same in logistics regression, also no difference in "DO files" either 13 or 14 or 15?? Appreciated on this online free teaching!
It shouldn't make any difference using a logit. I guess this was a new feature in Stata 14. The oldest reference in my own do files I can find to this was from late 2015, after Stata 14 came out. From version to version, usually not a whole lot changes, but I'd recommend upgrading to Stata 15 at this point.
Hi, at the end of the video you say "we have the same exact numbers" when you compare the last two regressions you run. But the coefficients changed. So, not only from positive to negative. I tried with other data and the number of my coefficients also change when I change the category of reference but most importantly the significance changes too. So, I have a variable with 4 categories, when I chose category 1 as reference I get somme significant (p-value) results. But, when I chose category 3 as reference I have no significant results. Could you help me to understand why? And should we use then, the reference that give us the most interesting results?
The coefficients change because the base group changed. The coefficients always give you the difference from the base group. Any predictions you make will be mathematically identical. The significance would change because you're doing a different test when you change the base group - it's a comparison between different groups now.
I am running a regression using Stata with the dependent variable being R.O.A and the independent variable being green-house-gas emissions. I also have 4 control variables. I also want to control for each industry. For example, firms that operate in the industry sector will typically have higher GHG emissions than firms in the health care sector. Would this be the way to control for each industry? If not is there a way to do so? Thanks
You can put the numeric variable directly in the regression. If you want to use a numeric variable as a categorical variable, you can still do that. You skip the step of encoding, and just use the "i." structure.
Hi I have an issue. I am trying to convert my string variables (in a group called stage) to numeric variables. I used the encode option and created a new variable called stage_cat. But when I tabulate stage_cat I get no observations. The list of stage_cat also seems to be empty. It looks like the encoding option didn't work. How can I fix this?
Can I use a dummy variable as my dependent variable..with the i. command.I tried it, but I am getting an error message saying this "depvar may not be a factor variable" so what can I do
You can use a dummy variable (ie. the values must all be zeros or ones) as the dependent variable. See my video on this: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vRKesKWMCsg.html There are special regression tools, such as the multinomial logit, that allow for more complex categoricals as the dependent variable, but I haven't covered those in videos.
You can force Stata to omit the constant using the noconstant option. This will remove the collinearity and allow all categories to be in the regression. However, be careful about the interpretation and what statistical significance would mean in this new context. I personally wouldn't advise doing what I'm suggesting but that is the answer to your question.
Hi. If i have values such as “grade 1, grade 2 grade 3, grade 10, grade 11 etc” under a variable named “Education” but I want all lower grades such as grade 1-grade 7 to be called “Primary School” and higher grades such as grade 8-grade 11 to be called “High School”- how do I code that in stata?
I came on here trying to find answers to the same question. I was able to group my grade levels but not in the order that I wanted. I have grade levels 7-11 and I want to group grades 7-9 as "lower school" and grades 10-11 as "upper school". The command I have is was able to group grades 7-8 as "lower school" and grades 9-11 as "upper school". I am trying to figure out how to group grade 9 in the lower school category. Anyways, Try this and see if it works. gen schoollevel = recode (education, 1,2) Label define schoollevels 1 "Primary School" 2 "High School" As I mentioned this command worked for me but not in the order I wanted. Hope it helps in some small way.
You can do this with logical operators, in this case the "pipe" (|), which means "or." For your example, gen primaryschool = Education == "grade 1" | Education == "grade 2" and so on. Keep adding pipes and statements for each grade.
Hi, I need your help, Like i have only two educational categories, 1=PhD, 2nd Mphil then how can I estimate the return of two education levels. I have no the third category for the base then how can I find the returns of these two? My problem is different like I want o estimate the private return of education for these two-level, like how much impact on earning after one-year education increase
Short answer is that you can't. You cannot estimate a treatment effect if you do not have any observations without the treatment. The best you can do is estimate the difference between the PhD and the MPhil.
Hi I have a string of numbers but it's red. So I use this command to encode but all my numbers have been changed to others. I dont know why, pls help me 😭 🙏🙏🙏
The command you probably need is "destring." Most likely what happened is that you have at least one value that is not a number, which is why Stata read it as a string. You first need to make sure all the values are valid numbers, then use destring to generate a numerical version of your variable.
Just wanna turn the var after encoding into real numbers, like 1 for Someone, 2 for Anotherone, ... without claiming them one by one cause there are just too much of them. No one, literally no one can tell me how to do this. I am wondering why we are still using STATA rather than R, which is much more direct
When you encode, the new variable is just numbers (with labels on top). You can see this by clicking on the values in the data browser. Just make sure to set the order to how you want it. From there, just put the encoded variable into the regression without the "i." structure. You can see me do this in the video at 5:40.