Categorical Variables in Stata

SebastianWaiEcon

Подписаться 15 тыс.

Просмотров 136 тыс.

50% 1

Видео Поделиться Скачать Добавить в

More information on categorical variables in Stata: www.stata.com/features/overvie...

Опубликовано:

28 июн 2017

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 88

@DavidMwangi 2 года назад

Sebastian sir, thank you so much for this! you went straight to the point with no time wasted

@anushkanegi5220 3 года назад

This was so helpful! I spent hours looking for this on the internet. Thank you so much!!!

@wkatieivey 5 лет назад

Thank you so much! I am a grad student trying to work with survey data and this helped immensely! Your video saved me a solid three hours of time I would have spent making mistakes in Stata.

@sebastianwaiecon 5 лет назад

Glad to hear you found it useful.

@asmasultana2732 3 года назад

@@sebastianwaiecon can u help me in gaining training stata plz

@bevbeautifulhealing 11 месяцев назад

Thank you so very much. 🙏🏾 Helping a friend with masters and this has really helped me understand and am able to move forward after completing this section to help her to move on. Cheers mucho

@danilorodriguez1638 5 лет назад

Muchas gracias. Saludos desde Colombia.

@saurabhsahu175 5 лет назад

Thank you so much Sir.! Really helped me..!

@GM__user 3 дня назад

THANK YOU🙏🏼 This video has been extremely helpful

@vitriawaode6302 6 лет назад

thank you...this video is quite helpfull

@wharrison2010 5 лет назад

Thank you for this very precise and succinct lecture. A quick question: I saw positive coefficients when the base was "No Diploma" and negative coefficients when "Graduate". Kindly interpret one of the coefficients in either regression results.

@sebastianwaiecon 5 лет назад

The reason for this is that "no diploma" is the lowest income group. Any departure from that group would be associated with an increase in income. For example, the estimate for the bachelor's degree tells us how much more income bachelor's degree holders have above those with no diploma. Graduate, on the other hand, is the highest income group (people with PhDs, MBAs, and so forth). Here, people with (only) bachelor's degrees have less income on average than the base group.

@ViralVidz21 Год назад

You are the best man.

@anastaciamatlebjane1040 3 года назад

I think you just saved my life😢

@anlanhnguyen-ly9vi 3 месяца назад

Excellent! Thank you for saving me tons of time :)

@theReal_Mimi 3 года назад

whats the value of x if you could calculate estimated value of wage for each category.

@asmasultana2732 3 года назад

@SebastianWaiEcon I am a student of MRes course and I need to understand stata and it's working. I feel it is difficult. Can u train

@shannonbarnes1888 2 года назад

Hi how can i apply this information if my categorical variable is already in numbers (0,1,2) and i need to regress it with a continuous variable? STATA doesn't know what the values correspond to but 0= conservative, 1=labour and 2=other

@201120sebastian 5 лет назад

Hi thank you for the video very helfpful! I was wondering if now that you have the "educcodes" variable you can drop "educcategory"? Or should you keep it?

@sebastianwaiecon 5 лет назад

There is no particular reason you would drop it, but it's not needed for the "educcodes" variable to function properly once it's been created. You might want to keep in case you want to do something else with it later, though.

@201120sebastian 5 лет назад

@@sebastianwaiecon thank you so much!

@mussahemed1153 3 года назад

I only bought stata yesterday and first time using it for my Msc dissertation. This was so helpful. When using the label define command, I assume you have to type in the variable names exactly as was shown when tab. What if the variable name has spaces in between, as in instead of NODIPLOMA, it was NO DIPLOMA? The variable in my data had a space in between and I got a syntax error when I tried to use the command label define on it.

@mussahemed1153 3 года назад

Found the answer to my own question by playing around with stata. When entering the name for instance NO DIPLOMA, make sure you enter it as 1 "NO DIPLOMA"

@sebastianwaiecon 3 года назад

@@mussahemed1153 Yes, you got it. The reason you had this problem is that spaces are used as delimiters in Stata commands, similar to how commas are generally used in Excel functions. So, any time you have text with a space you need quotation marks.

@user-ty8tk5hg6r 2 года назад

how about dummy varible I have code nominal variable for exp. status how is it ?

@user-sl5ds8kn5x 3 года назад

Thank you!

@thidachawhlaing3494 5 лет назад

Thanks so much for this clear explanation. I am now doing PhD and your videos helps me a lot for my analysis. However, can I use "i. " in multiple regression? Is there any differences in STATA 14 and 13 for this creating dummy variable command "i. " in front of the variables we are going to use? Please help me.

@sebastianwaiecon 5 лет назад

You can use the "i." structure in any regression. It definitely works the same in Stata 14 and 15. I haven't used 13 in a very long time, so I can't remember for sure.

@thidachawhlaing3494 5 лет назад

@@sebastianwaiecon Thanks so much for the prompt reply. I tested in logistic regression in Stata 14, it works but that command from DO file (Stata 14) did not work in Stata 13. I tested it in other friend's computer as I have only stata 13. When I commend all DO files from Stat 14 to 13, this "i" structure did not work!! Again, is this "i." structure the same in logistics regression, also no difference in "DO files" either 13 or 14 or 15?? Appreciated on this online free teaching!

@sebastianwaiecon 5 лет назад

It shouldn't make any difference using a logit. I guess this was a new feature in Stata 14. The oldest reference in my own do files I can find to this was from late 2015, after Stata 14 came out. From version to version, usually not a whole lot changes, but I'd recommend upgrading to Stata 15 at this point.

@thidachawhlaing3494 5 лет назад

@@sebastianwaiecon Thanks so much. I think the point is old version Stata needs to be upgraded.

@habtamudoe8868 4 года назад

Thanks you so much sir.if the dependent and independent variable are categorical how I run it on stata

@sebastianwaiecon 4 года назад

See my video on binary choice models for basic ways to handle categorical dependent variables.

@ceciliadelvi2724 2 года назад

Hi, at the end of the video you say "we have the same exact numbers" when you compare the last two regressions you run. But the coefficients changed. So, not only from positive to negative. I tried with other data and the number of my coefficients also change when I change the category of reference but most importantly the significance changes too. So, I have a variable with 4 categories, when I chose category 1 as reference I get somme significant (p-value) results. But, when I chose category 3 as reference I have no significant results. Could you help me to understand why? And should we use then, the reference that give us the most interesting results?

@sebastianwaiecon 2 года назад

The coefficients change because the base group changed. The coefficients always give you the difference from the base group. Any predictions you make will be mathematically identical. The significance would change because you're doing a different test when you change the base group - it's a comparison between different groups now.

@n10f98 4 года назад

How would I adjust the education variable to comprise of fewer categories before doing a moderation analysis?

@sebastianwaiecon 4 года назад

If you want just two categories, then you could just generate a dummy variable indicating the subcategories you wanted.

@redface4444 4 года назад

I am running a regression using Stata with the dependent variable being R.O.A and the independent variable being green-house-gas emissions. I also have 4 control variables. I also want to control for each industry. For example, firms that operate in the industry sector will typically have higher GHG emissions than firms in the health care sector. Would this be the way to control for each industry? If not is there a way to do so? Thanks

@sebastianwaiecon 4 года назад

Sounds like dummy variables indicating each industry would be appropriate. You can do this using the methods outlined in this video.

@godwinnerarwill2885 4 года назад

Pls I need help with my data analysis especially creating a composite variable

@thedatahall 4 года назад

what specifically are you looking for?

@markvanderlinde30 3 года назад

hi, is it true that the i. command for categorical variables does not work when using the Oaxaca regression command?

@sebastianwaiecon 3 года назад

I haven't used that, so I don't know.

@ericli6027 4 года назад

Thanks this is really helpful! But what should I do if the original variable is numeric??

@sebastianwaiecon 4 года назад

You can put the numeric variable directly in the regression. If you want to use a numeric variable as a categorical variable, you can still do that. You skip the step of encoding, and just use the "i." structure.

@ericli6027 4 года назад

SebastianWaiEcon ok, thank you!

@tahmidfaysal8315 2 года назад

Thanks a lot

@mahbubhasan8102 4 года назад

It,s concise.

@ralphnestorpadero950 4 года назад

what about the post estimation? is it the post estimation for continuous and categorical variables the same?

@sebastianwaiecon 4 года назад

Any postestimation commands can still be used, since this is all contained within the regress command.

@keeks4914 2 года назад

Hi I have an issue. I am trying to convert my string variables (in a group called stage) to numeric variables. I used the encode option and created a new variable called stage_cat. But when I tabulate stage_cat I get no observations. The list of stage_cat also seems to be empty. It looks like the encoding option didn't work. How can I fix this?

@sebastianwaiecon 2 года назад

I'm not sure about recode, but did you try using encode like I showed in the video?

@emilbinny 5 лет назад

Can I use a dummy variable as my dependent variable..with the i. command.I tried it, but I am getting an error message saying this "depvar may not be a factor variable" so what can I do

@sebastianwaiecon 5 лет назад

You can use a dummy variable (ie. the values must all be zeros or ones) as the dependent variable. See my video on this: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vRKesKWMCsg.html There are special regression tools, such as the multinomial logit, that allow for more complex categoricals as the dependent variable, but I haven't covered those in videos.

@emilbinny 5 лет назад

thanks Sebastian...

@inestnewdocile1646 2 месяца назад

How to check colleration for categorical variables

@simonetaddeo1935 2 года назад

How could I see if even NODIPLOMA is significant in regression? Is there any command which shows all the variables without any base reference?

@sebastianwaiecon 2 года назад

You can force Stata to omit the constant using the noconstant option. This will remove the collinearity and allow all categories to be in the regression. However, be careful about the interpretation and what statistical significance would mean in this new context. I personally wouldn't advise doing what I'm suggesting but that is the answer to your question.

@saaakill 4 года назад

Can you please make video about how we can get the frequency of a category as a new variable.

@sebastianwaiecon 4 года назад

You can do that with egen: egen frequency = count(category), by(category)

@koketsomokoditoa3255 3 года назад

Hi. If i have values such as “grade 1, grade 2 grade 3, grade 10, grade 11 etc” under a variable named “Education” but I want all lower grades such as grade 1-grade 7 to be called “Primary School” and higher grades such as grade 8-grade 11 to be called “High School”- how do I code that in stata?

@keri-annfacey6794 3 года назад

I came on here trying to find answers to the same question. I was able to group my grade levels but not in the order that I wanted. I have grade levels 7-11 and I want to group grades 7-9 as "lower school" and grades 10-11 as "upper school". The command I have is was able to group grades 7-8 as "lower school" and grades 9-11 as "upper school". I am trying to figure out how to group grade 9 in the lower school category. Anyways, Try this and see if it works. gen schoollevel = recode (education, 1,2) Label define schoollevels 1 "Primary School" 2 "High School" As I mentioned this command worked for me but not in the order I wanted. Hope it helps in some small way.

@keri-annfacey6794 3 года назад

Try this video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-XWVaXN2KwmA.html

@sebastianwaiecon 3 года назад

You can do this with logical operators, in this case the "pipe" (|), which means "or." For your example, gen primaryschool = Education == "grade 1" | Education == "grade 2" and so on. Keep adding pipes and statements for each grade.

@Alex-sy4gg 8 месяцев назад

legend!

@emotionalstories8152 2 года назад

Hi, I need your help, Like i have only two educational categories, 1=PhD, 2nd Mphil then how can I estimate the return of two education levels. I have no the third category for the base then how can I find the returns of these two? My problem is different like I want o estimate the private return of education for these two-level, like how much impact on earning after one-year education increase

@sebastianwaiecon 2 года назад

Short answer is that you can't. You cannot estimate a treatment effect if you do not have any observations without the treatment. The best you can do is estimate the difference between the PhD and the MPhil.

@emotionalstories8152 2 года назад

@@sebastianwaiecon I have cross sectional data of 800 respondents.

@domillima 4 года назад

Are the coefficients here your R^2

@sebastianwaiecon 4 года назад

In the upper-right area of the Stata regression output, you can see where it says "R-squared."

@reyaa8593 4 года назад

What if I want the education but only females, or only males?

@sebastianwaiecon 4 года назад

You can add an "if" statement to most commands in Stata if you want to limit your analysis to a certain group.

@sunrose68 4 года назад

I entered the command 'label define .......' but the result is invalid syntax. I just replicated your steps. Why does this happen?

@sebastianwaiecon 4 года назад

You probably have a typo somewhere. Did you forget to put all the numbers in?

@sunrose68 4 года назад

@@sebastianwaiecon I'm not sure where I did wrong😭 Could you have a look at it: sm.ms/image/hjFQGbX1kpwaz9N

@sebastianwaiecon 4 года назад

I'm not sure, but my guess is that it didn't like the slash in Others/Uncertainty.

@zerohero109 5 лет назад

What is the intrepretation?

@sebastianwaiecon 5 лет назад

You'll have to be more specific.

@manpreetuk4277 2 года назад

Is there anyone who help me on Stata exam?

@256hzart 2 месяца назад

Hi I have a string of numbers but it's red. So I use this command to encode but all my numbers have been changed to others. I dont know why, pls help me 😭 🙏🙏🙏

@sebastianwaiecon 2 месяца назад

The command you probably need is "destring." Most likely what happened is that you have at least one value that is not a number, which is why Stata read it as a string. You first need to make sure all the values are valid numbers, then use destring to generate a numerical version of your variable.

@256hzart 2 месяца назад

@@sebastianwaiecon thank u, I can fix it now

@hazelw Год назад

Just wanna turn the var after encoding into real numbers, like 1 for Someone, 2 for Anotherone, ... without claiming them one by one cause there are just too much of them. No one, literally no one can tell me how to do this. I am wondering why we are still using STATA rather than R, which is much more direct

@sebastianwaiecon Год назад

When you encode, the new variable is just numbers (with labels on top). You can see this by clicking on the values in the data browser. Just make sure to set the order to how you want it. From there, just put the encoded variable into the regression without the "i." structure. You can see me do this in the video at 5:40.

@hazelw Год назад

@@sebastianwaiecon Thanks mate:) Just found out that egen group() can also do this.