Understanding the glm family argument (in R)

Подписаться 2,5 тыс.

Просмотров 20 тыс.

50% 1

The goal of this video is to help you better understand the 'error distribution' and 'link function' in Generalized Linear Models.
For a deeper understanding of GLM's, I'd recommend the book "Generalized Linear Models" by McCullagh and Nelder. This is a book well worth buying, but I also (somehow) found an online version: www.utstat.toronto.edu/~brunne...

Опубликовано:

13 окт 2020

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 35

@Insipidityy Год назад

THIS video really made GLMs "click" for me. I spent hours trying to figure out what do link functions and families mean, and I have found no better explanation. Thank you so much!

@joshuavernontanner1326 2 года назад

I'm floored by how clear this was. Incredible teaching ability, thank you!

@laure189 3 года назад

Absolutely fantastic video! Thank you SO much! And I truly appreciated the little recap on error distribution as well, extremely helpful!

@ibntuahirabdulhaqq5866 Год назад

this is not the first I am watching this video and any time I do I wish u made them a series of videos. Thanks for giving the best explanation and taking the time to respond to queries in the comment section. Much love

@kasperwelbers Год назад

Thanks! That's really nice to hear. I do intend to get back to creating some videos soon. Still figuring out how to do this more consistently besides work, but I really love how these platforms bring together people that are intrinsically motivated to teach and learn, and I greatly appreciate hearing that you enjoy my current contributions.

@Roy-xr2wq 3 месяца назад

Best Explanation, the visuals bring the whole idea into life. Thanks

@davidgao9046 5 месяцев назад

very clear layout and superb explanation for the intuition. Thanks!

@os2171 2 года назад

Awesome work! thanks! Cheers from a PhD Neurobiologist candidate from Colombia.

@yahiarafik9965 Год назад

Very clear, thank you so much for this explanation, you just helped a lot of people in my major :))

@PP-im6lu 2 года назад

Wow, what a great explanation!

@dulquerpauly321 3 года назад

You absolute legend thank you mate.... Subscribed already :-)

@JinaneJouni Год назад

Really well explained! Thank you very much

@sotinupuerto 2 года назад

Great video!!! So helpful :)

@m9017t Год назад

Very well explained, thank you!

@md.masumbillah8222 2 года назад

many thanks! for uploading

@syedahmedali8118 3 года назад

great lecture

@gurns681 2 года назад

Great video. Thank you

@anandimerchant5180 Год назад

Thank you! Well explained...

@mimzu89 3 года назад

thank you!

@yiyuanzhang6335 3 года назад

thank you very much! however, it is still unclear for me when should i use which family and which link function. should i check how error terms distributed and then decide which family and link function to use?

@kasperwelbers 3 года назад

You're welcome :). Regarding the choice for which distribution family, a good place to start is to look at and think about your response/dependent variable (though also see my answer to Jessica Hough about the conditional distribution of the response). You're right that you'll eventually want to look at the errors to get an idea of whether your model makes sense. But note that you can only check the errors after choosing a model. So you don't check the errors of an ordinary regression to determine the 'right' family/link (which I thought you were implying). There's a chapter (12) on this model checking loop in the book I mention (link in description). So where to start this loop? You should first think what type of distribution makes sense for your response. What are all the possible values in your response, and what type of distribution could produce this type of data? For instance, if your response is binary, then a binomial distribution makes a lot of sense. If your response is a count (0, 1, 2, etc.) with a low average (e.g., nr of comments to youtube videos), then a Poisson distribution might be good. As such, it really pays of to learn a bit about common distributions. As a starting point, I would recommend learning about the Binomial and Poisson distribution. At least in my field, these are heavily used (logistic regression and poisson regression). Finally, for the link function I would recommend initially sticking to the canonical link functions. These are also the default functions that R uses for each family. Aside from the logistic versus probit link for binomial regression I've rarely encountered studies using other link functions.

@jessicahough569 3 года назад

Hey! I am a master student looking to make a GLM incl. random error. I find that both my dependent and independant variables (all continuous) do not follow any of the distributions. I also don't know if I should then try and transform them before putting it into this model? Also, what part of my data needs to follow these distributions? Independant or dependant variables? What if after I transform them, some follow poisson (eventhough they are not discrete integers) and others follow a normal distribution after I add log? Shall I admit defeat?

@jessicahough569 3 года назад

Also- really helpful video! Thank you!

@kasperwelbers 3 года назад

Hi Jessica, That's a pretty big question, and I'm afraid there's no simple answer. The general guideline is that you choose the distribution family based on the distribution of the response (i.e. dependent) variable, but more accurately it should be the family for the 'conditional distribution of the response'. Notice (around 11:30) that the distribution in the random component takes the expected value from the systematic component. For example, say your dependent variable is the weight of fruit, but your data contains both apples and melons. Lets assume that for both apples and melons the weight is normally distributed, but apples and melons do have different means. So if we throw them together, our distribution might suddenly be bimodal (we'll have two 'humps'). But to model this we could still use a regular normal distribution, as long as we include the type of fruit as an independent variable. 'Conditional' on whether the fruit is an apple or melon, the distribution is normal. So if your dependent variable has a very weird distribution, something like this might be at play. Try also looking at graphs of your dependent and independent variable(s). Also, try to first think what distribution would make sense for your dependent variable. Is it a count variable? A proportion/percentage? Is it time between events? The number of events within a given time? This often largely determines what sort of distribution would make sense. Try putting to words what type of measurement you have, and googling for modeling strategies. There are some tweaks to GLMs that could work. For instance, if you have non-integer 'rates' rather than counts, you might use poisson with an 'offset'. So don't admit defeat to early! :)

@jessicahough569 3 года назад

@@kasperwelbers Thank you so much! I actually ended up with lmer, but have hit a new hic-up as my dependant variables are not so dependant afterall (they effect eachother). So now I am looking at glmtbbr, but have no idea where to start! *Sigh*

@darmaw22 3 года назад

I'd really appreciate it if you could let me know how you managed to have a list of arguments shown instead of lines. Many thanks!

@kasperwelbers 3 года назад

Do you mean the drop-down list, as seen at 0.25? On Ubuntu you get that by pressing tab when your cursor is between the parentheses of a function.

@darmaw22 3 года назад

@@kasperwelbers Many thanks! That is exactly what I meant.

@shahfahadalishah1152 3 года назад

very interesting

@MrSazid1 Год назад

Dude. Omg. Ty

@paphiopedilum1202 2 месяца назад

thank you french accent man

@brazilfootball 3 года назад

What the hell is a link function?

@kasperwelbers 3 года назад

I get that sentiment :'), so I kind of hoped this video would have been a good answer to that question. The key to understanding the link function is to first get a good grasp of the systematic and random component of the GLM. The link function is really just a simple transformation (like log) to link these components. If you're comfortable with the regression formula, the punchline is around 8:30 to 11:30

@djangoworldwide7925 Год назад

Nice video but at the end to state "identity" without explaining about the fact it's the I matrix, is a bit lacking

@kasperwelbers Год назад

Thanks! About the identity function, I think its uncommon to use matrix notation for link functions because many are non linear. So I prefer to also just think of the identity link more generally as an identity function. But maybe I'm missing something?