From Scratch: How to Code Linear Regression in Python for Machine Learning Interviews!

Подписаться 55 тыс.

Просмотров 15 тыс.

50% 1

Linear Regression in Python | Gradient Descend | Data Science Interview Machine Learning Interview
Correction:
8:40 gradients should be divided by "m" instead of "n".
🟢Get all my free data science interview resources
www.emmading.com/resources
🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
🔵 Data Science Resume Checklist www.emmading.com/data-science...
✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
// Comment
Got any questions? Something to add?
Write a comment below to chat.
// Let's connect on LinkedIn:
/ emmading001
====================
Contents of this video:
====================
00:00 Linear Regression Overview
03:58 Gradient Descend
06:32 Linear Regression Implementation
10:48 Time and Space Complexity

Опубликовано:

1 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 32

@diegozpulido 3 года назад

Hi Ema. Thank you very much for your videos. Thanks to them I got a Senior Data Scientist position at Facebook. I will forever thank you for your exceedingly good work.

@emma_ding 3 года назад

Glad to hear the videos are helpful. Congrats on your position at Facebook!

@jijyisme 3 года назад

The explanation is very clear and concise! Thank you so much. Please keep it going.

@jeoffleonora4612 3 года назад

I like the step by step guide. And I learned a lot from your implementation.

@sitongchen6688 3 года назад

Thanks Emma!!

@XuJiBoY 3 года назад

Very helpful video, clear and concise! One minor correction: at around 8:40 (updating the gradients in method compute_gradient()) I think you meant to divide by m instead of n.

@emma_ding 3 года назад

Yes, the gradients should be divided by m. Thanks for the correction!

@emma_ding 3 года назад

Correction: 1. Thanks to Ji Xu - at 9:39, it should be divided by m instead of n. 2. Thanks to Hui Yi - at 9:39, beta_1 should be beta_other.

@leon_0907 3 года назад

Should the beta_1 here be beta_other at 9:39? As beta_1 was not previously defined in the function.

@YEIYEAH10 3 года назад

Excellent, please keep it up!

@emma_ding 2 года назад

Thanks, will do!

@LuciaCasucci 3 года назад

Thank you for the videos, Emma, I am on the very last round of Amazon and Microsoft for senior data scientist and I am finding your material excellent for prep and review!!! One thing I cannot understand what does the subscript j mean? how is it any different than i (slide at min 6 and 6 30)

@jessieshao1397 3 года назад

Waiting for the new video for a week! It's coming!

@cccspwn 2 года назад

One of the important interview questions that I've seen for linear regression is: "What are the linear regression assumptions"

@emma_ding 2 года назад

Thank you for mentioning this! What a great tip to share with us!

@brandonhuang508 2 года назад

great video

@jamiew3986 2 года назад

Hi Ema, thanks for your video. I wonder how we can explain concepts like gradient decent, maximum likelihood, loglikelihood verbally in an interview?

@nanfengbb 3 года назад

Thanks for posting a great video. Quick question, I noticed that you chose iteration = 100 and learning rate = 0.01 here. Is there a relationship between iteration and learning rate, e.g. iteration*learning=1?

@user-me2mm2xu7j 3 года назад

Bear with my rusty math .... Calculating derivative of y over betai, at 6:27, how come it become Xji? It is making more sense if it is the derivative of y_hat over betai... consistent to the earlier slide at 2:50?

@techedu8776 5 месяцев назад

Time complexity is assuming this is the code that would be used, which does not leverage GPU. With parallelization, a vector multiplication may be computed in constant time, reducing time complexity to O(M) from O(MN)

@florachen9654 3 года назад

I believe in the gradient descent step, it should be beta[i] -= (gradient_beta_other[i ] * learning_rare) Since traversing down a slope requires taking the opposite sign of the computed the gradient

@shisk1 3 года назад

As she explained at 10:28, it's because the error term is calculated the other way around (derror_dy = 2 * (y_i_hat - y[i]), where y[i] is subtracted from y_i_hat. If you were to reverse it as in the traditional way, that is subtracting y_i_hat from y[i], then using your suggestion would work.

@Han-ve8uh 3 года назад

Thanks for sharing the implementation details. The explanation at 10:30 was hard to understand, specifically the reasoning that "if yhat>y, derror_dy is negative and thats why we add". Beta is being updated here, so the gradient is wrt B, so before reaching the error, B goes through yhat, then yhat contributes to E. That slide shows dE/dy, but where is dy/dB? It feels like a part of the chain rule was missing in that explanation, and directly jumped to "negative gradient" which includes dE/dy * dy/dB, but the latter term is not talked about in that slide.