Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial

Подписаться 273 тыс.

Просмотров 58 тыс.

50% 1

Get my Free NumPy Handbook:
www.python-eng...
In this Machine Learning from Scratch Tutorial, we are going to implement the Logistic Regression algorithm, using only built-in Python modules and numpy. We will also learn about the concept and the math behind this popular ML algorithm.
~~~~~~~~~~~~~~ GREAT PLUGINS FOR YOUR CODE EDITOR ~~~~~~~~~~~~~~
✅ Write cleaner code with Sourcery: sourcery.ai/?u... *
📓 Notebooks available on Patreon:
/ patrickloeber
⭐ Join Our Discord : / discord
If you enjoyed this video, please subscribe to the channel!
The code can be found here:
github.com/pat...
Further readings:
ml-cheatsheet....
towardsdatasci...
You can find me here:
Website: www.python-eng...
Twitter: / patloeber
GitHub: github.com/pat...
#Python #MachineLearning
----------------------------------------------------------------------------------------------------------
This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏

Опубликовано:

12 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 132

@ireneashamoses4209 4 года назад

This is the most clear explanation I have seen!! Thank you so much !! :)

@patloeber 4 года назад

thank you!

@RaunakAgarwallawragAkanuaR 6 месяцев назад

Although in the material a logarithmic loss function was shared but the gradient descent implementation is done using the square/entropy loss function

@sushil_kokil 3 года назад

Thank you buddy! This gives me a lot of sense after my self study of Machine Learning, and using a inbuild sklearn models.

@kougamishinya6566 3 года назад

The way you relate Linear Regression to Logistic Regression makes it so clear thank you so much!

@patloeber 3 года назад

Glad it was helpful!

@kstyle8546 Год назад

there is small typo in sigmoid fuction (1:00) As-is: h_hat = 1 / (1 * e^ "-wx+b") To-be: h_hat = 1 / (1 * e^ "-(wx+b)") Always appreciate you these great videos~

@nadabu.274 4 года назад

Excellent video. I really start to have a good understanding of the ML algorithm after I watch your videos.

@patloeber 4 года назад

that's great!

@priyanj7010 2 года назад

THANK YOU TO THE MOON AND BACK... BEST EVER EXPLANATION I HAD SEEN

@TanmayShrivastava17 3 года назад

Thanks for the video, now everything makes sense that what is going on in the behind.

@patloeber 3 года назад

glad to hear that :)

@jayantmalhotra1449 4 года назад

Great work bro , I am sure you will reach 100K soon . Best of luck

@patloeber 4 года назад

Thank you :)

@_inetuser 3 года назад

thx dude was searching all over the web if you have to put the truncating mechanism with 0.5 into the predict function which is used by gradient descent/ cost f but you successfully showed me that its just for the prediction hypo which is used afterwards

@armaanzshaikh1958 4 месяца назад

Guys for those wondering the gradient descent used here is same as of linear regression so the answer is the derivative of log loss will have same value as X.t*wieghts + bias including 1/n

@hoami8320 Год назад

i was looking for a basic form logistic regression model using algorithmic modeling. thanks you very much . i like your video

@drrbalasubramanianmsu1593 Год назад

Very nice Explanation ... Thanks

@PaulWalker-lk3gi 4 года назад

Is learning calculus a pre-requisite to this series -- I am learning, but feel a bit lost when it comes to the implementations because it is difficult for me to understand the underlying mathematical concepts. I do appreciate the videos!

@patloeber 4 года назад

A little bit of calculus would be good here. Have a look at some free math courses here: www.python-engineer.com/posts/ml-study-guide/

@satyakikc9152 3 года назад

I have 2 qs: 1.why we are transposing x(i checked from numpy documentation it is used to change the dimension, but i cannot get the point here) 2.how we r getting the summation without applying np.sum Can you please ans ?

@marilyncancino4875 3 года назад

thank you !!! your videos help me a lot :)

@gurekodok 3 года назад

Thank you for summarization

@patloeber 3 года назад

glad you like it

@robosergTV 3 года назад

isnt it a single layer neural net with a sigmoid activation function?

@OK-bu2qf 2 года назад

You have explained this very easily. Keep it going on. :) You saved my Ass!!!

@mrfrozen97-despicable 3 года назад

Glad I discovered this video.))))))

@patloeber 3 года назад

glad to have you here :)

@burcakotlu7858 3 года назад

Thank you very much for your video. I wonder why you are not checking your model at each iteration and returning the model with lowest error, instead of returning the model with the last w and b parameters of the for loop.

@patloeber 3 года назад

good point. I wanted to keep it simple here, but in practice of course you can/should check for the best model

@dhivakarsomasundaram21 2 года назад

so to evaluate test data we should not use fit_transform. ....... transform only requires??

@goodboytobi8202 3 года назад

Thank you sir. if we want to use elastic net regularization along with this logistic regression.....how should we approach?

@arieljiang8198 2 года назад

this is great thank you!!!

@Mar-kb8yq 3 года назад

Hi your videos are just awesome! One question: the iterations of the fit method's for loop should correspond to neural network's hidden layers . it's true?

@patloeber 3 года назад

No, this training loop here has nothing todo with neural networks, it's simply how long this optimization should try to improve. However, you can compare it with the number of epochs when training a neural net

@unfinishedsentenc9864 3 года назад

thank you so much.

@patloeber 3 года назад

glad you like it!

@akhadtop2067 3 года назад

congrat because lot of people do not do it from scratch

@arungovindarajan93 4 года назад

Hi bro, you have used squared loss function . But logistic regression has log loss .If we derivate square loss wrt to w and b ,do we get same as log loss derivative?

@patloeber 4 года назад

Hi. The loss I'm showing is the log loss (or better known as cross-entropy). However, the gradient is the same as for square loss in this case. You can check the further readings I provided in the description for a detailed gradient calculation :)

@armaanzshaikh1958 4 месяца назад

No you have used the linear regression cost function you have shown log loss but the derivative answer is of linear regression

@HuyNguyen-fp7oz 3 года назад

Love your video!!!

@patloeber 3 года назад

Thank you!!

@nackyding 4 года назад

When I code and run this model on the "advertising data set" from Kaggle the accuracy is only in the 40-50% range while the sklearn LogisticRegression model is over 90%. I've tried varying the number of iterations and learning but I can't get an accuracy score above 50%.

@patloeber 4 года назад

Please not that this code is not optimized at all. You can try to apply feature scaling before training.

@justAdancer 3 года назад

just started learning this and try running the code on jupyter notebook, It keeps saying no module named logistic regression it might be stupid one, but please let me know why it's happening

@patloeber 3 года назад

Import statement might be different when using Windows and/or jupyter notebook. try from .logistic_regression import LogisticRegression (with the dot)

@anmolvarshney8938 4 года назад

Sir, my code(sigmoid function) is giving exp overflow error in its iteration.How can I overcome it?

@patloeber 4 года назад

this is because your x has a very high number, so np.exp(-x) is getting too high for your datatype. you could cut your x in this case and set a maximum value, or try to use np.float64 instead of np.float32

@paulbrown5839 3 года назад

Nice video, thanks a lot. Very good compared to comparable ones that i have looked at.

@patloeber 3 года назад

Thanks! Glad you like it

@sz8558 3 года назад

Great insight thank you. It would be even better if you could have shown us the raw data and just explained the variables and what exactly we were trying to predict etc....thanks

@ahmadtarawneh2990 3 года назад

Very nice!

@patloeber 3 года назад

thanks!

@adbeelomiunu 4 года назад

Hello, please is there any way we could have access to the JUPYTER NOTEBOOK you made reference to in this video.

@patloeber 4 года назад

Hi, not yet, but I'm planning to release them on my website soon.

@global_southerner 4 года назад

-(wx+b) instead of -wx+b

@ubaidhunts 3 года назад

You save me. thankyou

@amirhosseintalebi6770 2 года назад

hello, in which environment the python codes are written ?

@yukiyoshimoto502 2 года назад

I wonder how can I plot the logistic regression line calculated from that (the boundary )

@ranitbandyopadhyay 5 лет назад

Things are explained much more elaborately

@dhananjaykansal8097 4 года назад

Lovelyyyyy. Cheers!

@KUNALSINGH-zk8su 3 года назад

Can you explain me why i am getting this error ValueError: not enough values to unpack (expected 2, got 1) def fit(self, X, y): ---> 12 n_samples, n_features = X.shape in this 12 number line Edit : when i am doing this with normal logistic regression function from sklearn it works but why not with the one we created

@patloeber 3 года назад

because sklearn can handle incorret shapes and then transforms it to the correct one for you. here you have to do this yourself. X does not have the correct shape here

@fahimfaisal4660 4 года назад

Excellent!

@reetikagour1203 4 года назад

Hi...i am using the below code line to update weight and bias but is giving me the error..could u please help here is the code. w = w - (alpha_lr_rate * dw) b = b - (alpha_lr_rate * db) where w = np.random.normal(loc=0.0, scale=1, size = X.shape[1]) b=0 error: operands could not be broadcast together with shapes (15,) (15,37500)

@nikolayandcards 4 года назад

How can I plot the sigmoid curve the same way you plotted the fitted line in Linear Regression at the end?

@patloeber 4 года назад

sigmoid = lambda x: 1 / (1 + np.exp(-x)) x=np.linspace(-10,10,100) fig = plt.figure() plt.plot(x,sigmoid(x),'b', label='linspace(-10,10,100)')

@nikolayandcards 4 года назад

@@patloeber Thank you! I also added X and y to the plot to see how they go with the sigmoid curve. Works like a charm.

@nadabu.274 4 года назад

I have questions regards random_state, some sources set it to 42 or 95 and when I changed this number, accuracy change as well, for example in the make_blobs dataset if I set it to 95, the classifier gave a good accuracy(~99%) but when I set it to 42 it gave around 88%. Also, I got this error (RuntimeWarning: overflow encountered in exp return 1 / (1 + np.exp(-x)) with exp when I change the learning rate value.

@patloeber 4 года назад

random state let's use reproduce your result. It does not matter which number you use (some people just like 42) . It affects the training and test samples. some splits work better than others. if your x is too large, you can get an overflow because exp(-x) is too large. try to clip your x to a maximum, or try to use datatype float64 instead of float32

@nadabu.274 4 года назад

@@patloeber Thank you very much, I will try these solutions, I tried absolute (abs) but the plotting of loss function look wired (got values under zeros)

@thecros1076 4 года назад

Why use gradient decent ... can't we make the derivative with respect to the parameter equal to zero and find out the parameter w and b ....by solving the equations... please answer this question I really need the answer

@patloeber 4 года назад

Good question! Gradient descent is an iterative approach to this solution. In theory your analytical method is optimal. However, in practice, this requires solving complex equations which is too expensive for higher dimensions. Moreover, in the real world, many cost functions don't have valid derivatives everywhere.

@thecros1076 4 года назад

Is there any resources for this with you ....any blog which can be read....I read the blog on linear regression but could not find the same method for logistic regression

@thecros1076 4 года назад

please bro do u have any blogs for this

@patloeber 4 года назад

@@thecros1076 I have a blog at python-engineer.com. Unfortunately, at this moment I do not have articles for the machine learning tutorials, but they will be added in the future

@arungovindarajan93 4 года назад

It is very hard to equate the derivative to zero and find minima value because in machine learning optimisation derivatives will be in complex forms. let say if you have derivative of some function as 5x+5, you can equate this to 0 and find minima for inputting x values. So that's why GD/SGD algorithms are used in ML

@parismollo7016 4 года назад

This is a great tutorial! Thank you!

@patloeber 4 года назад

Thank you! Glad you like it :)

@parismollo7016 4 года назад

@@patloeber I am reading the book '"Data Science from scratch, Joel Grus" which is really good too, but sometimes I have some trouble understanding the use of some formulas. Your channel is great because it has the same approach "from scratch" and I think it is really useful for those who are interested in the statistics/math behind all the magic of ML

@patloeber 4 года назад

@@parismollo7016 Yes some topics can be challenging, but keep going! I'm happy if this is helpful :)

@damianwysokinski3285 4 года назад

We dont need to define accuracy function. We can use sklearn.metric.accuracy_score instead

@patloeber 4 года назад

Of course you can. But I want to implement it from scratch ;)

@damianwysokinski3285 4 года назад

@@patloeber my bad :P I forgot about main goal of the playlist

@patloeber 4 года назад

@@damianwysokinski3285 No problem. It's actually good that you know about these sklearn functions :)

@4wanys 3 года назад

i think you use the linear Regression cost function not the logistic Regression cost function in your code

@umarmughal5922 2 года назад

Where is the entropy loss implemented in this code?

@bryanchambers1964 4 года назад

Great job. Your code is so concise and logical, but does require a solid background. Just wish I got better accuracy. I used it for the "Titanic" dataset on Kaggle and could only get 66%. Thats the lowest of all the models I have tried. Monte Carlo Markov Chain gave me the best so far at 78%. Any idea of how I can get a better score?

@patloeber 4 года назад

Thanks! Yes the model is not optimized at all. Titanic dataset is all about cleaning and preprocessing your data, so maybe that could improve it :)

@matthewking8468 3 года назад

Where is the loss applied pleased?

@alexanderperegrinaochoa7491 4 года назад

Hi, Can this algorithm be extended to a multi class problem?..

@patloeber 4 года назад

Yes it can. For multinomial logistic regression you have to use the softmax function instead of the sigmoid function to approximate the y. So y_predicted = self._softmax(linear_model). Furthermore you have to apply the cross-entropy as loss function and then calculate the gradients. Then again you can use the update rule with the correct gradient: self.weights -= self.lr * gradient

@sat4GD 3 года назад

what about loss function?

@shaikrasool1316 3 года назад

How implement one vs rest from scratch and intigrate with logistic regression

@stevewang5112 4 года назад

Could you please tell me how to do the logistic regression with L2 regularization?

@patloeber 4 года назад

Hello. Please check out my video about SVM to see how a regularization term is applied. Basically you add the regularization term to your cost function, and then calculate the gradients. You also have to use a regularization parameter to balance the effect of the regularization during optimization. You can also check out this code: github.com/pickus91/Logistic-Regression-Classifier-with-L2-Regularization

@bryanchambers1964 4 года назад

@@patloeber Doesn't the bias term do the same thing as a a regularization term though?

@kritamdangol5349 4 года назад

I used this Logistic Regression model algorithm for prediciton of disease and while pickling i got this error for this model . Can you please explain what kind of error is this and how to overcome . Plz help me out UnpicklingError: invalid load key, '\xe2'.

@patloeber 4 года назад

I guess you try to unpickle something that has not been pickled correctly...

@jaimehumbertorinconespinos3790 3 года назад

TypeError: unsupported operand type(s) for *: 'int' and 'NoneType'

@prashantsharmastunning 4 года назад

hi , i am getting a runtime warning RuntimeWarning: overflow encountered in exp return 1/(1+np.exp(-x)) 0.8947368421052632 what should i do to avoid this?

@patloeber 4 года назад

you probably have values in x with very large negative numbers. Try applying a standard scaler

@prashantsharmastunning 4 года назад

@@patloeber you are the best thnkx..

@prashantsharmastunning 4 года назад

@@patloeber thankx that solved the issue..

@keerthyeran1742 4 года назад

Can I use this in writer identification?? Can U respond fast

@patloeber 4 года назад

Yes LR can be used for classification tasks

@bassamal-kaaki3253 4 года назад

I follow the exact thing you did, yet I get an error that says “object has no attribute ‘sigmoid’ although I typed the exact thing? In addition your code in video and in github is different and needs updating example learning rate with lr :)

@patloeber 4 года назад

Compare again with code on GitHub? There must be a typo somewhere. Do you use ‘self’?

@bassamal-kaaki3253 4 года назад

Python Engineer i copied exactly!! Where is the code on github. Can you kindly provide link.

@patloeber 4 года назад

github.com/python-engineer/MLfromscratch

@bassamal-kaaki3253 4 года назад

Python Engineer Thank u I got the link. I love your videos by the way!

@patloeber 4 года назад

Thanks!

@abhisekhagarwala9501 4 года назад

Hi I wrote your code and tried applying on the below custom dataset. But it raised an error. and i also tried reshaping the vector into 3 features that also gave me error it only work on the code you gave why is that????help me again i tried loading boston dataset and tried your code after that also I faced en error .could you tell me why is that X = np.arange(10) X_train = np.arange(7) y_train = np.arange(7) X_test = np.array([8,9,10]) y_test = np.array([8.5,9.5,10.5])

@patloeber 4 года назад

your X does note have the correct dimension! You have to add one more axis to X_train and x_test: X_test= X_test = np.array([[8,9,10]]). Or use np.newaxis to add a new one (have a look at my new numpy tutorial there I show this.)

@abhisekhagarwala9501 4 года назад

@@patloeber thank you so much for your response. I will use your suggestion. I will share my inputs with you , then you can show me where and why i am making the mistake. That way i will get more insight

@keshavarzpour 4 года назад

how we can update it to multiclass version, more 2 lable ?

@patloeber 4 года назад

Hello. For multinomial logistic regression you have to use the softmax function instead of the sigmoid function to approximate the y. So y_predicted = self._softmax(linear_model). Furthermore you have to apply the cross-entropy as loss function and then calculate the gradients. Then again you can use the update rule with the correct gradient: self.weights -= self.lr * gradient

@dattijomakama9703 4 года назад

@@patloeber Hello, thank you for putting these great tutorials online. We're earning a lot from them. I have implemented the Logistic regression using softmax function as you described above, but its return a score/ accuracy of zero on the iris dataset. for i in range(self.n_iterations): model = np.dot(X, self.weights) + self.bias y_predicted = self.softmax(model) dw = (1/n_samples)*np.dot(X.T,(y_predicted - y)) db = (1/n_samples)*np.sum(y_predicted - y) self.weights -= self.lr*dw self.bias -= self.lr*db def predict(self, X): model = np.dot(X, self.weights) + self.bias y_predicted = self.softmax(model) print(y_predicted) return y_predicted def softmax(self, x): return np.exp(x) / float(sum(np.exp(x)))

@PenAndPickaxe 4 года назад

@@dattijomakama9703 same it doesnt work for me too. My model is predicting only one class despite having a balanced dataset. EDIT: Run it for about 1000 iterations, it worked for me.

@adithyarajagopal1288 4 года назад

SHOUDNT DW HAVE AN NP.SUM OUTSIDE AS WELL, YOU HAVE SUMMED UP THE DB's BUT NOT THE DW's

@patloeber 4 года назад

the dot product already includes a sum (np.dot applied for dw)

@babaabba9348 3 года назад

I'm a bit confused the teacher showed us a different way where the gradient descent is calculated by an ugly formula that involves logarithm

@patloeber 3 года назад

the cost function I showed also involves the log. Not sure which final formula your teaches uses, but is is very common to keep the logarithm in order to avoid overflow for large numbers...

@lucasmrtiins_ 4 года назад

Why do we use the bias?

@patloeber 4 года назад

The approximation is w*x + b. You can think of this in 2D, then this is equivalent to a line equation m*x + t. The bias is the intercept. It can shift the whole data up or down. So if your data is not centered around the origin, then we need to shift it to get the correct prediction. Hence we also try to learn the bias/intercept.

@passionatedevs8158 4 года назад

Logistic Regression reduces Log Loss. U r reducing Square Loss... Why it is so?

@patloeber 4 года назад

Hi. The loss I'm showing is in fact the log loss (or better known as cross-entropy). However, the gradient is the same as for square loss in this case. You can check the further readings I provided in the description for a detailed gradient calculation :)