My eyes are filled with tears after watching this video. I thought I was too dumb to learn ML, and I almost wasted three weeks trying to understand it. But this helped me so much! Thank you, dada.
One more problem for high dimension data , "Before coding, we have to choose 1 and 0 class in the data wisely." Like in example, you took two dimensional data, so you were able to plot it and analysis that, this has to be assign 1 and this to be 0 because this class will be positive side of line and this will be negative side at the end. But...in high dimensional data how will you decide which class will to be assign 1 and which to be assign with 0, because you can't plot and visualise it.
The 0 and 1 is to classify the output column only. So for any number of input columns, the output column values will be either 0 or 1. Logistic Regression is supervised model used for mainly classification problems (which means the data will be categorical) The line equation WiXi = 0 (i=0 to n), is for any amount of input columns and give the separation line on the basis on this equation. Hope this helps!!!
What will be the value of m and b when there are more than 2 features(higher dimensional data). As we'll get more than 2 coefficients, how will we calculate the value of "m" for the hyperplane.
There is no any slope-intercept form of a hyperplane , we have general w0x0+w1x1+...+wnxn=0 form only. So, no need to think about m and b for it. w0 can be interpreted as intercept and wi (i=1,..n) can be interpreted as multiple slopes, each corresponding to a different feature axis.
sir you teach best, explain best but the problem is jo sikha hey vo kaha pey apply kare samaj mey nahi aata. Aap bolte ho kaggle kaa koi bhi dataset uthavo orr deko lekin dataset milta nahi hey same problem pey
excellent explanation. But I have a doubt. I have a line equation 3x + 2y- 9 = 0 and a point ( 1,1 ) which is in the negative region of the line. I want to bring this negative point to positive therefore, I am doing as per the transformation logic explained in the video: (3+1)x + (2+1)y + (-9+1) = 0 which makes the transformed line as: 4x + 3y - 8 = 0 However, when I checked, the point (1,1) is still in the negative region of the transformed line. can anyone explain please?
hi, in code, under the function -> (df Perceptron(X,y): there are 1000 epochs, in each epoch, we have to traverse each data point. But in code, in every epoch, we are only picking up 1 data point and updating the equation using one point only. So, according to me, there are 100 data points, so in each epoch, we have to use all these 100 data points to update the coefficients. Is my understanding right or have I missed something? Please let me know. Thanks
In each epoch, you randomly pick just 1 data point. If and only if this data point is mis-labeled, you update the weights (coefficients) accordingly. Hope this clears your doubt.
@@sandeepm9313 thanks for the clarification, I still have one doubt regarding the sigmoid function. In the video, you mentioned that we are using the sigmoid function because we also want to use correctly classified points to update the weights, but because of step function output, there is no update for the correctly classified points, so you told instead of using step function we will use the sigmoid function. What I am thinking, If we do not use step function or any other function and just use the wtxi to update the weights, then also we can avoid new weights= old weights. I think, we are using sigmoid function to avoid the impact of the outliers. Please clarify my doubt. Thanks
@@muditmishra9908 Could you pls clarify what you meant by wtxi? By the way, this is not my channel but Nitish Sir's. So the credit goes to him. If you meant weight times i, then think through this: 1. You initialise w to [1,1,1]. In first iteration, i = 0. So w will remain the same ( w*i is [0,0,0]). 2. Next iter, i = 1. So updated w is [0,0,0]. You end up with no line. 3. Next iter, i = 2. Updated w is [-2,-2,-2] which is same as [1,1,1]. Recall that equation of line is C+Ax+By=0, where C,A,B are first,second and third elements of w matrix. So in subsequent iter also you are doing w_new = w_old - (a number* w_old) which when put into the line equation gives same line. Hence you are not improving anything. Hope i caught your point correctly. If not pls mention.
@@sandeepm9313 According to me the, the sigmoid function is not a substitute for step function, we are using sigmoid to taper out the extreme values only, i.e the use of sigmoid is to neutralize the effect of outliers
@@muditmishra9908 Well, this is a classification algorithm (a binary classification in this case). The aim is to draw a decision boundary ( one line) which separates positive and negative data points. Even though the step function did give a boundary through an iterative process, it was not the optimal one. So step func was replaced with sigmoid so that any random point picked (only one is picked) in the current iteration would not just update the w matrix but the extent to which the point would update the w matrix was dependent on how far off it was from the current boundary line ( when I say current we are still in the process of figuring out the line iteratively). Now coming to your point about outliers. The sigmoid approach would have the effect as you said, but the approach is not implemented neutralize outliers. Recall that in each iteration we are picking a random data point and suppose we have just one outlier. Since we pick data point randomly, it may be possible this outlier is never picked in the entire iteration. Even if it is picked, it will have minimal effect on updation of line's w matrix since it is far off from line (unless the line passes close enough to the outlier , the chances of which is less since we start of with w matrix of [1,1,1]). That is to say outlier probably will have close to zero effect on the algorithm using sigmoid as implemented above. The approach is counter-intuitive to your logic.
i am getting the error that name step is not defined please help me sir def perceptron(X,y): X=np.insert(X,0,1,axis=1) weights=np.ones(X.shape[1]) lr=0.1 for i in range(1000): j=np.random.randint(0,100) y_hat= step(np.dot(X[j],weights)) weights=weights+lr*(y[j]-y_hat)*x[j] return weights[0],weights[1:]
I cannot understand the code there...what is happening is beyond my head totally Can anyone tell me where am i doing wrong.....or am i not meant for machine learning I easily understand the theoritical part but the code part is too tough for me