Excellent explanation my friend . I loved it .I am CSE professor of age 61 years . May god bless you .Please provide vedios like this so that many of student community can learn. May godess saraswathi bless you for your bright future.
Your way of explanation is so simple and organized that any one can understand. I enjoyed learning Perceptron, you are amazing educator. Thank you for such content. :)
Correct me if I am wrong. The weight changing algorithm is increasing the weights if the target value is higher than the actual value and vice versa. That will make the output in the next iteration closer to the target output BUT it will not be that way if suppose the inputs (xi) are say always negative. In that case the weight changing has to be modified that is if the input was negative then the sign of delta weight should be reversed. Eg. : x1 = -1, w1 = 1 -> x1*w1 = -1 (actual output) , sigmoid(-1) -> 0 target output = 1 delta weight = (n) * (1) Assuming n to be 0.1 then delta weight = 0.1 So the new weight becomes -> w1 = w1 + delta weight w1 = 1.1 But now running it again we see=> x1 = -1, w1 = 1.1 -> x1*w1 = -1.1 (actual output) sigmoid(-1.1) -> 0 This makes the algorithm even worse now. So we should have made sure that as the input was negative rather than adding the delta weight we should have subtracted it. so w1 = w1 - delta weight = 1 - 0.1 => 0.9 x1 = -1 , w1 = 0.9 => x1*w1 = -0.9 sigmoid(-0.9) = 0
I have gone through few videos about the topic and did not get the clear understanding. But your video was very clear and examples were very simple to understand, great job and keep up the good job. A big thanks for explaining things clearly.
I have a question: Does the perceptron use a sigmoid function as I know. Perceptron is using the step function. Logistic Regression uses the step function.If I am wrong correct me.
How can a set of data be classified using a simple perceptron? Using a simple perceptron with weights w0, w1 , and w2 as −1, 2, and 1, respectively, classify data points (3,4); (5, 2); (1, −3); (−8, −3); (−3, 0).
Wouldn't all the weights be equal after all the iteration as the delta we are adding to each of the weight is always the same for all in any iteration. (Assuming the weights were same at the start) ?
I think the error term of (yi - yi bar) takes care of that. As the iterations go your error term will also becomes smaller and smaller until it converges eventually.
Thankyou for the very clear explanation, it was was a pleasure to learn. I have a question on the activation function, x.w+b, since we are using a squashing function should it not be x.w+b < 0.5 for 0, and x.w+b > 0.5 for it to be classified as 1. Thanks again