Hey! This is the best explanation I have ever seen in the internet .I was trying to understand these concepts watching videos, etc but without positive results. Now I understand how these networks function and their structure. I have forgotten my calculus and here you explain the chain rule in very simple words anyone can understand. Thank you for these great videos and God bless.
Good Job! This is a tough subject and you tried, with success, simplify the explanation as much it was possible. I did the AndrewNG course at coursera, and his explanation even for me that had previous knowledge of maths involved was difficult to understand. Now I think you should implement in this algorithm in Python, for example.
Great! Finally you understand something. Without a hidden layer it is a bit difficult to understand how to apply bckpropagation. But the thing that doesn't explain any tutorial is this and you would be the right person to teach us. I use keras but also python would be good; "How to create your own classification or regression dataset". Thank you.
Thank you for your comment! At the end of the video the generalized case is briefly explained. If you follow the math exactly as in the single-weight case, you will see it works out. If I find time, I may make a video about that, but it might be a bit redundant.
I understand the logic and the thoughts behind this concept. Unfortunately I just can't wrap my head around how to calculate it with these kinds of formulas. But if I saw a code example I would understand it without an issue. I don't know why my brain works like that. But mathematical formulas are mostly useless to me =(
thank you, it really helped me to understand the principle behind the backpropagation. In the future i would like to see how to implement it with layers that have 2 or more neurons. How to calculate the error for each neuron in that case, to be precise
Don't you hate it when the lecturer goes outside for a cigarette in the middle of a lecture... but continues teaching through the window. Yes, we get it... your powerpoint remote works through glass! But WE CAN'T HEAR YOU! XD
At 7:53 what are the values for a and y that have the parabola experiencing a minimum around 0.3334 when for a desired y value of 0.5 the value of "a" would have to be 0.5? That is, the min for the cost function occurs when a is 0.5 so why in the graph has the min for it been relocated to 0.3334 ?
What changes in the equation if I have more than just 1 Neuron per Layer though? Especially since they are cross-connected via more weights, I don't know exactly how to deal with this.
I don't know anything about this subject but I was understanding it until the rate of change function. Probably a stupid question but why is there a 2 in the rate of change function, as in 2(a-y). Is this 2 * (1.2 - 05)? Why the 2? I can't really see the reference to the y = x^2 but that's probably just me not understanding the basics. Maybe somebody can explain for a dummy like me. Wait maybe I understand my mistake, the result should be 0.4 right? So its actually 2(a-1) because otherwise multiplication goes first and you end up with 1.4?
The derivative of x^2 (x squared) is 2x. The cost function C is the square of the difference between actual and desired output i.e. (a-y)^2. Its derivative (slope) with respect to a is 2(a-y). We don't use the actual cost to make the adjustment, but the slope of the cost. That always points 'downhill' to zero cost.
There is one thing I dont understand. Suposse you have two inputs, for the first input the perfect value is w1=0.33 But for the secon input, the perfect value would be w1 = 0.67. How would you compute the backpropagation to get the perfect value to minimize the cost function?
Run multiple experiments with different inputs and measure the outcome: if the outcome is perfect, there is no learning. How would you answer the question?
nice video, im a little confused with which letters for whitch values - a = value from activation function / or just simply output from a ny given neuron? - C = loss/error gradient and which of these values qualify as the gradient?
a=activation (with or without activation function) C=loss/error/cost (these are all the same thing, the naming varies between textbooks and frameworks) WRT gradients: this is a 1-dimensional case for educational/amusement purposes. In actual networks, you would have more weights, therefore more dimensions and you would use the term 'gradient' or 'jacobian', depending on how you implement it etc. I have an example with two dimensions here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Bdrm-bOC5Ek.html
No one ever says when multiple layers and multiple outputs exist when the weights get adjusted do you do numerous forward passes after each individual weight is adjusted? Or do you update ALL the weights THEN do a single new forward pass.
Yeah, single forward pass (during which gradients get stored, see my other videos) followed by a single backpropagation pass through the entire network, updating all weights by a bit.
If I had different amounts of neurons per layer, then would the formula at 11:30 be changed to (average of the activations of the last layer) * (average of the weights of the next layer) ... * (average cost of all outputs)?
Can you please tell me how did you graph that cost function? I plotted this cost function in my calculator and I am getting a different polynomial. I graphed ((x*0.8)-0.5)**2 thanks.
Hi and thank you for your question. I've used Apple's Grapher for all the plots. It should look like in the video. Your expression ((x*0.8)-0.5)**2 is correct.
Brilliant. Now generalize that to any sized layer and any number of layers. I suppose you won't need bias units at all. You have just solved deep learning. Profit.
it was moreless comprehensible until the "mirrored 6" character appeared with no explanation of what it was, how it was called and why it was there. so let's move on to another video on backpropgagation...