I have spent hours reading scientific papers and trying to get some minor misunderstandings of mine. After all it took me just 20 minutes of watching this video to understand everything. Siraj, you are a great lecturer!
The timing is just right! Thanks Siraj, don't listen too much on harsh comments - your style and presenting is very good. You are not talking down on audience & it really helps to grasp the material.
Siraj I love you. I can literally say that you are a man who found your way and that made you incredible. and I can tell from your passion and the way you speak, because of that you love yourself too which is so goddamn beautiful. I deeply appreciate your work and explanation and more importantly your energy... keep it up, buddy.
For those who want to learn to derive the backpropagation for LTSM, I would avoid using this code, I already found a couple bugs, in addition, to the syntax errors and the confusing variable naming. I don't know LSTM (yet), but I think my calculus is right. RNN backprop function - when the error variable is redefined as w*error, it's missing a dsigmoid(oa[i]) , and don't know why it's still call error, it's actually dL/dH which will be propragated into the LSTM cell LSTM backprop function - when computing ou (should be dL/dwxo, or dweight for output gate), it should be dsigmoid, not dtangent in the derivative these are bugs I found as I was trying to figure out the code.
Most people are ungrateful jealous Bacteria & fungus , who can't say anything nice. Siraj is doing such a good job putting out good content on otherwise hard to understand AI content , in simple terms , and that too for free. Some mistakes happen and that is ok. All the jealous fellows find such a problem with Siraj's content . . . then why the hell aren't they pumping out quality videos instead of complaining here anonymously ?? Siraj will keep learning and teaching . . . so he will improve and so will his followers . . . and be more job ready . . . .! God Bless you Siraj. Although It is grammatically to say 'It's Siraj' I think ' I'm Siraj or Siraj here makes more sense.' It is usually used to refer to inanimate objects :p
You are awesome man, I have been watching your videos and learning a lot. I am completing master`s programme in Prob. theory and by watching your videos I am really understanding the core of programming mathematical models in Python. Sincerely appreciate your effort, please keep making more videos.
I wish all my lecturer in collage is like you!!! OMG!!! OMG!!! If so, I don't want to leave collage or graduate soon anymore... Love yaaaaaaaaa so much!!!
This is very trivial thing about the notation, but, in the equation of f_t, bias b_t shouldn't be b_f because he uses W_f.(15:55) Thank you for your informative videos!!
Some of you asked some basic questions here and I had to search a bit too. Original code comes from Kevin Bruhwiler on Github, search its "Simple Vanilla LSTM" that's exaclty the same code but without the typo of Siraj. Jupyter isn't needed, you can just execute its code like you usually do . Also, note that it doesn't provide the text file which can be anything (Kevin used a text by Shakespeare). It is also DAMN SLOW ! One iteration takes few seconds , reaching 100 iterations is done in about five minutes on my poor laptop and it's not sufficient to get an interesting result. Here, while I'm writing, I reached 735th iteration with a small text ( maybe too small) and none of the words generated is readable. So Siraj, you did probably a good job about your explanation but that example needs some improvement or better guidelines
Agreed. It is too slow. Probably if you code in C++ the performance is much better. I'm wondering how Tensorflow manages to perform such task in much smaller time.
4:56 "The problem is that, uhh, there are 99 of these problems" Dude you fucking had me! You are hilarious! Thank you for some great content for an aspiring machine learner!
There are actually two ways to handle the initial values for the memory cells. If I understood your code you initialized them all to zero, but in some implementations of backpropagation through time once the initial state is reached the remaining error signal is used to adjust the initialization vector.
Don't listen to the haters, you're an epic vid creator, a real learning accelerator, master matrix manipulator, a real hidden state operator. BOOM! Keep it up. I do like that you're going back to basics with numpy here. Your only flaw is using Python 3. JK
There are 2 lines of code in the github source for this project that are incorrect. 'for i range(yada, yada,yada)'. That should read 'for i in range(yada,yada,yada)'. The other line of code that chokes, is the line that says 'alt text' in the source. I installed Jupyter Notebook and tried to run it, and couldn't find any sort of 'run' function. What are those In [ ]: thingies anyways? :-)How did you get the code to run and produce the output you had shown at the end of the lesson? I may be doing something wrong. I hope to figure this out soon. I have an idea.... :D I still enjoy the lesson and am still learning from it. Is this forum the only way to contact you? I'd rather not post such lengthy messages publically. I tend to be Verbose. Keep at it. I really appreciate your contributions. Çheers. :-)
Quite a late response, but It's when I saw this video ^^" Jupyter Notebook uses "cells", which you can execute independently. If you have one of the cells active, press Shift+enter or the "Run" icon on top and it executes the active cell. the '[]' is empty when it hasn't been run, and fills with a number when it is run. Hope that helps!
Hi, Siraj! In RNN used for stock prices, often people normalize the prices for a given window. Why is this necessary? There is another (better) method for it?
Both routines from Siraj and Kevin's appears to have problems in routine 'ExportText()': when calling 'np.random.choice(data, p=prob)' we get 'ValueError: probabilities contain NaN'
Thanks again for the vids & tuts Siraj! Quick heads up tho, be sure to update your MOI playlist, seems like you've been forgetting to do that since week 6!
Siraj, what kind of model would you suggest for the following: 1) predict daily sales for lets say google play store 2) we have past data for every line item of purchase 3) With this provided can we predict each line item for the next day 4) I'm interested in line item level as it could be used to generate other sort of insights like: which category will sell more, demographics of the customer and so on 5) Is something like this possible? Looking forward for your thoughts on this
LSTM cells considers both previous input and previous output or only previous output. Because some use cases only output will confuse and we will need previous input features.
Ran the program to trace through it and error value never changed. Also had the following spewed out: RuntimeWarning: invalid value encountered in double_scalars prob[j] = output[i][j] / np.sum(output[i]). Never saw in the video any real output statements. It was like you were showing some fictitiously made up output.txt file??? Please explain.
My Doubt: In an RNN,lets take an input data as [[2,3,4],[2,4,6],[5,6,8],[3,6,8] [3,5,7],[1,2,4],[5,7,8],[75,4,] [4,4,6],[0,6,0],[2,5,3],[2,2,2] [1,1,1],[4,8,0],[2,4,3],[6,4,0] [3,5,7],[1,2,4],[5,7,8],[75,4,]] This is a 3D array of shape (5,4,3).Am I right? My question is How is this input data processed in a RNN? How the dimension of the weight matrix(between input and hidden neuron) is determined? How the dimension of the another weight matrix(between hidden neuron and previous hidden state) is determined? How does the input data undergo computation? I know the equations. s(t) = tanh[Ux + Ws(t-1)] y(t) = softmax[Vs(t)] I need to know how the matrix is multiplied and the bias is added? I don't know how to determine the shape of the bias and other weight vectors(U,V,W). I need to know the computation in detail. You can suggest some related articles to read. Thank you
@@aidenstill7179 It will be sufficient to use the deep learning frameworks that have already been developed e.g. Keras and Tensorflow. Once you are comfortable developing applications with these you could look into building your own, but I doubt you will feel the need aside from being interested in doing that.
I may have set my learning rate too high. I got to a point where the error on Iteration kept going up and then down to the same numbers. Also I had to use the tweaks in someone's fork of Siraj to get it to work, but I also had to make another "rounding" tweak because I was getting the error "np.random.choice: probabilities do not sum to 1" so I had to re-scale the probabilities to sum to one. Needless to say the output was junk over 500 iterations. Learning a ton though.
Hey siraj, I did download the code and I tried to run it, but it failed. I checked and tried to fix the problems (missing variables, outputs and such), and run it again for a day, I used 42 Kb text file (Turkish text) for training, but the output was giberrish as in " . . . vVVvAV:,,VVDAAVVVVvvvV:.k . . . " --> "which is, Trust me, not turkish" . I mean I'm not expecting well written text, but I hoped to get some words atleast, there was not a single word. Can you fix the code and upload it again when you get a chance. because it seems like I messed it up while trying to fix it. Thank you!
def LoadText(): replacing text = list(data) with text = list(data.split(' ')) should give better predictions The earlier one predicts at a character level the new one at a word level
Hello, many thanks for this video, I have a question: How do we know that the forget gate is really forgetting ? I mean how do we mathematically set it to make sure its function is to learn what to forget and not something else ?
Hey there, Mr. Siraja sauce. You need to go over the forwardProp method in the RecurrentNeuralNetwork class. You missed an "in" in the for loop and an equal sign in the second to last line in the for loop.
Also the forwardProp doesn't catch the correct number of arguments from the LSTM.forwardProp. I'm guessing the line should be, cs, hs, f, inp, c, o = self.LSTM.forwardProp() is this right?
I may have misunderstood your code but in this blog colah.github.io/posts/2015-08-Understanding-LSTMs/ the creation of new candidate values to add to the cellstate is done within the LSTM cell however in yours it is done (I think) using a RNN. Is there anything wrong with creating candidate values within the cell or am I going down the wrong route?
So, I've been experimenting with LSTMs with learning both letter by letter and word by word. Why did you choose word by word? My networks seem to predict better when learning letter by letter. Also, maybe I missed it, but how do you inpiut your data? Is it in one hot tensor form like (num_batches, sequence_len, vocab_size)? Thanks, your vids are good.
Hey Raval, do you know if the neural nets learn better if (for example a man stands in front of a greenscreen) and u exchange the green with a random noise, or it is better to exchange with real backgrounds. Maybe with the random noise the net doesnt learn features out of it or?
Siraj, what would you recommend to someone who wants to evolve recurrent lstm networks? Tensorflow by itself does not provide evolution algorithms as far as I know.
Hi... I'm a big fan. I have only one piece of feedback for now. I get halfway through typing the Python code, and then your body blocks the tail end off of a critical piece of coding. I'm still learning, so filling in the blanks is a little difficult. Sometimes I have to find just the right frame in order to see the actual proper code. Could you please work on keeping your body out of the way of the source. Other than that, I think you're doing a great job here! Thank you Siraj. :-)
Disclamer: New at all of this stuff I have a question, I dont see any bias vector in the LSTM cell. Am I looking it wrong or what?. Then in wikipedia Ive read that to calculate the output of each gate not only do you need to add a bias but also theres another matrix of weights that needs to be multiplied to the previous hidden state and the sum of all of those is the output of the gate. for example for the forget gate: f = sigmoid(Wf*x(t)+Uf*h(t-1)+bf) where W and U are the matrix weights and bf is the bias vector Am I reading your code wrong and this is taken into account? Or this is a simpler version of a perfectly valid LSTM just different.
It is possible to incorporate the bias into the weight matrix. Not sure whether he is doing that here. But consider the following example of a plain. 0 = a * x1 + b * x2 + c * x3 + d (d being your bias, and x your input) That is the same as: 0 = [a b c] * [x1 x2 x3]^T + d Now incorporating d: 0 = [a b c d] * [x1 x2 x3 1] Meaning you can always add the bias to your weights, when you add a 1 to your input matrix / vector. The thing is here you just have one single multiplication instead of an additional addition. It's just more compact basically.
in your video on different gradient descent optimizers didn't you say adam was usually the best choice? just curious if there is a reason why you chose RMSprop for this