Awesome as always! Worth noting that the added columns (""Sex_male", "Sex_female", etc) are now bools rather than ints, so you need to explicitly coerce the df[indep_cols] explicitly at around @25.42 -- t_indep = tensor(df[indep_cols].astype(float).values, dtype=torch.float)
Great lesson! I found myself a bit confused by the predictions and loss.backward() at ~37:00. did some digging to clear my confusion up which might be helpful for others: - At 37:00 minutes when we're creating the predictions, Jeremy says we're going to add up (each independent variable * coef) over the columns. There's nothing wrong with how he said this, it just didn't click for my brain: we're creating a prediction for each row by adding up each of the indep_vars*coeffs. So at the end we have a predictions vector with the same number of predictions as we have rows of data. - This is what we then calculate the loss on. Then using the loss, we do gradient descent to see how much changing each coef could have changed the loss (backprop). Then we go and apply those changes to update the coefs, and that's one epoch.
I might be cheating a lil because I've already done a deep learning subject at Uni, but this course so far is fantastic. It's really helping me flesh out what I didn't fully understand before.
Semi off topic: What I really dislike about Python is the lack of types (or that type hints are optional). It really makes it difficult to understand things if you learn complicated new stuff like this. Is that argument a float or a tensor? What is the shape of the tensor? If that would be in a type of the function argument it would make reading the code much more easy when learning this stuff.
It drives me absolutely batty to do matrix work in Python because it's so difficult to get the dimension stuff right. I always end up adding asserts and tests everywhere, which is sort of fine but I would rather not need them. I really want to have dependent types, meaning that the tensor dimensions would be part of the type checker and invalid operations would fail at compile time instead of run time. Then you could add smart completion, etc. to help get everything right quickly.
What helped me was reading the PyTorch source code with the `??` operator and thinking about the operations in terms of linear algebra. It's hard to keep all of the ranks in mind. At the end of the day I just have to keep hacking through the errors.
If the coefficients are too large or too small, they create gradients that are either too steep or too gentle. When the gradient is too gentle, a small horizontal step won't take you down very far, and the gradient descent will take a long time. If the gradient is too steep, a small horizontal step will correspond to a big vertical drop and a big vertical swoop up the other side of the valley. So you might even get further away from the minimum. So what you want is something in between.
Adding a dimension at ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-_rXzeWq4C6w.html is very important as otherwise the minus in the loss function, which then would do incorrect broadcasting leading do an model, which achieves at most 0.55 accuracy. The error is silent, as the mean in the loss function hides this.
coeffs =torch.rand(n_coeff)-0.5 What is the use of subtracting 0.5 from the coefficients? Is there a problem that the values are just between 0 and 1? Thanks a lot.
torch.rand() generates random numbers range 0 to 1, subtracting 0.5 from the random coefficients is a simple technique to center the random values around zero, I believe that help in optimizing the gradient descent.
Shifting the range between -0.5, 0.5 so it can take positive and negative. There is different strategies you can google "weight initialization strategy" Libraries does this auto for relu or tanh etc
This is going to sound very pedantic, you use the word "rank" where I think "order" would be more correct. Rank usually means the number of independent columns in a matrix. At about 1:02:00, you say that the coefficients vector is a rank 2 matrix, but I would say its rank is 1 and its order is 2.
@howardjeremyp I haven't examined the best-performing Gender Surname Model for the Titanic dataset in detail, but something seems rather strange to me. Isn't using the survival status of other family members constituting a data leak? After all, at the time of inference, which is before the Titanic incident, I would not have this information.
Depends on how you look at it. If you're truing to predict whether a person has survived or not, and you already have a list of confirmed survivors and casualties then it's probably a good way to make the prediction, as in if Mrs X has died, then it's safe to assume that Mr X has died as well. Or if their children have died, then it's safe to assume that both their parents are dead if you consider that women and children board the lifeboats first.
27:18 Why don't we have a constant in our model? How can we know that there's not going to be a constant in the equation? Can someone explain this to me?
Simply brilliant workshop... I had to change/add dtype=float e.g pd.get_dummies(tst_df, columns=["Sex","Pclass","Embarked"], dtype=float) to get it to work maybe due to a later version of pandas?
torch.rand(n_coeff, n_hidden) How does one set of coeffs, output 20 n_hidden values? I mean, mathematically, a single set of coefficients multiplied by a specific set of values will alway equal the same thing right?
Im assuming you are in the section about NN (before deep learning). The term n_hidden is a bad variable name. Its only 1 hidden layer, but the hidden layer is the linear combination of n_hidden relu's. Each of the relus have coefficients to learn which we store in a matrix size n_coeff by n_hidden.