Excellent video. Below are the jotted down points from this video: 1. We have a Data 2. Creating Base Learner 3. Predicting Salary from base learner 4. Computing loss function and extract residual 5. Adding Sequential Decision Tree 6. Predicting residual by giving experience and salary as predictors and residual as a target 7. Predicting Salary from base learner prediction of salary and decision tree prediction of residual - Salary Prediction = Base Learner Prediction + Learning Rate*Decision Tree Residual Prediction - Learning Rate will be in the range of 0 to 1 8. Computing loss function and extract residual 9. Point 5 to 9 are a iterations. Each iteration decision tree will be added sequentially and prediction the salary - Salary Prediction = Base Learner Prediction + Learning Rate*Decision Tree Residual Prediction1 + Learning Rate*Decision Tree Residual Prediction 2 ..................................................................................... + Learning Rate*Decision Tree Residual Prediction...n 10. Testing the data - Testing data will be giving to the model which have minimum residual while prediction in iteration
For those who are learning about boosting, here's the crux. In boosting, we first build high bias, low variance (underfitting) models on our dataset, then we compute the error of this model with respect to the output. Now, the second model that we build should approximate the error that we have for our first model. second_model = first_model + (optimisation: find a model which minimises the error that the first model makes) This methodology works because as we keep on building the model the error get's minimised, hence the bias reduces. So, we get a robust model. Going a bit more in depth, instead of computating the error we compute the pseudo residual because the pseudo residual is proportional to the error, and we can minimise any loss. So, the model becomes, model_m = model_at_(m-1) + learning_rate * [derivative of the loss function with respect to model_at_(m-1)]
@krishnaik06 could you please comment Where is the gradient btw? As I know in real gradient boosting we teach weak-learners (r_i trees) not to predict a residual, but to predict a gradient of the loss function by y_hat_i. This gradient is later multiplied with learning rate and step size is thus obtained. Why to predict gradient instead of just residulas? 1) We can use complex function with logical conditions. For example -10x if x2. Thus we punish model with negative score if y_i_hat is lower than 0. This is the major reason
Hello sir, I am a college student and ML enthusiast. I have followed your videos and have recently completed Andrew Ng's course on ML. Having done that, I think I have got a broader perspective on ML and stuffs. Now am keen to crack the GSoC in the field of ML but I have no idea how to do so. Additionally, I don't even know how much knowledge I need. Going through answers on Quora didn't helped, thus, I would be quite grateful if you address my problem. Waiting to hear from you. Mucho gracias!!
Hi Krish. Love the content on your channel. Could you do a project from scratch which includes PCA, Data normalization, Feature selection, feature scaling. I did see your other projects but would love to see one that implements all of the concepts.
should the sum of all learning rates be 1? Or close to 1? Coz I believe by that way only we can prevent overfitting and still reach closest to true functional approximation value
Sir, m see each and every video of yr channel even many times, plz make a video on imbalenced datasets end to end project, even u make a video on dis but u r not deal wid imbalenced data, u use a another technic, plz make one video for me, Awesome 👍 in word
Asynchronous Stochastic Gradient Descent does it work like parallel decision tree please make a video on this algorithm, no standard material available on this gradient algorithm, how can implement on image data I will thankful to you
Great job!!! really like the example used to explain what is actually happening to the input values. understanding on overall technicals is easily available on youtube channels, but this example really changes the way i look at GBM after years of using it
1) I think you learning rate wouldn't change. So it is just 'alpha' . Not 'alpha1' and 'alpha2' for every decision tree 2) The trees are predicting residuals . It not necessary the residual reduce at every iteration. They may increase for some observation. For example for data point where your target is 100. The residuals has to increase
Hi Krish. Thanks for making such complex techniques easier to understand. I have a query though. Can we use techniques such as Adaboost, Gradient Boost and XgBoost for Linear and Logistic Regression Models and not trees? If Yes, Is the Output Final Model Coefficients or Additive Models just like Trees? Thanks in advance.
It is Cristal clear thanks for the video. Actually I want to know about membership is it included deep learning and NLP and what kind of content you will be sharing Thank you
Sir, I am a 17-year old I have been taking some certificates and doing some projects so is it possible to get hired if I continue like this at this age
Hi krish can you help me how I can make a way to learn the machine learning because I’m new this domain. I had started doing a master project in it . For an thesis, I had tried allot but couldn’t make it . Could you help on it please that will be really helpful to me.
Hi Krish. I wanted to know, how the algorithm computes multiple learning rates (L1,L2, .... Ln) when we specify only single learning rate while initializing the GBRegressor() or GBClassfier(). We are specifying only single learning rate while initializing, right ? Please feel free to correct me if I am wrong...
Sir your vedios are really value added asset really good to listen,In comming vedios can you please for Topics to learn seperately for learning on ML and Deep Learning
Can anybody explain why we need to learn the inner functioning and loops of various algo such as linear regression and logistics regression .. whereas we can directly call a function and apply it in python ... Plz explain
Sir, i understand your teachings and it would be helpful if you address cholesky and quasi Newton solvers and what are they in optimization along with gradient descent. Not being from statistical domain its too hard for us to understand these terms
Hi Krish, Thank you for the videos. In the example you took for Gradient Boosting, I see the target has numeric values. How does the algorithm work in case the target has categorical values (e.g. Iris dataset)? How does the first step of calculating the average of the target values happen?
Krish, that was great content. I would like to know, where exactly does the algorithm stop? In case of random forest, it is mentioned by controlling max_depth, n_samples_split, etc. What is the parameter that helps gradient boosting to stop?
Hi Krish, how would we calculate the average value when we have to predict the salary for new data because at that point of time we do not have this value?
hello sir, @6:10 decision tree predict on given features and taking R1 as a target, if R2 is -23 then it means decision tree predicts the +2, only then the R2 --> -25+2 =-23, is that so? and final model is h0(x)+h1(x) ........ ??
I have a question could you please solve it e what is the difference and similarities of Generalised Linear Models (GLMs) and Gradient Boosted Machines (GBMs)