Code generated in the video can be downloaded from here: github.com/bns... XGBoost documentation: xgboost.readth... lightgbm.readt... pip install lightgbm Dataset: archive.ics.uc...)
As a semi-technical user, I understood about 80% of this. I will need to do more research before I am confident that everything would be proper. Thank you for sharing!! ❤️
why do you want to normalize using standardscaler if it's an ensemble technique? It should work without scaling also. Please reply if you think otherwise.
Nice video for getting a perspective of Light GBM! I'm introducing myself to Machine Learning algorithms and personally I would like to learn this method use at Time Series Analysis and Forecasting. If someone knows of any material or videos that contain this topic, I would really like to know abour it.
As others already mentioned that scaling is not necessary, I want to point out that scaling before train/test split introduces data leakage, thus scaling should only be performed after. Be careful with that order
Thanks for the note. Yes, scaling is not necessary but helps with fast convergence. Also, there are multiple opinions about scaling before and after test/train split. Some people mention about data leakage and other people argue about it helping with generalizing the data. I did not see any compelling evidence supporting or against any of these theories. In general, it is very important for the engineer to investigate proper data pre-processing good practices for their specific domain. Thanks again for bringing up this topic so the viewers are aware of it.
@@DigitalSreeni I would like to hear about the generalization argument. If you scale the features using the test set and then use the same test set for evaluating whether your model generalizes well, you introduce self confirming biases and basically stripped yourself of the only true out of sample subset you had. Again, Im open to hearing a counterexample or compelling evidence why one CAN do scaling before the split
It is not necessary for linear regression. The only reason to scale for linear regression is when your variables have a huge difference in variation, which causes computational problems. If not, you can reverse coefficients from scaled to non-scaled from a simple equation. In this video, the problem with scaling is in the other place. He scaled data before splitting between a training and testing set. This means, he transferred information about distribution from a training set to a testing set, which is forbidden.
For multiclass, the model produces 3 probabilities and you use numpy.argmax to find the max probability and assign label. You will be integer coding your classes as 0, 1, 2, etc.... You resulting probabilities would look like something like [0.15, 0.8, 0.05]. When you do numpy.argmax, it gives you a value of 2 which is the class with max probability.
@@surflaweb ok! I am trying to use FCN. I have created an augmented dataset for autonomous vehicles in different weather conditions. Our aim is to create a model to be robust in different weather conditions only using image data. I need certain pointers about in which sections i need to be most attentive for creating the model. Any idea?
You are correct, decision trees are not sensitive to the magnitude of features. I scale my data as a habit as I use the same data preprocessing pipeline for many classifiers, including neural networks.
It depends on whether you are planning on using the algorithm as a tool or interested in becoming a machine learning engineer. This is similar to using a hammer. If your goal is to hammer a nail into the wall so you can hang a photo frame you need to focus on getting the nail into the wall. It does not matter how the hammer is prepared but it does matter if it has the right weight and surface area. So you need to have enough knowledge to understand the tool itself and how it benefits your task. If your goal is to design a new hammer that makes the job of hanging photo fames easy for you and others then focus on how you'd like to design the material and structure of the hammer. I hope the analogy makes sense. If not - in case you are interested in solving a scientific or engineering challenge using image processing tools then know about algorithms' benefits so you can pick the right one. If you are designing a new approach by combining the benefits of various algorithms then you need to know them in depth. If you plan on becoming a data scientist or machine learning engineer you need to understand math and statistics. I interview candidates for ML jobs and I always ask math and statistics questions. I also interview candidates for applications jobs and I only ask them about specific applications and how they would solve problems in those applications.
Right now my target is to become a data scientist.I know popular algorithms that commonly used in data science. In algorithm I just know how an algorithm works and I'm just focusing on how an algorithm works but not focusing on how math working behind the algorithm but I know linear algebra,statistics,calculus,probability. I can solve problem related linear algebra, statistics,calculus,probability. So my question is can I get a just on data science field just knowing how an algorithm work or I must know the math behind it. Please reply. I'm very confused. Please reply.
If you are looking for jobs in fields where you apply machine learning towards an application you do not need to know the math behind every algorithm. You need to develop the domain knowledge. For example, if you are looking for financial analysis field then you need to know about stocks and hedge funds. How they have changed historically and how various factors affected them. This gives you an idea of features and attributes that you need to model in your machine learning approach. You need to understand feature engineering so you can extract right features that define the problem. In case you are interested in image analysis fields such as medical image processing or remote sensing, you also need to understand the application space. For example, How do images with glaucoma look like compared regular eye images. In summary, for applied machine learning you need to focus more on the application and use ML as a tool to solve the problem. On the other hand if you’d like to work for a company that develops novel algorithms for novel problems then you better know the math and every detail behind the approach. These jobs are usually given to people who did their education in the field of ML, such as Ph.D. in computer sciences. You’ll find these type of jobs at Nvidia, google, Facebook and startups focused on developing new networks for some targeted problems.