Тёмный

196 - What is Light GBM and how does it compare against XGBoost? 

DigitalSreeni
Подписаться 108 тыс.
Просмотров 59 тыс.
50% 1

Code generated in the video can be downloaded from here:
github.com/bns...
XGBoost documentation:
xgboost.readth...
lightgbm.readt...
pip install lightgbm
Dataset:
archive.ics.uc...)

Наука

Опубликовано:

 

2 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 44   
@jinweitu7330
@jinweitu7330 2 года назад
Super clear video. Wow, amazed by how clearly you explained the model without going to deep to the theory
@venkatesanr9455
@venkatesanr9455 3 года назад
Great tutorial by Sreeni sir as usual and waiting for your next video.
@puppyfindchloe
@puppyfindchloe Месяц назад
I think that train_test_split should be performed before applying scaling to avoid leakage. please Correct me if I'm wrong
@Brickkzz
@Brickkzz 3 года назад
You're an educational blessing Sreeni! Thank you for sharing your knowledge!
@MelissaTurberville-p3o
@MelissaTurberville-p3o 16 дней назад
Brown Joseph Thompson Susan Martin Daniel
@kayleeforster3995
@kayleeforster3995 19 дней назад
Jackson Joseph Hernandez Daniel Martinez Robert
@lashlarue7924
@lashlarue7924 2 года назад
As a semi-technical user, I understood about 80% of this. I will need to do more research before I am confident that everything would be proper. Thank you for sharing!! ❤️
@princekhunt1
@princekhunt1 5 месяцев назад
OMG Explanation. I think underrated channel.
@DigitalSreeni
@DigitalSreeni 5 месяцев назад
Glad you think so!
@susovandey1875
@susovandey1875 3 года назад
why do you want to normalize using standardscaler if it's an ensemble technique? It should work without scaling also. Please reply if you think otherwise.
@RAHUDAS
@RAHUDAS 10 месяцев назад
I do think the same it is not needed
@francuscus
@francuscus 6 месяцев назад
Nice video for getting a perspective of Light GBM! I'm introducing myself to Machine Learning algorithms and personally I would like to learn this method use at Time Series Analysis and Forecasting. If someone knows of any material or videos that contain this topic, I would really like to know abour it.
@Uzrdinma
@Uzrdinma 3 месяца назад
Hi Sreeni. I thought LGBM handles categorical features well. Why did you still encode them?
@SohanG3
@SohanG3 4 месяца назад
Hello, its too late, but why did you standardize the data as LGBM handles the scaling itself.
@TrainingDay2001
@TrainingDay2001 2 года назад
As others already mentioned that scaling is not necessary, I want to point out that scaling before train/test split introduces data leakage, thus scaling should only be performed after. Be careful with that order
@DigitalSreeni
@DigitalSreeni 2 года назад
Thanks for the note. Yes, scaling is not necessary but helps with fast convergence. Also, there are multiple opinions about scaling before and after test/train split. Some people mention about data leakage and other people argue about it helping with generalizing the data. I did not see any compelling evidence supporting or against any of these theories. In general, it is very important for the engineer to investigate proper data pre-processing good practices for their specific domain. Thanks again for bringing up this topic so the viewers are aware of it.
@TrainingDay2001
@TrainingDay2001 2 года назад
@@DigitalSreeni I would like to hear about the generalization argument. If you scale the features using the test set and then use the same test set for evaluating whether your model generalizes well, you introduce self confirming biases and basically stripped yourself of the only true out of sample subset you had. Again, Im open to hearing a counterexample or compelling evidence why one CAN do scaling before the split
@ComicKumar
@ComicKumar Год назад
to avoid data leakage please ensure to use pipelines.
@TheBlazeThrower
@TheBlazeThrower Год назад
​@@TrainingDay2001 Yeah, this guy just doesn't really know what he's talking about
@ErVibez
@ErVibez 3 года назад
why are you scaling numericals on a Tree algorithm ? Its necessary for a linear regression type algo, but shouldn't be necessary for trees.
@bartomiejgora4565
@bartomiejgora4565 3 года назад
It is not necessary for linear regression. The only reason to scale for linear regression is when your variables have a huge difference in variation, which causes computational problems. If not, you can reverse coefficients from scaled to non-scaled from a simple equation. In this video, the problem with scaling is in the other place. He scaled data before splitting between a training and testing set. This means, he transferred information about distribution from a training set to a testing set, which is forbidden.
@pedrogusmao5266
@pedrogusmao5266 2 года назад
@@bartomiejgora4565 Yes, big mistake there.
@AdityaRaut765
@AdityaRaut765 10 месяцев назад
​@bartomiejgora4565 Sorry, but I am new to this stuff. Can you please explain me about it in depth?
@freemanmalit3141
@freemanmalit3141 3 года назад
What happens when you have three instead of two classes? Say your y values are +1, 0 and -1. How do you classify your thresholds
@DigitalSreeni
@DigitalSreeni 3 года назад
For multiclass, the model produces 3 probabilities and you use numpy.argmax to find the max probability and assign label. You will be integer coding your classes as 0, 1, 2, etc.... You resulting probabilities would look like something like [0.15, 0.8, 0.05]. When you do numpy.argmax, it gives you a value of 2 which is the class with max probability.
@tanmaydeshpande2409
@tanmaydeshpande2409 3 года назад
Sir, can you give some suggestions about semantic segmentation on a custom dataset ?
@surflaweb
@surflaweb 3 года назад
I did it using Mask rcnn
@tanmaydeshpande2409
@tanmaydeshpande2409 3 года назад
@@surflaweb ok! I am trying to use FCN. I have created an augmented dataset for autonomous vehicles in different weather conditions. Our aim is to create a model to be robust in different weather conditions only using image data. I need certain pointers about in which sections i need to be most attentive for creating the model. Any idea?
@surflaweb
@surflaweb 3 года назад
Thanks man, great tutorial.
@FindMultiBagger
@FindMultiBagger 2 года назад
18:49 📝 Use dart over grading boosting for better accuracy
@hp5072
@hp5072 2 года назад
I don't think scaling the features is necessary here. Light GBM is based on decision trees. Decision trees don't get affected by feature scales.
@DigitalSreeni
@DigitalSreeni 2 года назад
You are correct, decision trees are not sensitive to the magnitude of features. I scale my data as a habit as I use the same data preprocessing pipeline for many classifiers, including neural networks.
@rafsunahmad4855
@rafsunahmad4855 3 года назад
Is knowing the math behind algorithm must or just knowing that how algorithms works is enough? please please please give a reply.
@DigitalSreeni
@DigitalSreeni 3 года назад
It depends on whether you are planning on using the algorithm as a tool or interested in becoming a machine learning engineer. This is similar to using a hammer. If your goal is to hammer a nail into the wall so you can hang a photo frame you need to focus on getting the nail into the wall. It does not matter how the hammer is prepared but it does matter if it has the right weight and surface area. So you need to have enough knowledge to understand the tool itself and how it benefits your task. If your goal is to design a new hammer that makes the job of hanging photo fames easy for you and others then focus on how you'd like to design the material and structure of the hammer. I hope the analogy makes sense. If not - in case you are interested in solving a scientific or engineering challenge using image processing tools then know about algorithms' benefits so you can pick the right one. If you are designing a new approach by combining the benefits of various algorithms then you need to know them in depth. If you plan on becoming a data scientist or machine learning engineer you need to understand math and statistics. I interview candidates for ML jobs and I always ask math and statistics questions. I also interview candidates for applications jobs and I only ask them about specific applications and how they would solve problems in those applications.
@rafsunahmad4855
@rafsunahmad4855 3 года назад
Right now my target is to become a data scientist.I know popular algorithms that commonly used in data science. In algorithm I just know how an algorithm works and I'm just focusing on how an algorithm works but not focusing on how math working behind the algorithm but I know linear algebra,statistics,calculus,probability. I can solve problem related linear algebra, statistics,calculus,probability. So my question is can I get a just on data science field just knowing how an algorithm work or I must know the math behind it. Please reply. I'm very confused. Please reply.
@DigitalSreeni
@DigitalSreeni 3 года назад
If you are looking for jobs in fields where you apply machine learning towards an application you do not need to know the math behind every algorithm. You need to develop the domain knowledge. For example, if you are looking for financial analysis field then you need to know about stocks and hedge funds. How they have changed historically and how various factors affected them. This gives you an idea of features and attributes that you need to model in your machine learning approach. You need to understand feature engineering so you can extract right features that define the problem. In case you are interested in image analysis fields such as medical image processing or remote sensing, you also need to understand the application space. For example, How do images with glaucoma look like compared regular eye images. In summary, for applied machine learning you need to focus more on the application and use ML as a tool to solve the problem. On the other hand if you’d like to work for a company that develops novel algorithms for novel problems then you better know the math and every detail behind the approach. These jobs are usually given to people who did their education in the field of ML, such as Ph.D. in computer sciences. You’ll find these type of jobs at Nvidia, google, Facebook and startups focused on developing new networks for some targeted problems.
@rafsunahmad4855
@rafsunahmad4855 3 года назад
Thanks for your valuable reply. It will help to a lot to reach my goal. Thanks a lot and please make awesome videos like this❤️.
@mouraleog
@mouraleog Год назад
This video is fire! thank u
@DigitalSreeni
@DigitalSreeni Год назад
Glad you liked it!
@doyourealise
@doyourealise Год назад
this is perfect :
@DigitalSreeni
@DigitalSreeni Год назад
Thanks :)
@nana-xf7dx
@nana-xf7dx 2 года назад
Thank you so much for this great video.
@DigitalSreeni
@DigitalSreeni 2 года назад
You are so welcome!
Далее
Обменялись песнями с POLI
00:18
Просмотров 333 тыс.
МАЛОЙ ГАИШНИК
00:35
Просмотров 460 тыс.
Gradient Boosting : Data Science's Silver Bullet
15:48
AdaBoost, Clearly Explained
20:54
Просмотров 760 тыс.
Can one do better than XGBoost? - Mateusz Susik
23:47
CatBoost Part 1: Ordered Target Encoding
8:32
Просмотров 34 тыс.
198 - Feature selection using Boruta in python
16:50
Просмотров 14 тыс.
Mac USB
0:59
Просмотров 26 млн