Тёмный

Automated Machine Learning - Successive Halving and Hyperband 

AIxplained
Подписаться 529
Просмотров 3,3 тыс.
50% 1

Опубликовано:

 

22 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 14   
@Hoe-ssain
@Hoe-ssain Год назад
In 16.44 why would we violate the maximum R(81)? Wouldn't we be taking n= 3 and r =27. That doesn't violate the max R. In fact as per your table, taking 6*27 = 162 .81 violates this rule. I am lost. Can you please explain?
@aixplained4763
@aixplained4763 Год назад
Good question! This observation is based on the original example given by the authors. However, this original example given by the authors is unfortunately wrong. To make sure that you understand the method fully, you could try to follow their pseudocode (link to their paper in description). You will end up with different numbers in the table.
@alperari9496
@alperari9496 2 месяца назад
I can easily tell you how many hours i spent on this to figure it out whether the problem is me or the example...
@deepsutariya929
@deepsutariya929 6 месяцев назад
Hyperband was like headache before watching your video. Now it is clear. Thank you for such a beautiful content and examples. you shouldn't stop making videos though it's very unfortunate that you have only few subscribers.
@gowtime
@gowtime Год назад
Great video, I finally understood Hyperband thanks to you and was able to use it in Keras confidently. Thanks! Do you know other hyperparameter tuning approaches that may be better/worth exploring?
@aixplained4763
@aixplained4763 Год назад
Glad to hear that it was helpful! :) Hyperband relies on a model-free approach (successive halving) that does not aim to learn a predictive model that maps any configuration to a predicted performance. The approaches that do this (called Bayesian optimization), like Tree Parzen Estimator, can be more efficient and require less trial-and-error. It would also even be possible to combine this with Hyperband or successive halving, making it even more efficient. If you are interested, there is also a video about the Tree Parzen Estimator.
@haneulkim4902
@haneulkim4902 Год назад
Amazing video! One question, so for each bracket in hyperband new set of configuration is chosen from total set of hyperparameters, correct? So there may be duplicate configuration, so same configuration may be in bracket 1 and 2?
@aixplained4763
@aixplained4763 Год назад
Thank you! Yes, that is absolutely correct :)
@haneulkim4902
@haneulkim4902 Год назад
​@@aixplained4763 I'm still unsure about hyperband's benefit, so for each consequtive bracket it resample smaller set then previous bracket from hyperparameter configurations. Since it is randomly sampling hyperparameters of final bracket aren't the best ones and they are trained for a long time. What exactly is the benefit over simple successive halving...
@aixplained4763
@aixplained4763 Год назад
@@haneulkim4902 Good question! In regular successive halving, we have the issue that the halving can be too aggressive (prematurely discarding the better configurations because they needed some more time to yield good performance). Finding the right level of "aggression" is not easy to do. Hyperband basically does multiple successive halving brackets with different levels of "aggression" to solve this. In the end, it is indeed sometimes the case that more training time leads to better performance, but not always. Moreover, after performing all brackets in hyperband, you could do post-process the result. E.g., you could select the best candidate from every bracket and train them all with the same budget.
@Thamizhadi
@Thamizhadi Год назад
Silly Question: Which software do you use for making your slides? The math symbols look so nice.
@aixplained4763
@aixplained4763 Год назад
Happy to hear that you like the symbols! The slides are created in Google Slides and I copy/paste symbols from a latex2image generator such as latex2image.joeraut.com/
@engcaiobarros
@engcaiobarros 2 года назад
Thank you so much for this inspiring lesson. We have 5 brackets because we should consider log_n (R) + 1 brackets?
@aixplained4763
@aixplained4763 2 года назад
Good to hear! :) Good question! Indeed, that's correct.
Далее
Fake Referee Whistle Moments 😅
00:38
Просмотров 8 млн
Bayesian Optimization - Math and Algorithm Explained
18:00
How I’d learn ML in 2024 (if I could start over)
7:05
Gaussian Processes
23:47
Просмотров 130 тыс.
AutoML with Hyperband
7:50
Просмотров 6 тыс.