MIT 6.S191: Reinforcement Learning

Alexander Amini

Подписаться 272 тыс.

Просмотров 46 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 29

@artukikemty 4 месяца назад

Amazing intro to the subject. Since it is interrelated to control theory it is mandatory to have a good back ground on control theory such as state space models and optimal control

@artukikemty 4 месяца назад

Transformers can be used as a direct replacement for DRL since it can process sequences as well. There is an article in medium related to this alternative.

@izharulhaq2436 4 месяца назад

One of the best intro to RL. Recommended to every student interested in this field to watch this amazing lecture. I have just completed it at 1:40 AM...Now waiting for Actor-Critic Type RL Agent to be released soon...Thanks and Good night.

@gamalieliissacnyambacha3029 4 месяца назад

I'm curious to listen to this lecture. I need more concepts to apply in my Thesis. I'm looking forward to seeing this happen soon.

@anoopitiss 4 месяца назад

Following since 3 years

@hrishabhg 4 месяца назад

Lovely lecture.❤ Self driving car is a dynamic environment as compared to Gaming environment. It may be mentioned.

@Asif-fp8gy 3 месяца назад

Awesome job. Only curious if someone can explain how was the target part of the loss function computed at 26:40?

@ravenclaw3693 2 месяца назад

immediate reward + discounted best possible future reward

@christianrink4093 2 месяца назад

Can one conclude from the AlphaGo vs. AlphaZero showcase, that the bottleneck of "achieving" AGI/ASI, are we humans and the ethical/safety restrictions we have set?

@Radiant-84 Месяц назад

Both alphago and zero rely on world models (and self play) which they can use to try out or plan different moves based on the simulated results. While it's super easy to do this simulation in board games, where the rules are deterministic, creating such a world model for something with drastically more complexity like the real world is far more challenging. Algorithims like MuZero, which use learned models, are getting their, but technically speaking, Deepminds got a lot more work to do before they can make Alpha-terminator ;)

@melvinkuriakose2708 3 месяца назад

10:30 equation for total reward should be summation of rewards from t=0 to t=t, right? But in equation its from t to infinity...why?

@rorisangsitoboli4601 Месяц назад

The total reward is from time 't' to a later time/time in far future (t^inf). Initial value of reward is r_t. The next one will be r_{t+1}, r_{t+2}, ..., till termination-assumed some time in the future but can be user chosen, e.g. time {t+n} as the termination time. Remember you can be rewarded now (t) or anytime in the far future (inf) so you sum over the entire duration.

@foregroundtreble05 4 месяца назад

Needed u

@TheNewton 3 месяца назад

Please repeat questions, question askers audio is blown out or intelligible. Some of the questions manage to be in the captions others but not all. The professors mic is perfect however with a great mix one of the few series where you don't have to be max volume all the time.

@wangfenjin 3 месяца назад

太牛了

@Huayi-x3p Месяц назад

Hi, when i tried to run the modeling building part of lab 1, the line "tf.keras.layers.Embedding(vocab_size, embedding_dim, batch_input_shape=[batch_size, None])," does not work, and the error says batch_input_shape is an unrecognized keyword argument to Embeddings, has anyone else encountered this problem? I looked up the tf.keras.Embeddings documentation and couldnt' find anything to replace it...What did you guys to solve it? Thanks!

@Yeanpc Месяц назад

Hi, from my understanding when looking at TF documentation, Embeding doesn't take a batch_input_shape as parameter. I justg went ahead and executed the embedding as: tf.keras.layers.Embedding(input_dim=vocab_size, output_dim=embedding_dim) and it worked for me.

@breezecreator8751 4 месяца назад

🎉

@visheshphutela 4 месяца назад

Babe wake up new 6.S191 lecture just dropped

@BheezHandle 4 месяца назад

Lol...

@VisatoVino 4 месяца назад

@@BheezHandle Feel the vibessssss

@crarewhiteheadpoin9471 2 месяца назад

U got it

@Crashrapescrypto 3 месяца назад

can you advise for my startup, we applied for YC, we want to setup up indian team and RLHF as well as using SIMPO to agentify the hospital system and remove the inefficiences faced in the current hospital systems. im an aussie coming to america. we have hardware as well, been in guangzhou for the last 6 weeks finding the best containers and cameras triend to train for guaging container volume for measuring stock remaining.