Function Approximation | Reinforcement Learning Part 5

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

😍😂❤️ #shorts

КУПИЛ СУМАСШЕДШИЕ ТОВАРЫ из ИНТЕРНЕТА на 400.000р!

Приехал покупать BMW M3 GTR из NFS Most Wanted, а оказалось…

Подземелья Чикен Карри #28 Часть 1 Чёрное небо вампиров (Шастун, Позов, Попов, Гудков)

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4

Mutual Information

Подписаться 68 тыс.

Просмотров 28 тыс.

50% 1

Видео Поделиться Скачать Добавить в

The machine learning consultancy: truetheta.io
Join my email list to get educational and useful articles (and nothing else!): mailchi.mp/truetheta/true-the...
Want to work together? See here: truetheta.io/about/#want-to-w...
Part four of a six part series on Reinforcement Learning. As the title says, it covers Temporal Difference Learning, Sarsa and Q-Learning, along with some examples.
SOCIAL MEDIA
LinkedIn : / dj-rich-90b91753
Twitter : / duanejrich
Github: github.com/Duane321
Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
SOURCES
[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.
[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL | Deep ...
SOURCE NOTES
The video covers topics from chapters 6 and 7 from [1]. The whole series teaches from [1]. [2] has been a useful secondary resource.
TIMESTAMP
0:00 What We'll Learn
0:52 No Review
1:18 TD as an Adjusted Version of MC
2:49 TD Visualized with a Markov Reward Process
6:34 N-Step Temporal Difference Learning
8:08 MC vs TD on an Evaluation Example
11:50 TD's Trade-Off between N and Alpha
12:47 Why does TD Perform Better than MC?
15:29 N-Step Sarsa
17:15 Why have N above 1?
19:02 Q-Learning
20:50 Expected Sarsa
21:48 Cliff Walking
25:04 Windy GridWorld
28:12 Watch the Next Video!
NOTES
Code to compare TD vs MC on the evaluation task: github.com/Duane321/mutual_in...

Опубликовано:

1 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 97

@lordjared2572 Год назад

ok, just pls upload more vids. There's a huge vacuum of ML education out here for people who are not scared of math.

@Mutual_Information Год назад

That’s the audience I want!

@rewixx69420 Год назад

I learng ML by my self its hart find informatio this gus saves me on RL thanks

@marcin.sobocinski Год назад

Your animations are fantastic, it's like a new dimension of learning. I helps so much to be able to visualize RL processes. Thank you!

@Mutual_Information Год назад

Thank you Marcin - it means a lot when I hit someone in the audience exactly as I hoped :)

@samlaki4051 Год назад

Starting to get into RL, Yannic recommended you! How have I missed such a gem of a channel!

@Mutual_Information Год назад

Love Yannic's stuff. Super pumped to get the shout out.

@marcegger7411 Год назад

Fantastic!! Keep up the amazing work! It's always so great to see quality content presented so eloquently :)

@saranahluwalia5353 Год назад

I wish I had this to review 5 years ago. This would have eliminated wasteful experiments. Thank you for making this more accessible.

@Mutual_Information Год назад

The way I'm designing the channel.. is the channel I would have wanted when I was learning ML for the first time. Seems like that theory landed!

@bornamorasai5285 Год назад

Can't wait for part 5 and 6!!!! Let's go!!!!

@rostislavmarkov7488 5 месяцев назад

Awesome series covering essential fundamentals with great didactics. Raised the bar at creating high-quality content!

@mryazbeck98 10 месяцев назад

I love your videos to recap what I read in the book. Helps me understand and visualize everything better. I was however hoping to know more about batch training because I didn't understand how it works at all!

@hjop010 Год назад

Great video as always! It is helping me so much complementing Barto & Sutton.

@Mutual_Information Год назад

Sweet - that's exactly how I want this series used!

@pandie4555 10 месяцев назад

dude, the amount of work you put in these videos is fantastic.

@Mutual_Information 10 месяцев назад

lol yea these videos were crazy hard. This one took me 100+ hours

@buh357 Год назад

I am starting to learn RL, and your video is helping me a lot. You have a clear and precise explanation; thank you. Looking forward to new coming videos :)

@Mutual_Information Год назад

excellent! Exactly what I'm going for :)

@antonkot6250 8 месяцев назад

O, man. This visualisations are top-notch!

@victormanuel8767 Год назад

"You've covered a lot, give yourself a 3 second break" *0.5 seconds later* "Great, let's keep going"

@skirazai7591 Год назад

Man your doing some very high quality stuff ,keep it up.

@Mutual_Information Год назад

Thanks - I'm tryin!

@fedelozano2895 Год назад

Hi, your videos are really specific and super helpful! This information is helping me with my paper, can´t wait for the next one, thank you :)

@Mutual_Information Год назад

Lol yea specific as hell! Glad it helps and I'm working on the next one as we speak!

@ryderbrooks1783 Год назад

This channel is extremely under subscribed. I very much appreciate the work you're putting in here. Thank you

@Mutual_Information Год назад

Ha well I can't expect a large number of subs when my stuff is so technical. So.. should I make it less technical? Nope!

@AlisonStuff Год назад

love it!! so good!!!

@ezragarcia6910 Год назад

Thanks!! I just found your channel and IT'S AWESOME!

@Mutual_Information Год назад

Thanks Ezra - I think it’s a work in progress lol 😁

@broccoli322 Год назад

Great videos! Can't wait for more.

@Mutual_Information Год назад

This is one of my favorites in fact - glad it hits!

@datsplit2571 Год назад

High quality videos, my compliments! This helps so much in understanding RL for a Master's course. Thank you!

@Mutual_Information Год назад

You're welcome! And yuno what would be totally sweet? If you told your classmates about these vids :)

@datsplit2571 Год назад

@@Mutual_Information Posted it in the teams chat of the Advanced Machine Learning course!

@Mutual_Information Год назад

@@datsplit2571 thank you! Over time moves like that will make all the difference :)

@nathanzorndorf8214 5 месяцев назад

Thanks for this. Amazing.

@qiguosun129 Год назад

Excellent lecture! It solved the doubts about the method that reviewers asked me to do parameter uncertainty analysis in scientific research papers.

@Mutual_Information Год назад

Excellent - that's thrilling to hear this has some real impact!

@selcukkalafat2857 Год назад

thank you. looking forward for the next part

@Mutual_Information Год назад

In the works :) but I'll need some patience

@timothytyree5211 Год назад

Excellent video! I am so stoked to use this in my work!

@timothytyree5211 Год назад

I used the knowledge of ^this vid today to help a buddy out at work! You rock, Duane!

@timothytyree5211 Год назад

I'm really looking forward to your next video on function approximation!

@user-sx3dy6cw8m 3 месяца назад

This is a life saver

@marcin.sobocinski Год назад

Dziękujemy.

@IRONMAIDEN146 Год назад

Your videos are helping me a lot in my AI engineering degree, thanks a lot!

@Mutual_Information Год назад

Love it!

@sathyakumarn7619 Год назад

So precise and fun! But Highly under rated! Please advertise so that more people can be benefitted!

@Mutual_Information Год назад

Thank you! and I agree, distributing this needs some more effort. Sometimes my tweets help

@bean217 5 месяцев назад

"If you recall... which you better!" I swear, I recall!

@bmenashetheman Год назад

What a fantastic series, thank you so much!!!

@Mutual_Information Год назад

Thanks Ben, glad you see the same value I do Btw, if you know other people studying the same subject, it would help a lot to share this with them :)

@b0nce Год назад

Double this, great effort, excellent videos, thank you so much Also, Duane, you forget to add this video into RL playlist

@Mutual_Information Год назад

@@b0nce oh thank you! Fixed

@bmenashetheman Год назад

@@Mutual_Information already shared it with everyone in my class! I'm certain this channel will get really popular really soon, your content is fantastic.

@Mutual_Information Год назад

@@bmenashetheman oh you rule! Thank you!

@surakshachoudhary2880 Год назад

Eagerly awaiting the remaining episodes - remarkable work there! So far I've just watched the videos, and I think it can only become clearer with some practice - but was curious why I keep hearing about 'deep' RL? Where does the 'deep' a.k.a. neural nets fit into these videos..

@Mutual_Information Год назад

Parts 5 and 6 are in the works - I only just started them so, it'll take some time. Nothing coming this month, but probably in Dec. Good question! "Deep" in "Deep RL" refers to deep learning, where we utilize neural networks with many layers to learn complex functions from observations. At this point, those NNs have had no place to be inserted - but that changes in part 5. In part 5, we'll discuss handling state-space that are so huge, we can't list them out in a table. In that case, you can use a function to model giant swaths of those states.. and Deep NNs can be especially good at that. My video won't be a deep dive in NNs - that's too big of a subject. But it should be clear how they would get used.

@NazerkeSafina 9 месяцев назад

superb job with visualization. keep up! Only you could explain certain things to me, I've watched several other tutorials and wasn't feeling confident. One thing, I wish the explanation of how V(s) obtained for each state was more detailed, perhaps with multiple samples and step by step calculations.

@rewixx69420 Год назад

episode 6 finally i will undestand PPO

@arrozenescau1539 6 месяцев назад

i wish i could like twice your videos

@Mutual_Information 6 месяцев назад

Well unfortunately, there is no way to double-like. I see only one solution: I need to upload 2x more videos!

@kimchi_taco 11 месяцев назад

14:30 TD is better than MC in general. In my opinion, * TD: It's more align to Bellman Optimality equation, as it focuses on n steps optimization. * MC: It's more align to Bellman equation (with sampling), as it averages the rewards over the trajectory.

@user-qm6up7kz4n 9 месяцев назад

04:00 "Return g_3 is diff of levels at t=3 and the end of Episode". Could someone explain this? a)Why g_3 is that and b)how do we know return at at t=3? In our BJ example we only know Reward at end of Episode(play), and we use that Reward to update Q.

@Electrikalforenzis Год назад

Where are the rest, you are doing fine job with these episodes!!

@Mutual_Information Год назад

haha thank you very much. I need a bit of time for parts 5 and 6. I just moved to a new house, got a full time job, many little things.. but it's coming :)

@the_random_noob9860 3 месяца назад

In an epsilon greedy policy, the two probabilities are epsilon and 1 - epsilon. So, is my understanding correct? if epsilon = 0, the policy always takes the max action value from q table while generating the episode that q-learning, sarsa and expected sarsa becomes identical.

@123ming1231 Год назад

can u make a video later, showing how u make those animation, it is fantasic !!!! It show the concept very clearly !!! The data visualization art behind is so elegant

@Mutual_Information Год назад

Maybe one day.. The code I use is a big personal library that's not ready for the public. But I could see doing that.. maybe in a year or two after things have gone well. We'll see

@raghavendrakaushik1691 2 месяца назад

At 4:23 Shouldn't it be traversing backwards in time for MC?

@snowflake5204 Год назад

At 20:30 shouldn't it be SARSA rather than TD1? Since we use state value function in TD rather than state action

@Mutual_Information Год назад

Sorry it's not clear. I'm using 1-step TD control and Sarsa interchangeably here.

@samuelepignone8255 Год назад

Thanks a lot for your videos. There's just one thing that doesn't make sense to me: in the last example when you add Q-learning in the graph, it has a lower maximum reward than SARSA, and I don't understand how that's possible since the path it follows has many fewer steps. I hope I have explained my doubt well.

@Mutual_Information Год назад

I don't know either actually. My intuition, by this point, is that an inability to explain performance is the rule, not the exception. It's rare that you can tell a story about why one algo is superior on a particular problem. These very simple toy examples are designed precisely to call out the different in their character. The last one, however, is weird enough that I can't explain all the performance gaps. If anyone else has an intuition, please chime in!

@omerlevy6939 14 часов назад

why in 18:16 the last n action values are the only ones who are getting updated

@sidnath7336 Год назад

Could we get videos on Markov Chain Monte Carlo methods?

@Mutual_Information Год назад

MCMC! Absolutely, just may take me a bit to get to it

@hihellohowrumfine Год назад

Can you please do a series on statistics

@Mutual_Information Год назад

That's a bit broad. Is there a particular topic you're interested in?

@hihellohowrumfine Год назад

@@Mutual_Information specifically statistical learning theory, something like what 3blue1brown channel has done for linear algebra. A lot of times when I read ML papers, it's hard to deeply appreciate why certain techniques work.

@abramgeorge3290 11 месяцев назад

why didn't we use importance sampling in Q-Learning, I have been searching for an answer for days with no clue

@coconut_camping Год назад

I bet you are in Stanford as a professor teaching RL by now? This became a RL bible to me.

@Mutual_Information Год назад

haha not quite a professor! But if you're using this as a resource, I consider my job fulfilled

@imanmossavat9383 Год назад

why the mean TD performance is getting worst as you increase m (11:24)

@Mutual_Information Год назад

I am not sure.. but I know the behavior is expected. That's actually a question posed in Sutton/Barto's book and I'm sure the answer is online somewhere.

@imanmossavat9383 Год назад

@@Mutual_Information Thank you for your response. I really benefit from your videos. If I figure out the answer, I will share it here.

@danielm3772 Год назад

From what I have read online and my personal interpretation: this is due to 2 factors, mainly a big value for alpha and the initial state values. I we take the 5 states (calling them A,B,C,D,E) example, we know that the true values are 1/6, 2/6, 3/6, 4/6, 5/6. If we then use an initialization schema of 1/2 for all of them, then first we will see a decrease in the error due to the update of A,B,D,E (as they have the biggest difference compared to the true value), however at some point they are going to stabilize and V(C) is going to change as well, and because the value of alpha is big, we will move away from 1/2 (which corresponds to the initial AND true value) by an non-negligeable amount. Hope that helps.

@catcoder12 9 месяцев назад

I really liked the videos, but a pace felt a bit too quick...The efforts put into examples are commendable.

@Mutual_Information 9 месяцев назад

I'll take it! I'm learning the slowness thing.. a bit

@hansthompson Год назад

where is part five? in production?

@Mutual_Information Год назад

Yea, I took a little break before starting part 5. I'm currently writing it. It'll take sometime. Should be ready in January.

@hansthompson Год назад

@Mutual Information very easy to follow. I'll be patiently waiting. Thanks.

@swastiksharma2683 7 месяцев назад

you have so good content but you tried to make the video as short as you can due to which there are no natural pauses in the video making it difficult to focus and understand your content.

@Mutual_Information 7 месяцев назад

I think you're right. I'll have fewer cuts in future videos, and I have less cuts in my more recent ones.

@raminessalat9803 10 месяцев назад

You videos are amazing and I know the time spent for creating these are probably astronomical. But i do have a feedback that would help your videos and its my own observation. I think your body language is too much and I feel it is very unnatural/isn't meaningful for the content. I don't know if you are actually forcing it to have a body language or not, but I think body language is something that happens naturally and you don't need to try too hard for it. At first when I started to watch your videos, that was something that was repelling for me personally but when I saw your content, I became a fan of your channel. So hope you take it as a constructive feedback from a fan.

@Mutual_Information 10 месяцев назад

Thank you, appreciate the genuine feedback, and I know what you mean. There's this awkward robotic-ness that's difficult to shake. But I think some of it is due this set up. In my more recent videos, my new setup has hopefully brought the unnaturalness down. A work in progress. I also may de-burden myself with trying to match my language with what I'll anticipate will be on screen.

Далее

Function Approximation | Reinforcement Learning Part 5

21:16

Function Approximation | Reinforcement Learning Part 5

Просмотров 16 тыс.

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

27:06

Monte Carlo And Off-Policy Methods | Reinforcement Learning Part 3

Просмотров 37 тыс.

😍😂❤️ #shorts

00:12

😍😂❤️ #shorts

Просмотров 314 тыс.

КУПИЛ СУМАСШЕДШИЕ ТОВАРЫ из ИНТЕРНЕТА на 400.000р!

1:10:42

КУПИЛ СУМАСШЕДШИЕ ТОВАРЫ из ИНТЕРНЕТА на 400.000р!

Просмотров 702 тыс.

Приехал покупать BMW M3 GTR из NFS Most Wanted, а оказалось…

49:29

Приехал покупать BMW M3 GTR из NFS Most Wanted, а оказалось…

Просмотров 569 тыс.

Подземелья Чикен Карри #28 Часть 1 Чёрное небо вампиров (Шастун, Позов, Попов, Гудков)

1:38:52

Подземелья Чикен Карри #28 Часть 1 Чёрное небо вампиров (Шастун, Позов, Попов, Гудков)

Просмотров 1,2 млн

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

35:35

Q-Learning: Model Free Reinforcement Learning and Temporal Difference Learning

Просмотров 93 тыс.

The Boundary of Computation

12:59

The Boundary of Computation

Просмотров 969 тыс.

Markov Decision Processes - Computerphile

17:42

Markov Decision Processes - Computerphile

Просмотров 160 тыс.

Reinforcement Learning, by the Book

18:19

Reinforcement Learning, by the Book

Просмотров 79 тыс.

The Most Important Algorithm in Machine Learning

40:08

The Most Important Algorithm in Machine Learning

Просмотров 296 тыс.

Is the Future of Linear Algebra.. Random?

35:11

Is the Future of Linear Algebra.. Random?

Просмотров 234 тыс.

What happens at the Boundary of Computation?

14:59

What happens at the Boundary of Computation?

Просмотров 59 тыс.

What is Q-Learning (back to basics)

45:44

What is Q-Learning (back to basics)

Просмотров 92 тыс.

Reinforcement Learning: Machine Learning Meets Control Theory

26:03

Reinforcement Learning: Machine Learning Meets Control Theory

Просмотров 258 тыс.

Proximal Policy Optimization (PPO) - How to train Large Language Models

38:24

Proximal Policy Optimization (PPO) - How to train Large Language Models

Просмотров 17 тыс.

😍😂❤️ #shorts

00:12

😍😂❤️ #shorts

Просмотров 314 тыс.