Тёмный

Temporal Difference Learning (including Q-Learning) | Reinforcement Learning Part 4 

Mutual Information
Подписаться 68 тыс.
Просмотров 28 тыс.
50% 1

The machine learning consultancy: truetheta.io
Join my email list to get educational and useful articles (and nothing else!): mailchi.mp/truetheta/true-the...
Want to work together? See here: truetheta.io/about/#want-to-w...
Part four of a six part series on Reinforcement Learning. As the title says, it covers Temporal Difference Learning, Sarsa and Q-Learning, along with some examples.
SOCIAL MEDIA
LinkedIn : / dj-rich-90b91753
Twitter : / duanejrich
Github: github.com/Duane321
Enjoy learning this way? Want me to make more videos? Consider supporting me on Patreon: / mutualinformation
SOURCES
[1] R. Sutton and A. Barto. Reinforcement learning: An Introduction (2nd Ed). MIT Press, 2018.
[2] H. Hasselt, et al. RL Lecture Series, Deepmind and UCL, 2021, • DeepMind x UCL | Deep ...
SOURCE NOTES
The video covers topics from chapters 6 and 7 from [1]. The whole series teaches from [1]. [2] has been a useful secondary resource.
TIMESTAMP
0:00 What We'll Learn
0:52 No Review
1:18 TD as an Adjusted Version of MC
2:49 TD Visualized with a Markov Reward Process
6:34 N-Step Temporal Difference Learning
8:08 MC vs TD on an Evaluation Example
11:50 TD's Trade-Off between N and Alpha
12:47 Why does TD Perform Better than MC?
15:29 N-Step Sarsa
17:15 Why have N above 1?
19:02 Q-Learning
20:50 Expected Sarsa
21:48 Cliff Walking
25:04 Windy GridWorld
28:12 Watch the Next Video!
NOTES
Code to compare TD vs MC on the evaluation task: github.com/Duane321/mutual_in...

Опубликовано:

 

1 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 97   
@lordjared2572
@lordjared2572 Год назад
ok, just pls upload more vids. There's a huge vacuum of ML education out here for people who are not scared of math.
@Mutual_Information
@Mutual_Information Год назад
That’s the audience I want!
@rewixx69420
@rewixx69420 Год назад
I learng ML by my self its hart find informatio this gus saves me on RL thanks
@marcin.sobocinski
@marcin.sobocinski Год назад
Your animations are fantastic, it's like a new dimension of learning. I helps so much to be able to visualize RL processes. Thank you!
@Mutual_Information
@Mutual_Information Год назад
Thank you Marcin - it means a lot when I hit someone in the audience exactly as I hoped :)
@samlaki4051
@samlaki4051 Год назад
Starting to get into RL, Yannic recommended you! How have I missed such a gem of a channel!
@Mutual_Information
@Mutual_Information Год назад
Love Yannic's stuff. Super pumped to get the shout out.
@marcegger7411
@marcegger7411 Год назад
Fantastic!! Keep up the amazing work! It's always so great to see quality content presented so eloquently :)
@saranahluwalia5353
@saranahluwalia5353 Год назад
I wish I had this to review 5 years ago. This would have eliminated wasteful experiments. Thank you for making this more accessible.
@Mutual_Information
@Mutual_Information Год назад
The way I'm designing the channel.. is the channel I would have wanted when I was learning ML for the first time. Seems like that theory landed!
@bornamorasai5285
@bornamorasai5285 Год назад
Can't wait for part 5 and 6!!!! Let's go!!!!
@rostislavmarkov7488
@rostislavmarkov7488 5 месяцев назад
Awesome series covering essential fundamentals with great didactics. Raised the bar at creating high-quality content!
@mryazbeck98
@mryazbeck98 10 месяцев назад
I love your videos to recap what I read in the book. Helps me understand and visualize everything better. I was however hoping to know more about batch training because I didn't understand how it works at all!
@hjop010
@hjop010 Год назад
Great video as always! It is helping me so much complementing Barto & Sutton.
@Mutual_Information
@Mutual_Information Год назад
Sweet - that's exactly how I want this series used!
@pandie4555
@pandie4555 10 месяцев назад
dude, the amount of work you put in these videos is fantastic.
@Mutual_Information
@Mutual_Information 10 месяцев назад
lol yea these videos were crazy hard. This one took me 100+ hours
@buh357
@buh357 Год назад
I am starting to learn RL, and your video is helping me a lot. You have a clear and precise explanation; thank you. Looking forward to new coming videos :)
@Mutual_Information
@Mutual_Information Год назад
excellent! Exactly what I'm going for :)
@antonkot6250
@antonkot6250 8 месяцев назад
O, man. This visualisations are top-notch!
@victormanuel8767
@victormanuel8767 Год назад
"You've covered a lot, give yourself a 3 second break" *0.5 seconds later* "Great, let's keep going"
@skirazai7591
@skirazai7591 Год назад
Man your doing some very high quality stuff ,keep it up.
@Mutual_Information
@Mutual_Information Год назад
Thanks - I'm tryin!
@fedelozano2895
@fedelozano2895 Год назад
Hi, your videos are really specific and super helpful! This information is helping me with my paper, can´t wait for the next one, thank you :)
@Mutual_Information
@Mutual_Information Год назад
Lol yea specific as hell! Glad it helps and I'm working on the next one as we speak!
@ryderbrooks1783
@ryderbrooks1783 Год назад
This channel is extremely under subscribed. I very much appreciate the work you're putting in here. Thank you
@Mutual_Information
@Mutual_Information Год назад
Ha well I can't expect a large number of subs when my stuff is so technical. So.. should I make it less technical? Nope!
@AlisonStuff
@AlisonStuff Год назад
love it!! so good!!!
@ezragarcia6910
@ezragarcia6910 Год назад
Thanks!! I just found your channel and IT'S AWESOME!
@Mutual_Information
@Mutual_Information Год назад
Thanks Ezra - I think it’s a work in progress lol 😁
@broccoli322
@broccoli322 Год назад
Great videos! Can't wait for more.
@Mutual_Information
@Mutual_Information Год назад
This is one of my favorites in fact - glad it hits!
@datsplit2571
@datsplit2571 Год назад
High quality videos, my compliments! This helps so much in understanding RL for a Master's course. Thank you!
@Mutual_Information
@Mutual_Information Год назад
You're welcome! And yuno what would be totally sweet? If you told your classmates about these vids :)
@datsplit2571
@datsplit2571 Год назад
@@Mutual_Information Posted it in the teams chat of the Advanced Machine Learning course!
@Mutual_Information
@Mutual_Information Год назад
@@datsplit2571 thank you! Over time moves like that will make all the difference :)
@nathanzorndorf8214
@nathanzorndorf8214 5 месяцев назад
Thanks for this. Amazing.
@qiguosun129
@qiguosun129 Год назад
Excellent lecture! It solved the doubts about the method that reviewers asked me to do parameter uncertainty analysis in scientific research papers.
@Mutual_Information
@Mutual_Information Год назад
Excellent - that's thrilling to hear this has some real impact!
@selcukkalafat2857
@selcukkalafat2857 Год назад
thank you. looking forward for the next part
@Mutual_Information
@Mutual_Information Год назад
In the works :) but I'll need some patience
@timothytyree5211
@timothytyree5211 Год назад
Excellent video! I am so stoked to use this in my work!
@timothytyree5211
@timothytyree5211 Год назад
I used the knowledge of ^this vid today to help a buddy out at work! You rock, Duane!
@timothytyree5211
@timothytyree5211 Год назад
I'm really looking forward to your next video on function approximation!
@user-sx3dy6cw8m
@user-sx3dy6cw8m 3 месяца назад
This is a life saver
@marcin.sobocinski
@marcin.sobocinski Год назад
Dziękujemy.
@IRONMAIDEN146
@IRONMAIDEN146 Год назад
Your videos are helping me a lot in my AI engineering degree, thanks a lot!
@Mutual_Information
@Mutual_Information Год назад
Love it!
@sathyakumarn7619
@sathyakumarn7619 Год назад
So precise and fun! But Highly under rated! Please advertise so that more people can be benefitted!
@Mutual_Information
@Mutual_Information Год назад
Thank you! and I agree, distributing this needs some more effort. Sometimes my tweets help
@bean217
@bean217 5 месяцев назад
"If you recall... which you better!" I swear, I recall!
@bmenashetheman
@bmenashetheman Год назад
What a fantastic series, thank you so much!!!
@Mutual_Information
@Mutual_Information Год назад
Thanks Ben, glad you see the same value I do Btw, if you know other people studying the same subject, it would help a lot to share this with them :)
@b0nce
@b0nce Год назад
Double this, great effort, excellent videos, thank you so much Also, Duane, you forget to add this video into RL playlist
@Mutual_Information
@Mutual_Information Год назад
@@b0nce oh thank you! Fixed
@bmenashetheman
@bmenashetheman Год назад
@@Mutual_Information already shared it with everyone in my class! I'm certain this channel will get really popular really soon, your content is fantastic.
@Mutual_Information
@Mutual_Information Год назад
@@bmenashetheman oh you rule! Thank you!
@surakshachoudhary2880
@surakshachoudhary2880 Год назад
Eagerly awaiting the remaining episodes - remarkable work there! So far I've just watched the videos, and I think it can only become clearer with some practice - but was curious why I keep hearing about 'deep' RL? Where does the 'deep' a.k.a. neural nets fit into these videos..
@Mutual_Information
@Mutual_Information Год назад
Parts 5 and 6 are in the works - I only just started them so, it'll take some time. Nothing coming this month, but probably in Dec. Good question! "Deep" in "Deep RL" refers to deep learning, where we utilize neural networks with many layers to learn complex functions from observations. At this point, those NNs have had no place to be inserted - but that changes in part 5. In part 5, we'll discuss handling state-space that are so huge, we can't list them out in a table. In that case, you can use a function to model giant swaths of those states.. and Deep NNs can be especially good at that. My video won't be a deep dive in NNs - that's too big of a subject. But it should be clear how they would get used.
@NazerkeSafina
@NazerkeSafina 9 месяцев назад
superb job with visualization. keep up! Only you could explain certain things to me, I've watched several other tutorials and wasn't feeling confident. One thing, I wish the explanation of how V(s) obtained for each state was more detailed, perhaps with multiple samples and step by step calculations.
@rewixx69420
@rewixx69420 Год назад
episode 6 finally i will undestand PPO
@arrozenescau1539
@arrozenescau1539 6 месяцев назад
i wish i could like twice your videos
@Mutual_Information
@Mutual_Information 6 месяцев назад
Well unfortunately, there is no way to double-like. I see only one solution: I need to upload 2x more videos!
@kimchi_taco
@kimchi_taco 11 месяцев назад
14:30 TD is better than MC in general. In my opinion, * TD: It's more align to Bellman Optimality equation, as it focuses on n steps optimization. * MC: It's more align to Bellman equation (with sampling), as it averages the rewards over the trajectory.
@user-qm6up7kz4n
@user-qm6up7kz4n 9 месяцев назад
04:00 "Return g_3 is diff of levels at t=3 and the end of Episode". Could someone explain this? a)Why g_3 is that and b)how do we know return at at t=3? In our BJ example we only know Reward at end of Episode(play), and we use that Reward to update Q.
@Electrikalforenzis
@Electrikalforenzis Год назад
Where are the rest, you are doing fine job with these episodes!!
@Mutual_Information
@Mutual_Information Год назад
haha thank you very much. I need a bit of time for parts 5 and 6. I just moved to a new house, got a full time job, many little things.. but it's coming :)
@the_random_noob9860
@the_random_noob9860 3 месяца назад
In an epsilon greedy policy, the two probabilities are epsilon and 1 - epsilon. So, is my understanding correct? if epsilon = 0, the policy always takes the max action value from q table while generating the episode that q-learning, sarsa and expected sarsa becomes identical.
@123ming1231
@123ming1231 Год назад
can u make a video later, showing how u make those animation, it is fantasic !!!! It show the concept very clearly !!! The data visualization art behind is so elegant
@Mutual_Information
@Mutual_Information Год назад
Maybe one day.. The code I use is a big personal library that's not ready for the public. But I could see doing that.. maybe in a year or two after things have gone well. We'll see
@raghavendrakaushik1691
@raghavendrakaushik1691 2 месяца назад
At 4:23 Shouldn't it be traversing backwards in time for MC?
@snowflake5204
@snowflake5204 Год назад
At 20:30 shouldn't it be SARSA rather than TD1? Since we use state value function in TD rather than state action
@Mutual_Information
@Mutual_Information Год назад
Sorry it's not clear. I'm using 1-step TD control and Sarsa interchangeably here.
@samuelepignone8255
@samuelepignone8255 Год назад
Thanks a lot for your videos. There's just one thing that doesn't make sense to me: in the last example when you add Q-learning in the graph, it has a lower maximum reward than SARSA, and I don't understand how that's possible since the path it follows has many fewer steps. I hope I have explained my doubt well.
@Mutual_Information
@Mutual_Information Год назад
I don't know either actually. My intuition, by this point, is that an inability to explain performance is the rule, not the exception. It's rare that you can tell a story about why one algo is superior on a particular problem. These very simple toy examples are designed precisely to call out the different in their character. The last one, however, is weird enough that I can't explain all the performance gaps. If anyone else has an intuition, please chime in!
@omerlevy6939
@omerlevy6939 14 часов назад
why in 18:16 the last n action values are the only ones who are getting updated
@sidnath7336
@sidnath7336 Год назад
Could we get videos on Markov Chain Monte Carlo methods?
@Mutual_Information
@Mutual_Information Год назад
MCMC! Absolutely, just may take me a bit to get to it
@hihellohowrumfine
@hihellohowrumfine Год назад
Can you please do a series on statistics
@Mutual_Information
@Mutual_Information Год назад
That's a bit broad. Is there a particular topic you're interested in?
@hihellohowrumfine
@hihellohowrumfine Год назад
@@Mutual_Information specifically statistical learning theory, something like what 3blue1brown channel has done for linear algebra. A lot of times when I read ML papers, it's hard to deeply appreciate why certain techniques work.
@abramgeorge3290
@abramgeorge3290 11 месяцев назад
why didn't we use importance sampling in Q-Learning, I have been searching for an answer for days with no clue
@coconut_camping
@coconut_camping Год назад
I bet you are in Stanford as a professor teaching RL by now? This became a RL bible to me.
@Mutual_Information
@Mutual_Information Год назад
haha not quite a professor! But if you're using this as a resource, I consider my job fulfilled
@imanmossavat9383
@imanmossavat9383 Год назад
why the mean TD performance is getting worst as you increase m (11:24)
@Mutual_Information
@Mutual_Information Год назад
I am not sure.. but I know the behavior is expected. That's actually a question posed in Sutton/Barto's book and I'm sure the answer is online somewhere.
@imanmossavat9383
@imanmossavat9383 Год назад
@@Mutual_Information Thank you for your response. I really benefit from your videos. If I figure out the answer, I will share it here.
@danielm3772
@danielm3772 Год назад
From what I have read online and my personal interpretation: this is due to 2 factors, mainly a big value for alpha and the initial state values. I we take the 5 states (calling them A,B,C,D,E) example, we know that the true values are 1/6, 2/6, 3/6, 4/6, 5/6. If we then use an initialization schema of 1/2 for all of them, then first we will see a decrease in the error due to the update of A,B,D,E (as they have the biggest difference compared to the true value), however at some point they are going to stabilize and V(C) is going to change as well, and because the value of alpha is big, we will move away from 1/2 (which corresponds to the initial AND true value) by an non-negligeable amount. Hope that helps.
@catcoder12
@catcoder12 9 месяцев назад
I really liked the videos, but a pace felt a bit too quick...The efforts put into examples are commendable.
@Mutual_Information
@Mutual_Information 9 месяцев назад
I'll take it! I'm learning the slowness thing.. a bit
@hansthompson
@hansthompson Год назад
where is part five? in production?
@Mutual_Information
@Mutual_Information Год назад
Yea, I took a little break before starting part 5. I'm currently writing it. It'll take sometime. Should be ready in January.
@hansthompson
@hansthompson Год назад
@Mutual Information very easy to follow. I'll be patiently waiting. Thanks.
@swastiksharma2683
@swastiksharma2683 7 месяцев назад
you have so good content but you tried to make the video as short as you can due to which there are no natural pauses in the video making it difficult to focus and understand your content.
@Mutual_Information
@Mutual_Information 7 месяцев назад
I think you're right. I'll have fewer cuts in future videos, and I have less cuts in my more recent ones.
@raminessalat9803
@raminessalat9803 10 месяцев назад
You videos are amazing and I know the time spent for creating these are probably astronomical. But i do have a feedback that would help your videos and its my own observation. I think your body language is too much and I feel it is very unnatural/isn't meaningful for the content. I don't know if you are actually forcing it to have a body language or not, but I think body language is something that happens naturally and you don't need to try too hard for it. At first when I started to watch your videos, that was something that was repelling for me personally but when I saw your content, I became a fan of your channel. So hope you take it as a constructive feedback from a fan.
@Mutual_Information
@Mutual_Information 10 месяцев назад
Thank you, appreciate the genuine feedback, and I know what you mean. There's this awkward robotic-ness that's difficult to shake. But I think some of it is due this set up. In my more recent videos, my new setup has hopefully brought the unnaturalness down. A work in progress. I also may de-burden myself with trying to match my language with what I'll anticipate will be on screen.
Далее
Function Approximation | Reinforcement Learning Part 5
21:16
😍😂❤️ #shorts
00:12
Просмотров 314 тыс.
The Boundary of Computation
12:59
Просмотров 969 тыс.
Markov Decision Processes - Computerphile
17:42
Просмотров 160 тыс.
Reinforcement Learning, by the Book
18:19
Просмотров 79 тыс.
The Most Important Algorithm in Machine Learning
40:08
Просмотров 296 тыс.
Is the Future of Linear Algebra.. Random?
35:11
Просмотров 234 тыс.
What happens at the Boundary of Computation?
14:59
Просмотров 59 тыс.
What is Q-Learning (back to basics)
45:44
Просмотров 92 тыс.
😍😂❤️ #shorts
00:12
Просмотров 314 тыс.