Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 3 - Model-Free Policy Evaluation

Подписаться 605 тыс.

Просмотров 110 тыс.

50% 1

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Professor Emma Brunskill, Stanford University
stanford.io/3eJW8yT
Professor Emma Brunskill
Assistant Professor, Computer Science
Stanford AI for Human Impact Lab
Stanford Artificial Intelligence Lab
Statistical Machine Learning Group
To follow along with the course schedule and syllabus, visit: web.stanford.edu/class/cs234/i...
0:00 Introduction
3:32 Dynamic Programming for Policy Evaluation
5:53 Dynamic Programming Policy Evaluation
15:27 First-Visit Monte Carlo (MC) On Policy Evaluation
23:44 Every-Visit Monte Carlo (MC) On Policy Evaluation
26:02 Incremental Monte Carlo (MC) On Policy Evaluation, Running Mean
27:35 Check Your Understanding: MC On Policy Evaluation
32:14 MC Policy Evaluation
34:30 Monte Carlo (MC) Policy Evaluation Key Limitations
37:35 Monte Carlo (MC) Policy Evaluation Summary
39:40 Temporal Difference Learning for Estimating V
48:08 Check Your Understanding: TD Learning
56:30 Check Your Understanding For Dynamic Programming MC and TD Methods, Which Properties Hold?

Опубликовано:

28 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 17

@user-cx5ni7me6l 2 года назад

Thanks to everyone who made it possible to upload this video.

@robensonlarokulu4963 Год назад

This is real quick. The first half of the book is covered just in three lectures which corresponds to digesting 60 pages per week.

@DimanjanDahal Год назад

what book? Sutton and BartoI?

@flecart Год назад

@@DimanjanDahal yeah that book

@prhc 3 дня назад

masters in computing from stanford requires 45 units. Full time student expected to complete in 1/2 years, part time in 3/5. Does anyone know how many units this course is worth? Wondering how many of these courses someone could complete at the same time given that half of the S&B textbook is covered in what looks like two weeks! X.X

@NguyenAn-kf9ho 10 дней назад

When we talk about Monte Carlo, when we evaluate V^(pi)(s), in order to pickout the best policy, we have to evaluate all possible policy ? and then pick the best one? Im a bit confused on how to do control here thanks :D

@mohammadrezanargesi2439 Год назад

Hi, Can anyone please explain how the Monte Carlo method should be implemented in real world where we have no model of The professor explains we repeat an experiment over and over again and average over all the values. But in some cases it's not possible to gain in insight of the environment, suppose we are sending a rover to europa moon of the Jupiter. We would have no time to carry out experiments in such cases... Also let's assume we can carry out experiment. Suppose the experiment is living in this world and history repeats itself. However the conditions is changing all the time. How can we calculate the values in such cases.

@zonghaoli4529 Год назад

43:34 should that V^{pi}(s_{t}) be approximated over s instead of s`?

@shaozefan8268 6 месяцев назад

Also think it should be s, s' indicates s_{t+1}

@MengLi-yw7ix Месяц назад

In the Mars rover example, why it remains s_2 when it takes action a_1 at state s_2??

@NguyenAn-kf9ho 17 дней назад

due to stochastic, taking action still have probability that the robot remains in current state :D I take "action" to go to work... but my body decides to sleep :D

@namluong4647 5 месяцев назад

Who can explain for me the difference between trajectory and episode?

@yaboidaggerlirette2391 3 месяца назад

Trajectory is the specific path taken to a termination state while an episode is just the term we give to a single "run" in this case. The term episode is just more broad but it is just the name of some process. In the case of the Mars rover, the trajectory is the state, action, and next state pair, while the episode is just the process from the start state to the termination state. Again, it's not specific, it's just what we call a process.