Тёмный

Stanford CS234: Reinforcement Learning | Winter 2019 | Lecture 1 - Introduction - Emma Brunskill 

Stanford Online
Подписаться 604 тыс.
Просмотров 638 тыс.
50% 1

For more information about Stanford’s Artificial Intelligence professional and graduate programs, visit: stanford.io/ai
Professor Emma Brunskill, Stanford University
stanford.io/3eJW8yT
Professor Emma Brunskill
Assistant Professor, Computer Science
Stanford AI for Human Impact Lab
Stanford Artificial Intelligence Lab
Statistical Machine Learning Group
To follow along with the course schedule and syllabus, visit: web.stanford.edu/class/cs234/i...
#EmmaBrunskill #reinforcementlearning
Chapters:
0:00 intro
02:20 Reward for Sequence of Decisions
13:23 Imitation Learning vs RL
23:02 Sequential Decision Making
24:42 Example: Robot unloading dishwasher
25:19 Example: Blood Pressure Control
52:04 Key challenges in learning to make sequences of good decisions
54:15 Reinforcement learning example

Опубликовано:

 

26 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 49   
@cineblazer
@cineblazer 2 года назад
the fact that this is available online for free is truly remarkable. stanford rocks.
@second1799
@second1799 4 месяца назад
oh yea? whatdya learn bro?
@petrogaglozou9976
@petrogaglozou9976 2 года назад
Thank you for making this course available. Thank you Stanford
@coder-wolf
@coder-wolf Год назад
This is really such an incredible lecture series. So much informative yet really well explained.
@shaozefan8268
@shaozefan8268 7 месяцев назад
Future is independent of past given present🙃 I am taking this class at the southwestern corner of Northeastern University
@Tomharry910
@Tomharry910 7 месяцев назад
Are coding assignments available for the public?
@lukechong590
@lukechong590 2 месяца назад
Lecture proper 22:49
@sichengmao4038
@sichengmao4038 Год назад
11:41 I wonder why go game doesn't need exploration. In fact even human player would compute several steps after to see some possibilities.
@firefistace8569
@firefistace8569 Год назад
It seems counter intuitive
@anastasisaglogallos3147
@anastasisaglogallos3147 8 месяцев назад
I am commenting to come back in case of a good reply
@manuellayburr382
@manuellayburr382 7 месяцев назад
Board games require *search*. This is defined differently from exploration. Search involves following the known rules of the game, exploration involves trying to discover what the rules are.
@shaozefan8268
@shaozefan8268 7 месяцев назад
According to the slide 1 page 15 of this course in 2023 winter: "AI planning assumes having a model of how decisions impact environment". It seems the model is given and do not need to be learn by exploration process explained in 9:09.
12 дней назад
IMO it does need exploration, even if rules of the game are given agent needs to explore new strategies, "what if I play it different this time?" Deepmind when doing AlphaGo framed it as RL problem and also shown that by exploration their agent was able to discover new, previously unknown to human strategies, so solving Go *is not* only planning
@tai15515
@tai15515 Год назад
The link to the course page doesn't work. Can anyone tell me where else I can get the slides?
@kshitijshekhar1144
@kshitijshekhar1144 Год назад
take down notes
@Jursus-mx1le
@Jursus-mx1le 5 месяцев назад
Don't they combine RL theories with code when teaching basic knowledge?
@MuhammadAhsan-hq2bc
@MuhammadAhsan-hq2bc Месяц назад
As a graduate student you are kind of expected to figure out the coding aspect on your own and seek help of TA's office hours :)
@mohammadrezanargesi2439
@mohammadrezanargesi2439 Год назад
Hi, can anybody please let me know what book can be a basis for this course?
@mohammadrezanargesi2439
@mohammadrezanargesi2439 Год назад
@@taslas thanks
@puskarwagle2392
@puskarwagle2392 11 месяцев назад
what is it ?
@pedrohenriquefernandes3268
@pedrohenriquefernandes3268 8 месяцев назад
@@puskarwagle2392 Reinforcement Learning: An Introduction, by Richard Sutton and Andrew Barto
@mohamedmounirabbes9879
@mohamedmounirabbes9879 6 месяцев назад
Any new !
@Rumit_Pathare
@Rumit_Pathare 2 месяца назад
47:30:00
@forheuristiclifeksh7836
@forheuristiclifeksh7836 3 месяца назад
6:00
@yuanmengandy
@yuanmengandy 10 месяцев назад
as a non-native speaker, i don`t get used to so flexible tone,sometimes tone is too high for me to recognize.
@Yeonjun_Choi_
@Yeonjun_Choi_ Год назад
7:18
@dekroplay5373
@dekroplay5373 7 месяцев назад
This slide could have needed a little bit more explanation.
@Bao_Lei
@Bao_Lei Год назад
The lecturer is wrong about Markov's assumption @36:02. Blood pressure, exercise, etc. are different features of a state. Given the values of all features at the current state, blood pressure is independent of the values of any features in all past states. Hence the system is Markov.
@TacoMaster07
@TacoMaster07 11 месяцев назад
blood pressure is dependent on states like exercise, what you eat * how much , genetics, etc.
@Bao_Lei
@Bao_Lei 11 месяцев назад
@@TacoMaster07 To put it simply, Markov's Assumption is "that the *future* actions are influenced only by the *present*, not the *past* states". In other words, Markov's Assumption emphasizes only the independencies across the time domain, *NOT* the independencies across different features. Hence, explaining the concept of Markov's Assumption by saying one feature, such as "blood pressure", is dependent on other features, such as "exercise", is missing the point on time domain dependency. The professor's other example, such as "hot-or-not-outside", is simply wrong. Blood pressure can be dependent on the temperature outside, but still satisfies Markov's assumption, as long as knowing *current* temperature provides sufficient information about blood pressure, regardless of the temperature in the past. In fact, Markov's Assumption is so widely applicable exactly because most natural processes are continuous in time, i.e. knowing the current state is often sufficient to ignore the past states. The common examples of processes that violate Markov's Assumption are human discretion and randomness.
@vfestuga
@vfestuga 4 месяца назад
But your current blood pressure is dependent on actions you took in the past, and these actions will still influence it. So taking medication may have different outcomes (future states) for a given blood pressure value (state).
@Bao_Lei
@Bao_Lei 4 месяца назад
@@vfestuga If by “action in the past” you mean, for example, “taking blood-pressure regulating medication”, which affects current blood pressure, and current blood pressure affects current action of taking more blood pressure regulating medicine, then agreed, this process is NOT Markov. However, the lecturer cited “other features in the past, such as exercise or just-ate-a-meal, etc.” as the reason, hence missing pointing out whether there is a path from past blood pressure to current blood pressure, which is the key to Markov Assumption. This could be misleading. Markov Assumption is valuable because it allows us to determine the current state without knowing the very initial state of the system, which simplifies computation. Hence, if my blood pressure at t can be fully determined by knowing my action at t-1, it is Markov. It becomes a problem if determining my blood pressure at t somehow requires knowing my blood pressure at t-1, because my blood pressure at t-1 will then be dependent on my blood pressure at t-2, so on and so forth, all the way till t at time 0, which will not be Markov. So, the Markov Assumption is not so much about whether the current state is dependent on any states in the past. If there is such dependency, such as just ate a meal, etc, as provided by the lecturer as an example, it can be resolved by tricks such as realigning the timestamp if necessary. The key takeaway should be whether there is a path from the same feature in the past states affecting the feature in discussion in the current state. When I see doctors, they prescribe medicine based mostly on my current symptoms, most of the time anyway. I don’t know about you, but it certainly sounds reasonable to me. More thoughts?
@EpicFaceInc100
@EpicFaceInc100 4 месяца назад
@@Bao_Lei Well I think the caveat is that the action is taking medicine. What she's arguing is that just because the state of your blood pressure is high, it doesn't mean that you should take medicine. It depends on whether your blood pressure is high at rest or your blood pressure is high from exercise. I know that the system is still Markov, since the next state only depends on the previous state, but I guess she's pushing that you shouldn't do a specific action based solely on the previous state, and have to take into account other things.
@zohan1ify
@zohan1ify 2 месяца назад
This is what i feel the better example for RL Agent: Represents an individual, such as a human being, who is making decisions and taking actions in the environment. Environment: Represents the world or the context in which the agent operates. It includes all the factors that influence the outcomes of the agent's actions, such as societal norms, laws, and consequences. State or Events: These are the situations or circumstances the agent encounters in the environment. In your analogy, these could be viewed as the events or experiences that the individual faces in their life. Actions (Good or Bad Deeds): Actions taken by the agent in response to the events or states encountered in the environment. These actions can be categorized as good deeds (positive actions) or bad deeds (negative actions) based on their consequences. Rewards (Paradise or Hell): The consequences of the agent's actions. Good deeds may result in positive rewards (such as paradise), while bad deeds may result in negative rewards (such as hell). Here's how the analogy relates to RL: The agent (individual) learns from its experiences in the environment, just as RL agents learn from interacting with their environments. Good deeds (positive actions) lead to positive rewards (paradise), reinforcing the behavior. Bad deeds (negative actions) lead to negative rewards (hell), discouraging the behavior. In RL terms, the agent aims to maximize its cumulative reward over time by learning which actions lead to desirable outcomes (paradise) and which actions lead to undesirable outcomes (hell). Through trial and error, the agent learns to choose actions that maximize its long-term reward, similar to how humans learn from their experiences to make better decisions in life. Agree ?
@adityanjsg99
@adityanjsg99 10 месяцев назад
The more I hear, the more I confused... 😮
@davids949
@davids949 Год назад
Wow, the Stanford online? Is SBF a member of your club? Maybe you know where he is? Oh wait he went to MIT. I guess it's the sultry wood nymph that went to Stanford that I was hoping to hear about today in your lecture. I can hardly wait to hear what you have to say given the newfound credibility of Stanford grads in the crypto space. I'm super excited for another fancy box scheme. Or better yet maybe you can tell me some more about how I can get my blood tested by some super slick new machines that another Stanford grad came up with. WAY EXCITED!!
@devilsolution9781
@devilsolution9781 3 месяца назад
Envy is a hell of a drug.
@qingqiqiu
@qingqiqiu Год назад
Does anyone have the feeling that the lecturer has a poor clarification on the notation she used.
@prosecurity8789
@prosecurity8789 Год назад
me also
@scharlesworth93
@scharlesworth93 Год назад
@@prosecurity8789 Joe MAma has poor clarifications
@muuubiee
@muuubiee Год назад
prerequisits.
@DavidAGarciaMontreal
@DavidAGarciaMontreal Год назад
Can you provide some examples ? I think it was very clear.
@nature_through_my_lens
@nature_through_my_lens 5 месяцев назад
Yep.
@wiktorm9858
@wiktorm9858 Год назад
Fir now it's deadly boring, but I will see the next ones
@dekroplay5373
@dekroplay5373 7 месяцев назад
How does that come? Were the other ones better than the first lecture?
@wiktorm9858
@wiktorm9858 7 месяцев назад
@@dekroplay5373 I think I just listened to about half of this lecture
@dekroplay5373
@dekroplay5373 7 месяцев назад
@@wiktorm9858 Didn't expected an answer. Second half was a little more interesting. imo
Далее
MIT Introduction to Deep Learning | 6.S191
1:09:58
Просмотров 305 тыс.
WoT Blitz. Late Night Birthday Lotto + Gifts and Presents
1:07:55
1. Introduction to Human Behavioral Biology
57:15
Просмотров 17 млн
MIT 6.S191 (2023): Reinforcement Learning
57:33
Просмотров 124 тыс.
Einstein's General Theory of Relativity | Lecture 1
1:38:28
AI Learns to Walk (deep reinforcement learning)
8:40
WoT Blitz. Late Night Birthday Lotto + Gifts and Presents
1:07:55