Тёмный

AI Seminar 2020: Marlos C. Machado, An operator view of policy gradient methods (Nov 27) 

Amii
Подписаться 4,7 тыс.
Просмотров 359
50% 1

Marlos C. Machado presents "An operator view of policy gradient methods" at the AI Seminar (November 27, 2020).
The Artificial Intelligence (AI) Seminar is a weekly meeting at the University of Alberta where researchers interested in AI can share their research. Presenters include both local speakers from the University of Alberta and visitors from other institutions. Topics related in any way to artificial intelligence, from foundational theoretical work to innovative applications of AI techniques to new fields and problems, are explored.
Bio:
Marlos C. Machado is a research scientist at Google Brain, Montreal. Marlos received his Ph.D. from the Department of Computing Science at the University of Alberta. His research interests lie broadly in artificial Intelligence and particularly focus on reinforcement learning, including topics like representation learning, generalization, exploration and temporal abstractions.
Abstract:
We cast policy gradient methods as the repeated application of two operators: a policy improvement operator 𝙸, which maps any policy π to a better one 𝙸π, and a projection operator 𝙿, which finds the best approximation of 𝙸π in the set of realizable policies. We use this framework to introduce operator-based versions of traditional policy gradient methods such as REINFORCE and PPO, which leads to a better understanding of their original counterparts. We also use the understanding we develop of the role of 𝙸 and 𝙿 to propose a new global lower bound of the expected return. This new perspective allows us to further bridge the gap between policy-based and value-based methods, showing how REINFORCE and the Bellman optimality operator, for example, can be seen as two sides of the same coin.

Наука

Опубликовано:

 

28 июн 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии    
Далее
The Problem with Wind Energy
16:47
Просмотров 118 тыс.
How to stop AI project failure! | CXOTalk #840
50:57
Lid hologram 3d
0:32
Просмотров 9 млн
SSD с кулером и скоростью 1 ГБ/с
0:47