Тёмный
No video :(

ReBeL - Combining Deep Reinforcement Learning and Search for Imperfect-Information Games (Explained) 

Yannic Kilcher
Подписаться 260 тыс.
Просмотров 36 тыс.
50% 1

Опубликовано:

 

21 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 58   
@connor-shorten
@connor-shorten 3 года назад
Pumped for this!!
@brunomartel4639
@brunomartel4639 3 года назад
i love you both equally, glad you dont fight
@norik1616
@norik1616 3 года назад
Consider colab on a 2 sided anonymous MEME review.
@oncedidactic
@oncedidactic 3 года назад
That high level summary section at the end is super fun once you've run through the preceding explanation, it's like a tony hawk perfect run. XD
@jahcane3711
@jahcane3711 3 года назад
I love the eye you drew Yannic. Great review, thanks
@hanabimock5193
@hanabimock5193 3 года назад
Thanks Yannick, your explanations are the best
@robindebreuil
@robindebreuil 3 года назад
Lisa: Poor predictable Bart, always takes rock. Bart: ROCK! NOTHING BEATS ROCK!
@binjianxin7830
@binjianxin7830 3 года назад
A POMDP is not an MDP. The infostate is not a sufficient state of the history. The whole trick is to transform the infostate formulation into a sufficient one.That’s why the early strategy of the opponent needs to be considered and hence the added complexity and the computation. This could be clarified in the first place.
@stefanvasilev8948
@stefanvasilev8948 3 года назад
I love your channel, keep up the good work!
@herstar9510
@herstar9510 2 года назад
This was simultaneously interesting and I couldnt understand it at the same time.
@AP-dc1ks
@AP-dc1ks 3 года назад
Self play :| Imperfect information :) Provably converges :O
@mdmishfaqahmed8356
@mdmishfaqahmed8356 3 года назад
Cool paper. IMHO the r-p-s example is a bit of underwhelming. There is a fundamental difference between games like chess and rock-paper-scissors. in chess the moves in a sequence are way more dependent on each other whereas in r-p-s it's somewhat independent just like 50 throws of a coin are independent of each other. Please correct me if I am wrong. Assuming the results of every game independent of each other (a game consists of one move from each player), picking rock is more potent for p1 (outcomes: 0, -1, +2). Then upon observing p1 picking rock often p2 can up the probability of picking paper more and so on. Eventually, though it should come to the same (04, 0.4, 0.2) equilibrium, starting with somewhat like (1.0, 0.0, 0.0) instead of the authors' suggestion of something like (0.0, 0.0, 1.0).
@23kl104
@23kl104 3 года назад
I like the drawing of your eye, it's got some sinister watchfulness
@MrAquaktus
@MrAquaktus 3 года назад
Amazing! Please keep doing these
@JTMoustache
@JTMoustache 3 года назад
To understand the first part of this video it is useful to look at the definition of Nash equilibrium and Pareto Optimal strategy. The coursera Game Theory course (1st and 2nd week) is great to understand those concepts.
@wilsontang8492
@wilsontang8492 3 года назад
Mind if you share which specific game theory course is it on Coursera?
@BernhardSchlegel
@BernhardSchlegel 3 года назад
Dreaming of a PyTorch / TensorFlow implementation on github for christmas...
@ozgeozcelik8921
@ozgeozcelik8921 3 года назад
Kaiji Ultimate survivor comes to mind
@wilsontang8492
@wilsontang8492 3 года назад
The AI probably won't worry about losing fingers :)
@ozgeozcelik8921
@ozgeozcelik8921 3 года назад
@@wilsontang8492 yeah, nothing to lose...
@Zantorc
@Zantorc 3 года назад
A three sided dice is an ordinary dice with opposite faces the same.
@DamianReloaded
@DamianReloaded 3 года назад
Or a rounded-off triangular prism see "Poly D3 Dice"
@michaelnurse9089
@michaelnurse9089 3 года назад
Pedantic man responding. That is a 3 outcome dice with six sides. Actual three sided dice are seemingly very rare (I have never seen them after a lifetime of playing games) but a quick google search reveals they do exist - it looks like they have to make use of unused rounded areas to work.
@Zantorc
@Zantorc 3 года назад
@@michaelnurse9089 Hmm... Topologically identify opposite faces.
@sofia.eris.bauhaus
@sofia.eris.bauhaus 3 года назад
i you sacrifice some precision you can just throw a coin.
@garret1930
@garret1930 3 года назад
@@Zantorc faces with equivalent normal vectors
@brandont643
@brandont643 2 года назад
Your a legend
@herp_derpingson
@herp_derpingson 3 года назад
I still dont fully understand the architecture. Are we differentiating through the CFR? What is the loss function?
@YannicKilcher
@YannicKilcher 3 года назад
No, as. Iunderstand it, CFR is the inner loop to determine the targets
@skinnyboystudios9722
@skinnyboystudios9722 3 года назад
How long does it take to learn on rtx3090. How many 3090s do you need and how long. Also how does training time compare to alphazero?
@nichevo
@nichevo 3 года назад
Yes
@ologhai8559
@ologhai8559 3 года назад
you wanna do some online casinos? 😂
@skinnyboystudios9722
@skinnyboystudios9722 3 года назад
@@ologhai8559 Yeah. Build an AI that beats pro players for fun.
@aleks1980
@aleks1980 3 года назад
@@skinnyboystudios9722 Pluribus was trained for 144 hours on 64 cpu which is about 200 bucks on aws.
@skinnyboystudios9722
@skinnyboystudios9722 3 года назад
@@aleks1980 Oh thats cheap, I thought it would take gpu servers
@XOPOIIIO
@XOPOIIIO 3 года назад
Can somebody inform where do you get the latest ML news about developments in the field, the latest research papers that was discussed etc?
@dome8116
@dome8116 3 года назад
deeplearn.org/ is a super website which tracks new releases etc. Besides that I subscribed to a few Telegram channels such as t.me/ai_machinelearning_big_data
@NextFuckingLevel
@NextFuckingLevel 3 года назад
you could open hackernews or r/machinelearning subreddit..
@jahcane3711
@jahcane3711 3 года назад
Did you mean it when you said this takes heaps of compute, don't try this at home? By that I mean, does it really take THAT much compute?
@RamRachum
@RamRachum 3 года назад
Where can I find a formal definition of the term "optimal strategy" for a 2-player game?
@berkerdemirel2899
@berkerdemirel2899 3 года назад
it is the strategy whose actions are always a best response. an action is a best response if u(a* | state) >= u(a | state) for all a in action space. note that best response may not be exist for all states of the game (there may not be one clear action that weakly dominates others) however you can acquire a mixture of actions (a probability distribution of actions) that weakly dominates all.
@RamRachum
@RamRachum 3 года назад
@@berkerdemirel2899 That makes sense in a one-player game, but the other player could have any other strategy.
@berkerdemirel2899
@berkerdemirel2899 3 года назад
@@RamRachum this is why it is called *best response*. for simplicity assume that actions are discrete and assume there is only one move. players decide what they play simultaneously (without observing other's action). then there are A1xA2 possible outcomes for the game (A1 is the action space of player 1 and A2 is of player 2). search for if any a1 inside A1 is better than all other actions *whatever player 2 plays*. it is the best response for player 1. do the same thing for player 2. and the intersection of best response set of player 1 and 2 is the nash equilibrium. (if there is no pure strategy meaning that no single a1 that dominates all other actions regardless of what player 2 plays, then there may be a mixture strategy (combination of actions with a probability distribution)). for further information you can check the terms best response, mixture strategies and nash equilibrium.
@NoHandleToSpeakOf
@NoHandleToSpeakOf 2 года назад
Is that what people do when they are dreaming while sleeping? Exploring possibilities space while imagining full knowledge for all participants.
@BigHorse4200
@BigHorse4200 2 года назад
Still don't get how you run CFR on a PBS?
@Prince-sf5en
@Prince-sf5en 3 года назад
Yay
@DavenH
@DavenH 3 года назад
A three-sided dice could exist, but not with geometric faces. Think of an intersection between 3 spheres, kind of in a Venn diagram / Nuclear hazard symbol configuration.
@DamianReloaded
@DamianReloaded 3 года назад
Had to google, it's a rounded-off triangular prism see "Poly D3 Dice"
@garret1930
@garret1930 3 года назад
There's other ways to make them too, you can take a cube and fillet three edges in a way that leaves you with a three sided die (each side will no longer be a plane though, and I don't think that a 3 sided die can be made with only planar faces)
@wildwest1832
@wildwest1832 2 года назад
I wish these things would allow the public to play against them. Let me play poker against this unbeatable bot, and so we can see ourselves.
@mehermanoj45
@mehermanoj45 3 года назад
1st?!
@techma82
@techma82 3 года назад
Ye
@HappyDancerInPink
@HappyDancerInPink 3 года назад
Bruh
@HappyDancerInPink
@HappyDancerInPink 3 года назад
@@dieseegurke7843 ich spreche keine Deutsch bruh
Далее
The kindhearted bunny officer helps the disabled!
00:20
СЛАДКОЕЖКИ ПОЙМУТ😁@andrey.grechka
00:11
Rethinking Attention with Performers (Paper Explained)
54:39
The moment we stopped understanding AI [AlexNet]
17:38
Просмотров 935 тыс.
This is why Deep Learning is really weird.
2:06:38
Просмотров 382 тыс.
An introduction to Reinforcement Learning
16:27
Просмотров 651 тыс.
The kindhearted bunny officer helps the disabled!
00:20