Тёмный

DQN in Pytorch from Scratch stream 1 of N | Deep Learning 

Jack of Some
Подписаться 30 тыс.
Просмотров 14 тыс.
50% 1

Опубликовано:

 

11 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 33   
@JackofSome
@JackofSome 4 года назад
Timestamps for mobile users: 0:00 Intro, goals 3:15 OpenAI gym discussion and exploration 23:45 Model class 37:00 Replay buffer 55:00 Train step/loss calculation 1:17:30 Debugging the train step 1:29:00 Setting up main for the final training 1:34:00 Sanity check, adding logging 1:43:00 Updating replay buffer to be faster 1:47:00 Epsilon greedy implementation 1:57:00 Training iteration 1 1:59:00 Oops ... fixing loss 2:01:00 Loss being wonky, tuning hyperparams 2:05:00 Training iteration 2 2:13:00 Writing code to test the model
@wkppp4732
@wkppp4732 2 года назад
Finally someone who can explain dqn 🤣 I can't believe that I have been searching for this for 2 years 🤣
@lucaswehmuth
@lucaswehmuth 3 года назад
Excellent video! Well motivated and much more informative than most videos on the topic. Watching you debug and go through code optimization was an invaluable lesson here. Thanks for posting.
@JackofSome
@JackofSome 3 года назад
Thanks. Glad it was helpful. I wanna do more but keep not finding the time. I think demonstrations of debugging are sorely lacking for educational material in all computer science and that was why I focused on that here so much.
@mikesmith853
@mikesmith853 4 года назад
Excellent session! Very engaging and full of detail. I've learned a lot. Thank you!
@emilgmelfald
@emilgmelfald 3 года назад
I have watched many videos of deep learning in Python, but this is by far the most educational I have watched! Thank you for uploading this content, and I look forward to watching more content from you.
@JackofSome
@JackofSome 3 года назад
Thank you for your kind words, this means a lot.
@joaoavf
@joaoavf 4 года назад
Thank you for the tutorial, there were many tips (wandb, ipdb, dataclass...) that I can incorporate into my own practice. It was my first DQN :)
@jamespogg
@jamespogg 2 года назад
Could someone please explain why in the loss function, we are multiplying by the one_hot actions and done_mask? I do not understand: loss_fn = nn.SmoothL1Loss() loss = loss_fn( torch.sum(qvals * one_hot_actions, -1), rewards.squeeze() + mask[:, 0] * qvals_next * 0.99 )
@hemanthkotagiri8865
@hemanthkotagiri8865 3 года назад
A walkthrough of your entire desktop, the window manager, Emacs, desktop env, operating system would be really good. I'm looking forward for it. Also, discord?
@hanwantshekhawat4314
@hanwantshekhawat4314 4 года назад
Yoo great series man.
@justinnicholson8372
@justinnicholson8372 2 года назад
Silly question. @34:26 how are you able to insert text from the minibuffer into the main buffer like that? Seems like such a useful behavior in emacs. Cheers for the VERY useful tutorial!
@ZeeshanKhan-sq9mb
@ZeeshanKhan-sq9mb Год назад
sir I got an error in the train_Step that torch.Tensor is not created because it got float32 not a sequence any solution ???
@mariomoran3067
@mariomoran3067 2 года назад
I havn't finished the video but are you doing markov decision process? I see there is a section with elision.
@haohuynhnhat3881
@haohuynhnhat3881 3 года назад
if you use visual code, then no need to use ipdb, just select the code and press shift+enter
@manassarpatwar
@manassarpatwar 4 года назад
Really great stream, one I'll refer to the most when I start implementing DQN. PS. Idk why everyone wants to be RU-vid partners lol, probably bots
@Throwingness
@Throwingness 3 года назад
Buffer? The Corleone Family had a lot of buffers.
@cyber7000
@cyber7000 4 года назад
Hey I want to make a bot for a 3D game what data should I feed it
@yanxiaosun4363
@yanxiaosun4363 4 года назад
Terrific tut! One quick question, though: why using one_hot to encode the actions, while we sum the qvals along rows in torch.sum(qvals*one_hot_actions, -1) ? =)
@JackofSome
@JackofSome 4 года назад
Because we're not really adding the qvals, we're selecting one and the is one way to do it which is differentiable
@brendanbrowne2103
@brendanbrowne2103 4 года назад
do you have the sample code, so I can just copy and paste. Python newbie here.
@2dapoint424
@2dapoint424 3 года назад
Is this in github?
@uonliaquat7957
@uonliaquat7957 3 года назад
My loss is getting increased after updating target_model, I've even tuned my parameters too
@amalpatel4676
@amalpatel4676 4 года назад
what's this ide you are using??
@JackofSome
@JackofSome 4 года назад
Spacemacs. I have a number of videos on it
@uonliaquat7957
@uonliaquat7957 3 года назад
Would you mind providing the link to Github?
@JackofSome
@JackofSome 3 года назад
github.com/safijari/jack-of-some-rl-journey
@uonliaquat7957
@uonliaquat7957 3 года назад
@@JackofSome Thanks for this. Well I need to ask you one thing that my loss is getting reduced properly but avergae reward isn’t going above than 28 even though I’ve used the same hyoer parameters as yous
@uonliaquat7957
@uonliaquat7957 3 года назад
@@JackofSome There's no any file related to this stream on the provided link
@JackofSome
@JackofSome 3 года назад
Sorry. This is the correct link. github.com/safijari/rl-tutorials If you look through the commits you can find the state of the code at the end of each stream
@uonliaquat7957
@uonliaquat7957 3 года назад
@@JackofSome Thanks I'll check it out. Would you mind telling me that is there any possible reason of average reward not going above than 60 even though I've used the same parameters as yours
Далее