Deep Q Learning With Tensorflow 2

Machine Learning with Phil

Подписаться 43 тыс.

Просмотров 27 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

21 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 74

@MachineLearningwithPhil 4 года назад

If you like my RU-vid content, you'll love my courses. I'm almost always running a sale, so check the links in the description for the best price! Thanks for your support!

@MachineLearningwithPhil 4 года назад

Hey Bud, I'll run another sale tonight or tomorrow when I release a new video. Stay tuned!

@MachineLearningwithPhil 4 года назад

Sorry dude, I didn't get the notification on the last two comments you sent. No idea about the Pakt stuff. From what I've seen they are hit or miss.

@first-thoughtgiver-of-will2456 4 года назад

Thank you for these videos I'm a self taught machine learning researcher and find this material invaluable please keep up the great work.

@SonPham-CompetitiveProgramming 4 года назад

Hi Phil, I just finished your Udemy course. Great content that you created and the code base is very usable. I look forward to more courses from you. And I would love to see you tackle harder algorithms and projects because your explanation of paper and how to implement them is very nice!

@RedShipsofSpainAgain 4 года назад

project video Table of Contents: 1:39 ReplayBuffer class 10:19 Agent class for memory, network, and hyperparameters 20:20 write main_tf2_dqn_lunar_lander.py

@mgr1282 4 года назад

Hi Mr Phil, I have some issues with your code. I used it for CartPole-v0 and FrozenLake-v0 of gym. for cartpole it did very well but for frozenlake was very very weak. I don't know why. BTW, in your code, in the body of build_dqn function, you didn't use input_dims; why?

@killereks Год назад

Hey, thanks a lot for showing this. Really helpful for my dissertation work. I'm curious if you figured out the issue at 14:30?

@mesuraion8862 2 года назад

why is it that you don't update the weights of the deep network? I don't see it in any other algorithms you code. I really enjoy your tutorial. Hope you would answer the question.

@santhoshckumar7367 3 года назад

Great video; Phil.

@m.zubairislam3405 4 года назад

Hello @Machine Learning with Phil, There is any possibility to select multiple actions (based on ranking [rank-1: maximum q value, rank2: second max and soo on]) instead of one in action selection model.

@vurtnesaerdna 4 года назад

Such a good video, I learned a lot, thanks!

@jahcane3711 2 года назад

Hey Phill, has there been any updates or fixes to the slow choose action function since you released this video?

@ShortVine 4 года назад

Thank you so much for this vid. What is your opinion on stable-baselines framework or TF Agent. And can you please make vid using any framework?

@MachineLearningwithPhil 4 года назад

I found tf agent to be hard to use and haven't tried stable baselines

@geo2073 4 года назад

Thank you Dr. Phil!

@andrespena582 3 года назад

Hi Phil, thanks a lot for the video. I completed it along the video but I have a problem with the plotLearning function. It seems it's not included in the utils module anymore. Without the related lines, the code run smoothly but I can't see the learning data, what would you recommend to use instead of the utils module?

@orsimhon133 2 года назад

Hey Phil Why when using TensorFlow we are passing to "train_on_batch" the states and the updated q_target and in PyTorch, we are passing to Q_eval.loss the *q_target* and the q_eval Thanks

@TM-bj6zh Год назад

Not sure if because of a tensor flow 2 update, but I keep getting this error : "setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1." when trying to implement this problem with cart-polev2. I've copied exactly as in the video

@abhisheksaini5217 Год назад

I encountered the same problem. Have you managed to find a solution for it?

@MachineLearningwithPhil Год назад

You need to either use an older version of gym, or add info to the return of the reset, and add a truncated boolean to the return from the step function.

@Moprationsz 2 года назад

Hey, pretty clean implementation, but I would like to ask why didn't you used tf agents implementation of DQN. Thanks

@MachineLearningwithPhil 2 года назад

TF agents is good for plug and play, but you don't really learn how something works unless you code it yourself. I also had issues with installing tf agents, last time I tried. I got stuck in dependency hell. Granted, that was some time ago so it's probably better now, but it left a bad impression.

@Moprationsz 2 года назад

@@MachineLearningwithPhil I totally agree with you. And don't tell me about the dependency hell haha, tensorflow has soo many problems on that. I learned a lot in this video, and I'll also read DQN's paper. Thanks a lot, +1 sub!

@ஸ்ரீவாஸ்.எஸ்.பி 2 года назад

Hi Phil can you please give me some suggestion on how to practice to code like u and what are parameter I have to keep in mind for code in python in deep reinforcement learning like this for any case studies. Your suggestion will help for me to practice.

@MachineLearningwithPhil 2 года назад

Read the free book by Sutton and Barto. Implement the algorithms and do the questions. Then read and implement papers.

@CrackerJakkTV 4 года назад

Great tutorial! How would I go about reconfiguring the main agent file to work in more complex environments? I know that creating algorithms to play extremely complex games like Minecraft, CS: GO, etc. is quite a bit more complicated, but I think it would be cool to at least begin learning what it entails. Any sources you could point me to would be great, especially if you have some tutorials on Udemy on more complex development.

@JWJ-u8q Год назад

what is a target network, what you mentioned at the end of the video.

@MachineLearningwithPhil Год назад

It's a slowly moving copy of the online network that is used to stabilize training

@TheSAfreak 4 года назад

You never input_dims when calling build_dqn, why?

@prashantsharmastunning 3 года назад

why you didnt use a target network?

@JousefM 4 года назад

Comment for the algorithm, Phil. Nice one as always :)

@arron7857 4 года назад

On env.step(), what is the reward value? Also how does that reward value relate to the math in generating training data targets?

@colquin 4 года назад

Thanks so much for the coding video. The machine learning part was very useful as an hands-on tutorial for me! As a Vim enthusiast: your Vim skills could be improved, use don't make much use of its power ;)

@billallen9251 Год назад

This may be answered somewhere else, but in build_dqn, your code doesn't use input_dims. Shouldn't that be the dimension of the input layer?

@MachineLearningwithPhil Год назад

Tensor flow 2 infers the input dimensions. The argument isn't needed.

@billallen9251 Год назад

@@MachineLearningwithPhil Excellent - thanks! Really helpful content BTW!

@shaheerzaman620 4 года назад

Tensorflow 2 is beautiful. Much simpler and more elegant than Pytorch.

@JousefM 4 года назад

Would not agree on that one but I am just a newbie here :D

@albertog2196 4 года назад

because of Keras I would say

@kae4881 4 года назад

Awesome tutorials Phil!! I just had one question: If we are calculating the target values fro the q_eval network by doing a second pass from the same network for the next states, isn't the dqn just chasing its own tail? Thanx anyway i love your tutorials so much

@MachineLearningwithPhil 4 года назад

Yes, and you can fix it by using a second network that you use for the next states, and copying the weights to it from the q eval network, periodically. It doesn't make much difference in the lunar lander environment ( it's within the run to run variation ) but is quite helpful in more complex environments.

@kae4881 4 года назад

@@MachineLearningwithPhil So with a target network, if the weights are copied from the q eval network, don't both the networks become the same? And then the same problem occurs, the chasing your own tail;even though we are doing a pass with the target network, the q eval network and q target network are the same?? Anyway, Thank you so much!

@kae4881 4 года назад

Hi Phil! Love your tutorials. Just had one question: in the learning function, you've written q_target[batch_index, action] , when the bellman equation says q_target[state, action]. now that would only act on one state, while all the others are diregarded, as there is no loop. I have tried using dummy matrices and also tried your code, but i still dont get it. Please help. Thanx anyway!

@MachineLearningwithPhil 4 года назад

Look up array indexing and slicing. Using the array of indices takes care of each row.

@suryanshankur3078 4 года назад

I tried saving my model after training but it simply doesn't work can someone help please ?

@RSUtsha 4 года назад

Is pytorch code train faster than tensorflow 2.0 for this same problem?

@sergeypigida4834 4 года назад

Hi Phil. Could you please check why store_transition method in simple_dqn_tf2.py has been defined but never used. Is it responsible to fill in the memory buffer?

@MachineLearningwithPhil 4 года назад

github.com/philtabor/RU-vid-Code-Repository/blob/master/ReinforcementLearning/DeepQLearning/main_tf2_dqn_lunar_lander.py It's called in line 27

@sankethsalimath5436 3 года назад

Why isn't model.fit used here before using model.predict ?

@cromi4194 3 года назад

I tried inplementing your code with simple environments.No matter what I do, the network learns to always chose action 0. I can change the reward structure as much as I want.

@MachineLearningwithPhil 3 года назад

Is epsilon set to 1?

@alvynabranches1214 4 года назад

Sir can you make the same video using pytorch??

@MachineLearningwithPhil 4 года назад

Yup, stay tunes

@zaheerbeg4810 4 года назад

Anybody guide how can we make use of this code to solve our problems?

@marcl467 4 года назад

Thank you for this tutorial It took almost an hour for me to train with an average score of *14.63!* Looks like something went wrong at the end episode: 468 score 248.95 average score 188.47 epsilon 0.01 episode: 469 score 269.47 average score 192.11 epsilon 0.01 episode: 470 score 225.31 average score 195.72 epsilon 0.01 episode: 471 score 46.36 average score 194.66 epsilon 0.01 episode: 472 score 250.61 average score 198.44 epsilon 0.01 episode: 473 score 237.25 average score 198.50 epsilon 0.01 episode: 474 score 237.02 average score 202.16 epsilon 0.01 episode: 475 score 249.85 average score 204.11 epsilon 0.01 *episode: 476 score 259.90 average score 204.15 epsilon 0.01* episode: 477 score -520.33 average score 198.10 epsilon 0.01 episode: 478 score -567.86 average score 190.58 epsilon 0.01 episode: 479 score -670.45 average score 185.15 epsilon 0.01 episode: 480 score -413.31 average score 178.85 epsilon 0.01 episode: 481 score -995.13 average score 166.78 epsilon 0.01 episode: 482 score -450.73 average score 159.52 epsilon 0.01 episode: 483 score -583.07 average score 151.16 epsilon 0.01 episode: 484 score -586.59 average score 143.61 epsilon 0.01 episode: 485 score -436.80 average score 136.45 epsilon 0.01 episode: 486 score -665.69 average score 127.73 epsilon 0.01 episode: 487 score -602.49 average score 119.65 epsilon 0.01 episode: 488 score -1685.73 average score 99.95 epsilon 0.01 episode: 489 score -689.85 average score 91.23 epsilon 0.01 episode: 490 score -501.80 average score 83.63 epsilon 0.01 episode: 491 score -1016.14 average score 74.79 epsilon 0.01 episode: 492 score -475.65 average score 67.81 epsilon 0.01 episode: 493 score -525.98 average score 59.80 epsilon 0.01 episode: 494 score -393.08 average score 53.57 epsilon 0.01 episode: 495 score -557.32 average score 46.85 epsilon 0.01 episode: 496 score -431.39 average score 39.88 epsilon 0.01 episode: 497 score -944.19 average score 27.72 epsilon 0.01 episode: 498 score -571.92 average score 19.96 epsilon 0.01 episode: 499 score -498.09 average score 14.63 epsilon 0.01 The problem was solved at episode 476 but then after that all the remaining episodes have a negative score. Do you know what could have happened there?

@marc-andreladouceur3367 4 года назад

I asked the question on stackexchange but nobody has replied datascience.stackexchange.com/questions/70038/sudden-drop-of-score-in-the-last-few-episodes

@anirbanchowdhury5372 4 года назад

How are you managing the replay buffer for 1 million frames? my ram is running out in google colab.

@marcl467 4 года назад

@@anirbanchowdhury5372 Whats your tensorflow version? I think there was a memory leak in the first 2.0 version

@hocmid7761 4 года назад

I think u are overfitting, during ur training, check ur memory during last episodes

@howto7166 3 года назад

I m unable to read so small text in coding...

@blackdarkside8946 4 года назад

I got the problem, that when i start it, it starts the calculation but i dont get any prints while it is computing, so i cant see the progress until the programm is done. Then it prints all at once, what did i wrong? (sorry for my bad english)

@GlobalWarningIsAMyth 4 года назад

Are you sure the print statement is inside the loop? Are you missing an indentation?

@blackdarkside8946 4 года назад

Its inside the Loop. I get for each game (it played) the correct output, but its printing it after it finished calculating

@zachm9705 4 года назад

Phil I'm still new to this youtube content creation. Would love to collaborate with you one day once I refine my pitch. I study data science and machine learning at Lambda School. I can talk about my experience there.

@outofbody4788 4 года назад

Doesn't your epsilon decay to quickly? You reach minimum epsilon after 11 episodes

@billallen9251 Год назад

I changed mine to 1e-4 and it did work better

@shilashm5691 Год назад

@@billallen9251 nope, change the epsilon to decay evry episode, instead of evry step, and try to change his code there is to much of coupling in the code, and unwanted components, which improves the readability also.