If you like my RU-vid content, you'll love my courses. I'm almost always running a sale, so check the links in the description for the best price! Thanks for your support!
Hi Phil, I just finished your Udemy course. Great content that you created and the code base is very usable. I look forward to more courses from you. And I would love to see you tackle harder algorithms and projects because your explanation of paper and how to implement them is very nice!
project video Table of Contents: 1:39 ReplayBuffer class 10:19 Agent class for memory, network, and hyperparameters 20:20 write main_tf2_dqn_lunar_lander.py
Hi Mr Phil, I have some issues with your code. I used it for CartPole-v0 and FrozenLake-v0 of gym. for cartpole it did very well but for frozenlake was very very weak. I don't know why. BTW, in your code, in the body of build_dqn function, you didn't use input_dims; why?
why is it that you don't update the weights of the deep network? I don't see it in any other algorithms you code. I really enjoy your tutorial. Hope you would answer the question.
Hello @Machine Learning with Phil, There is any possibility to select multiple actions (based on ranking [rank-1: maximum q value, rank2: second max and soo on]) instead of one in action selection model.
Hi Phil, thanks a lot for the video. I completed it along the video but I have a problem with the plotLearning function. It seems it's not included in the utils module anymore. Without the related lines, the code run smoothly but I can't see the learning data, what would you recommend to use instead of the utils module?
Hey Phil Why when using TensorFlow we are passing to "train_on_batch" the states and the updated q_target and in PyTorch, we are passing to Q_eval.loss the *q_target* and the q_eval Thanks
Not sure if because of a tensor flow 2 update, but I keep getting this error : "setting an array element with a sequence. The requested array would exceed the maximum number of dimension of 1." when trying to implement this problem with cart-polev2. I've copied exactly as in the video
You need to either use an older version of gym, or add info to the return of the reset, and add a truncated boolean to the return from the step function.
TF agents is good for plug and play, but you don't really learn how something works unless you code it yourself. I also had issues with installing tf agents, last time I tried. I got stuck in dependency hell. Granted, that was some time ago so it's probably better now, but it left a bad impression.
@@MachineLearningwithPhil I totally agree with you. And don't tell me about the dependency hell haha, tensorflow has soo many problems on that. I learned a lot in this video, and I'll also read DQN's paper. Thanks a lot, +1 sub!
Hi Phil can you please give me some suggestion on how to practice to code like u and what are parameter I have to keep in mind for code in python in deep reinforcement learning like this for any case studies. Your suggestion will help for me to practice.
Great tutorial! How would I go about reconfiguring the main agent file to work in more complex environments? I know that creating algorithms to play extremely complex games like Minecraft, CS: GO, etc. is quite a bit more complicated, but I think it would be cool to at least begin learning what it entails. Any sources you could point me to would be great, especially if you have some tutorials on Udemy on more complex development.
Thanks so much for the coding video. The machine learning part was very useful as an hands-on tutorial for me! As a Vim enthusiast: your Vim skills could be improved, use don't make much use of its power ;)
Awesome tutorials Phil!! I just had one question: If we are calculating the target values fro the q_eval network by doing a second pass from the same network for the next states, isn't the dqn just chasing its own tail? Thanx anyway i love your tutorials so much
Yes, and you can fix it by using a second network that you use for the next states, and copying the weights to it from the q eval network, periodically. It doesn't make much difference in the lunar lander environment ( it's within the run to run variation ) but is quite helpful in more complex environments.
@@MachineLearningwithPhil So with a target network, if the weights are copied from the q eval network, don't both the networks become the same? And then the same problem occurs, the chasing your own tail;even though we are doing a pass with the target network, the q eval network and q target network are the same?? Anyway, Thank you so much!
Hi Phil! Love your tutorials. Just had one question: in the learning function, you've written q_target[batch_index, action] , when the bellman equation says q_target[state, action]. now that would only act on one state, while all the others are diregarded, as there is no loop. I have tried using dummy matrices and also tried your code, but i still dont get it. Please help. Thanx anyway!
Hi Phil. Could you please check why store_transition method in simple_dqn_tf2.py has been defined but never used. Is it responsible to fill in the memory buffer?
I tried inplementing your code with simple environments.No matter what I do, the network learns to always chose action 0. I can change the reward structure as much as I want.
I asked the question on stackexchange but nobody has replied datascience.stackexchange.com/questions/70038/sudden-drop-of-score-in-the-last-few-episodes
I got the problem, that when i start it, it starts the calculation but i dont get any prints while it is computing, so i cant see the progress until the programm is done. Then it prints all at once, what did i wrong? (sorry for my bad english)
Phil I'm still new to this youtube content creation. Would love to collaborate with you one day once I refine my pitch. I study data science and machine learning at Lambda School. I can talk about my experience there.
@@billallen9251 nope, change the epsilon to decay evry episode, instead of evry step, and try to change his code there is to much of coupling in the code, and unwanted components, which improves the readability also.