Тёмный

Train AI to Beat Super Mario Bros! || Reinforcement Learning Completely from Scratch 

Sourish Kundu
Подписаться 1,6 тыс.
Просмотров 13 тыс.
50% 1

Today we'll be implementing a Reinforcement Learning algorithm named the Double Deep Q Network algorithm. A lot of other videos will use a library like Stable Baselines, however, today we'll be building this completely from scratch. It'll be used to train the computer to play Super Mario Bros on the NES! This is a tutorial aimed at people that have a base level understanding of ML, but not necessarily reinforcement learning. Also, it's perfect if you're looking for a personal project to add to your resume that can be completed in a weekend.
Additionally, if you don't have the resources to train this locally, I highly recommend checking out Google Colab Notebooks!
This is my first ever RU-vid video and I've been really excited to share this with you guys! If there are any questions or if anyone has any tips/advice, please don't hesitate to comment down below!
00:00 Demo & Intro
03:02 Key Reinforcement Learning Vocabulary
07:47 Epsilon-Greedy Approach
09:32 Replay Buffer
10:20 Action-Value Function Intuition
15:19 The DDQN Algorithm
18:39 DDQN Pseudocode
19:39 Implementation in Code
30:21 The AI Beats the Level!
30:56 Conclusion
SOURCE CODE
github.com/Sourish07/Super-Ma...
PAPERS USED AS REFERENCE
Human-level control through deep reinforcement learning
www.nature.com/articles/natur...
Deep Reinforcement Learning with Double Q-learning
arxiv.org/pdf/1509.06461.pdf
DOCUMENTATION
PyTorch Documentation
pytorch.org/tutorials/interme...
pytorch.org/rl/reference/data...
Gymnasium Documentation
gymnasium.farama.org/index.html
gymnasium.farama.org/api/wrap...
TEXTBOOKS
Learning Deep Learning by Magnus Ekman
www.nvidia.com/en-us/training...
Reinforcement Learning, Second Edition by Sutton, Barto
mitpress.mit.edu/978026203924...
OTHER
CNN Explainer
poloclub.github.io/cnn-explai...
Introducing ChatGPT
openai.com/blog/chatgpt
All content on this channel is produced by and is the intellectual property of Sourish Kundu LLC.

Опубликовано:

 

12 май 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 112   
@RaunakPandey
@RaunakPandey 7 месяцев назад
Great video! I like how Sourish made complex topics seem so simple. Can’t wait for more videos to come
@sourishk07
@sourishk07 7 месяцев назад
Thanks Raunak! More to come, I promise!
@bordplate.
@bordplate. 5 месяцев назад
I've watched a bunch of videos on DQN implementation the last days, and only after having gone through a bunch of trouble to implement mine do I see your video. This is hands down the best video on this topic I've seen. Great work!
@sourishk07
@sourishk07 5 месяцев назад
Thank you for those kind words! I'm glad it was helpful. Feel free to share this with any friends that you think may be interested as well!
@akshaynaik4197
@akshaynaik4197 7 месяцев назад
This is awesome, Sourish! I love the animations! Explanations were incredibly thorough and well put. I can't wait to see what you do next!
@sourishk07
@sourishk07 7 месяцев назад
Thanks Akshay! I really appreciate it
@Chak29
@Chak29 7 месяцев назад
This is awesome, could not have ever guessed it's your first video! Can't wait for the next video!!
@sourishk07
@sourishk07 7 месяцев назад
Thank you so much! Next video is in the works right now :)
@archansen8084
@archansen8084 7 месяцев назад
Super cool video! Can’t wait to see what comes next from the channel!
@sourishk07
@sourishk07 7 месяцев назад
Thank you Archan! I promise you won't be disappointed :)
@bhaskarmondal7461
@bhaskarmondal7461 Месяц назад
such an awesome tutorial. I have always loved video games and loved ML since got to know about it. This is project helped me a lot with Reinforcement Learning Concepts and at last I got to combine my two favorite things together !
@sourishk07
@sourishk07 Месяц назад
I'm really glad to hear that! Don't worry, I have more RL content in the pipeline!
@bhaskarmondal7461
@bhaskarmondal7461 Месяц назад
that's great, we are all waiting@@sourishk07
@saanvim1788
@saanvim1788 7 месяцев назад
This is so cool and helpful Sourish ! Super interesting watch
@sourishk07
@sourishk07 7 месяцев назад
I appreciate it Saanvi! I'm glad you enjoyed it
@SybilGrace
@SybilGrace 3 месяца назад
Great job. Looking forward to watching more videos from you. Machine learning is so cool.
@sourishk07
@sourishk07 3 месяца назад
Thank you so much! And yes I agree!
@mayankgarg9728
@mayankgarg9728 5 месяцев назад
Very smooth and hierarchical way of explaining. Should create more such videos.
@sourishk07
@sourishk07 5 месяцев назад
Thank you! Don't worry, we got a couple more in the pipeline!
@biancaturman8917
@biancaturman8917 7 месяцев назад
This is amazing!!!
@sourishk07
@sourishk07 7 месяцев назад
Thank you Bianca! Stay tuned for more videos coming soon
@Spideyy2099
@Spideyy2099 5 месяцев назад
Please make more videos like this! Everyone else like and share this!! Its simple and clear with lots of education. Plus you can see all the code at once and not have to jump around the video! Gem of a coding video!
@sourishk07
@sourishk07 5 месяцев назад
Thank you so much for the kind words! Don’t worry we have some more in the pipeline :)
@_kumar06
@_kumar06 3 месяца назад
Best RU-vid channel for learning deep learning. Please make more videos like this.
@sourishk07
@sourishk07 3 месяца назад
Thank you so much! Don't worry, I have many more machine learning topics in the works which I'm really excited to share with you guys!
@suryaraghavendran3627
@suryaraghavendran3627 7 месяцев назад
Fantastic video!
@sourishk07
@sourishk07 7 месяцев назад
Thanks Surya! Excited to produce more videos for you
@ArtOfTheProblem
@ArtOfTheProblem 3 месяца назад
really nice work
@user-wm6fn5bb3k
@user-wm6fn5bb3k 2 месяца назад
The best channel. It helped me a lot!
@sourishk07
@sourishk07 2 месяца назад
Thank you for this kind words and I’m glad I was able to help! Looking forward to sharing my next RL video with you!
@hakan6449
@hakan6449 2 месяца назад
Most simple yet excellent video on RL. Great Work!. Maybe in the next video you can implement it without a ready environment like gym.
@sourishk07
@sourishk07 2 месяца назад
Hi! I'm glad you liked the video! Don't worry, I do plan on making a video about custom environments soon! Stay tuned
2 месяца назад
​@@sourishk07That would be nice to see.
@mridinithippisetty8867
@mridinithippisetty8867 6 месяцев назад
Awesome vid!
@sourishk07
@sourishk07 6 месяцев назад
Thanks for the visit Mridini!
@mariorodriguez8854
@mariorodriguez8854 Месяц назад
keep this way to teach, thanks!
@sourishk07
@sourishk07 26 дней назад
You bet! Thanks for watching!
@evangill6484
@evangill6484 20 дней назад
Have you by any chance done or found any follow-up types of projects that incorporate multiprocessing? I was curious if there's a way to speed up the training by doing a few simulations at the same time and communicating results to the agent. Any thoughts or ideas?
@sourishk07
@sourishk07 20 дней назад
Hi! That’s a great question. I’m actually working on running parallel environments right now so it’s a crazy coincidence that you asked. Stay tuned :)
@joyantamitra8186
@joyantamitra8186 4 месяца назад
Please publish more. I have learnt a lot
@sourishk07
@sourishk07 4 месяца назад
Thank you! I'm glad you enjoyed the video. And don't worry there are more videos like this one in the pipeline! Feel free to check out my other videos on my channel in the meantime!
@DarkSciencez
@DarkSciencez 6 дней назад
Nice work! Is there a reason to use epsilon greedy over upper confidence bound / Thompson sampling? I'm trying to do something similar with a game called "I wanna be the boshy" and UCB just feels better to me
@sourishk07
@sourishk07 6 дней назад
That sounds like a really fun project! To be honest, I hadn't considered those other methods. I used epsilon-greedy because that's what the paper used. If you end up trying those other methods, I'd love to learn about your results. Good luck!
@sohamkundu9685
@sohamkundu9685 5 месяцев назад
Great video!
@sourishk07
@sourishk07 5 месяцев назад
Thanks for the visit
@Trails_in_the_Sky
@Trails_in_the_Sky 7 месяцев назад
What version of python did you use? I'm trying to run the repo code but I can't install the requirements at all.
@sourishk07
@sourishk07 7 месяцев назад
Thank you for asking! I apologize for not providing clearer instructions. I use Python 3.10.12 and I've updated the repo with some more details about my installation process.
@oleander1956
@oleander1956 4 месяца назад
Hey Sourish, ive watched a couple of other videos where some of the versions were outdated and so a lot of things were broken and i couldnt fix them or understand the errors cusse i truly dont know what im doing. I just started your video and you mentioned how you understood every line of code as a sophomore . I want that . What resources should i follow in order to fully understand what im doing in this game? Thank you. Great content
@sourishk07
@sourishk07 4 месяца назад
Hi! Thanks for the comment! My goal for this video was to serve as the introduction to the DDQN algorithm such that you can understand the code after watching it. If there are any confusing parts in the video or code that I can clarify, please let me know! But if you're talking more generally about ML, then I highly recommend Andrew Ng's "Deep Learning Specialization" course on Coursera. It is literally the one-stop shop for Machine Learning and has been the foundation for all of my exploration in the ML space. I also recommend Learning Deep Learning by Magnus Ekman! It's a great book to get more familiar with ML.
@gamermanv
@gamermanv 6 месяцев назад
dope video! If I had to make a graph showcasing the rewards vs episodes, how would i go about doing it?
@sourishk07
@sourishk07 6 месяцев назад
Great question! I would import matplotlib at the top. And then before training starts, I would create an empty list. For each episode, set a reward counter to 0 and accumulate the rewards gained while playing the episode. After the episode ends, append the total reward to the list. Let me know if something doesn't make sense! Pseudocode: import matplotlib.pyplot as plt rewards = [ ] for i in range(NUM_EPISODES): reward_counter = 0 while not done: reward = env.step(...) reward_counter += reward rewards.append(reward_counter) plt.plot(rewards)
@gamermanv
@gamermanv 6 месяцев назад
@@sourishk07 that makes perfect sense! Thank you for the explanation!
@tanushjadhav6814
@tanushjadhav6814 9 дней назад
Hey, Great Video!!! I had a question, i cloned your github repo, how do i run the code to see the output of Mario being played autonomously?
@sourishk07
@sourishk07 8 дней назад
Thank you so much! The README has some instructions on how to setup your virtual environment. And then you'll need to start training with main.py! Sorry I don't have a checkpoint for you to use!
@ArtOfTheProblem
@ArtOfTheProblem 3 месяца назад
would be cool to make a super simple interactive environment of this, something that kids or non-experts could play with, to see playing results. have you seen that?
@sourishk07
@sourishk07 3 месяца назад
I'm glad you enjoyed the video! When you say interactive environment that someone could play with to see playing results, what exactly do you mean? The gym_super_mario_bros library already supports direct keyboard inputs. Also, if you meant a way someone could see the latest model weights without training, then I would recommend looking at how to save & load checkpoints. In the repo, main.py has some code to save checkpoints during training. Let me know if you meant something else though!
@ArtOfTheProblem
@ArtOfTheProblem 3 месяца назад
Good question. I was thinking a demo where a non expert could "see the magic in action" so I'm thinking it would be something like...A. Showing an initially unlearned behaviour (random) as well as time unfolding, B. allowing users to adjust key learning parameters (what would you say would be most important, gamma/discount factor, learning rate, what else?) and maybe C. showing some visual of how the network weights are updating (perhaps just a pattern/visual) mainly to show 'change' taking place....curious what you think would be most useful, i'm thinking as a demo for non experts. @@sourishk07
@U_Lambda
@U_Lambda 2 месяца назад
Question, in your setup, where are you defining/tweaking your reward function? Btw great video!
@sourishk07
@sourishk07 2 месяца назад
Thanks so much for watching! And that's a good question; Maybe I should've specified more clearly in the video, but the reward function is already handled by the gym_super_mario_bros package. Check out the 'Reward Function' section in the gym_super_mario_bros documentation! pypi.org/project/gym-super-mario-bros/
@theashbot4097
@theashbot4097 5 месяцев назад
This is soo cool! You put a lot of effort into this video! I have use Reinforcement Learning before in the unity game engine, and I have made a tutorial on how to use it in unity, but I have never seen it been used outside of unity, and it looks very cool! I want to take my Reinforcement Learning knowledge and take it out of unity, and start to train a robot in real life. The robot is not coded python, it is coded in java. I have not starting researching it yet so I do not know how hard it will be but I was wondering if you have any knowledge on how to connect a Neural Network yo java. If you do not that is just fine. I do not think it should be too hard to research.
@sourishk07
@sourishk07 5 месяцев назад
Thanks for watching and I’m glad you enjoyed it! With regards to your robot, that sounds like a really cool project! I’m not exactly sure how best to connect the two. Maybe you can create a simple API in Python that your Java code can call?
@theashbot4097
@theashbot4097 5 месяцев назад
@@sourishk07Thank you for making this! Ya that is what I was thinking. I am first going to try it in a C# project because I am more familiar with it. Then later move on to java.
@kag46
@kag46 3 месяца назад
Hey Sourish, thanks for a great video! I can see that my 7 y.o. machine not utilized by 100%, is it possible to increase game engine speed? it looks like it plays faster than real time, but still slower than machine can and I believe it can do it better at learn phase! xD Initially I thought setting DISPLAY = False will speed it up, but seems not..
@sourishk07
@sourishk07 3 месяца назад
Don't worry, that is also a concern of mine! That's why I'm working on a follow up video where I try some hyperparameter tuning and parallelize multiple instances of the environment. Running multiple environments at the same time should especially help in maximizing CPU usage, while the GPU can continue handling the neural networks.
@brianferrell9454
@brianferrell9454 4 месяца назад
Not sure why people dislike this video, it was awesome!
@sourishk07
@sourishk07 4 месяца назад
Thank you for those kind words! I'm glad you enjoyed it
@junyehu2315
@junyehu2315 5 месяцев назад
Is torchrl has to be installed by torch 2.1.0++? when I use pip install torchrl, it uninstall my torch 1.13 and install 2.1.1
@sourishk07
@sourishk07 5 месяцев назад
I actually updated the requirements.txt today! It uses PyTorch v2.1.1 and torchrl v0.2.1. If you have to stick to PyTorch v1.13 then I recommend installing a specific torchrl version that is compatible with v1.13 (pip install torchrl==x.xx) You may need to consult the PyTorch or torchrl documentation for the right version numbers
@fruitpnchsmuraiG
@fruitpnchsmuraiG 6 месяцев назад
would i be able to run the repo on my local system even if it it lacks the harware requirements like a very solid gpu
@sourishk07
@sourishk07 6 месяцев назад
To be honest, it really depends. I trained this code on a RTX 3080 and it took me ~48 hours, but I didn't really dive too deep into the hyperparameter optimizations. If you feel like your GPU isn't powerful enough, I highly recommend checking out Google Colab Notebooks where you can run a Jupyter Notebook in the cloud that has access to GPUs!
@pgiralt
@pgiralt 5 месяцев назад
The video says it uses Gymnasium, but as far as I can tell, you're using the older Gym and not the newer (maintained) Gymnasium, correct?
@sourishk07
@sourishk07 5 месяцев назад
Yes, you're correct. I apologize for the oversight in the video. gym_super_mario_bros is a pretty old library (last updated in June of 2022) and it hasn't been updated to use the new Gymnasium library. However, the latest version of the older gym library is v0.26.2 so it does include the new, breaking changes that were introduced in v26 Gymnasium API. This is why in the code I have to set the apply_api_compatibility flag to True when making the environment.
@66a2.vijayendarreddy3
@66a2.vijayendarreddy3 Месяц назад
I have a doubt on what should we use as controllers Should those be images as 0-4 png
@sourishk07
@sourishk07 Месяц назад
Regardless of what you name the PNG files for each controller, I recommend using a dictionary that maps each action to its corresponding image. The index value of each action that is available to your agent depends on your chosen action space. In this video, we chose RIGHT_ONLY, but you can select others from gym_super_mario_bros.actions.
@ApexArtistX
@ApexArtistX 5 месяцев назад
What I’m looking for external game bot
@NeuralGamerAI
@NeuralGamerAI 4 месяца назад
I have a problem with the box2d library. I've tried everything but couln't get to work
@sourishk07
@sourishk07 4 месяца назад
The box2d library shouldn't be a dependency for this project! Is your pip somehow trying to install it or were you asking just in general?
@ApexArtistX
@ApexArtistX 5 месяцев назад
can you do a custom environment that plays external web game
@sourishk07
@sourishk07 5 месяцев назад
I've actually received another request for a video on custom environments as well! It's a pretty hard topic to tackle, but it's definitely on my list. I'll keep you posted on when I'm able to create it! Thank you for the request!
@user-kl1yh4ub1o
@user-kl1yh4ub1o 4 месяца назад
that sound greats
@sourishk07
@sourishk07 4 месяца назад
Thank you for watching!
@PriyankaJain-dg8rm
@PriyankaJain-dg8rm 3 месяца назад
Approximately how many episodes did it take for your mario to learn and reach the flag?
@sourishk07
@sourishk07 3 месяца назад
So I trained for 50,000 episodes but I would see the level being completed at around 40,000 albeit pretty inconsistently
@vivekpadman5248
@vivekpadman5248 5 месяцев назад
These algorithms are too old now, we have to have some foundational model in rl soon, but ya gr8 work bro ❤
@sourishk07
@sourishk07 5 месяцев назад
Yeah you're probably right haha. Gotta start from the basics first I suppose. And thank you for the view!
@vivekpadman5248
@vivekpadman5248 5 месяцев назад
@@sourishk07 yup that's right man, all the best 👍 rl is a world of itself
@PriyankaJain-dg8rm
@PriyankaJain-dg8rm 3 месяца назад
how to end the training process?
@sourishk07
@sourishk07 3 месяца назад
So I typically train for a set number of episodes while checkpointing every so often. In this case, I would checkpoint every 5000 episodes. Then once the training finishes, I can load the weights from any of the checkpoints to see how the model performs with that many episodes. For evaluation, I would also lower epsilon to a small number, maybe like 0.1.
@kushagrasingh6361
@kushagrasingh6361 28 дней назад
can someone provide the trained model
@sourishk07
@sourishk07 26 дней назад
Hey! Thanks for the question. I actually have a follow up video planned where I'll be optimizing the training process for this agent. I'll make sure to upload the model then!
@CouchPotator
@CouchPotator 4 месяца назад
huh, Kush Gupta did this and achieved a much better result with far less episodes. I wonder why.
@sourishk07
@sourishk07 4 месяца назад
Hey thanks for pointing that out! I checked out his video and he was using a different algorithm, PPO. That might be one reason. A second is that my hyperparameters might not be the most optimized, which is something I leave to the viewer to experiment with on their own!
@KushGupta1
@KushGupta1 3 месяца назад
The funny thing is I tried using DDQN first but I wasn't able to beat a lot of levels using this approach & it was taking way too long, so I eventually switched to PPO
@sourishk07
@sourishk07 3 месяца назад
@@KushGupta1 Haha well it's reassuring that long training times with DDQN aren't only a problem for me! Btw, I loved your video Kush
@agenticmark
@agenticmark 3 месяца назад
302 subs is crazy low for this content.
@sourishk07
@sourishk07 3 месяца назад
Haha thank you for those kind words! I'm glad you enjoyed the video!
@curtisnewton895
@curtisnewton895 2 месяца назад
try explaining that again to AI beginners and non mathematicians what are you trying to prove here ?
@sourishk07
@sourishk07 2 месяца назад
Hi! Thanks for sharing your concern! I completely understand that this video might be a tad overwhelming for complete beginners and people that aren't too comfortable with math. That's why at the beginning of the video, I specified that my target audience was people that have some basic understanding of the ML fundamentals, which includes the prerequisite calculus. It's my fault if that wasn't clear enough. However, if you have any specific questions about the video or about machine learning in general, please do let me know! I'll either answer them myself or point you to resources that answer the question better. Don't hesitate to ask!
@JarppaGuru
@JarppaGuru 2 месяца назад
oh noh. not intelligence. python and screen recognize do better. FIRST TIME. no need train. why train we just say do this. if there any AI then it should do lot mmore on first time . object is go right. if enemy jump over or on it. if obsticke jump. this is ground this is brick. everything is lava. dont touch lava
@sourishk07
@sourishk07 2 месяца назад
Yes, you make a good point, but imagine having to code up those specific rules for each separate game! The goal with RL is to have a generalized algorithm that can work with any environment! Thank you for watching though!
@mahdi_c200
@mahdi_c200 4 месяца назад
this tutorial does not help beginners , it has a lot of useless theory !! , I know you spent a lot of time to record , edit , and upload this video , but please consider that , for some beginner like me , I cannot figure out a lot of theory out of development environment , I need to know how to handle my code . and I prefer to watch a better tutorial that has more works and coding on it , instead of watching 31 minute useless video
@prabhmeet6842
@prabhmeet6842 4 месяца назад
This is one of the best tutorial It has maths+code so one dosent needs to watch another video for the basics
@sourishk07
@sourishk07 4 месяца назад
Thanks, I'm glad you think so!
@citra5431
@citra5431 3 месяца назад
he doesn't even go that deep into the theory, he gives the perfect combination of both. if you didn't want to watch the theory you could have just gone to the implementation section. but if you don't even know the absolute basics why would you try to code it? you won't understand anything and it will just be a copying exercise
@mahdi_c200
@mahdi_c200 4 месяца назад
this tutorial does not help beginners , it has a lot of useless theory !! , I know you spent a lot of time to record , edit , and upload this video , but please consider that , for some beginner like me , I cannot figure out a lot of theory out of development environment , I need to know how to handle my code . and I prefer to watch a better tutorial that has more works and coding on it , instead of watching 31 minute useless video
@dennisestenson7820
@dennisestenson7820 4 месяца назад
Perhaps this video isn't for you right now.
@revimfadli4666
@revimfadli4666 4 месяца назад
Tbf coding something this complex from scratch isn't exactly beginner stuff either, but xor or fashion mnist, now _that_ is beginner level from scratch project
@sourishk07
@sourishk07 4 месяца назад
Hi, I'm sorry to hear that! So if I understand correctly, the difficult part of this video was getting PyTorch and the python environment set up on your local machine? If so, let me know if I can see if I can make a video that talks about that topic!
@GamingAmbienceLive
@GamingAmbienceLive 2 месяца назад
This video is for nobody its for entertainment there’s no value in it beyond that
@sourishk07
@sourishk07 2 месяца назад
@@GamingAmbienceLive I appreciate the comment and I'm sorry you believe that. If you have any specific problems or concerns about the video, please feel free to let me know!
Далее
Pytorch Deep Reinforcement Learning Tutorial (With Code!)
1:04:52
This is why Deep Learning is really weird.
2:06:38
Просмотров 309 тыс.
Mario Bros is too easy for INSANE AI
10:45
Просмотров 121 тыс.
AI vs. AI in 100m Dash (deep reinforcement learning)
11:13
The Most Important Algorithm in Machine Learning
40:08
Просмотров 181 тыс.