AI Learns PvP in Old School RuneScape (Reinforcement Learning)

Подписаться 95

Просмотров 25 тыс.

50% 1

This showcases how a neural network was trained to PvP in Old School RuneScape (osrs) using deep reinforcement learning (machine learning).
Check out the code on GitHub to train and test your own models on a simulated version of the game: github.com/Naton1/osrs-pvp-re....
0:00 - Intro
0:52 - Overview
1:18 - Real Game Impact
2:05 - Observations & Actions
2:38 - Rewards
3:50 - Training Simulation
5:08 - Training Session Statistics
6:29 - Network Architecture (High-Level)
6:44 - Extra Technical Details
7:26 - How To Use
10:29 - Outro
10:45 - Gameplay Footage

Опубликовано:

24 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 70

@dexthefish96 9 дней назад

Thanks for sharing. Interesting project despite the potential for abuse. Imagine if Jagex introduced NPCs with this behaviour!

@EdanMeyer 4 месяца назад

This is great, I imagine the custom action space you define is important to get this working so well. At the beginning of the project did you start with more complex action spaces (e.g. click inventory slot x, left-click opponent)? I'm curious to know how the tradeoff of complexity vs. performance played out

@Naton1 4 месяца назад

Thanks! I think simplifying the action space helped a lot. The more you can simplify your problem, the easier it tends to be to learn. It’d be interesting to explore with pixel based actions but I think it’d be incredibly challenging to learn since the action space size would massively increase. I didn’t do much experimentation with lower level actions like that since I don’t think it’d give value to be able to say left click opponent or click inventory slot over the higher level actions defined here.

@SchlakRS 10 дней назад

Do you have the model checkpoints saved between the 50-95%? Would be a great tool for people to practice and level up their pvp skills by practicing against the different model checkpoints.

@Naton1 8 дней назад

Yeah - running a training job by default saves all model checkpoints. There's also a built-in tool to generate elo ratings for each model checkpoint so you can compare how good they are relative to each other. Would be interesting to explore ways to practice PvP in this way!

@jacksonwaschura3549 4 месяца назад

Very cool project. This is something I've always wanted to try. Thanks for sharing!

@Poibos 2 месяца назад

Excellent video and really cool project

@badnam3189 4 месяца назад

Very impressive! Did you do any attempts at using screen image as observations?

@Naton1 4 месяца назад

I didn't experiment with this for a few reasons. Primarily, my goal was to train the best model possible - without any restrictions - and feeding it the observations directly would perform better then having to learn them from pixels. Secondly, it would require a significantly more compute to render the game clients for every agent in the training simulation. At times during training there were over 1000 agents playing across multiple simulations, and the compute required for this would have been far more than accessible to me. I do think it could be interesting to explore. Perhaps couple the screen image as observations with using mouse/keyboard actions too to have it learn a more 'human' experience.

@tivia4929 4 месяца назад

That's my Naton!

@howuhh8960 4 месяца назад

This is so cool!

@iaoys 3 месяца назад

Hey I'd love to hear if you had any issues with defining your rewards. Did it take a while for you to come across a set of rewards that "work"? Or did it sort of just work the first try? In addition I'd love a further video explaining the rational behind your definition of the observational state space and action space.

@Naton1 3 месяца назад

For the rewards, I started with a simple win/loss reward at first, and nothing else. It kind of worked, but it wasn't great - I think it had trouble figuring out specifically which actions were good and which were bad. The next obvious reward idea was a damage reward/penalty which I found significantly sped up learning speed. I also added in those prayer rewards which I think helped a bit but I honestly didn't do a lot of experimentation with that. It also took awhile to come up with good scales for each reward, at first the damage rewards were too low, then I had them too high, and finally found a sweet spot that seemed to work well for this. In short, lots of experimentation + intuition. I could go on and on about the observation/action space and why I chose those haha. A video or something more in-depth on that could be interesting!

@iaoys 3 месяца назад

@@Naton1 I would personally love to see that video! Do you also have a discord by any chance that you wouldn't mind me adding? I find RL to be so challenging because there could just be so many things that could go wrong, so always love to connect with people who have done something in this space!

@Naton1 3 месяца назад

I do - discord is naton2.

@mike-ny1zg 3 месяца назад

This is fascinating! If the model only has a time-horizon of one tick, does it have much predictive power, or is it mostly reactive? Also, how do action combos work? For example I noticed “food” and “karambwan” are defined as separate actions, so the model must be aware that multiple actions can be performed in the same tick. The model must assign scores to *sets* of actions? And the require_all/require_none configs define mutually exclusive actions?

@Naton1 3 месяца назад

Yes it effectively only has a time-horizon of one tick - almost all game state is available in the latest tick. The main useful information would be opponent attack styles/overhead prayer usage to help with prediction as you mention. The help give the model the ability to predict, a set of observations are given for the % usage of each attack style/overhead prayer for the player/target throughout the whole fight, and for the last 5 game ticks. This, combined with equipment bonuses, is relatively effective in helping the model predict attacks/overheads. However, it could be interesting to test with longer memory in various ways to improve prediction capability. "action combos" work by each action type being a separate action head in a multi-discrete action space. So every game tick it can pick between like 10 different actions, and can use multiple of these at once (such as eating and karambwan). The require_all/require_none configs can define mutually exclusive actions as you mention, but are generally used for action parameterization - example being if using a magic attack, select the spell type to use.

@ArcadeZMC 4 месяца назад

this is, in my opinion, the best explanation of how reinforcement learning works that doesn't rely on mathematical terminology! also great video and project! really interesting to see (:

@peterwhidden 4 месяца назад

Really nice work! Awesome use of RSPS

@Naton1 4 месяца назад

Thank you man! Your project with Pokémon was awesome too! I watched your video awhile back and it inspired me to use a novelty reward here too!

@JohnSmith-yr4vi 19 дней назад

Amazing project, just seeing this amazing piece of content now. Curious as to what learning resources you used along your journey and what background you had coming into this project? Seems like you have some professional machine learning experience as well as a great understanding of Runescape's systems as well. I just started machine learning programming at my software engineering job and would love to one day attain a level of knowledge required to make a project like this from scratch one day! Amazing!

@Naton1 15 дней назад

Thanks! To be honest, I don't have professional machine learning experience. This is all self-taught (and a few courses from back in college). I had always been interested in machine learning so I spent a ton of time learning how to use it in a project like this. I do work professionally as a software engineer though. The best way to learn in my opinion (and everyone learns different - this is just how I learn) is to just apply it to a project that you're interested in. You can read a ton of books and take a ton of courses but won't truly learn it until you apply it to a real project. I do have a lot of experience with RuneScape though - been playing on and off since 2005 - and have written my own "third-party" client before.

@JrViiYt 3 месяца назад

Genuinely an amazing project, as someone who only knows some python, what would you reccomend as resources to improving? I'd love to reach a point where I'm able to create things like this. One thought I had watching this as well from the game perspective was switching overheads to smite on opponents offtick, amps up the war of attrition to a point that humans would be near inable. Great video :)

@Naton1 3 месяца назад

Thank you! To be honest, just working on a bunch of personal projects in topics that you find interesting is the best way to learn in my opinion. If you're not interested, you won't want to do it. It actually does have the ability to do this! The agent can choose to use smite instead of overhead prayers when it's attacking the next tick. Smite unfortunately isn't available in LMS so I disabled it for the model trained throughout this video, but support for this has already been added. Can be nice in places like PvP Arena.

@Rainingson 3 месяца назад

What type of gpus were you running this on, or were you using cpu only? This is an awesome project, very impressive.

@Naton1 3 месяца назад

Thanks! This was mostly CPU-bound for rollout collection, but I did use a GPU as well. CPU: Ryzen 3950x GPU: RTX 4060 TI 16GB VRAM

@chairwood 4 месяца назад

Extremely cool :)

@yomusiko 3 месяца назад

Nice work! Is there a way to use this on my own rsps? Would be awesome to have people try to beat it for a tournament with rewards!

@Naton1 3 месяца назад

That would be really cool! It definitely could be adapted to other rsps’s, but would require a non-trivial amount of work to do so. You’d have to essentially re-create the environment and provide all inputs (definitely doable, but would be an effort).

@PurpleGod 2 месяца назад

super interesting!

@CHRISTICAUTION Месяц назад

So cool!! Can you make a video where you go through the technical details with all the tricks you used to make RL work? Like which policy, do you use optimistic exploration or did you craft some nifty things yourself? Am absolutely thrilled to learn more without having to read the code 😊

@Eyedwiz 3 месяца назад

Thanks for sharing and masterfully done - I'm currently studying robotics as part of my postgraduate in AI. I was wondering if someone had managed to simulate rs for RL!

@mucahiddemiry5258 3 месяца назад

So cool! Is there a way to not play against the script but to be the script user in elvarg?

@Naton1 3 месяца назад

Good question - there is! I've added a command you can type in-game to make the agent 'play' in the logged in account via ::enableagent. There isn't much documentation on this, but I'll link the code. An example would be ::enableagent model=FineTunedNh envParams=accountBuild=PURE,randomizeGear=true. The main caveat here is the command has to select the account build/gear so you can't use your own gear setup (as how it's implemented right now). github.com/Naton1/osrs-pvp-reinforcement-learning/blob/master/simulation-rsps/ElvargServer/src/main/java/com/github/naton1/rl/command/EnableAgentCommand.java

@mucahiddemiry5258 3 месяца назад

This is amazing. Thanks!@@Naton1

@mucahiddemiry5258 3 месяца назад

So I just tried what you said. I followed the steps in the video. Loaded up the pure gear setup, walked up to the bot and typed: ::enableagent model=FineTunedNh envParams=accountBuild=PURE, randomizeGear=true. I got a message that the agent is enabled. But nothing happened. What am I doing wrong?@@Naton1

@lazaraslong 4 месяца назад

nice!

@Pinkgloves322 3 месяца назад

Don’t lie and say ur not giving the script out because 99% you are to friends or high paying people. You’re the guy beezyR or whatever it was. You’ve been seen abusing this in wildy before so dont lie.

@Naton1 3 месяца назад

I can promise I've never given the plugin out to anyone! And I'm not sure who that is

@skrillmurray4317 3 месяца назад

This is sick. You're gonna get big

@sol12498 2 месяца назад

I'm not sure it was the best idea to let people run their own models without having to write a single line of python. Now LMS has 4-5 of them a game. I'm big fan of open source but I think you have an inherent responsibility when you release stuff like this to not allow others to harm the integrity of the game with your work. And this has, and will continue to.

@BigDaddyWes Месяц назад

It doesn't actually matter where you think it came from. This is just inevitable, unfortunately.

@currentcommerce4774 11 дней назад

@@BigDaddyWes venes & pakis would never program something like this, he did the heavy lifting for them.

@l0lan00b3 8 дней назад

@@BigDaddyWesright. This guy was working on this and there definitely at least 10 others doing the same.

@roofyosrs3513 3 месяца назад

Hi, I worked on some osrs computer vision personal project, don't have experience in reinforcement learning tho, do you use any sort of computer vision? Or is there code injections? Because I see your AI switching prayer without switching to prayer tab, how does it know the position of potions etc??? I would love to know thx

@Naton1 3 месяца назад

Yeah it hooks directly into the game client to perform higher level actions. The agent will choose things like use protect from melee, drink a combat potion, and eat food.

@roofyosrs3513 3 месяца назад

@@Naton1thank you for the reply, do you have a discord bud? i have few questions, would appreciate if you can get in touch with me, not going to ask you about your plug in or anything, I just wanna develop something specific but without having access to the client, I am familiar with pytorch computer visions but I wanna learn some reinforcement learning and got some questions.

@Naton1 3 месяца назад

@@roofyosrs3513 I do, my discord is naton2.

@roofyosrs3513 3 месяца назад

@@Naton1 I added you bro

@OfficialMastercape 2 месяца назад

If all the accounts used were banned, why did you blur the username?

@Naton1 2 месяца назад

The opponents names are blurred

@damendoyeee 3 месяца назад

Interesting

@kell7689 11 дней назад

Such a cool project

@chrisc.6601 4 месяца назад

What's your day job? Do you do any ML? been keeping up with the project since you first posted and blows me away. This is fantastic work.

@Naton1 4 месяца назад

Appreciate it! I’ve been working professionally as a software engineer since I graduated from school two years ago. No ML or anything as a day job.

@sallyjones5231 4 месяца назад

I love this! Need more than a like button, we need a love button.

@iggynub 3 месяца назад

What an incredible project.

@kingcoconut3697 Месяц назад

Can this work for basic woodcutting skills?

@Naton1 Месяц назад

Woodcutting is so simple it's not too helpful to apply this kind of thing there. The best actions can be computed through simple logic and no machine learning is needed.