We train an Artificial Intelligence with Reinforcement Learning to play the game Trackmania Nations Forever, and post videos showcasing the progressive improvement of our A.I. This channel is a collaboration between pb4 (github.com/pb4git) and Agade (github.com/Agade09).
You can contact us at the address pb4videos (at) gmail.com, via our github, or on Discord (server: discord.gg/tD4rarRYpj and channel: discord.com/channels/847108820479770686/1150816026028675133).
I heard there is a method that GPT trains another AI. I don't know if it's possible to do in this case but it would be fun to see an AI train AI from scratch
It actually is already doing a small one in the E03 run. At this part of the run: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-GFOTtl4LbBY.html But the AI will have to be improved for it to find this technique more broadly on more maps.
how about implementing that the AI can reset the run, so it can learn to recognize the difficult part of maps. Like a human trying again and learning. I mean AI learning to store information about the last run and interpreting it, and not learn by modifying the ai model. Give him a maximum times to reset the run, like 10 Times or else. Otherwise it will kill the overall Computer performance. After the 10 runs, it gets scored and punished or revarded by the best out of ten runs
What if you split the AI in two parts. A Pathfinder that tries finding different routes and give the general "strategy" of the map, while the driver as at the end uses those paths to try and figure out how much they can be optimized, as well as perform the run.
Well, mostly it is A and less I. The potential is massive. Sadly I hope this is just a showcase and never ever find its way into TMNF, which is where human skill is required. We alreadyhave much too much AI on youtube with stupid voiceovers. I will soon start a human league that content is AI free
Is there any way to input past runs from the best human runs? This would theoretically allow the AI to mimic the best run (reducing the initial learning curve) and then optimize it more, just like how humans learn from their own mistakes, we also learn from others mistakes/successes. IDK that much about this stuff but meh, just an idea🤷🏽♂️
Would it not be possible to insert a player driven run into the learning pool of the AI to introduce shortcuts or something like a wallbang? Or would this corrupt the learning pool with the AI trying to go offtrack and wallbang in completely irrelevant places everywhere
It is possible in principle because the algorithm we are using is "off-policy". It can learn by watching someone else play. But we haven't tried. There are practical roadblocks: e.g. currently the algo plays at 20Hz whereas the human we would try to learn from does not follow this format of inputs. And some questions like: how many replays would we need to cause meaningful learning? Would 1 human replay added to the pool be enough?
Did you try training a model with graphic output from the game? I feel like giving AI access to the game engine grants it unfair advantage over human players, which makes it more reasonable for it to compete with TAS. Still great work though
AI vs (non-AI) TAS would be fun to try. How *smart* is the AI if we remove slow human reflexes? Alternatively, you could force the AI to have a lag and time-jitter to simulate human limitations. Can it still be smarter than Wirtual?
Hot take: The results the AI produces are not viable as the AI is basically just doing a TAS-Run, therefor the AI won't ever be better then a human player as the AI can't play the game without making it a TAS-Run While a player-TAS-run might be about bruteforcing the minute situation in a constant back and forth until they reach peak result, the AI is driving every single iteration of the minute advancement over the finishing line. Humans kind of cut out the wasted time while TASing due to human constraints while the AI doesn't need to as it doesn't have those constraints but in the end, it is still a TAS-run the AI is doing, just with WAY more time wasted ^^
As someone who creates maps, but isn't a stellar driver, one of the issues I run into is building around incorrect racing lines. I may build turns that will be taken a different way by a better player resulting in far more speed than expected on the exit. This causes the flow to be completely wrong and the maps to break. In these times I often wish I could have access to more strong players to test my map so I could build around these issues. With this AI, theoretically I could have it run on the map instead and learn from how it drives my maps. I could then make improvements and increase my map quality. Amazing work and excited for the future
First I know I complained a bit, but I did like and subscribe so I hope we hear more about how your AI does, I can't drive those races though I did try a few as long as coming in like 10000 place or worse is an achievement :) Later.
OK, very interesting. I have to ask, why are you trailing so much like even in the last race basically right from the start you are already losing and nothing has happened, it seems you must be missing something to always come out at the start behind and then you have to speed up. Also when you know a shortcut exists and that using it on lap two is bad because you will lose time why can't you teach the ai to use it on lap three because now it would be faster, maybe, must be since you ended up behind, I get how wheels on the track accelerate the car and wheels in the air do nothing, but still, in the race you won, you still had to come from behind when you could have been a touch slower but farther down the track and had a bigger lead on the human. I know this is teaching it how to measure best performance, but I bet you could teach it, maybe it needs to try jumps that it will fail then it could rank the jump undoable at speed x so it doesn't try and jump and when it has more speed it considers it and learns it's still slower so it marks the jump at that speed as also a bad choice and on lap three when it's going even faster it tries again and sees the better result. I get that this might be hard in your algorithm or web of choices but you seem smarter than me and I would just like to see the AI really kick human asses even harder, though I do have to dislike the use of the AI your using :)
Wait you're saying an AI can't learn as fast as a human, that seems like just maybe your code is fucked in the head, can't the ai review the track design. Then guestimate the best speed he could drive through the area then figure out how good a bounce would be at let's say 20 points around the curve and just try those 20 spots, and see which one is the best then try 20 spots around the known current best and try again, if none are better is has the best one it can know, I mean how does the human player hit every wall ram/crash drive out at speed, for every corner, obviously an AI should be able to try and test these 24 hours a day and find the route?
Wonder how hard it would be to train it to get vertical setups by itself and start noseboosting all over the place. PS I think the setup is far harder to train than the noseboosts themselves
In the video you give rewards to the agent for following the line of the course. How do you initialize the line? Is it just defined by the track itself or do you hardcode a line for every track yourself? If you do hardcode it, how do you hardcode it? Then how do you assign the reward? Is it just based on distance to the line? Because wouldnt that mean that if it moves backward that it still gets a reward for being close to the line. How do you make sure it moves forward with your reward function?
it uses existing replay files and extracts the positions the player had from the replay file and creates the virtual checkpoints with it. its not rewarded based on how close it is to the line, it only checks if it still is within a certain reach of the line
what if the progress line was also part of the ai network? now that the distance to the line also has an impact on the reward, maybe giving the ai the ability to modify the line will make it possible for even better optimized race lines.