AI learns GeoGuessr and plays against pro! 🌎

Traversed

Подписаться 739

Просмотров 20 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

26 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 99

@MysteryPancake Год назад

You can use the street view API to get screenshots much faster, and ensure they connect seamlessly with the FOV set to 90

@sebastianreyes8025 Год назад

I had a similar thought. would be cool to see an update vid if he makes these changes.

@mysteriathe8474 2 года назад

here from rainbolts vid. you’re awesome, can’t wait for the AI to improve.

@antimatter2417 2 года назад

Wow, I'm so happy someone actually took the time to develop that ! I second Stjepan's suggestion of creating a classification model that just outputs a country label. The model would probably end up relying on known metas like car color, antenna, cam gen, but also poles, roadsigns, etc... Then if that works well, you could potentially train different country-specific models for pinpointing the coordinates within each country. That would be a lot of work but I'm guessing it would have a much higher chance of beating pros. Good luck for your project, I'm definitely subscribing in hope to see more in the future :D

@elfrit 2 года назад

You can maybe as suggested first train the model based on countries, and when this works well, you can take one of the last layers as an embedding and start training from there on lat/lon. By not having one model per country you simplify the problem, but probably you will also learn non country specific information such as mountain, sun, terrain which will all help for your overall predictions.

@MysteryPancake Год назад

In the video it uses categorical_crossentropy. This is a loss function for labels, meaning the model already uses labels. There is a comment later explaining how the labels used are geohashes, but often this is not accurate enough to beat pro players

@bwL3 2 года назад

Incredible idea! Will be super interesting to see how this progresses. Good luck with the AI!

@TraversedTV 2 года назад

Thank you!

@Jetpans 2 года назад

Great video! One suggestion, I don't know if it is considered "cheating". But rewarding AI for guessing the country right would be very benificial (even more than distance probably) and could be implemented to the model (not easily though, because of how country borders work).

@TraversedTV 2 года назад

That's an excellent idea - I'll do that!

@mahranislam3554 2 года назад

everyone gangsta until the ai 5k's the dirt

@peorakef Год назад

pro tip: take another image looking downward. may need some more coding but i predict it will greatly and quickly increase accuracy. its thr most important clue in the game.

@EpicVideos2 Год назад

This would GREATLY benefit from attention layers. If you think about how RAINBOLT thinks, he only really cares about certain very specific features, the AI must do the same. I also recommend using a openAI's UNET architecture rather than the basic CNN architecture.

@waxigoxor 2 года назад

This is super fascinating! Can't wait to see how this progresses :D

@TraversedTV 2 года назад

Thank you! I'll probably share an update in 2 weeks or so! 🤖

@havardmj 2 года назад

Very interesting. Thank you for doing this. AI is usually very bad at recognizing where an object ends and another begins, so if it ends up being better than humans, it will probably be through color palette or something like that. That being said, I think it has a major disadvantage in not being able to recognize bollards, signs, road lines and telephone poles. It probably cannot learn what a country is either? But it does have the advantage of long learning time. I'm excited to see what happens!

@TraversedTV 2 года назад

Yeah it doesn't have an understanding of countries. With higher resolutions it could theoretically pick up clues from signs and billboards, but yeah - AI learns much slower than humans, they just get more data to learn from.

@eugenekrabs141 2 года назад

ai usually knows where objects begin and end through the colors, but without human interaction to learn a specific object it will probably need thousands of images of a specific country to start recognising bollards and stuff and even more to know the differences from one country to another when they have similar ones

@loafzero 2 года назад

this is incredibly impressive; awesome work!

@TraversedTV 2 года назад

Thank you!

@wesjeshurun 2 года назад

Very cool project, dude! Appreciate the collab with rainbolt

@TraversedTV 2 года назад

Thank you!

@nocny5546 2 года назад

It will be interesting watching development of this technology.

@jakubsuchanek350 2 года назад

Would try multi-label classification :-) (with separate class sets). People were suggesting classification into countries since that is the most common meta but past that you can say how rich the region is, how dry/wet it is, how mountainous/flat it is, what biome it is, what latitude it is, what biogeographical realm it is... You can definitely extract a lot of these from some rendered maps with a bit of effort, might be possible to get most of them in tabular format. There was a suggestion of doing per country regressive model but I expect there it will just learn that southern half of this country is mountainous etc. and you'll be able to do it more accurately with a multi-label classifier. You can also do the final step of joining the different labels into one still in the model and hence have a weighed ranking that is trained with everything else. E.g. you can have a list of geohashes with a class asigned from all the other categories and based on that it has an edge linking it to the relevant label in each set with one connection and hence one weight per set. Regression will eventually work best but that would probably take orders of magnitude more data than you have? You were mentioning only 14 layers because 40k images isn't enough for more? But since it's very standard images you can just use a pretrained model and only train the last bit? I've trained a resnet50 on a minute of video and it worked better than smaller models...

@TraversedTV 2 года назад

I'll give multi-level classification a try, thank you for all the suggestions. Yeah, I started with regression before I switched to geohash es, but that didn't work well likely because of the limited data.

@mrphotography4596 2 года назад

Great work, here from Rainbolt

@TraversedTV 2 года назад

Thank you! ☺️

@philosophiabme 2 года назад

Nice! I wonder how an adaptation of a more developed image classification model (ResNet, inception, etc.) would do here. Especially if you can keep the imagenet pretrained weights.

@TraversedTV 2 года назад

Yes, I'll eventually try a pre-trained model and fine tune it for geo-location! 🤖👍. Also curious to see the results of that.

@Benw8888 Год назад

You should apply a GAN (generative adversarial network) to your CNN to make it more consistent and robust. This helps it generalize better, and may improve consistency.

@Benw8888 Год назад

Also I recommend increasing the width of your layers, even if you need to remove a layer or two at the end. A small layer can bottleneck the whole model.

@richko 2 года назад

Make the 2nd part possible of the AI vs PRO video! This is cool asf.

@TraversedTV Год назад

Very soon ;)

@machtmonk Год назад

Very cool project! I hope i didn´t miss it in the video, what kind of hardware are you using to do your training on(CPU/GPU)?

@TraversedTV Год назад

Thanks Fabi! I'm used a GPU through the aws infrastructure. To reduce costs I switched to an own GPU now (rtx 3060)

@hungrypumkin8103 2 года назад

Amazing video! I'm really curious why you decided to take on this project. You obviously have a lot of experience with coding, but you definitely seem to have a decent grasp of Geogeussr as well (which is mainly why I ended up here). I'm not a very good coder, but I enjoyed your explanation of your processes and I hope you create some more content in the future!

@TraversedTV 2 года назад

Thank you! I just graduated from my masters and had some free time! I like the game and thought it would be a nice challenge to train a network on it ☺️. I didn't have much experience with Geoguessr but I learned a bit about it on the way. I'll definitely do some more videos on geoguessr or other AI projects!

@sirynka 2 года назад

If geoguesser is taking images from Google Streets View why don't you skip the middleman and go directly to Google for scraping the data? Instead of training the network from scratch isn't it beneficial to take an existing, pretrained, model (like resnet) and go from there? Firstly, you would get a solid architecture. Secondly, it's weights won't be completely random so retraining could take less time. And finally, the idea with guessing the country and than coordinates within it could be quite beneficial (but harder to implement)

@jackvelez5532 Год назад

The google street view api is kinda expensive I think

@michaelwisniewski6047 Год назад

This is amazing. The first time I played Geoguessr about 2 years ago I thought "I wonder if someone has or will create AI for this and I wonder how". Then of course your AI play in the tournaments, etc. I actually first thought you used some brute force method of taking your program through every Google covered street in the world, reducing it for speed and then comparing the game image to your massive database. But that's because I'm no good at AI. I'm glad it works more like a human player works. Anyway, I really enjoyed seeing how you did this. I'm wondering what uses this might have outside the game. I first thought about geolocating for news networks (you know how CNN and others check if images sent to them were actually taken in the claimed location etc), then about military uses, then about police in crime investigations. Of course I understand pictures taken by phone will often not match those taken by Google's street view camera, but perhaps some version could move between sources of image. Anyway, I'm sure some other use can be found. Thank you for sharing your methods!

@TheBen1513 2 года назад

Great video! A few questions about the model: it seems you're doing classification between 1024 classes and I'm assuming each one encodes one specific location. Have you considered training for regression, i.e. to regress the longitude and latitude directly? I feel like the model could perform better if you encoded this spatial dependency in the target variable better. Also, using a more advanced architecture like ResNet would probably boost performance (altough slow down training -- but here you could probably lower the image resolution noticeably without much loss :) ). Anyways, good luck!

@TraversedTV 2 года назад

Yeah, the 1024 classes come from geohashes (a standard for encoding locations in text) with a precision of 2. I tried regression initially, but that was much less accurate. I think because there is no linear relationship but a rather complicated one between coordinates and images/countries. I might try that again someday though. Or at least more classes/locations. ResNet / transfer learning is also on the ToDo list, but I'm currently limited by hardware. Thank you for your comment!!

@CaridorcTergilti 2 года назад

@@TraversedTV I suggest colab pro and kaggle for the hardware

@dubot4076 2 года назад

Cool idea, gl!

@TraversedTV 2 года назад

Thanks! ☺️🤖

@maciej5434 2 года назад

idk if its possible but if you could hard-code some tips that are super typically for different countries (roadlines, camera generation etc) it should be easier and more efficient for AI to learn. Now, as it uses the general-look method it's easy to do a big mistake because for example british colombia looks very similar to north norway. Great project anyway, excited to see how it would play after few months

@TraversedTV 2 года назад

It's usually rather difficult and not always beneficial to hardcode any hints, as the system should eventually learn those by itself. However, I can provide some more information than just the position to the system. Then it has more information to learn with. Good thought though!

@livedandletdie 2 года назад

All it needs to know is this forest looks polish, telephone poles, roads, this dirt looks Brazilian...

@CaridorcTergilti 2 года назад

What about training from a pretrained model such as efficient net? finetune it on this data, then run it 5 times with square images and make an average on these predictions. Also, will you kindly share the dataset please?

@chen2613 2 года назад

This is actually insanely cool and I've recently been super into ML and CNN's. One question I have is how you got your current dataset? Did you have a bot play through thousands of games to collect your own dataset? Also how do you plan on increasing your data for this model? Repeat this same process? Is there a better way? And lastly (this is a bit of a noob question) but how do you recognize that the inaccuracies from this model is resulting from a lack of data vs. the model just not working well? In that case how many thousands of images would you reckon would produce a reasonably working model? 100,000? 1 million? more?

@chen2613 2 года назад

Also a suggestion: Have you considered using cloud computing (ie. aws or azure) to bypass hardware limitations? If your worried about costs I'm sure that you could set up an patreon or something and many AI or geoguesser enthusiasts would gladly fund your cloud computing,

@TraversedTV 2 года назад

The current data set is roughly 40.000 images. Im defenetly planning to increase the data. Data augmentation (rotation etc) also help to reduce the data requirements. No idea how many for a very good model, but so far it increased significant with more data.

@TraversedTV 2 года назад

I'm actually using aws already, so it's boils down to the costs. I might consider patreon! Good idea 👍!

@dev_mind 2 года назад

@@TraversedTV and the patrons are then allowed to play against the AI.

@shadowobito 2 года назад

@@TraversedTV Hey, great video! I actually had a very similar idea where I used selenium and geopy to make a dataset of 50k geoguesser images and feed it to a convnet, but I think the panorama input format is much more informative. Would it be possible for you to make the panorama data public at some point? Would love to play around with it. Either way I think the vid was inspired me to revisit the project so thank you :)

@misteick 2 года назад

hi, I wonder if you can give out the code with dataset

@TraversedTV 2 года назад

I don't want people to mess with it in competitive mode. And for the dataset, there are probably some licence issues if I upload it.

@tschupf 2 года назад

Well explained and neatly solved, such a smart idea to just merge all the screenshots together. How many Conv Layers did you use in the end and to you think the Model could improve more with more layers or do you think the limitation is in the training data?

@TraversedTV 2 года назад

Hi Malte, currently 14 conv layers. It's limited by the data atm. I think the layers are sufficient, it could use more filters/kernels per layer though!

@SamExplode Год назад

I love this implementation - have you considered using the browser extension's JavaScript code to monitor/scrape the network traffic and download the panoramic image files when they are fetched by the browser? This could allow you to acquire the training data faster rather than by interacting with the screen and screenshotting? May also provide you with a cleaner, non-overlapping image. I also like the idea of classifying countries first as another commenter suggested. Great project, really enjoy seeing the improvement

@HungrysitesRu Год назад

This is the way

@supremezzzzzzz Год назад

how would you do this exactly ?

@SamExplode Год назад

@@supremezzzzzzz presumably by overloading the XHR request and whenever it's downloading an image that you need, save it locally either with the FileSystem API or by posting it to a local web server

@pm682 6 месяцев назад

That's a good idea but in the context of the game the idea is that both parties get to see "limited information" wouldn't that be considered unfair advantage when playing against a human if the ai gets to see the whole paranomic view but a human can only see by scrolling around? Would be cool to have both options available maybe

@olekosikowicz6382 2 года назад

Wow man great work! I want to help you. Would you be intrested in team up to beat Rain bolt ? 😂 I'd like to develop your ideas event further

@fxcilities2237 2 года назад

if you use selenium with python, you can take screenshots faster

@TraversedTV 2 года назад

I'll check it out! It still has to load the street view image though.

@BenjaminAster 2 года назад

@@TraversedTV Mit Selenium kannst du mehrere Instanzen parallel laufen lassen. Ich kann auch Puppeteer empfehlen, das ist eine Art Selenium aber ohne ein grafisches Chromium und direkt von Google.

@lmerry213 Год назад

Any chance you'd be able to open source some of your code for this? I'm most interested in the frameworky bits for talking to Geoguessr programmatically so I can play with the model part myself. I spent a little bit of time playing with it tonight, took a different approach on the data collection and am trying to use Selenium rather than a browser plugin. I am struggling with getting the camera point of view to pan around so I can try to capture several points of view as you did. I have something working by having it press the arrow key a bunch of times, but it's slow. I suspect you might have been clicking on the compass needle, but in the newer geoguessr UI the compass at the top can't be interacted with. If that's the case, have you found a way to do this with the new UI? Either way, awesome project, thanks for sharing!

@HK-cq6yf 2 года назад

Potential to be the next AlphaGo

@chromatic4573 Год назад

Is this project open source? As a geoguessr fan and ML researcher, I would like to try improving upon this as a hobbyist project. Pretty sure that using semi-supervised learning would instantly yield significantly improved results given the currently very limited dataset.

@kedi7235 2 года назад

Great Video! Which Browser Plugin is used for taking the images?

@TraversedTV 2 года назад

Thanks Kedi, I wrote the plugin myself.

@kedi7235 2 года назад

@@TraversedTV Oh okay, thanks for replying!

@jurekandrzejewski6671 Год назад

Have you thought of making it open-source?

@sebbog 2 года назад

try this with country streak (its a gamemode on geoguessr)

@TraversedTV 2 года назад

I'll give it a try 😁

@Gryffins90 2 года назад

Good work! Do you plan to release the source code?

@TraversedTV 2 года назад

Currently not, I don't want people to use it in competitive. ;)

@Gryffins90 2 года назад

@@TraversedTV ah yes, I was not even thinking about that haha. I would like to do something similar but testing different architecture.

@jakaahlin2676 2 года назад

Does the plugin follow geoguessrs terms and services? Im asking because each time you create a new game an API call is made which costs money for the game.

@RubyPiec Год назад

will it ever be possible to play against the bot? i'd really love to

@c_s3378 Год назад

I swear in 20 years all the aimbots and cheats are gonna be ran by ai

@graz2675 2 года назад

I hope people don't use to cheat...

@TraversedTV 2 года назад

I hope so too! But that's why I don't make it available to the public. People would need to redo it by themselves.

@langlitz Год назад

i am subscriber #420

@joscha1248 2 года назад

how did you do that ?

@lastnamefirstname8655 2 года назад

nice ai.

@TraversedTV 2 года назад

Thanks!

@lastnamefirstname8655 2 года назад

@@TraversedTV 👍

@InsertYourself1 Год назад

i fell asleep listening ngl

@einemailadressenbesitzerei8816 Год назад

If you would include the vehicle that made the fotos you can guess the country pretty easy. Second step country specific features.

@einemailadressenbesitzerei8816 Год назад

Do you share the source code on github or something similar? Did you also scraped the lng/lat data from the network request when downloading the corresponding panorama to the server or did you do it inside the chrome-extension and how?