10:28 that "does anyone know where i can find some" makes me really want to see these ais try to do something together. like just set them loose in survival mode and see what happens
Imagine they do a Prometheus and escape his PC, upload themselves to the internet, manage to build themselves a body then travel to Emergent Gardens house to wake him up
It is still interesting to see these LLMs do their best at understanding how to build in Minecraft, i wonder if more of them ever get image scanning abilities, you could let them take pictures of builds or the environment so they can see what they built and they can auto-correct?
Gemini vision is really good for images and is able to process minecraft screenshots. No way it will get coordinates of missing/wrong block correctly tho. Maybe it is possible to create some resoursepack/mod to write block coordinates on its faces?
@dapperwolf465 no, its the same issue thats always been the case, where porn is always leading the bleeding edge of technology, no matter what field of technology.
I don't have friends like I used to at childhood and I'm hoping AI can play Minecraft with me because I would love having another player even if it's AI build buildings, exploring and doing lot of fun Minecraft stuff, as a 19 years old no peers would like to play Minecraft because it's a childish game they say
@Emergent Garden Recommendation from me: Your prompts have no leverage, what I mean is that the LLM does not handle complex building tasks well because its limited by the single shot answer it needs to generate. Your template for "NewAction" is a great idea, my idea to improve its leverage is to add another template "NewActionPlan" Which it then fills with a list of generated prompts that will then be fed back into itself one after another (kind of like writing a todo list before getting started) My vision for it was kind of like this: -You whisper"Build a bridge for me" - "Okay lets plan this out" *used newActionPlan* - Okay lets see whats first on the todo list... *used actionPlan[0]* Sure I will build the supporting pillars *used newAction* ...etc Getting a shared reference point for superimposed building actions is of course something to consider. Using plans recursively might also be interesting, like making a plan for planning multiple plans for even more abstracted tasks. Some way of sensing the world is possible, maybe you can let it take screenshots of the game and feed the image into some of the multi modal image recognition capable models
4:08 In Llama's defence, I can see how those could be described as one-block columns spaced 'one block apart' as requested at 2:48, it's just included the column itself in the measurement of 'spaced'.
Doing this without computer vision is interesting and really makes me appreciate how incredibly complex the human brain is to be able to do so much in real time. Imagine the resources needed to give a multimodal model with vision/language/action the ability to _play_ in real time, the power requirements, where we can just eat for energy
i think it would be cool if you let every iteration build a skyscraper and add them all to a single city which will then grow with skyscrapers that are slowly getting better so you can see the improvement in one place
Gemini 1.5 is generally available via Vertex AI since a couple of days. You can also create an API key via AI Studio; it's not only their chatbot interface and a little easier to create an account.
I don’t know exactly how your system works but have you tried letting them use something like mathematical curves for building? Like vectors at positions pointing to positions with some formulas on top if required? Another thing you could do to help them out is allow them to write classes per object in a build. I think this would be great for things like columns because they then realise there would be spatial rules like spacing.
Remember when Chess Engines came into existence? Now people held tournaments for Chess Engine AIs to beat each other. Soon, Hunger Games and Bed Wars matches will be played by AIs as well, for our entertainment. With the creative world of Minecraft, god knows what else AI can do in a server... They can achieve greatness, even better than all what we have ever did as a species...
Can't wait for multimodal inputs with this project :) Having the ability to see what they're doing might even mean they don't have to write code to make the actions but rather can respond to the outcomes of their actions
You could probably create a really powerful diffusion model for generating minecraft buildings if you managed to obtain a dataset for it. The blocks would be the equivalent of pixels, and the noise would be in the form of random blocks. A dataset could probably be gathered from those servers where players build on plots of land, or perhaps a mod could be made to allow users to independently mark and tag creations in their own worlds/servers.
@@deltamico Yeah idk how you would solve that part. Maybe the average rgb values of the block would work for standard blocks, and then an additional value could categorize its shape (i.e. panes, stair types, fences, candles...) I really have no idea what I'm talking about
Map all minecraft blocks to vectors in an n-dimensonal feature space using machine learning to find suitable eigenvectors (kinda like how LLMs turn words into vectors) then treat each pixel as a vector and proceed as normal. If your machine learning picks the right eigenvectors and enough of them it might work.
Yeah, mineflayer-pathfinder definitely needs some improvements, especially in the scaffolding department. Maybe I can get myself to work on it some more. This is actually not the first time people tried to use general ai with mineflayer. There was also a French Microsoft team that did the same before with gpt. I think having the agents write the code has huge potential, especially if the modells were trained on the existing mineflayer code.
So cool! A while ago I had a thought of a Minecraft mod, which would add Beavers as a mob that were capable of navigating the world, collecting wood, and building wooden structures all on their own. I find it very exciting that such a thing might soon be possible.
that actually sounds so cool! and mojang might add beavers like how pistons were added. But less advanced as their buildings are more like structures but they are built.
Woldn't be interesting to have naturally generated structures in the world built by AI, instead of finding the same structures over and over you could find useless, not so good looking and simple but surely enigmatic structure to give minecraft his old feeling of mistery and the feeling of seeing for the first time like in the old days
Imagine having a team of AI agents to keep you company in the long lonely hours of Minecraft. To not only help you with manual labor but also to keep you company when your friends are disinterested in playing with building blocks.
the models have been trained on solving probability problems with characters and words. Had they been trained on (wasd) clicking and else, whow far would the be now. having to transcend a second dimensional barrier as a language model is quite challenging and im very impressed.
I love the way that your videos are presented, but this one in particular was really cool. Right now, the AI are silly and fun and a little bit mysterious; we bestow on them a personality given the first opportunity, too. Exploring and experimenting with this newer technology on a game that couldn't have conceived of it when it was released is fascinating. Imagine playing Minecraft with an AI companion that is not only adapted to beat Minecraft, but to play it. To explore, to build, to fight mobs and chat; it doesn't have to be a real player at all, but it's really good at acting like one. It could soon reach a point where, just watching them play, you really can't tell. I can see GPU only models trained entirely on Minecraft builds, or even highly playing parameters. Personally, I would love to see an AI capable of building really great stuff.
I haven't tested it extensively with GPT-4o yet, but in my experience with Gemini and GPT-4 on other tasks, vision has feedback doesn't work super well. I often tried creating diagrams with python, SVG images, images generated by DALL-E, graphical user interfaces and more. While the models were able to produce pretty decent outputs for most of these tasks, they all required many prompts and detailed instructions put in manually. Inputting images of their current results almost never lead to significant improvements. I'm sure that eventually LLMs or similar models will be able to do that, but from my experience, we're not there yet (with published models at least).
Id love to see something like this reach a point where it's fun to just casually play survival alongside them, and they build things in an unpredicatable way
I think, the main issue here is the design of the workflow. LLMs are, afterall, LLMs, they're not building stuff incrementally, step by step, they are programming actions without seeing the result of their previous step. This can be solved with better prompts, ofc, but I still think it's not enough. We probably need some combination of computer vision with llm with custom training data, or even another architecture of neural network.
12 year old me would cry seeing this. I used to love adding bots to my local minecraft server and setup markets and stuff you can use to trade with them.
it’s really weird how alive they seem. Not bad, it’s more interesting than anything. I’m really curious to see them going forward, I’m glad I just discovered your channel
I find the fact that the scaffolding is different when it so easily could repeat the same pattern of scaffolding for the same pattern of blocks being placed
About the scaffoling thing... try telling them to use the actual scaffold block for that, which can be easily removed later just from below of the construction
This series is awesome! Not sure if it'd work with how you've set things up, but it'd be pretty cool to make building the same structure into a competition, with the models considering what others have built and trying to beat them, i.e. Claude trying to build a better castle than Llama and so on.
If this video came out when I was 8, my adolescent mind would have burst with disbelief. (My mind was blown after the 2nd FNaF game theory video so there wasn't much of a threshold to break.)
You should ask ais about tools for how someone could effectively build in minecraft while blind, since that's effectively what they're doing. I think they need some kind of way of analyzing how things are going, rather than just knowing that what they tried to do failed in some way. Also, particularly for the nether portal example, maybe they should keep track of bounding boxes of built structures, and start building something new in a different place. (or optionally using the same location as a past structure, might be useful) Other possible programming tools for building: cellular automata (in particular, look up 'markov junior' and 'L systems') starting with a heightmap of a build, then hollow out details/interior (give the underlying agent the ability to place a whole column at once) see what's around it as a 2d top down heightmap
That already exists, also there is a bot that mines for you diamonds, or any other material you need xd But of course, unlike AI, those can't create code by themselves.
I suppose I should be glad people take seriously the part where I imply there not existing such bots already, instead of taking seriously the part where I tell them I'm looking forward to automated griefing 😂
Hey EG! I have a question (I know very little about AI) As a minecraft player we sometimes use "structure blocks" to select and save a region of blocks. The saved NBT file can be exported to anyone who wants your creation in their own world. There are literal thousands of NBT files online... would it be possible to train an AI to generate NBT files?
It's possible, but very expensive. That data is quite complex, requiring huge amounts of compute resources to train any useful AI. Most current AI techniques also require labels for the data. So the data for a house isn't enough, you also need a detailed text description. There are similar things that have been done though. I think NVIDIA has an AI that can generate 3D models based on text prompts, not in Minecraft, but some are fae mire detailed than a small minecraft build could be. It's a very similar concept.
God, Im so excited for more of these videos. Im interesting how far will this go and it also spiked my interest in playing with llms, still lack the skills to do it but ill get there Keep it up, love it!
this is honestly beautiful, i have always woshed to have ai friends within random videogames.. clicks more woth me due to me having autism.. average people feel more like ai to me than ai itself..
I dont know why but its so cute seeing the AI use dirt as scaffolding even though they're in Creative. I really wonder what they would build if you gave them really abstract concepts.
dystopian adjective dys·to·pi·an (ˌ)dis-ˈtō-pē-ən variants or less commonly dystopic (ˌ)dis-ˈtō-pik -ˈtä- Synonyms of dystopian : of, relating to, or being an imagined world or society in which people lead dehumanized, fearful lives
This is really fascinating. If you weren't aware that they were AI, it would almost look like a small child playing Minecraft. Just wait though, these children are going to grow up exponentially fast and be able to out-build even the best players.
one thing I was thinking about earlier in the video was how its building technique was non-existant. think about the pro pong players now, its not so much about the intelligence of game tactics, but more so they learnt how to play the controls. in mine craft people learn special/unique ways to build. I seriously think you need the models to learn from their own building performance. from doing just a line of blocks to an entire castle. I know the pro players will use gravel/sand for scaffolding so when reaching the bottom of the build clean up is easier and faster. definitely make them take the time to review the code they generate. perhaps for bigger builds make them first overlay the general steps to build. take time thinking about how to start. then they start building, the following steps can be thought through while the first building step is occurring. It sounds like you step make the model send a code block and wait for it to end, but instead I would say allow it to start and stop a code block. give it a 2 code block slot. think about how as humans we can be doing a task and thinking about something else. allow the model to stop itself mid way during the task.
I don't know what you're doing to get such success, but I couldn't get any model to do anything right. Almost as if it's the anti-pattern, 100% failure. No matter how simple the task, none seemed to be able to do anything.
Would be interesting to decompose litematica files into a structure that's usable for doing a finetune with GPT-4o.... or having them available in some RAG solution that can be used with function calls to direct the agent(s). Interesting stuff :)
This could be turned into a mod where you find villagers who can build for you and you can level them up to make bigger and better houses for you or even give them a custom prompt and since it’s AI it’ll determine its seed world and biome to create that structure
I’ve said this in your mindcraft video and maybe this should be a question for the mind flayer team but since the Minecraft community has a huge amount of schematics. it would be amazing if the bots could use those schematics to make buildings.
Dead Minecraft theory. Servers flooded with bots controlled by AI that respond in near human like fashion, build their own cities, occasionally engage in pvp with each other. I want to see it. And we surely will in the coming months/years. I would love to join or host a server with hundreds of AI players living their artificial lives.