I feel like what's not shown in the video is more interesting than what is. How well does it work? How DO you put it into the game? Do you require players to have 8+GB of VRAM just to generate flavour text on the fly? Does the LLM produce significant variety, such that it can't be cached into a dictionary of a few thousand thoughts and triggers?
I wonder how big the minimum viable model would be if you only need a 20 token context window. Also if you can fit it onto the iGPU and how much performance it would sap at 2 tps. Final Fantasy was 2 or 3 characters per second if I recall correctly.
Interesting concept, but this video is WAAAAAAAY too fast. I set it to 0.75x speed, and even then I had to pause every couple of seconds to see what was on the screen. Could you please slow it down next time so we can actually understand what you are talking about?