Optimizing my prey vs predators project for future bigger simulations. 00:00 Introduction 02:00 Data optimization 04:00 Neural Network optimization 05:40 Space partitioning 06:30 Multithreading
As a partially colorblind person, I find the new colors harder to differentiate than the old ones. I would recommend that you use light orange and dark blue, as they are the most easily distinguished colors across all forms of color blindness. I'm still glad that you bothered to think about colorblind people in the first place, though :)
There are free online utilities which transform a regular photo into how it would look to someone with color blindness. A quick search should find them.
Hey jack, my brain always registers green and orange as the same, what you think that means? Haha I'm not color blind but I often say green when I look at orange & vise versa. Maybe you have a wild perspective that I don't see. But I also agree, for some reason the colors were off-putting for me, compared to the first.
@@Al-tg7ok I also see green and orange as similar sometimes, so you might have protanopia/protanomaly (difficulty perceiving red light), or deuteranopia/deuteranomaly (difficulty perceiving green light). These two forms of color deficiency, especially protanopia, are surprisingly common among men specifically, because the biology of the eye is slightly different between genders. I myself have mild protanomaly, which causes me to mix up lots of colors. People seem to think that when someone is red-green color deficient they simply see all shades of red and green as identical, but in reality having a color deficiency is much more complicated and affects many more colors than people seem to realize, and it can vary even between people who have the same type of deficiency. Another big factor of color deficiency that people often don't realize is that anyone, even someone who is entirely colorblind, can differentiate between dark and light colors. Even though I often mix up red and green, I can easily differentiate between a light green and a dark red.
I'm also partially red/green colorblind and I agree, the previous video's colors also had a lot of contrast in brightness in comparison with these colors which are a similar brightness.
i want to suggest something: adding objects and obstacles into the "arena" so that the "agents" can evolve to use those to their advantage, similar to how animals have evolved to use certain land features for cover or nesting
That's indeed interesting, but it also sounds like another type of object that needs recognizing. No idea if difficulty has to scale the performance issues linearly, in principle you could probably get away with less impact after changing the architecture the first time, but I'm not sure ^^
@@EliasMheart As far as I understand it is using a neural network for vision. This means that it would not be much of a problem to implement physical objects in code. But the network should learn to recognize them
As ml engeener, I don't think that for such small networks topology is crucial for interesting behaviour. Even with fixed topology (but with mutating weights) you can get impressive results in supervised or reinforcement learning tasks (see hide and seek multi-agent project from OpenAI, not an evolution tho). But! With fixed topology you can store weights simply as matrices and forward pass as matrix multiply. With synchronized step for all agents, you can even step on all env at once (concatenate weights from all agents, matrix multiply on gpu). Usually with such setup, MASSIVE simulations are possible.
@@WsprWndrr Yeah, but usually evolutionary community advocates also for networks plasticity, like in NEAT algorithm, where you can add neurons, remove them, add connections, that stuff. While in conventional deep learning, topology usually fixed as this is make possible a lot of optimizations (gpu, autodiff frameworks like pytorch, jax, etc)
an ML engineer that does not know how to spell "engineer" 🤨 That said, heck yeah, using a matrix for the connections and using a compute shader would be a huge win.
@@paulpach well, I am not a native speaker, not even work in english on daily basis (I know enough to read papers and docs, not to write without typos), lol, pathetic
Really cool to see such a concrete example of optimization done at the right time -- when it's needed. You could've easily just hand waved this in the next video, but it's awesome that you took the time to make this interstitial video 🙌.
You can optimize your multithreading even further by taking into account a "complexity rating" while queueing up tasks: Long tasks being executed at the end would currently block the frame until the last long task finishes. If you can rate how long tasks will take, assigning the longer tasks to workers first will improve consistency and speed of frames. You can do this either by hand "guessing", or dynamically using some sort of profiler and then assigning the tasks that took long on one frame a higher priority on the next.
I think the creatures are evenly distributed enough so each thread will execute in the same time. (You'd need a large population in one grid to get a long pole which is unlikely).
It seems like the optimization with the neural networks was needed due to the sparse nature of your neural network. But I wonder since GPU's nowadays are very optimized to preform matrix multiplications if it would be faster to have the neural network instead be fully connected but with the unwanted connections' weights set to 0 and frozen during training, so that the weights for each layer could become a 2d array and the multiplication could be done on the gpu. But then again I don't think the neural network here is the bottleneck anyway.
Hi Pezzza, Your first video was really great, it got me motivated to play a bit with evolving agents too. I did notice the exact same problem you have here: It gets slow with a lot of agents, and the majority of the time is spent on calculating the networks. The solution that worked for me was to just do all network calculations on the gpu, this allowed 60k+ agents in realtime (depending on net complexity of course). Its adds more complication with the memory management, but I would assume it is the only realistic solution to get a high agent count in realtime, otherwise just the number of floating point operations required for the network will probably hit the limit of the cpu.
I know nothing about coding or programming, but your explanations are very clear and easy to understand! Also big props for the production quality. Those graphics are really nice and help a lot in conveying what you are doing. Keep it up!
Dunno if this will help but there's been a breakthrough in neurology by the University of Tokyo where they appeared to have identified how the brain achieves self-awareness. This may be worth investigating for development of better neutral networks. In short, most neural networks are monodirectional which was believed to be how synapses work. But what has been found is that along the network are clusters of bidirectional synaptic nodes that compare the inputs from multiple monodirectional inputs and create a self-contained loop with one output. This appears to be a weighting system whereby the final output that is fed into the rest of the network is the one which didn't get cancelled out by the cross-connections within these bidirectional nodes. When you look at this from experience, this is how it is possible for you not to notice a headache when you stub your toe as it generates a stronger reaction. Or how a room can be so noisy that it's not possible to focus on a particular task or thought. The current neuroscience equasion for brain activity is r=f(s) but this discovery has them investigating an additional theory of sentience being C =g(r) where r is brain activity and C is a measurement of consciousness.
I loved the explanations of the optimizations. So informative and concise! Your voice is very soothing. I wish you had videos simply explaining different algorithms, computer science students around the world would eat that up with the quality of these animations and the production quality.
I love the animation! I did one of these years ago using the Qt Mouse Sprite demo as a base. One thing my kids loved in elementary school was choosing a mouse tribe to follow so I gave the mice different colours from a small pallette such that there were at least 10 mice of each colour. They had different colour ears for boy, girl, diseased(green) , old (white) then different sizes for child and adult and different rules for interactions between all characteristics. They would watch an initial world-building and different colours dying out or thriving and then the game would stop and they could type in their name to choose which colour from the remaining mice they thought would win by surviving longest. Some runs lasted hours!
Yes I am sorry for this, optimizing took me quite some time and I don't know how much more I will need to run and tweak the simulation so I preferred to do this that way
Your tasks & threads representation at 7:11 is beautiful, what language did you use to write this little lib? I could see myself implementing something similar in CSS/JS
Storing the NN as a matrix will be much more efficient, since computing the next layer will be as simple as activation(input x weights) which will be way more efficient than manual looping if performed on a GPU. That alone might give you a significant performance boost. Also, if you wish to optimize the k-nearest neighbor queries, you can look into Quad-Trees
That's awesome, it reminded my my own performance improvements search in my projects at work)))) Event processing is sometimes an interesting task) The conveyor with paralellized stages rules!))))
Really nice video. I have a little question, when you already implemented multithreading into the simulation, why didn't you just use the gpu instead of the cpu, since the gpu is made for parallel processing?
i used to watch your ant sim vids and i loved them, but this is on another level! the video is really well made and feels really proffesional, honestle youre one of my favorite coding channels, keep it up
As others have said, the visualizations are great, and the project too : ) I was wondering, does the rendering have any reasonable impact on performance?
Honestly, respect to you & your work. Your first video awake a passion about neural network theory and i even tried to reproduce it on Unity. By the way, what do you use for your simulation ? Is it from scratch or do you use a game engine ?
This is very well presented, I love your style and I love this video concept, can’t wait to see what else you do with it! What are you using for your animations while you are explaining?
I don't know if it was for oversimplification or actual implementation, but you showed updating agents one by one. Isn't there any way to use vectorization and batch process to update a bunch of them simultaneously?
Cool project! A couple of suggestions, reading a bit between the lines: Sounds like you did a lot of guesswork on the optimizations - using profiler would discover the actual hot paths easily. Your reasoning for a graph representation sounds weird - a matrix representation is faster and if anything easier to update. Not aware of any reasons to use pointers besides saving memory on sparse graphs with a lot of nodes. Reorganizing agents from aos to soa easing threading seems weird - your problem domain seems trivially data parallel: Just split the agents into # of threads chunks and proceed normally. Use separate output buffer if data races are a concern - flip input and output buffer for the next frame. Next step: Cuda/OpenCL ^^
I've just been talking (I'm a noob) about caching and lowering detection range and the video appears. Great noob-friendly video. You shoulda create a learning program and sell it!
These were most of the optimizations I also ran through when I was doing my version of this in Rust. My NN are just forward pass matrix mults with a relu activation though
A few ideas: 1) instanced rendering, hopefully parallel push data to a command buffer if you can, depending on what lang you’re using. 2) for a spacial partitioning, you dont need a full blown physics solution. Use a quadtree, or even simpler: fixed cell, where you hash entities to a cell id via position. Say a cell is 10 x 10 units and an entity is at x:9.5,y:45 -> cell 1,5. With a fixed grid size this can be mapped to a 1 dimensional array. Honestly a multi-value hashmap is all you need for your simulation. 3) dont need to raycast. detect entities nearby then use the dot product. a ratio using dot products and distance will resolve line of site. 4) obstacles can be navigated once detected by influencing the steering by shifting the direction towards its perimeter. 2D line - polygon intersection is pretty simple.
Two questions: 1) Are you using SIMD for this? 2) I had an earlier project where I had my data vectorized and then wanted to do some broad-phase spatial partitioning, but I ran into the issue that the partitioning would make the vectorization sort of worthless, since I'd either need to refer to elements with random access (which is not SIMD compatible) or else sort them all into "buckets" for each partition (which is expensive and has issues at the partition boundaries). I eventually lost interest in the project and scrapped it, since it was just messing around with SIMD for physics and barebones AI, but I am interested in ideas or common approaches.
Since this project seems to be written in C++, it should be possible to guide the compiler to perform vectorisation automatically. Tweaking the loops might be necessary to get the compiler to vectorise them, though.
@@iippari7 Vectorization is the process he described of arranging similar data contiguously, such as the process he described in the "data optimization" section of the video; changing his vector of structs to a series of vectors with the structs now indexing into the vectors. The compiler won't do it for you, but vectorizing the data isn't hard to pull off if your interfaces are appropriately abstracted. The problem comes afterwards when you have to start rearranging members of the vectorized data. For example, when one entity perishes then its data members in the vectors need to be marked as released somehow, so that they can be re-used by new entites that are spawning in. There are several strategies for this, but the difficulty is trying to find a set of solutions such that the overhead of managing the data doesn't encroach too much on the gains you get from vectorization.
@@khatharrmalkavian3306 With vectorisation, I was referring specifically to the compile-time process of optimising the code to use vector register -based (SIMD) instructions, such as the ones provided by AVX for x86. Given the appropriate compiler flags and source code, most C++ compilers can optimise code to utilise these instructions. The difficulty I spoke of comes from the fact that an understanding of the compiler's capabilities is needed to properly guide it into optimising code to use these SIMD instructions. That being said, you are indeed correct regarding the balance between data arrangement and the overhead from managing the arrangement :)
I didn't understand how object-oriented things like that worked on such a high level, I assumed that it would only load what you needed, and I figured that object storage was optimal, as I figured that they would be optimized in languages like Javascript which are designed for them.
I had the same performance issue in my simulation project. One big problem was that creatures seeing each other leads to exponential interations since each creature has to check each other's distance. I thought about the solution of internal square blocks in which creatures can enter temporarily, so they only have to iterate through creatures in neighbour blocks, which could keep the total number low. When they move, they enter a new block. But i didn't test that out yet.
@@miquellluch1928 can you explain how distance would make O(n)? I'm picturing a sort of distances but then i'd need to create distance lists for every object.
@@TheRainHarvester You don't check for every creature, you have some preprocessing step eliminating most pairs. Like explained in the video by separating the world in small worlds which only perform collision checks between them. Or create buckets that only check themselves and neighboring ones. There must be thorough explanations of this existing online as this is quite a common problem and can be used outside collision checking as well. As long as you have some other information that tells you you can discard checks.
I imagine a simulation of the evolution of predator eaters. As a predator scans and fails to see it's preferred food, it has x/20 chance of evolving to eat others of its own kind, where x is the number of units of its own kind that it can see. Maybe even throw in a special case for the "plants" in this, like if a plant sticks around long enough it will increase the chance of another plant growing around there eventually. This would give a simulation of how plants reproduce and grow over the years.
Make agents have hearing so they evolve to be more quiet. Also add a day-night cycle and when its night make there be an atribute of how stealthy an agent is, so even if a ray hit it the agent would not see it