Interesting idea… my first concerns would be of mean reversion kicking in when combining different models, that or overfitting when combining the same one. Have to try it out though
That is spot on! You explained it better than I did (when I mentioned "amplifying some weights" I guess that's better expressed with overfitting). Your concerns are exactly why I do not understand how it could work. I am also considering: - Maybe the process of merging has some "unintended" side-effect that may be the cause for better results. (I don't yet know how to test this) - Maybe merging reduced previous overfitting in the network, by applying changes that were not present during training, leading to slightly better generalisation. (This could be tested by disrupting the network with noise, or "merging with a noisy model" and see if the effect persists.) What do you think?
It needs a fair bit of memory, I’m running on an M1 Air with 16GB Which models have you tried? Start with a smaller one. How much memory do you have? Soon I am going to show a lighter approach that although slow, runs on a raspberry pi
@@yorkie4k totally! I’m now also looking into Llamafiles and SQLite-vec. Both really cool too, but LM Studio is still the simplest and most accessible I know Do you use LM Studio just for the chatbot? Or are you building something with it? Would love to learn more 😉
@@danielhabibio thanks! The coming videos should also have an outdoor twist too 😉 hopefully gets more people in the flow and woods are just awesome. P.s. I promise I’m not building a bunker in the woods to hide from our AI overlords 😉