Take your personal data back with Incogni! Use code bycloud at the link below and get 60% off an annual plan: incogni.com/bycloud maybe we are all bots and the dead internet theory is true
Please create your own style of thumbnails and stop trying to mimick Fireship lol... I'm being honest rn but the contents in your videos don't feel as interesting/ funny/ easy-to-understand as his. Hope you take that as constructive criticism because you do cover lots of cool topics that Fireship doesn't.
I disagree with the comments complaining that the video is too technical. I really like that you provide enough detail to roughly understand the technique, awesome video!
I found the pacing a bit off. In general very well eddited and summarized information. But its hard to keep track with all the vocabulary, personally id ether need to linger longer on these details or get an even shorter overview on those aspects. I really like the style of Yannic Kilcher Paper reviews but his videos are also 3 times as long, so in any case its a tradeoff what one prefers.
@@seriousbusiness2293Honestly, I feel like it's because his target audience was different, and now it's more technical, so he'd need more time to explain those concepts instead of expecting a baseline understanding. But going more in detail would scale logarithmically with video length, which would also hurt his YT channel, considering we all expect at best 5~15 minute videos from this channel. So, yea, it's a trade-off.
@@w花b But the problem is that bycloud tries to mimick Fireship's thumbnail style to lure in Fireship viewers then throw them off with 10+ minutes videos of overly technical stuffs, who prefer ~5 mins of mixed interesting, meme-y, simplified contents instead.
But imagine if those ML models are CNNs and you can see how the kernel adapt to the context of the input in real time, wouldn't that be actually easier to interpret?
I love that you tell us how the method in the paper roughly works. A lot of RU-vid channels just say this new technique is better without any explanation and just show results, so I have to skim the paper to get the gist of it.
I would never suspect that this video would help me write my PhD, but the "compression heuristic" is exactly the term I needed but didn't know to express my idea. Thank you!
Good short dense overview of an even super denser subject matter. Still waiting for the paper that modularizes all these component processes and flows then runs training against all the permutations to bootstrap itself.
Perfect amount of complexity, please do not make your longer videos like this more simple, im not involved in any form of computer science but ive kept up with ai since tensor flow was brand new and i understood almost everything first try
Whenever a new architecture takes over, the tech companies heavily invested into developing hardware specifically optimized for the transformer architecture are gonna be sad.
Dude no kidding, I came up with something similar a month ago. In concept. I'm afraid I have a limited num of insights in my life time. And without timento persue them I will never make any diference in the world. 😢. But hey that also proves, to me at least, that my math intuition is on point. 😅
If you thought of it other people thought of it or will, so don’t worry about not being the one who gets credit, what matters is that the idea is in the memosphere
Well, some year ago or so I had thoughts about going into ML, but you have lost me on this one. 👍 I guess it's only gonna get more complicated from now on.
Nowadays its easier to learn ML than ever. You should start with something simple enough that you understand around 80% and only actually doing 20% as smth new. There are lots of free shared classes like MIT, Stanford etc.. lots of tutorials, examples, code documentation. First get a general yet simple view bout NN, then chose what you'd like to specialize: image recognition, text or smth else
Imagine giving money to a service for a sense of security because it is now the status quo to let every substential company out there infringe on your privacy rights. Just a thought. What parallel universe is this?
A single brain neuron needs something like 5 layers or so to encode its behavior. So this kinda maps each node now to somethibg like a neuron. I know biological features map poorly to neural nets but neurons in the brain change how and when they fire as the brain learns.
@@bycloudAI btw this was posted on r/singularity where there are more normies - obv u need normies if you want growth though, but any technical video is automatically going to have a very niche audience understandably so, so you probably dont mind that aswell. i mean i watch ur stuff and most of it goes over my head but interesting regardless, but just letting u know the feedback here is kind of skewed.
@@kingki1953 Actually dark internet is really lame right now. You can spot these comments from a mile away Your videos are always so informative and interesting! Thank you for that! Thank you for your work! Your videos are always top notch! Always a pleasure to watch your videos! I will be looking forward to new episodes!
guys I'm in high school and I'm trying to choose a career path. my no.1 choice considering the things I like and that I'm good at is becoming an AI reaearcher, can anyone in the academic world tell me if it would be a fun job or not?
It definitely is. But the field is getting increasingly complex, fast-paced, and hyper-competitive. I'd recommend studying computer science and mathematics, since you will not be able to compete in this field without a very strong mathematical background. Except for that, go for it. I'm a researcher in parallel processing and numerical high-performance computing. It is definitely fun and rewarding, but be prepared for a painful journey.
AFAIK, output context windows are not a thing for the models themselves, the model is just called once for every token it has to generate, you can perform that process a million times if you want, however, it's not useful if the LLM outputs text up to a point where its prompt gets out of its context window, so in the early days the "output window" was just set to whatever the model's context window was, nowadays, it's probably capped for economic reasons, LLMs get more expensive the longer the input is, so by limiting the output window, they force you to pay for tokens several times, once as the model's output, and subsequent times as input to the next outputs
The quadratic complexity is not the main problem of current LLMs. It's that they are dog sh*t at reasoning (and tasks that depend on it) and a better scaling with context length won't solve that.