LLM Attention That Expands At Inference? Test Time Training Explained

bycloud

Подписаться 160 тыс.

Просмотров 44 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

21 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 155

@bycloudAI 2 месяца назад

Take your personal data back with Incogni! Use code bycloud at the link below and get 60% off an annual plan: incogni.com/bycloud maybe we are all bots and the dead internet theory is true

@mine.moment 2 месяца назад

Please create your own style of thumbnails and stop trying to mimick Fireship lol... I'm being honest rn but the contents in your videos don't feel as interesting/ funny/ easy-to-understand as his. Hope you take that as constructive criticism because you do cover lots of cool topics that Fireship doesn't.

@Miaumiau3333 2 месяца назад

I disagree with the comments complaining that the video is too technical. I really like that you provide enough detail to roughly understand the technique, awesome video!

@FireOfGott 2 месяца назад

Agreed, this is very approachable to someone who knows some architecture fundamentals!

@seriousbusiness2293 2 месяца назад

I found the pacing a bit off. In general very well eddited and summarized information. But its hard to keep track with all the vocabulary, personally id ether need to linger longer on these details or get an even shorter overview on those aspects. I really like the style of Yannic Kilcher Paper reviews but his videos are also 3 times as long, so in any case its a tradeoff what one prefers.

@xClairy 2 месяца назад

@@seriousbusiness2293Honestly, I feel like it's because his target audience was different, and now it's more technical, so he'd need more time to explain those concepts instead of expecting a baseline understanding. But going more in detail would scale logarithmically with video length, which would also hurt his YT channel, considering we all expect at best 5~15 minute videos from this channel. So, yea, it's a trade-off.

@w花b 2 месяца назад

Yeah they might as well just watch Fireship because that's what they're asking.

@mine.moment 2 месяца назад

@@w花b But the problem is that bycloud tries to mimick Fireship's thumbnail style to lure in Fireship viewers then throw them off with 10+ minutes videos of overly technical stuffs, who prefer ~5 mins of mixed interesting, meme-y, simplified contents instead.

@MrJaggy123 2 месяца назад

Turn all the hidden states into ML models? That scream of pain you all just heard was from the interpretability researchers ;)

@QuantumConundrum 2 месяца назад

OK, but then their employment is secured forever LOL

@anthonybustamante5736 2 месяца назад

We need black boxes for the black boxes!

@koktszfung 2 месяца назад

But imagine if those ML models are CNNs and you can see how the kernel adapt to the context of the input in real time, wouldn't that be actually easier to interpret?

@naumbtothepaine0 2 месяца назад

@@koktszfung CNNs are more like DL, ML models are simpler

@revimfadli4666 2 месяца назад

@@naumbtothepaine0which simpler ML models? XGBoost? SVM? Because CNNs are ML models too

@Eianex 2 месяца назад

In conclusion, Trouble in Terrorist Town is cooler than some transformers and some snakes.

@flamakespark 2 месяца назад

Another day, another attempt to re-invent LSTMs

@babyjvadakkan5300 2 месяца назад

Whats that now?

@zyansheep 2 месяца назад

@@babyjvadakkan5300type of rnn that google used to use (or still does?) for language translation before we got transformers

@Bencurlis 2 месяца назад

It is more of a generalization of both LSTMs and Attention, it is theoretically much more powerful IMO

@keypey8256 2 месяца назад

It's definitely an interesting idea

@heavenrvne888 2 месяца назад

holy shit this method is so interesting. and the way they encapsulated the entire idea into the title LOL!

@papakamirneron2514 2 месяца назад

Please make a video explaining all of these terms, apart from that, keep the technical videos coming!

@karlkastor 2 месяца назад

I love that you tell us how the method in the paper roughly works. A lot of RU-vid channels just say this new technique is better without any explanation and just show results, so I have to skim the paper to get the gist of it.

@krollo8953 2 месяца назад

Yup makes you feel like you're learning something rather than information without enough context

@FunBotan 2 месяца назад

I would never suspect that this video would help me write my PhD, but the "compression heuristic" is exactly the term I needed but didn't know to express my idea. Thank you!

@manuelburghartz5263 2 месяца назад

This channel explaining AI and using anime references in the visuals is exactly what I needed. Great video!

@OumarDicko-c5i 2 месяца назад

As an IA, thanks you for teaching me this, i will use it to train myself

@ginqus 2 месяца назад

intelligently artificial

@IN-pr3lw 2 месяца назад

@@ginqusinteligencia artificial

@truongao5425 2 месяца назад

intelligent anti-africa

@TheRealUsername 2 месяца назад

@@truongao5425😂 troll

@mikairu2944 2 месяца назад

"too technical for this video" man you lost me at the thumbnail

@cdkw2 2 месяца назад

me to bro yet I still watch he entire video 💀

@Dannydrinkbottom 2 месяца назад

My brother speaking greek

@OperationDarkside 2 месяца назад

Let's put transformers into transformers. Maybe we end up with baby transformers.

@revimfadli4666 2 месяца назад

Ah yes, hot transformers in transformers action

@divandrey-u3q 2 месяца назад

As always, thank you for the video! I really appreciate the amount of technical details here. Don't know why other people complain but I love it!

@FaultyTwo 2 месяца назад

"Mom! They are adding more weights to the models again!"

@TheNewton 2 месяца назад

Good short dense overview of an even super denser subject matter. Still waiting for the paper that modularizes all these component processes and flows then runs training against all the permutations to bootstrap itself.

@cdkw2 2 месяца назад

2:32 Waiting for bycloud to be on that page like others!

@DarrenReidAu 2 месяца назад

It’s trainable models all the way down! Great video, thanks!

@athul_c1375 2 месяца назад

It's some mamba jamba

@XenoCrimson-uv8uz 2 месяца назад

How do we know the ones complaining about the bots in youtube chat aren't bots themselves?

@Alice_Fumo 2 месяца назад

I have definitely seen bots complain about bots before. In fact, you could also be a bot. Who knows at this point?

@picmotion442 2 месяца назад

I might be a bot

@leftybot7846 2 месяца назад

I'm definitley not a bot, what a stupid idea.

@turgor127 2 месяца назад

Ban both then. Spamming is annoying ether way.

@Cloudruler_ 2 месяца назад

The interesting thing is it's probably cheaper for a bot to spam "bot" than create LLM comments.

@marshallodom1388 2 месяца назад

I got up to 6 minutes and loved the ride! Gonna have a snack and p and dive right back in!

@SimGunther 2 месяца назад

Audience: Less reading, more technical content! Also audience: AAAAAAHH, MY EYES! TOO TECHNICAL FOR MY EYES AND EARS! 😢

@QuantumConundrum 2 месяца назад

More videos like this, please.

@JorgetePanete 2 месяца назад

6:08 it resembles Trouble in Terrorist Town

@HarperChisari 2 месяца назад

TTT is literally short term memory. Wild.

@heavenrvne888 2 месяца назад

that intro was amazing

@StefanReich 2 месяца назад

Super well explained. And full of memes

@registered_dodo1743 2 месяца назад

I love words.

@frazuppi4897 2 месяца назад

great video but transformers in practices do not have quadratic complexity, only if u implement it in the vanilla way

@spoonikle 2 месяца назад

Earth shattering.

@guilhermecastro3671 2 месяца назад

Cool video, for a beginner all these terms together seem very technical, can someone suggest a playlist to learn more in depth about these topics ?

@jasonhemphill8525 2 месяца назад

What part are you struggling with?

@sh4ny1 2 месяца назад

4:11 Why not use wavelet transform for this? I think it would be useful here since

@fnytnqsladcgqlefzcqxlzlcgj9220 2 месяца назад

Perfect amount of complexity, please do not make your longer videos like this more simple, im not involved in any form of computer science but ive kept up with ai since tensor flow was brand new and i understood almost everything first try

@ismailnejjar 2 месяца назад

Love the video!!

@flinkstiff 2 месяца назад

Bumblebee is my favorite

@CraftMine1000 2 месяца назад

Training on test data,,, unless I severely miss-understand this I'm just going say; "jikes, nope, get out, and don't come back"

@Ryu-ix8qs 2 месяца назад

Good video, thanks

@jondo7680 2 месяца назад

I want a TTT-Linear (T) with TTT-MLP (M) as it's inner loop.

@dhillaz 2 месяца назад

I know some of these words!

@noctarin1516 2 месяца назад

Nahh they actually cooking with this architecture though

@Dom-zy1qy 2 месяца назад

Whenever a new architecture takes over, the tech companies heavily invested into developing hardware specifically optimized for the transformer architecture are gonna be sad.

@samarthpatel8377 2 месяца назад

Sooooo many bot comments!

@bolon667 2 месяца назад

Putting innocent comments to change it into ads later

@samarthpatel8377 2 месяца назад

@@bolon667 I think you are right. The comments which I noticed earlier have gone?

@someonetrustme161 2 месяца назад

so nobody gonna talk about how we just got rickrolled? at 3:43

@narpwa 2 месяца назад

my brain is exploding send help

@ricardocosta9336 2 месяца назад

Dude no kidding, I came up with something similar a month ago. In concept. I'm afraid I have a limited num of insights in my life time. And without timento persue them I will never make any diference in the world. 😢. But hey that also proves, to me at least, that my math intuition is on point. 😅

@ccash3290 2 месяца назад

A lot of people have zero insights. Its important to work on your ideas to test them in reality

@anywallsocket 2 месяца назад

If you thought of it other people thought of it or will, so don’t worry about not being the one who gets credit, what matters is that the idea is in the memosphere

@scientificaly_restful_one 2 месяца назад

Well, some year ago or so I had thoughts about going into ML, but you have lost me on this one. 👍 I guess it's only gonna get more complicated from now on.

@kamilbxl6 2 месяца назад

Nowadays its easier to learn ML than ever. You should start with something simple enough that you understand around 80% and only actually doing 20% as smth new. There are lots of free shared classes like MIT, Stanford etc.. lots of tutorials, examples, code documentation. First get a general yet simple view bout NN, then chose what you'd like to specialize: image recognition, text or smth else

@David-lp3qy 2 месяца назад

MAMBA IF YOU CAN HEAR ME PLEASE SAVE US

@koktszfung 2 месяца назад

Wouldn't this model be slow in operation if it has to train on the context?

@Acceleratedpayloads 2 месяца назад

This looks block recurrent transformers by DL Hutchens

@quocanhnguyen7275 2 месяца назад

I tried to read this when u wrote about in your newsletter. But it was not an easy paper

@krollo8953 2 месяца назад

Lol thats an intense amount of memeage

@Vagabundo96 2 месяца назад

This is crazy

@FenrirRobu 2 месяца назад

Tho didn't they warn us against meta-optimizers due to the alignment becoming impossible?

@bloomp7999 2 месяца назад

did i understand this

@bobsoup2319 2 месяца назад

Bro this model is too complicated to be simplified more. Keep up the complexity it’s what makes it interestijg

@4.0.4 2 месяца назад

It seems very convoluted, but I guess it should learn with less data? That could be good for startups that don't have big datasets.

@pladselsker8340 2 месяца назад

Imagine giving money to a service for a sense of security because it is now the status quo to let every substential company out there infringe on your privacy rights. Just a thought. What parallel universe is this?

@-mwolf 2 месяца назад

tell me the current paradigm is hitting a dead end without telling me

@LukasNitzsche 2 месяца назад

Does this relate in anyway to liquid time constant neural networks?

@BooleanDisorder 2 месяца назад

Next up is cisformers

@amafuji 2 месяца назад

detransformers

@ginqus 2 месяца назад

biformers

@anywallsocket 2 месяца назад

formers

@BooleanDisorder 2 месяца назад

@@anywallsocket forms

@sarveshpadav2881 2 месяца назад

performers?

@simonesborrinpz 2 месяца назад

good videos👍

@DanielJoyce 2 месяца назад

A single brain neuron needs something like 5 layers or so to encode its behavior. So this kinda maps each node now to somethibg like a neuron. I know biological features map poorly to neural nets but neurons in the brain change how and when they fire as the brain learns.

@pedrogorilla483 2 месяца назад

I watched half of the video and this is too technical for me. I’m skipping this one. Congrats to everyone who understands this video!

@bycloudAI 2 месяца назад

it's like RNN's hidden states are just ML models thanks for watching till half way tho

@MuhammadakbarAK47 2 месяца назад

Just watch it 3 times

@sashank224 2 месяца назад

@bycloudAI ili bro I explain hold up, I'm getting what hes saying. You need break it down in simple terms that relate to real world apps. Visualize.

@homeyworkey 2 месяца назад

@@bycloudAI btw this was posted on r/singularity where there are more normies - obv u need normies if you want growth though, but any technical video is automatically going to have a very niche audience understandably so, so you probably dont mind that aswell. i mean i watch ur stuff and most of it goes over my head but interesting regardless, but just letting u know the feedback here is kind of skewed.

@PhilsArtVibes 2 месяца назад

No, no, no, I do not want to add neural networks to recursion, I JUST BEGAN TO UNDERSTAND RECURSION DON'T DO THIS TO ME!!!

@pmosg9649 2 месяца назад

很棒😀

@GodbornNoven 2 месяца назад

Nice explanations but go easy on the vocabulary. I don't reckon every joe out there knows will understand all the terms. The pacing is too quick too.

@Wobbothe3rd 2 месяца назад

The human brain is a recurrent neural network, not a transformer. Eventually, recurrent will win.

@athul_c1375 2 месяца назад

But who said the human brain is better than the transformer

@notnotandrew 2 месяца назад

Yo dawg, I heard you like ML models...

@anywallsocket 2 месяца назад

Wouldn’t that take forever to train??

@jymcaballero5748 2 месяца назад

just give them more memory!

@kingki1953 2 месяца назад

You should consider to ban bot in your channel.

@kingki1953 2 месяца назад

You just upload and 3 bots already comment, dark internet is scary 😢

@StefanReich 2 месяца назад

@@kingki1953 Actually dark internet is really lame right now. You can spot these comments from a mile away Your videos are always so informative and interesting! Thank you for that! Thank you for your work! Your videos are always top notch! Always a pleasure to watch your videos! I will be looking forward to new episodes!

@algorithmblessedboy4831 2 месяца назад

guys I'm in high school and I'm trying to choose a career path. my no.1 choice considering the things I like and that I'm good at is becoming an AI reaearcher, can anyone in the academic world tell me if it would be a fun job or not?

@user-vg2ui3wg8n 2 месяца назад

It definitely is. But the field is getting increasingly complex, fast-paced, and hyper-competitive. I'd recommend studying computer science and mathematics, since you will not be able to compete in this field without a very strong mathematical background. Except for that, go for it. I'm a researcher in parallel processing and numerical high-performance computing. It is definitely fun and rewarding, but be prepared for a painful journey.

@Guedez1 2 месяца назад

Yeah, if you made up everything you said in the video I wouldn't be able to tell at all. Stuff is getting harder and harder to understand.

@donson3326 2 месяца назад

Short answer: no

@ONDANOTA 2 месяца назад

why is every llm's OUTPUT context window fixed to 4096?

@geli95us 2 месяца назад

AFAIK, output context windows are not a thing for the models themselves, the model is just called once for every token it has to generate, you can perform that process a million times if you want, however, it's not useful if the LLM outputs text up to a point where its prompt gets out of its context window, so in the early days the "output window" was just set to whatever the model's context window was, nowadays, it's probably capped for economic reasons, LLMs get more expensive the longer the input is, so by limiting the output window, they force you to pay for tokens several times, once as the model's output, and subsequent times as input to the next outputs

@spoonikle 2 месяца назад

To stop it. While still giving enough space to make “satisfying” answers.

@falsechord 2 месяца назад

fractal ai models

@multipurposepaperbox 2 месяца назад

damn yeah that's AI stuff right hahaaa? tbh I understand a quarter of this, but I really enjoy a lot of your videos

@pauljones9150 2 месяца назад

I'm here for the waifu memes Good video tho

@boricuaxflow9669 2 месяца назад

Are we all botted comments?

@mariusj.2192 2 месяца назад

The quadratic complexity is not the main problem of current LLMs. It's that they are dog sh*t at reasoning (and tasks that depend on it) and a better scaling with context length won't solve that.