Mamba Might Just Make LLMs 1000x Cheaper...

Подписаться 150 тыс.

Просмотров 125 тыс.

50% 1

Check out HubSpot's ChatGPT at work bundle! clickhubspot.com/twc
Would mamba bring a revolution to LLMs and challenge the status quo? Or would it just be a cope that may not last in the long term? Looking at the trajectories right now, we might not need transformers if mamba can actually scale but attention is probably still here to stay.
check out my AI sites leaderboard: leaderboard.bycloud.ai/
Special thanks to
- LDJ x.com/ldjconfirmed
- Gifted Gummy Bee
for helping with this video!
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
[Paper] arxiv.org/abs/2312.00752
[Code] github.com/state-spaces/mamba
Transformer: Attention Is All You Need
[Paper] arxiv.org/abs/1706.03762
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
[Paper] arxiv.org/abs/2401.09417
[Code] github.com/hustvl/Vim
Efficiently Modeling Long Sequences with Structured State Spaces
[Paper] arxiv.org/pdf/2111.00396.pdf
Flash Attention
[Paper] arxiv.org/abs/2205.14135
Flash Attention 2
[Paper] arxiv.org/abs/2307.08691
VMamba: Visual State Space Model
[Paper] arxiv.org/abs/2401.10166
[Code] github.com/MzeroMiko/VMamba
MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts
[Paper] arxiv.org/abs/2401.04081
MambaByte: Token-free Selective State Space Model
[Paper] arxiv.org/abs/2401.13660
Repeat After Me: Transformers are Better than State Space Models at Copying
[Paper] arxiv.org/abs/2402.01032
This video is supported by the kind Patrons & RU-vid Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - midnight
[Profile & Banner Art] / pygm7
[Video Editor] @askejm, Lunie

Наука

Опубликовано:

23 фев 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 323

@bycloudAI 5 месяцев назад

Check out HubSpot's ChatGPT at work bundle! clickhubspot.com/twc and I missed out a paper that combined mamba & attention, you can find it here: arxiv.org/abs/2402.04248 what an interesting timeline this is

@eleven-nine 5 месяцев назад

JJK edit after joe mama joke. This video is a masterpiece.

@DivineMisterAdVentures 5 месяцев назад

Somebody (I nominate you) needs to chart knowledge half-life and relate to the landfill and SciFi and "human uselessness". Somehow this deserves at least multiple theses. If there was time.... I think that would be more telling than Kurzweil's Exponential Tech Acceleration chart. "The Matrix is a place to hide and play games...."

@skyhappy 5 месяцев назад

Can you share the video assets for the anime edit? I wanna try using it in the future. Also how did you get the voice?

@Nakamako1 4 месяца назад

mark your vids where the examples are. i for example always skip straight to examples when im learning

@xthesayuri5756 5 месяцев назад

Bro dropped the hardest LLM anime edit and thought we wouldnt notice

@a_soulspark 5 месяцев назад

7:24 if you want to experience AI lobotomy

@zolilio 5 месяцев назад

Wasn't expecting that AT ALL 🤣🤣🤣

@JacksonMarshal 5 месяцев назад

This why he got a new sub 😂

@JorgetePanete 5 месяцев назад

wouldn't*

@C2WasAlreadyTaken 5 месяцев назад

Bruv I know. I made a clip and immediately shared. I almost feel like all technical information should be conveyed this way.

@sleepingArtist 5 месяцев назад

😂 I did not expect The JJK EDIT and died laughing

@Itachi_Uchia1 4 месяца назад

Whats JJK??

@guiAI 4 месяца назад

Jujutsu kaisen or smth like that@@Itachi_Uchia1

@tomerhorowitz4779 4 месяца назад

@@Itachi_Uchia1 Jujutsu Kaisen

@AkysChannel 4 месяца назад

Yes 🤣 So on point

@itzhexen0 5 месяцев назад

Good so I guess OpenAI no longer needs 7 Trillion dollars for chip factories.

@djecchi8768 5 месяцев назад

😂

@francisco444 5 месяцев назад

FYI: 7T is about the entire infrastructure not just chip factories and the fact that OpenAI doesn't own much compute in comparison to Meta, Microsoft, Nvidia, or Microsoft.

@itzhexen0 5 месяцев назад

@sco444 Yeah mass surveillance infrastructure. Not that I'm against that.

@jonathanberry1111 5 месяцев назад

@@itzhexen0 You absolutely should be, it won't be about keeping you safe but them, and if they are safe, well a nation of sheep will beget a Government of wolves.

@itzhexen0 5 месяцев назад

@@jonathanberry1111 Well there are a lot of nutty people in society and I think it's needed. It probably won't be about keeping you safe either. But what else would they need 7 trillion dollars for? Chips for themselves? Because they won't be passing them out to everyone in the USA.

@635574 5 месяцев назад

7:20 the greatest LLM anime of all time begins(JJK is within 2 letters of LLM)

@WoolyCow 5 месяцев назад

and by lack of competition...sadly the worst as well

@revimfadli4666 11 часов назад

LuLutsu Maisen

@OxidoPEZON 5 месяцев назад

That anime edit was one of the sickest media pieces I've seen, but unfortunatelly I have no friends in the intersection of jujutsu enjoyers and ai reaserch conisseurs, who would appreciate it wholly

@valentinovazzoler 5 месяцев назад

Yeah, when I saw it I was like: I need to show it to... Wait who will ever understand it among my friends? No one

@RllXeron 5 месяцев назад

But U have Us in comments section so we can laugh together 😂

@AshT8524 5 месяцев назад

No one: Mamaba: Nah! I'd win.

@JoshuaEworo 5 месяцев назад

didn’t expect lobotomy kaisen to make its way to the LLM and AI space😭😭 best thing ever

@lelouch1722 5 месяцев назад

"Exponentially" should stop being misused for everything that is bigger than linear... Quadratic != exponential

@losttale1 5 месяцев назад

what is x^3?

@varungawande9321 5 месяцев назад

It isn't wrong to call a square a rectangle.

@nanubalagnanasai3006 5 месяцев назад

@@WoolyCow Quadratic is polynomial, Exponential is exponential.

@WoolyCow 5 месяцев назад

@@nanubalagnanasai3006 yeah mb, i must be stupid lol

@jonathanduck5333 5 месяцев назад

@@losttale1cubic

@nawabifaissal9625 5 месяцев назад

naaah lobotomy kaisen is taking over everything i swear 💀💀💀💀

@zrakonthekrakon494 5 месяцев назад

Just for a few months than all the lobotomies will forgor

@zrakonthekrakon494 5 месяцев назад

Just for a few months than all the lobotomies will forgor

@voidnull4282 5 месяцев назад

Straight up stealing fireship viewers with these thumbnails ☠️

@felipearrais5415 4 месяца назад

I completely agree but at least the context doesn't let down

@Spyblox007 5 месяцев назад

I love seeing this trend of LLMs getting quicker and using less resources. I think we are only a few breakthroughs away from a point where LLMs can begin running on mobile devices locally at reasonable speeds. Right now companies are spending major resources on making the models smarter through the models themselves. However, make the model small and quick enough, and you could run it multiple times, prompted by hard-coded logic, to possibly accomplish the same things as the larger models without the need for as much power or space (at the cost of time). This could allow an LLM to exist on robots without being connected to a service. The technology is in the works for quick instruction following for robots, so an LLM being able to feed the robot instructions makes the robot self guiding, which would be a sight.

@wlockuz4467 4 месяца назад

"Stand proud Transformer, you are strong." - Mamba

@awesomebearaudiobooks 5 месяцев назад

In the early 2000s, here in Russia, Mamba was a very popular dating site. Good to hear they are now at the frontier of AI development!!!

@MilkGlue-xg5vj 5 месяцев назад

Just exactly like how RU-vid used to be a dating site too! The story repeats itself.

@16876 5 месяцев назад

also 'cope' (the word in the thumb) is 2019-ish 4chan troll word; this video is nostalgic in many aspects!

@ponponpatapon9670 5 месяцев назад

@@16876 isn't it ironic how Twitter started abusing the FUCK out of 4chan lingo

@MilkGlue-xg5vj 5 месяцев назад

@@16876 No one asked nerd

@xthesayuri5756 5 месяцев назад

it scales quadratically not exponentially

@bycloudAI 5 месяцев назад

oops i meant it metaphorically that was a bad word choice lol

@Alice_Fumo 5 месяцев назад

I guess ² is still an exponent by which we're scaling here. I'm sure anyone watching this video will know what was meant. A correction should probably still be made.

@LeoVital 5 месяцев назад

@@Alice_Fumo But exponential means that it scales according to the nth power. n² is polynomial, better than linear but not exponential like 2ⁿ.

@triforce42 5 месяцев назад

Came here to say this. Quadratic and exponential is a huge difference.

@xxlvulkann6743 3 месяца назад

@@LeoVital *worse than linear

@ambinintsoahasina 5 месяцев назад

The Gojo reference made me shout out loud like a little fangirl :')

@XashA12Musk 5 месяцев назад

can you give me the original source of that anime edit ?

@ambinintsoahasina 5 месяцев назад

@@XashA12Musk the manga is Jujutsu kaisen. He took from multiple parts. You can check chapter 75 for the "throughout the heavens and the earth" and chapter 221 for the "nah, I'd win"

@razieren7025 5 месяцев назад

this is crazy, everytime i click i think im going to watch a fireship vid

@pattyspanker8955 5 месяцев назад

bait

@lordm31 5 месяцев назад

throughout youtube clickbait and interesting facts you are the honored one

@avizi_ 5 месяцев назад

The last thing I expected was a jjk edit

@Artorias920 5 месяцев назад

awesome breakdown. When the other AI hype channels asked bycloud if he could go head to head with their surface level analysis, bycloud responded "Nah, I'd win" (DEEPFRIED BASS)

@david_n_nettey 5 месяцев назад

I was not expecting to get a lobotomy while watching an LLM news video today...

@dualasus12 4 месяца назад

Ngl the thumbnail tricked me, thought it was a fireship video, but it worked lol and I’m still watching.

@sarveshpadav2881 4 месяца назад

same!!

@MemesnShet 5 месяцев назад

Yes but how much would it cost to port something like GPT4 to Mamba or if they even can or they'd have to start from scratch? It's probably wont be the only architecture to come out so i imagine OpenAI are waiting for something that is very clearly way better in almost all categories compared to transformers

@wakannnai1 5 месяцев назад

They'd probably have to start from scratch. However it could take much less time to train. If it takes 1/1000 of the time to train with similar real world performance, it might become worth it. Transformers are proven while novel architectures like mamba are not. OpenAI is selling chatgpt after all so it may not be worth it for them.

@JazevoAudiosurf 5 месяцев назад

i have some hope for byte mamba but the architecture has drawbacks and seems more like an intermediary step before something greater that builds on it

@user-wr2cd1wy3b 5 месяцев назад

My greatest hope is that is can really run 6B models like they're 2B, was that for train or for actually running them? If it's for running them, then even the 40B param issue won't matter for local models, most consumer computers would gladly take 40B models that run like they're 20s

@bazookaman1353 5 месяцев назад

I'm not sure if everyone reading this knows, though with cloud's audience it's probably most. Going from something exponential to something linear is a GODSEND. The title says 1000x, take that as you will but even if it's just 10x, due to how exponentials work this would still save much way more than 1000x in the future because if it exponentially increased the computational costs would go way out of control, but with linear it's way WAY more manageable. If this is real, it will be a complete game changer.

@tyler.walker 5 месяцев назад

Technically, LLM context length increases quadratically, not exponentially.

@verigumetin4291 5 месяцев назад

don't get your panties wet. It might just be smoke until tested to see if it works with extremely large models. But the thing about it understanding vision better, can't they just codify data into visuals and then have the LLM train on that, then you build another LLM that can translate the input and output and the problem is solved? But then again, even with the vision based MAMBA, large models still haven't been tested so who knows.

@ckpioo 5 месяцев назад

@@verigumetin4291the thing with that method is that yes it's possible but problem is similar to moe

@mati_5555 5 месяцев назад

Just wait until they discover Mamba No.5. There will be no going back...

@EvertvanBrussel 5 месяцев назад

I just want to see a 7B mamba model trained on the same data as a 7B transformer model and get to try them both and test them on certain abilities.

@megachelick 4 месяца назад

many of us want it too

@Taddy_Mason 5 месяцев назад

Ngl, my bro is the Jay-Z of LLM education...out here dropping bangers.

@Steamrick 5 месяцев назад

Dangit, now I have Lou Bega's Mambo No. 5 stuck in my head!

@rileyretzloff8778 5 месяцев назад

this was the most entertaining and somehow equally educational llm/ai video i’ve ever watched

@gemstone7818 5 месяцев назад

that was a great part at 7:21

@chamba149 5 месяцев назад

If you were one of the professors at my school I would never miss a class lol. You are great at breaking down concepts and making it funny. Keep up the good work!

@andrewshort6440 5 месяцев назад

Such a great video, and FUNNY! Glad you're making these

@qwertyuuytrewq825 5 месяцев назад

finally understood why AI cannot tell how many letters N word banana contains )

@Hollowed2wiz 5 месяцев назад

damn, I just tested it with gpt4, and it said that there are 3 n in banana. It's so funny 🤣

@qwertyuuytrewq825 5 месяцев назад

@@Hollowed2wiz what is interesting is that gpt4 can spell this word one letter at time if asked and then give right answer about letter N count. So it seems that despite tokinezation gpt4 knows something about spelling...

@AlanMeigs 5 месяцев назад

Woooooowwww, 8ish minutes in was a mic drop I didn't expect. First time here, not the last.

@hoodie_cat 5 месяцев назад

You gained a like due to the high-quality of your video, however, when the JJK edit dropped, you gained my subscription and my worship. 🛐

@pareak 5 месяцев назад

Your videos are just the best. Humor and knowledge in its best combination

@zainkhalid3670 5 месяцев назад

7:22 Fire 🔥🔥🔥🔥🔥

@joannot6706 5 месяцев назад

I alone am the subquadratic one.

@ginqus 5 месяцев назад

These comments make me feel stupid 😭 All I understood is that mamba is faster than traditional transformer thing (the party explanation was awesome, thank you), and that mamba abandons the tokenization and uses... something else instead... I kinda wish you would summarize the video at the end for silly creatures like me But for now it's time to rewatch everything!! 🥳

@Deductive 5 месяцев назад

This video is where I will start with my thesis.

@KodandocomFaria 5 месяцев назад

what if there is a 70B mamba ? it can surpass existing ones? I don't saw in any place comparations where they compare mamba 70b with any other big model. perhaps it would be a decisive analysis to see how it performes

@Spyblox007 5 месяцев назад

I assume that bigger models are still in the works. Most attention is on transformers-based models, so the money and resources to train a 70b model for Mamba are taking longer to gather. I'm definitely looking forward to seeing what becomes of it though.

@nosferiazafora 5 месяцев назад

This whole video is Gold.

@pebre79 5 месяцев назад

Great content. Keep up the great work! I subbed

@salvogulizia421 5 месяцев назад

the jjk edit killed me, bro i love you

@lilstringcheese 5 месяцев назад

I cannot explain how much I enjoyed that edit

@johngayman4100 5 месяцев назад

Wtf did I just watch? This is like the best ever.

@nullbeyondo 4 месяца назад

I expected "Are your hidden states linear because you're a State Space Model? Because I can't seem to figure out your next move." :D

@berkeokur99 5 месяцев назад

Bro the JJK edit is superb

@deeplearning7097 4 месяца назад

Very nice. Thank you.

@Cdaprod 5 месяцев назад

Thanks for the papers in the description, I just put them a urls=[] and hydrated my s3 and vdb with them 5:42 😎

@sla8tful 4 месяца назад

I have no idea how LLMs work and I still understood some of it and the implications. Which is to say, this is a great video my dude.

@aminzarei1557 5 месяцев назад

Sooner or later we would need a bytes-level model architecture for multi-modality. Hope the test result for this one be good🙏 . Cool video btw 👌

@maxvg9161 5 месяцев назад

Great video, I already heard about Mamba, but didn’t get into it myself! The lobotomy Kaisen edit went really hard haha. Any chance you will be making a video about Liquid Neural Networks? Keep up the good work :)

@emmanuelikhide8998 Месяц назад

Yoo that LLM'S Anime edit hit hardddd😂😂

@arjundeshmukh8773 5 месяцев назад

I just wish to be this talented- amazing video

@yash1152 4 месяца назад

5:02 i am all in for non-sound memes. at least it doesnt make me a weirdo when watching without earphones in an open space.

@huangjason6557 5 месяцев назад

Didn't expect to see some jojo and gojo references in a AI model video, awesome!

@absence9443 5 месяцев назад

The edit made me understand than without :D

@chfr 4 месяца назад

Cool reference to the router edit

@Dr_Birthday 4 месяца назад

The LLM anime edit earned my subscription

@janniksco 5 месяцев назад

Thanks!

@chanxo643 5 месяцев назад

the JJK edit was INSANELY funny!

@SianaGearz 5 месяцев назад

As an electrical engineer: REAL TRANSFORMERS HAVE WINDINGS.

@pablitocodes 3 месяца назад

I've been attending parties with the mamba method forever. I may have been doing that wrong.

@DanteS-119 5 месяцев назад

The editing in this video is chef's kiss for meme connoisseurs

@Gatrehs 5 месяцев назад

This and the LPU that Groq uses are going to be insane together.

@julianvandenberg2002 5 месяцев назад

That thing was still created purely for a Transformers style model. They will need to make a new arch

@svendpai 5 месяцев назад

the edit made my day

@bloodcraver9183 5 месяцев назад

I would not have commented if it wasn't for that "nah I'd win" Mamba edit

@magistrcooldayn233 5 месяцев назад

Damn this edit was fire

@anren7445 2 месяца назад

the LLM anime edit was fucking gold

@cefcephatus 5 месяцев назад

I imagine Transformer guys are seeing MAMBA as a big 2 way transformer and yank it in the Transformer architecture forming multi-architecture transformer model.

@jefersonlemos4135 5 месяцев назад

yeah, I understand you mamba, I too have a "lost in the middle" problem

@jasonhemphill8525 5 месяцев назад

13:15 that’s the most passive paper name I’ve ever seen

@griffingibson4389 5 месяцев назад

The JJK bit was great

@joey199412 5 месяцев назад

I wonder if OpenAI switches to Mamba architecture if they will drop the "GPT" branding since technically the T will not apply anymore. I wonder if "GPT" will be like how Boomers would call every game console a "nintendo" and just used by the mainstream to mean every LLM, no matter the underlying architecture.

@LowestofheDead 5 месяцев назад

Lol Nintendo, but doesn't GPT = General Pre-Trained, not "-Transformer"?

@lordkacke6220 2 месяца назад

Bro. You make these LLM videos so interesting and funny. How do you come up with these? Keep it up

@ChristopherCricketWallace 5 месяцев назад

A MA ZING editing.

@BosonCollider 5 месяцев назад

Even if this doesn't replace transformers, this looks like a very promising way to replace tokenization/word vectors by having a layer read the bytes and output vector tokens

@danisekoi 5 месяцев назад

that gojo edit has to be the hardest thing ive seen in years wtf were you on when that came to your mind

@princejona 5 месяцев назад

GOD LEVEL EDIT, WHO IS THIS RU-vidR?

@codersama 5 месяцев назад

bro the content on this channel is for such a niche audience only 5 people on RU-vid will get all the memes and understand the science

@ln2deep 4 месяца назад

Dude... We are dangerously close to a bearish breakout.

@AlexLuthore 5 месяцев назад

That edit went so fucking hard lol😂

@danzmachinz2269 5 месяцев назад

At the start of the vid , the guy says.. "That's optimum pride! mother's v@gina! optimum pride!" You're welcome 😅

@MoDs_3 5 месяцев назад

What was so sick ! 😂😂 👏

@jaywessex5818 4 месяца назад

Dude that was such a sick anime cut in clip. How did you make that? D that script? All ai?

@GeekOverdose 5 месяцев назад

Lobotomy kaisen edit was PEAK

@jichaelmorgan3796 5 месяцев назад

Imagine if the middle eastern investors invested trillions in obsolete technology

@question_mark 5 месяцев назад

bro that anime thing was just incredible

@nicdemai 5 месяцев назад

Never bet against self-attention! 🗿

@sinayagubi8805 5 месяцев назад

Bro I was not prepared

@tristotech 5 месяцев назад

I do love Mamba for faster token/sec. But there's still a long road to make it able to extract key information from long text. For now it still feels like Bart or Gemma 2b for short prompt

@richardnunziata3221 5 месяцев назад

Without a serious upscaled mamba it will be going no were expect for niche areas

@Ch0s0Kam0 5 месяцев назад

7:22 NOOOOOOO NOT AGAIN

@phizc 5 месяцев назад

5:39 It's only 1000 times cheaper because the price was per 1000 tokens. If it was per 4000 tokens, it would get 4000 times cheaper, and so on. 😊

@RossPfeiffer 5 месяцев назад

Big hopes for mamba

@jarkkop1004 5 месяцев назад

VMamba has updated the scaling chart at 9:35. Performance keeps increasing with increased model size, but not much

@Kinatera. 5 месяцев назад

the video was good and i liked your style, but then the JJK edit dropped actual fire 🔥 i hope you don't mind me reposting the JJK edit section of your video tell me if you want me to take it down