How Did Open Source Catch Up To OpenAI? [Mixtral-8x7B]

Подписаться 160 тыс.

Просмотров 169 тыс.

50% 1

Sign-up for GTC24 now using this link!
nvda.ws/48s4tmc
For the giveaway of the RTX4080 Super, the full detailed plans are still being developed. However, it'll be along the line of you taking a photo of yourself attending a GTC virtual session, so you can sign-up to the conference now to set an early reminder!
What is Mixtral8-7B? The secret Mixture of Experts (MoE) technique that has beaten OpenAI's GPT-3.5 which was published around a year ago? In this video, you will learn what is Mixtral8x7B and how Mixture of Experts work which made them the new rising standard of LLM format.
Mixture of Experts
[Paper] arxiv.org/abs/...
[Project Page] mistral.ai/new...
[Huggingface Doc] huggingface.co...
This video is supported by the kind Patrons & RU-vid Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] massobeats - waiting
[Profile & Banner Art] / pygm7
[Video Editor] @askejm

Опубликовано:

21 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 264

@bycloudAI 8 месяцев назад

Sign-up for GTC24 now using this link! nvda.ws/48s4tmc RTX4080 Super Giveaway participation link: forms.gle/2w5fQoMjjNfXSRqf7 oh and check out my AI website leaderboard if you're interested! leaderboard.bycloud.ai/ Here's a few corrections/clarifications for the video: - it's named 8x7B but not 56B is because it's duplicating around 5B of Mistral-7B's parameters 8 times (=40B), then there's the extra 7B for attention and other stuff (40B+7B=47B) - experts are not individual models, but only FFNs from Mistral-7B - experts operates not just on a per token level, but also per layer level. Between each feed-forward layer, the router assigns that token to an expert, but each layer can have a different expert assignment. - Router is layer based, and it decides which expert for that next layer to activate, thus it looks different over each layers 3:26 and you can refer to this diagram imgur.com/a/Psl1Fi4 an example process of activations that might happen for any given token, and it's picking a new set of 2 experts for every layer special thanks to @ldjconfirmed on X/twitter for the feedbacks

@Psyda 5 месяцев назад

Who won?

@MCasterAnd 8 месяцев назад

I love how OpenAI is called "Open" yet it's just closed source

@boredSoloDev 8 месяцев назад

IIRC it was originally an idea for a non profit open AI system... But they realized how much mf money they could make, and now it's a private for profit company that's being rid by Microsoft n

@lemmyboy4107 8 месяцев назад

@@boredSoloDevIf nothing changed then openai is still a none profit but has a child company for profit which then in theory funds the none profit.

@MrGenius2 8 месяцев назад

it started open thats why it's like that

@caty863 8 месяцев назад

@@lemmyboy4107 Name one project they have that is "open" and then go on with the rambling; oh, they are a non-profit!

@lemmyboy4107 8 месяцев назад

@@caty863 wth is your problem? I answerd a guy who said that they were none profit. I answerd that the parent is non profit and a child company is for profit. Nothing to do with open source my guy. "for profit" and "none profit" are buisness terms and have massive legal implications. They are not decided by if they have a "open source project" and they cant just name themself none profit either.

@alexander191297 8 месяцев назад

Kinda like GLaDOS with its cores, where each model is a core. So maybe, Portal had some prescience regarding how modern AI will work!

@gustavodutra3633 8 месяцев назад

No, Aperture Science is real, Borealis must be too, so, Half-Life 3 confirmed???? @@xenocrimson Actually, I think someone at Valve knew about some stuff about AI and decided to put some references in Portal.

@Mikewee777 8 месяцев назад

GladOS combined emotional personality cores with the remains of a corpse .

@choiceillusion 8 месяцев назад

Gabe Newell talking about BCI brain computer interfaces 5 years ago is a fun topic.

@starblaiz1986 8 месяцев назад

SpaAaAaAaAaAaAaAaAace! 😂

@aloysiuskurnia7643 8 месяцев назад

@@gustavodutra3633 Cave Johnson is real and he's been watching the future the whole time.

@andru5054 8 месяцев назад

1/8th of neurons is me as fuck

@Mkrabs 8 месяцев назад

You are a mixture, Harry!

@zolilio 8 месяцев назад

Same but with single neuron

@dontreadmyusername6787 8 месяцев назад

That means you have excellent computation time I usually lag when prompted due to memory intensives background tasks (horrible memories) that keep running all the time even since my neural net is mostly trained on traumatic experiences

@dfsgjlgsdklgjnmsidrg 8 месяцев назад

@@Mkrabs gaussian mixture

@kormannn1 7 месяцев назад

I'm 8/8 neurons with Ryan Gosling fr

@MonkeyDLuffy368 8 месяцев назад

Mistral/Mixtral is not open source, it's open weight. We have no idea what went into the model nor do we have their training code. There is quite literally no 'source' released at all. It's baffling to see so many people call these things 'open source'. It's about as open source as adobe software.

@ghasttastic1912 8 месяцев назад

open weight is still better than openai. you can theoreticlaly run ti on ur pc

@theairaccumulator7144 8 месяцев назад

@@ghasttastic1912just need a workstation card

@ghasttastic1912 8 месяцев назад

@@theairaccumulator7144 lol

@thatguyalex2835 7 месяцев назад

@@theairaccumulator7144Well, Mistral Instruct 7B runs fine on my 2018 laptop computer CPU (95-99% of my 8 GB of RAM is used), but 8X7B will probably not.

@brianjoelbasualdo7436 8 месяцев назад

Makes sense "Mixture of Experts" is a powerful model. In some parts of the human (and other superior mammals) cortex, neurons adopt a configuration where groups of neurons are "relatively separated" from others, in the sense that they conform separate networks which aren't much connected between them. In some books (pardon me for my english, I dont know the specific term), they are reffered as "cortical cartridges" ("cartuchos corticales", in spanish), since they have a "column" disposition, where each column/network is by the side of others. This is the case for the processing of visual information in the context of guessing the orientation of an object. Different cartridges "process" the image brought by the retina, tilted by some degree. One cartridge processes the image tilted by 1° The next one by 3° Etc... This way we can generalize the concept of orientation for a certain object, in such a way that no matter it's orientation, we can still recognize that object. For example, this allows us to be able to recognize the same tea cup in different orientations, rather that seeing the same teacup in different orientations and thinking they are two distinct ones. I am not a software engineer, but it amazes me how the most powerful models are often the same ones used by biology.

@ErikPelyukhno 7 месяцев назад

Fascinating!

@thegiverion3982 8 месяцев назад

>Mom can we have fireship? >We have Fireship at home. >Fireship at home:

@Steamrick 8 месяцев назад

Isn't GPT-4 basically 8x220B? I read that it's composed of eight very chunky LLMs working together. I have no idea how the input and output are generated between the eight of them, though, so there could be huge differences between how GPT-4 and Mixtral-8x7B work and I wouldn't know.

@Words-. 8 месяцев назад

Yeah, its been rumored for a long time that GPT 4 already has mixture of experts...

@HuggingFace_Guy 8 месяцев назад

speculated that it's moe but not sure about that

@w花b 8 месяцев назад

Since they don't wanna reveal their secrets, it's hard to know

@michaeletzkorn 8 месяцев назад

@@Words-.it’s open information that DALLE3 whisperer and other models are included alongside GPT for ChatGPT4

@petal9547 8 месяцев назад

@@michaeletzkornThat makes It multimodal . Moe is a different thing. We know it's multimodal and it's very likely it's Moe too.

@iAkashPaul 8 месяцев назад

Mixtral needs ~28GB VRAM as a FP4/NF4 loaded model via TGI/Transformers. So you can attempt to load this with a 4090 or lesser using 'accelerate' & a memory map config for plugging in system memory as well as VRAM in tandem.

@kiloabnehmen2592 2 месяца назад

i ran it with ollama on my gaming laptop with just 2GB of vram, its super slow but it works

@iAkashPaul 2 месяца назад

@@kiloabnehmen2592 Nice, Mixtral is overkill for most pedestrian use case, you should definitely get better perf with the latest 7B Instruct 0.3, has longer content & better tool calling support

@diophantine1598 8 месяцев назад

To clarify, Mixtral’s experts aren’t different entire models. Two “experts” are chosen for every layer of the model, not just for every token it generates.

@FelixBerlinCS 8 месяцев назад

I have seen several videos about Mixtral before and still learned something new. Thanks

@peterkis4798 8 месяцев назад

To clarify Mixtral implements layer wise experts/routers and picks 2 of them based on the router output for every forward pass to generate a new token. That means maybe layer1 runs expert 4,5 but layer 2 runs 6,2 etc.

@titusfx 8 месяцев назад

What amazes me is how it's not discussed that, in the end, with the GPTs, we are training OpenAI's expert models because they use an MoE architecture. Now, with the mentions, we are training the model that selects the expert... a free community for a closed model. Everyone knows that they use the data to train models the small difference is the MoE architecture means several experts models not just one

@Slav4o911 8 месяцев назад

That's why I don't use any of OpenAI models, I'm not going to pay them and then train their models for free.

@XrayTheMyth23 8 месяцев назад

@@Slav4o911 you can just use gpt for free though? lol

@Slav4o911 8 месяцев назад

@@XrayTheMyth23 I don't have any interest to waste my time with brain dead models, because that's what "GPT for free" is . And I hope that your suggestion was just a joke.

@CMatt007 8 месяцев назад

This is so cool! A friend recommended this video, and now I'm so glad they did, thank you!

@clarckkim 8 месяцев назад

An addition that you didnt probably know, GPT4 is a mixture of Experts. exactly 16 gpt3. That was leaked in August.

@MustacheMerlin 8 месяцев назад

Note that while we don't _know_ for sure, since OpenAI hasn't said it publicly, it's pretty generally accepted that GPT4 is a mixture of experts model. A rumor, but a very credible one.

@colonelcider8292 8 месяцев назад

AI bros are too big brain for me

@GeorgeG-is6ov 8 месяцев назад

you can learn, if you care about the subject, educate yourself.

@ai_outline 8 месяцев назад

Study computer science bro! You’ll love it!

@colonelcider8292 8 месяцев назад

@@GeorgeG-is6ov no thanks, I'm not learning another course on top of the one I am already doing

@w花b 8 месяцев назад

@@colonelcider8292be greedy

@w花b 8 месяцев назад

@@colonelcider8292be greedy

@abunapha 8 месяцев назад

what is that leaderboard you showed at 0:54? I want to see the rest of the list

@Nik.leonard 8 месяцев назад

Running mixtral (or dolphin-mixtral) on cpu+gpu is not that terrible. I've got 5-7 tokens per second on a Ryzen 5600x (64gb DDR4 3200 ram) + RTX 3060 12gb with 4bit quantization. I consider that "usable", but ymmv.

@dockdrumming 8 месяцев назад

The mixtral8x7b model is really good. I have using it with Python code to generate stories. It is also rather fast. I am seeing somewhat quick inference times on Runpod with 4 A40 GPUs.

@amafuji 8 месяцев назад

"I have no moat and I must scream" -OpenAI

@tonyhere7004 8 месяцев назад

Hah excellent 😂

@guncolony 8 месяцев назад

This is super interesting because one could envision each expert eventually being hosted on a different cluster of machines in the datacenter. Hence you only need enough VRAM for a 7B model on each of the machines, meaning much lower cost, but the entire model performs as well as a 56B model.

@DanielSeacrest 8 месяцев назад

Well a few slight inaccuracies here. For example, MoE is not multiple models joined together, that is a common misunderstanding. A problem here though is it can be multiple models stitched together, but that is not the original intention of MoE. The whole point of MoE is to reduce the number of parameters at inference, not to have multiple different domain specific models working together. A MoE model like GPT-4 is one model, just during pretraining specific sections (or experts) of that model specialised in specific parts of its large dataset. I definitely think the "experts" of MoE really tripped people up as well, as I said they were not suppose to be seperate domain specific models but just different sections of one model specialised to a specific part of a dataset. In Mixtral's case to my understanding they instantiated several Mistral-7B models and trained them all on a dataset with the gating mechanism in place, but the problem here is that there are a lot of wasted parameters that have duplicated knowledge from the initial pertaining of the 7B model. It would be a lot more efficient to train a MoE model from scratch.

@oscarmoxon102 8 месяцев назад

This is such an interesting explanation. Thank you.

@DoctorMandible 8 месяцев назад

Open source didn't so much catch up as closed source temporarily jumped ahead. Open source was the leading edge before OpenAI existed. And now we are the leader again.

@johnflux1 8 месяцев назад

Hey. I want to highlight that the router is choosing the best two model **per token**. So for a single question, it will be using many (usually all) the models. You do say this in the second half of the video, but in the first half you said "the router choses which two models to use for a given question or prompt". But the router is chosing which two models to use for each token.

@starblaiz1986 8 месяцев назад

Not only that but it's also making that choice **per layer** for each token too. So one token will also have many (often all) experts chosen at some point in the token generation.

@Mustafa-099 8 месяцев назад

The sprinkle of random memes makes your content fun to watch :)

@bibr2393 8 месяцев назад

afaik vram requirements are not that high for mixtral. Sure it's not 13B level. You can run at 6bpw elx2 (quant around Q5_k_m for gguf file type) with 36 gb of vram. so an rtx 3090 + rtx 3060.

@JorgetePanete 8 месяцев назад

I like your funny words, magic man

@Octahedran 8 месяцев назад

Managed to get it running with 20 GB of vram, Although just barley. It could not do a conversation without running out and had to do it on the raw arch terminal

@MINIMAN10000 8 месяцев назад

@@JorgetePanete 6bpw means 6 bits per weight, a LLM is a collection of weights. exl2 refers to exllama2, like llama.cpp it is used to run the llm, only faster and more smaller ( not sure how that works but it does ) quant, short for quantization refers to shrinking the size of the weights, basically chopping off a bunch of information in hopes that none of it was important, usually works fine depending on how much you chop off. Q5_k_m Q5 meaning 5 bits per weight as opposed to the 16 bits precision commonly used in training. from what I can tell _k ( refers to k quants apparently means clustering ) _s/m/l refers to small medium and large. in increasing size, from what I can pull up it increases the precision of "attention.wv and feed_forward.w2" which play a large part in quality. GGUF is a file type, created specifically to shove an entire collection of files we used to have into one single file.

@brainstormsurge154 8 месяцев назад

This is the first time I've heard anything about a mixed network model it's very interesting just on what's presented here. At 3:38 you talk about how the router chooses several experts based on context rather than subject. I'm curious if that actually does make it work better than having just one expert for the given subject, such as programming, than if it was able to use subject only. Would be interesting if someone was able to get the model to behave that way and have the subject model compete with the current context model to see which one performs better. Makes me think about how our own brain compares. While I know this is still run on regular hardware and not neuromorphic hardware, it's getting there soon, it would be interesting nonetheless.

@chri-k 8 месяцев назад

It might be better to combine both and use two different sets of experts and two routers

@itisallaboutspeed 8 месяцев назад

as a car content creator i approve this video

@thatguyalex2835 7 месяцев назад

Mistral's smaller model, 7B had some bad things to say about German cars though. So, maybe you can use AI to help diagnose your cars. :) This is a small snippet of what Mistral 7B said. BMW X5: Issues with the automatic transmission and electrical system have been reported frequently in the 2012-2016 models. Mercedes C-Class: Several complaints about electrical problems, particularly with the infotainment system and power windows in models from 2014 to 2020.

@itisallaboutspeed 7 месяцев назад

@@thatguyalex2835 I Have To Agree Man

@aviralpatel2443 8 месяцев назад

considering mixtral came before the launch of gemini-pro-1.5(also uses MoE method) and is open source, is it safe to assume that google might have taken the inspiration from this open source model? If they did, dang the open source ai models are upping their game pretty quickly.

@davids5257 8 месяцев назад

I don't really understand the topic too much but it sound to me very revolutionary

@hobocraft0 8 месяцев назад

We really need a paradigm shift, where we're not having to multiply huge sparse matrices together, where the router is part of the architecture itself, kind of like how the brain doesn't 'run' the neurons it's not firing.

@vhateg 8 месяцев назад

Yes, but if this world was a simulation, the simulation would need to run the brain neurons that don't fire too, because it would have no way to know how they would behave. A computer is the simulator of a brain, not the simulation itself (that is the LLM) But, there definitely can be optimizations to remove parts that are unused, as you said. I agree with you that there should be some huge shift. If sparser matrices is not enough, then removal of neurons might be the next step. It would change the topology of the network dynamically, and that is just too much to even imagine. 😂

@capitalistdingo 8 месяцев назад

Funny, I thought I had heard that there were cross-training advantages whereby training models with data for different, seemingly unrelated tasks improved their performance but this seems to suggest that smaller models with more focus are better. Seems like a bit of a contradiction. Which view is correct?

@MINIMAN10000 8 месяцев назад

So the answer is both are true if we exclude the term "best" in the sense that "no one defines best in LLMs" the field changes too much no one would say which is best. Certain things like programming have shown to increase a model's ability to adhere more strictly ( programming fails on any mistake but is very well structured ) it improves the model universally not just in programming because of this behavior. Mixtral is 47B and is impressive, however Mistral is 7B and is impressive for its size and Miqu shows that mistral medium is 70B which is again impressive for its size, so we can't conclude one way or the other if mixtral is disproportionately good. But what it did prove is that MoE works without a doubt.

@DAO-Vision 8 месяцев назад

Mixtral is interesting, thank you for making this video!

@luciengrondin5802 8 месяцев назад

Shouldn't a nn, by essence, be able to do something like that? I mean shouldn't the training process naturally result in a segmentation of the network into various specialized areas?

@ai_outline 8 месяцев назад

Very good take! I’m wondering the same… but it makes sense if you ask me! For instance, think of residual learning. ResNet was introduced with the idea that skip-connections can alleviate the training since it is easier to learn the difference between the input and output than to learn the full mapping. Well a neural network in essence could learn that difference, but without the skip-connection the number of possible computational paths possible is enormous! The skip-connection enables to enforce such residual learning :D the same is analogous for mixture of experts, now you enforce specialised areas based on conditions (input)! Sorry if you don’t have a Computer Science background. I can try to explain in a different way to help you understand :)

@borstenpinsel 8 месяцев назад

Why? In our brains, certain skills being associated with certain parts surely is no function of a neural network. If it was, everybody's brain would look different (light up in different parts in whatever tech they use, in order to be able to tell that speech is here and music is there). So isn't it more likely that evolution "decided": "the input from the eyes goes here, the input from the ears goes here..and here be bridges to combine the info"? Instead of dumping every single nerve impuls into a huge network and expecting some sort of stream?

@shouma1234 8 месяцев назад

You’re right, but I think the point is the performance advantage of only having to run 2/7 of the model at a time instead of the whole model every time. More performance means you can make a bigger model in the long run at less cost

@MINIMAN10000 8 месяцев назад

@@shouma1234 I assume you mean higher performance at a lower runtime cost. Because it still has the same training cost and the same size. You just iterate over a small portion of the whole model at a time. This means lower cost per token at runtime.

@THEWORDONTECH 7 месяцев назад

I was going to hit the subscribe button I'm already a subscriber. Solid content!

@diophantine1598 8 месяцев назад

You should cover Sparsetral next, lol. It only has 9B parameters, but has 16x7B experts.

@grabsmench 8 месяцев назад

So a bot only uses 12.5% of their brain at any given moment?

@realericanderson 8 месяцев назад

1:18 the cod interface for the different experts got me good thanks cloud

@saintsscholars8231 8 месяцев назад

Nice quirky explanation, thanks !

@JuanBojorquezmolina 7 месяцев назад

Wow this mistral model is impressive.

@ipsb 8 месяцев назад

I have a question, do pareto principle still holds true when it comes to these LLMs ?

@David-lp3qy 8 месяцев назад

This smells like how sensory organs are specifically innervated to lobes dedicated to processing their particular modality of information. I wonder if having actual specialized experts would yield superior results than the current model

@planktonfun1 8 месяцев назад

Looks like they used an ensemble learning approach, not surprised since its mostly used in competitions

@razzledazzlecheeseontoast9808 8 месяцев назад

Experts seem like the lobes of the brain. Different specialities working synergistically.

@animationmann6612 8 месяцев назад

I hope that in Future we need less VRAM for better AI so we can actually use them in our Phones.

@Faceless_King-tc7kt 8 месяцев назад

what was the chart comparing the AI performances? may we have the link

@waterbot 8 месяцев назад

gtc is goona be hype this year

@AngryApple 8 месяцев назад

mixtral works incredible well on Apple Silicon. I use an 64GB M2 Max

@drgutman 8 месяцев назад

It didn't ... mixtral rickrolled me while working with it. I was talking with it in lm studio about some python code and after a few exchanges, at the end of a reply I got a youtube link. I thought, ohh it's a tutorial or something. Nope, Rick Ashley - Never gonna give you up. so, yeah. powerful model, but tainted (no, it didn't solve my coding problem)

@kipchickensout 8 месяцев назад

I can't wait for these to run without too high hardware requirements, as well as offline... as well as without any black box restrictions (looking at you, OpenAI). They should be configurable

@JoeD0403 8 месяцев назад

The problem with AI at the moment is the technology is ahead of practical usage. If social media existed in the early 80s, there would be videos made about every single PC model coming out, even though the most popular “computer” was the Atari 2600. The bottom line is, you need lots of proprietary data for the AI to process in order to generate any real value that can be monetized.

@holdthetruthhostage 8 месяцев назад

They will be launching medium soon which is even more powerful

@potpu 8 месяцев назад

How does it compare with the miqu model in terms of architecture and running it on lower spec hardware?

@Slav4o911 8 месяцев назад

Miqu is better than Mixtral, they say it should be almost as good as GPT4 when ready. But it's still not ready, what was "leaked" by accident is more like an alpha version. I think the open source community will reach GPT4 quality about 4 - 6 months from now. The open source community is much more motivated than the people who are working for open AI so I have no doubt we the real open source community will outperform them max 1 year from now, and yes I think by GPT5 we will be much close than now. Once the open source community outperforms the closed models, there is no going back, these closed models would never have a chance to catch up.

@Romogi 8 месяцев назад

All those layoffs will give many software developers plenty of time.

@sorucrab 8 месяцев назад

Damn I signed up but how am I going to follow along? I don't have an Nvidia GPU?

@LumiLumiLumiLumiLumiLumiLumiL 8 месяцев назад

Mistral Medium is even more insane

@favianeduardo4236 7 месяцев назад

We finally cracked the code people!

@exploreit7466 8 месяцев назад

I need 3d inpaining as soon as possible but its not working properly please make that video again

@deathshadowsful 8 месяцев назад

This router choosing an expert to maximize likelihood feels like a recursion of what makes the small models. This is all imaginative right now but it feels like thats how neurons would work in the brain too. What if this just kept on going and folding onto itself

@mateoayala9569 7 месяцев назад

The mistral mediums seems very powerful.

@seansingh4421 8 месяцев назад

I actually tried the Llama models and let me tell you nothing even comes close to GPT-4. Only Llama 70B is somewhat alright

@Shnugs 8 месяцев назад

So what happens if the number of parallelized experts are ramped up to N? What happens if there are layers of routers? Where does the performance plateau?

@tiagotiagot 8 месяцев назад

I'm not sure if if I'm misunderstanding something and not actually doing what I think I'm doing, but I have managed to make the gguf variant run with 16GB VRAM

@juanjesusligero391 8 месяцев назад

GGUF models are quantized models. That means less precission, hence less quality in the answers of the model (it's kinda like having less decimal numbers in a division). But the good part is that they require lower specs :)

@tiagotiagot 8 месяцев назад

@@juanjesusligero391 Ah, right, forgot about that detail

@pajeetsingh 8 месяцев назад

Every model using google’s tensorflow. That’s some solid monopoly.

@orbatos 8 месяцев назад

This doesn't answer the stated question, so I will. The open source community currently includes most AI researchers and their advancements, is quicker to react to new developments, and isn't hiding implementations from eachother for profit. This could change when companies start paying better, but the incentives are also different, corporate development is about displacing labour costs and shifting responsibility, not the future of technology.

@JamesRogersProgrammer 7 месяцев назад

Mixtral 7x8b runs fast on cpu. I am running a 5 bit quantized version on two different machines with no GPU but with 64GB of RAM and getting great performance out of it. Using the mixtral version of llama.cpp.

@garrettrinquest1605 8 месяцев назад

Wait. Can't you just run Mixtral-8x7B on any machine with a decent GPU using Ollama? I thought you only needed something like 8-10 GB of VRAM to have it run well

@Slav4o911 8 месяцев назад

No it gobbles about 36GB RAM + VRAM on my machine.... it's not a 7B model but 8x7B.... so it's actually heavier than a "normal" 32B model. I personally don't like it too much it's too slow for me, there are faster models with similar results.

@ImmacHn 8 месяцев назад

This is what we call distributed computing power. Lots of people solving small problems create a spontaneous order that far surpasses centralized organization.

@initialsjd5867 4 месяца назад

I have been running Mixtral on a surface laptop 5 with 32gb off ram core i7 12th gen and no dedicated gpu for a while now, it runs Fedora 39, the first prompt always generates quite fast, but the more you ask it in a single terminal windows the slower it gets over time, but just e new window fixes that. though i'm going to try it now on a 4080 super core i7 13th gen with 32gb ram, Fireship has a video about it in which he also goes over how to easily install it, the minimum is 32gb off ram i would say although i noticed Fedora linux doesn't use nearly as much ram in the background as windows 11, maybe a tenth.

@dishcleaner2 8 месяцев назад

I ran a GGUF quantized version on my rtx 4090 and it was comparable to quality and speed of ChatGPT when it first came out.

@Eric-yd9dm 8 месяцев назад

An rtx giveaway? Wouldn't a good bar of solid gold be cheaper?

@michael_hackson_handle 8 месяцев назад

Lol. I thought ChatGPT worked in the way of Mistral - having a router to pick what part of neurons to use for each topic. O_o Since it makes more sense.

@aniksamiurrahman6365 8 месяцев назад

Let's see how long does it takes for Mixtral to become a for profit company.

@szymex22 8 месяцев назад

I found some smaller version of mixtral and it runs ok on CPU.

@SkyyySi 8 месяцев назад

4:08 Then... quantize it? It will be 4x smaller and somewhat faster, while losing about 1% - 2% in quality. A hell of a deal if you ask me.

@romainrouiller4889 8 месяцев назад

@Fireship brother?

@amykpop1 8 месяцев назад

Finally, hopefully OpenAI will stop their overly exaggerated "safety testing" and be pushed to release their newer models faster. I'm not saying that there should not be any safety testing at all, but trying to only please some useless bureaucrats who know nothing about AI and LLM's will just slow the process of developing accessible AGI.

@turbogymdweeb184 8 месяцев назад

Inb4 gov's banning the personal development of LLM's and other AI tools and will only allow certain organizations who either know someone in the government or abide by unrealistic safety standards lmao

@JorgetePanete 8 месяцев назад

LLMs*

@LowestofheDead 8 месяцев назад

@@manitoba-op4jxOpenAI don't really care ethics, or it's very unlikely that they do. Think about it, by refusing to release their models (supposedly because of "bias") they are now one of the most valued startups in the world at around $100Bn. Do you really think they just happened to care about ethics when it was the most financially convenient? They're not going to give away their product for free, and neither is Elon even though he has opposite politics.

@robonator2945 8 месяцев назад

they aren't doing safety testing at all, they're doing censoring. LLMs are not skynet, they architecturally can't be. They have very fundamental architectural limitations. Their verson of 'safety' is stopping their models from saying things they don't want them to. (something which has been well-documented by asking models even vaguely political questions, IIRC the exact percentage of how much whiteness was unacceptable to be praised was somewhere around 20%) This isn't stopping skynet, it's breaking the model's leg before releasing it into the wild so that it's more controllable.

@gustavodutra3633 8 месяцев назад

@@robonator2945AI lobotomy.

@BeidlPracker-vb8en 8 месяцев назад

This is definitely too many cuts for my MySpace era brain.

@Oktokolo 8 месяцев назад

The real breakthrough will be experts in fields of knowledge - not experts selected by random syntax artifacts.

@legorockfan9 8 месяцев назад

Shrink it to 3 and call it the Magi system

@tungvuthanh5537 8 месяцев назад

Being a French startup also explain for why it is named 8x7B

@exploreit7466 8 месяцев назад

Heyyyyyyyyy bro can you please make a video on 3d photo inpaining again pleaseeeeeeeeeee dude I really need it and it's not working properly

@raunakchhatwal5350 3 месяца назад

Thanks!

@andreamairani1512 7 месяцев назад

OpenAI shaking in their digital boots right now

@rafaelestevam 8 месяцев назад

The Bloke is already messing with quantizing it 😁 (is a v0.1 as I write)

@ThorpenAlnyr 8 месяцев назад

"We discovered CPUs people".

@pictzone 8 месяцев назад

Omfg this video is pure gold thank you thank you thank you

@reamuji6775 5 месяцев назад

I wish someone would train an AI model with 3x7B parameter and call it MAGI

@nigel-uno 8 месяцев назад

Bycloud pretending to understand topics. Glad someone pointed out all the errors already.

@imerence6290 8 месяцев назад

A year to get to GPT3.5 while OpenAI is heading towards GPT5 is impressive ?

@GoshaGordeev-yg5bc 8 месяцев назад

what about miqu?

@MINIMAN10000 8 месяцев назад

Mistral is 7B and good for its size Mixtral is 47B but runs as fast as a 13B and is good for its size MiQu is a preview build of mistral medium a standard 70B model, no idea on what peoples opinions are on its quality.

@arnes.1328 3 месяца назад

good stuff

@Purpbatboi 8 месяцев назад

what is the ''B' in this? gigabytes?

@capitalistdingo 8 месяцев назад

I think it means billion as in 7 billion parameters. But I say that with only a weak understanding of what that means so don’t take my word for it.

@squeezyDUB 8 месяцев назад

Yes it's billion

@alansmithee419 8 месяцев назад

7billion parameters. Basically a list of numbers that determine the AI's behaviour. Depending on implementation it is likely either half precision (16bits/2bytes per parameter), or using tensor processing which is popular for AI (8bits/1byte). So it could be either 14 or 7 Gigabytes, depending. But yes, probably 7GB.

@troybird8253 8 месяцев назад

IF you need to pay to use the model, it is not open source.

@bluesnight99 8 месяцев назад

Me who knows nothing about ai: U were speaking english....

@nbshftr 8 месяцев назад

hivemind vs big brain. hivemind has a bunch of ai who are each very good at certain skills. hive mind is fast but maybe a little stupid sometimes. big brain is one big ai that is good at a little bit of everything. big brain consistent and smart but slow.

@amykpop1 8 месяцев назад

@@nbshftr cheers for the explanation!

@alansmithee419 8 месяцев назад

@@nbshftr Big brain also maybe not as good at individual tasks as each hive mind brain is at their specialised tasks, but is more generally intelligent.