Llama 3.1 405b model is HERE | Hardware requirements

Подписаться 16 тыс.

Просмотров 19 тыс.

50% 1

In this video, we dive into Meta’s latest AI breakthrough: the Llama 3.1 405B model! Learn about its state-of-the-art capabilities, Inference requirement of over 16K NVIDIA H100 GPUs, supporting a massive 128K context length. Discover how this model excels in general knowledge, multilingual translation, and more, pushing the boundaries of AI technology. Whether you’re an AI enthusiast or a developer, this video covers everything you need to know about the Llama 3.1’s groundbreaking features and applications. Don’t miss out on the future of AI!

Наука

Опубликовано:

20 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 32

@MeinDeutschkurs 2 месяца назад

Please keep in mind that the context window also increases vram needs. 128k? We‘ll need something like Apple M8 Extreme Chip with X terrabyte(s) of unified memory. The cool thing, it will cost something around 10k-15k instead of 200k.

@AKA96369 18 дней назад

M8?! Is Extreme confirmed?!

@MeinDeutschkurs 18 дней назад

@@AKA96369 , I don‘t know. I said „something like“.

@Ukuraina-cs6su Месяц назад

3:15 It Looks like we crossed the point when it was possible to run AI locally. Now, you need a tiny supercomputer to operate with cutting edge models. =((

@Wingnut353 28 дней назад

Not really, the smaller models are still making progress in quality improvements, so we will see smaller models that are better than the very large llama 405B... for sure.

@Those_Weirdos 2 месяца назад

I have 72GB VRAM - Can't wait to run the 405B parameter model at 0.01bpw. But I am going to screw around with this on my 512GB RAM Epyc box. Expecting a couple seconds per token, should be wicked awesome.

@lnstagrarm 2 месяца назад

Donate me please 😢😂

@SalveMonesvol 5 дней назад

How's the performance hit if you have just a couple of 3090ti's with nvbridge but stupid system ram, like 2TB of o channel ddr5 6400 ? Is it still crap or does it become a decent option?

@MrOktony 2 месяца назад

Look forward to your quantisation results😊

@InstaKane 2 месяца назад

Awesome! Great video, learned a lot cheers 👍

@क्लोज़अपवैज्ञानिक 2 месяца назад

can i use amazon or ibm servers to run the 70b or 405b model?

@threepe0 2 месяца назад

There was a sticker that came with your mic cluing you into the fact that it's a side-address 😆don't be a Yeti

@martianreject 2 месяца назад

10:04 "You are on a Quuee" 🤭

@proflead 2 месяца назад

Thanks for sharing!

@ThatGuyJoss 2 месяца назад

I cannot get it install. I don't know what I'm doing wrong. I've gotten basically to every point. Except for the very last one when you type in y to confirm that you're okay with the file size

@jamesmichaelcabrera9613 2 месяца назад

Just go to olama.com and then once you download the install, go in your terminal and type ollama run llama 3.1

@mendodsoregonbackroads6632 2 месяца назад

I tried to run the 70b model with 16GB and it just crippled the machine and ran up 2.5 GB of swap.

@mendodave 2 месяца назад

@@avataros111 Yes it did work. I don't have a fancy GPU on the M1, so it takes a little while longer to run a query, than someone with an external GPU module. but it works fine with the onboard GPU and neural engine ,with no hiccups or spinning beach balls. The Ram spikes up to 14 Gb or 15 GB or so but then goes back to normal after Llama is finished.

@Wingnut353 28 дней назад

I've ran 70B based quantized models using about 13GB of vram and 45GB or so of system ram.... only about 1token a second though.

@kumarsujal4969 2 месяца назад

really nice informative video

@thecount25 2 месяца назад

Can I run it on my CPU? I have 44 cores and 512GB RAM.

@tavares1574 2 месяца назад

Try with ollama

@luisff7030 2 месяца назад

Ollama.cpp

@juliusvalentinas Месяц назад

NVIDIA A100 GPU is 30K USD, and you need many! Each GPU takes 450W, it's nonsense in electrical bill and initial price.

@gileneusz 2 месяца назад

3:31 wait 3 years and it would be possible 😃

@Ukuraina-cs6su Месяц назад

in 3 years you will get updated llamas

@gileneusz 2 месяца назад

8:42, not possible, the smallest quant is Q2_K 149.0 GB, possible to run on Mac Studio 192GB, but gives only 2 t/s, better to use just hugging chat.....