Тёмный

Llama3 speed test on Dual Nvidia 3090 

Lev Selector
Подписаться 1,9 тыс.
Просмотров 452
50% 1

Llama3 speed test on Linux PC with Two Nvidia RTX 3090 with 24GB - 48GB total.
Presented by Lev Selector - May 13, 2024
Slides - github.com/lselector/seminar/...
--------- My websites:
- Enterprise AI Solutions - EAIS.ai
- Linkedin - / levselector
- GitHub - github.com/lselector
--------- Contents of today's video:
We have tested llama3 (70b and 8b) on a desktop with two Nvidia RTX 3090 24GB video cards.
- CPU: AMD Ryzen™ 9 3900X (12 cores, 24 threads)
- RAM: 32GB
- WSL2 (Windows Subsystem for Linux).
- We used ollama to run llama3 8b and 70b
- Actual models were: llama3:latest, llama3:70b, lama3:70b-instruct-q4_K_M
- We also compared with performance on Apple Macbook Pro Max M3 128GB
The prompt was:
Please make a numbered chronological list of the last ten (10) US presidents in reverse order. The list should start like this: 1. Joe Biden (2021-present); 2. Donald Trump (2017-2021); 3. Barack Obama (2009-2017); the list should contain 10 rows. Important - make a fresh list. Disregard the chat history. Ouput only the list itself, nothing else. Output each lsit element on a separate line.
Results below show total duration of the response (in seconds) and output speed (in tokens/s).
We compare the Linux (Windows WSL) with latest MacBook Pro Max M3 128GB
70b Models:
Linux. 9.7 s, 14 t/s
Mac 16.5 s, 8.6 t/s
8b Models:
both were 2.6 sec, 56 t/s

Опубликовано:

 

12 май 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 2   
@lev-selector
@lev-selector 25 дней назад
When Running 70b model, you can see that the model is loaded completely into two GPUs (using most of their combined memory). If we repeat the same test with only one GPU - the generation speed drops down 10+ times to approx. 1 token/second. It is amazing how a thin quiet Mac laptop can match performance of a powerful and noisy desktop with 1.5 KW power supply :)
@qGte
@qGte 25 дней назад
Did you use SLI or no?
Далее
RTX3090 vs RTX4090 for creative workflows.
15:54
Просмотров 10 тыс.
Run 70Bn Llama 3 Inference on a Single 4GB GPU
8:18
Просмотров 11 тыс.
Should you update your GPU drivers?
12:34
Просмотров 159 тыс.
All You Need To Know About Running LLMs Locally
10:30
Просмотров 106 тыс.
Is the fastest GPU ALWAYS the best? RTX 4090 Review
16:19
Mythbusters Demo GPU versus CPU
1:34
Просмотров 6 млн