Llama3 speed test on Linux PC with Two Nvidia RTX 3090 with 24GB - 48GB total.
Presented by Lev Selector - May 13, 2024
Slides - github.com/lselector/seminar/...
--------- My websites:
- Enterprise AI Solutions - EAIS.ai
- Linkedin - / levselector
- GitHub - github.com/lselector
--------- Contents of today's video:
We have tested llama3 (70b and 8b) on a desktop with two Nvidia RTX 3090 24GB video cards.
- CPU: AMD Ryzen™ 9 3900X (12 cores, 24 threads)
- RAM: 32GB
- WSL2 (Windows Subsystem for Linux).
- We used ollama to run llama3 8b and 70b
- Actual models were: llama3:latest, llama3:70b, lama3:70b-instruct-q4_K_M
- We also compared with performance on Apple Macbook Pro Max M3 128GB
The prompt was:
Please make a numbered chronological list of the last ten (10) US presidents in reverse order. The list should start like this: 1. Joe Biden (2021-present); 2. Donald Trump (2017-2021); 3. Barack Obama (2009-2017); the list should contain 10 rows. Important - make a fresh list. Disregard the chat history. Ouput only the list itself, nothing else. Output each lsit element on a separate line.
Results below show total duration of the response (in seconds) and output speed (in tokens/s).
We compare the Linux (Windows WSL) with latest MacBook Pro Max M3 128GB
70b Models:
Linux. 9.7 s, 14 t/s
Mac 16.5 s, 8.6 t/s
8b Models:
both were 2.6 sec, 56 t/s
12 май 2024