Google GEMMA AI on Apple Silicon Macs with Python & Hugging Face - Creative AI with Nono · Live 113

Nono Martínez Alonso

Подписаться 3,6 тыс.

Просмотров 1,5 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Наука

Опубликовано:

7 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 16

7 месяцев назад

↓ Here are the chapters. =) 01:23 Introduction 02:46 Previously 03:11 Today 03:45 Elgato Prompter 06:19 Interlude 06:43 Google Gemma 2B & 7B 08:45 Overview 11:59 Hugging Face CLI 14:01 CLI Install 14:54 CLI Login 15:33 Download Gemma 22:19 Run Gemma Locally 24:49 Anaconda Environment 29:00 Gemma on the CPU 52:56 Apple Silicon GPUs 55:32 List Torch Silicon MPS Device 56:50 Gemma on Apple Silicon GPUs 01:08:16 Sync Samples to Git 01:17:22 Thumbnail 01:28:42 Links 01:31:12 Chapters 01:36:28 Outro

@thiagosantana8276 2 месяца назад

Amazing video. Very useful. I was trying to find some content using Apple Silicon. I Have M3 Pro, but still having some problems. 7b models are very hard to run. Thanks for the show!!!

2 месяца назад

Hey, Thiago, I'm glad you found this useful! =)

@mmenendezg 6 месяцев назад

Great video, great work with the visuals and really useful tutorial. 5 stars ⭐

6 месяцев назад

Hey, Marlon! Thanks so much for your kind words. I'm happy to hear you found this useful. Nono

@florianschweingruber2963 5 месяцев назад

Dear Nono, First off, thanks a lot for your work, it's super helpful. Can you elaborate on what's going on when we are downloading the shards, loading the tokenizers and safetensors? To me it seemed like the gemma model I previously downloaded and cached was being downloaded again. Or is that just showing the progress of loading into memory? How can I make sure the local, cached model is being used? Thanks again and all the best, Flo

5 месяцев назад

Hey, Florian! The models and shards get downloaded to HuggingFace's local cache in your machine (on macOS, for me, that's ~/.cache/huggingface) and it shouldn't be re-downloaded on every execution. It normally does a fast process to verify the models were already there, and if not it downloads. As you are saying, there's always a loading bar for loading the model into memory. 👌🏻 Nono

@pb2222pb 5 месяцев назад

Nice information. I just had one quick question, what is the configuration of your m3 max? How many gpu cores and RAM?

5 месяцев назад

Hey! I have an Apple M3 Max 14-inch MacBook Pro with 64 GB of Unified Memory (RAM) and 16 cores (12 performance and 4 efficiency). It's awesome that PyTorch now supports Apple Silicon's Metal Performance Shaders (MPS) backend for GPU acceleration, which makes local inference and training much, much faster. For instance, each denoising step of Stable Diffusion XL takes ~2s with the MPS backend and ~20s on the CPU. I hope this helps! Nono

@pb2222pb 5 месяцев назад

@ thanks a lot! I ordered mine yesterday after watching your video.

5 месяцев назад

Great! Let us know how it goes. =)

@BeejalVibhakar 6 месяцев назад

Would you be able to share the libraries from your project's requirements.txt?

6 месяцев назад

Hey! This is what `pip list` outputs in my environment. But what really matters is that the environment is running Python 3.11.7 and that I installed the following libraries: accelerate transformers pytorch. I hope it helps! Nono › pip list Package Version ------------------ ---------- accelerate 0.27.2 certifi 2024.2.2 charset-normalizer 3.3.2 clip 1.0 filelock 3.13.1 fsspec 2024.2.0 ftfy 6.1.3 huggingface-hub 0.20.3 idna 3.6 Jinja2 3.1.3 MarkupSafe 2.1.5 mpmath 1.3.0 networkx 3.2.1 numpy 1.26.4 packaging 23.2 pillow 10.2.0 pip 23.3.1 psutil 5.9.8 PyYAML 6.0.1 regex 2023.12.25 requests 2.31.0 safetensors 0.4.2 setuptools 68.2.2 sympy 1.12 tokenizers 0.15.2 torch 2.2.1 torchvision 0.17.1 tqdm 4.66.2 transformers 4.38.1 typing_extensions 4.9.0 urllib3 2.2.1 wcwidth 0.2.13 wheel 0.41.2

@BeejalVibhakar 6 месяцев назад

@ Thanks for this information. I created exactly same environment but for GPU version upon calling generate, it's returning same prompt as output (nothing more), whereas this works perfectly fine when I use CPU code. Here's my GPU code.. from transformers import AutoTokenizer, AutoModelForCausalLM import time start = time.time() tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b-it") print(f"Tokenizer = {type(tokenizer)}") model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it", device_map="auto") print(f"Model = {type(model)}") input_text = "Tell me ten best places to eat in Pune, India" input_ids = tokenizer(input_text, return_tensors="pt").to("mps") print(input_ids) outputs = model.generate(**input_ids, max_new_tokens=300) print(outputs) print(tokenizer.decode(outputs[0])) end = time.time() print(f"Total Time = {end - start} Sec")

6 месяцев назад

What macOS version are you running on?

@BeejalVibhakar 6 месяцев назад

@ - It's Sonoma 14.4