Тёмный

Accelerate Your GenAI Model Inference with Ray and Kubernetes - Richard Liu, Google Cloud 

CNCF [Cloud Native Computing Foundation]
Подписаться 119 тыс.
Просмотров 731
50% 1

Accelerate Your GenAI Model Inference with Ray and Kubernetes - Richard Liu, Google Cloud
Generative AI has become increasingly prevalent in recent years, and is reaching a critical point as the models are demonstrating human-level capabilities. However, serving these massive models have presented new technical challenges, as they contain hundreds of billions of model parameters and require massive computational resources. In this talk, we will discuss how to serve GenAI models using KubeRay on Kubernetes with hardware accelerators like GPUs and TPUs. Practitioners will learn how to get these large models into production on a performant and cost-effective Kubernetes platform. Ray is an open-source framework for distributed machine learning. It enables ML practitioners to scale their workloads out to large clusters of machines. Ray Serve offers a scalable and framework-agnostic library for online inference that’s suitable for large and complex models. The audience will learn how integrating Ray with accelerators can create a powerful platform for serving GenAI models.

Опубликовано:

 

4 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии    
Далее
小路飞嫁祸姐姐搞破坏 #路飞#海贼王
00:45
Women’s Celebrations + Men’s 😮‍💨
00:20
Просмотров 3,3 млн
Scale AI with Ray on Vertex AI
24:41
Просмотров 1,5 тыс.
Deploying Many Models Efficiently with Ray Serve
25:42
Просмотров 4,2 тыс.
What are AI Agents?
12:29
Просмотров 504 тыс.
Inference Optimization with NVIDIA TensorRT
36:28
Просмотров 12 тыс.
Enabling Cost-Efficient LLM Serving with Ray Serve
30:28
Operationalizing Ray Serve on Kubernetes
29:54
Просмотров 1,9 тыс.
Deploying machine learning models on Kubernetes
26:32
小路飞嫁祸姐姐搞破坏 #路飞#海贼王
00:45