Faster and Cheaper Offline Batch Inference with Ray

Подписаться 8 тыс.

Просмотров 1,3 тыс.

50% 1

The popularity of machine learning (ML) in the real world has exploded recently, with offline batch inference as one of the primary workloads. Yet existing production ML systems fall short for this workload in scale and simplicity. To address this, the Ray community has built Ray Data, an open-source library for building large-scale data processing for ML applications. In this talk we'll discuss:
• How to use Ray Data for efficient inference of Terabytes of data and a pretrained model
• Why traditional data processing tools are difficult, expensive, and inefficient, particularly for modern deep learning applications.
• How to easily leverage modern ML models, multiple times faster and cheaper than other common solutions (like Spark or SageMaker) using - Ray Data
• How and why offline inference can be useful with LLMs and when building LLM applications
• Demonstrate an e2e offline batch inference use case with Ray Data
Takeaways:
• Ray Data is the best solution for offline batch inference/processing, particularly when working with unstructured data and with deep learning models
• Discuss why having a performant batch inference solution is important for LLM workloads, and show how Ray Data can help
• Share user success stories of using Ray Data for batch inference
Find the slide deck here: drive.google.c...
About Anyscale
---
Anyscale is the AI Application Platform for developing, running, and scaling AI.
www.anyscale.com/
If you're interested in a managed Ray service, check out:
www.anyscale.c...
About Ray
---
Ray is the most popular open source framework for scaling and productionizing AI workloads. From Generative AI and LLMs to computer vision, Ray powers the world’s most ambitious AI workloads.
docs.ray.io/en...
#llm #machinelearning #ray #deeplearning #distributedsystems #python #genai

Опубликовано:

3 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 3

@Mohith7548 8 месяцев назад

Can you share the full code for the audio batch inference?

@AnnerdeJong 5 месяцев назад

Considering the 300g ray vs spark comparison (~15m30s-18m30s) - the spark side seems to save all the prediction outputs (`...write...save()`), but I don't see that on the ray side (`for _ in ds.iter_batches(..: pass`). Does ray's 'iter_batches()` automatically dump outputs somewhere? (e.g. when specifying `batch_format='pyarrow'` does it get automatically cached or sth in the ray object store, or sth similar?) If not - I'd argue it's not an entirely fair apples-to-apples comparison?