37C3 - What is this? A machine learning model for ants?

media.ccc.de

Подписаться 208 тыс.

Просмотров 3,2 тыс.

50% 1

media.ccc.de/v/37c3-11844-wha...
How to shrink deep learning models, and why you would want to.
This talk will give a brief introduction of deep learning models and the energy they consume for training and inference. We then discuss what methods currently exist for handling their complexity, and how neural network parameter counts could grow by orders of magnitude, despite the end of Moore's law.
Declared dead numerous times, the hype around deep learning is bigger than ever. With Large Language Models and Diffusion Models becoming a commodity, we ask the question of how bad their energy consumption really is, what we can do about it, and how it is possible to run cutting-edge language models on off-the-shelf GPUs.
We will look at the various ways that people have come up with to rein in the hunger for resources of deep learning models, and why we still struggle to keep up with the demands of modern neural network model architectures. From low-bitwidth integer representation, through pruning of redundant connections and using a large network to teach a small one, all the way to quickly adapting existing models using low-rank adaptation.
This talk aims to give the audience an estimation of the amount of energy modern machine learning models consume to allow for more informed decisions around their usage and regulations. In the second part, we discuss the most common techniques used for running modern architectures on commodity hardware, outside of data centers. Hopefully, deeper insights into these methods will help improve experimentation with and access to deep learning models.
etrommer
events.ccc.de/congress/2023/h...
#37c3 #SustainabilityClimateJustice

Опубликовано:

12 фев 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 9

@eldoprano 3 месяца назад

Love the Sakamoto at 15:57. A nice detail when talking about MoE

@hackjealousy 3 месяца назад

Excellent title.

@moccamixer 3 месяца назад

😂 i wonder how many got it 🤣

@keyworksurfer 3 месяца назад

@@moccamixerliterally everyone, it's an insanely old and mainstream reference

@jadeaffenjaeger6361 Месяц назад

@@keyworksurfer everyone of a certain age... Not entirely sure how much sense it makes for people under 25.

@stuartwilson4960 2 месяца назад

This is inaccurate, if a company distributes training weights, they are giving away their training model. Inference is much the same thing as training, the only difference is there is no comparison for expected output, and backpropagation. If you have an LLM inference model, you have an LLM training model.

@Eunakria 2 месяца назад

I think they're referring to companies only distributing quantized/pruned weights and keeping the original weights used for training private. it's not to say that you can't train off the quantized/pruned weights, just dialogue about the computational feasibility of either. and there's also something to be said about oversized models being easier to train

@jadeaffenjaeger6361 Месяц назад

The concern is that you typically lack the recipe to reproduce the training weights (practical considerations like required compute aside). So the weights are somewhat analogous to a compiled binary, rather than the actual source code for a program. It's a whole lot better than nothing, but means that significant portions of the training process of the foundational model (and, by extension, everything that is derived from it) are opaque to the public. I hope this clarifies the intent of the remark a little bit. (I'm the speaker)