Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

Paul G. Allen School

Подписаться 20 тыс.

Просмотров 22 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

3 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 22

@FeintMotion 10 месяцев назад

Let's see Paul G. Allen's computer science lecture

@DestinyAdvisor 5 месяцев назад

Look at that clear, engaging explanation. The depth of knowledge presented. Oh, my god. It even includes practical examples.

@lance31415 10 месяцев назад

Great to have this posted - not going to catch it all the first time through

@virushk 9 месяцев назад

Legend!

@alfinal5787 10 месяцев назад

Great talk.

@goldnutter412 10 месяцев назад

Still barely half way through, this is epic

@EnginAtik 10 месяцев назад

If we can figure out the optimum topology of the NN by pruning it could be possible to replace the whole NN with an analog circuit which would be a MIMO analog controller. If that is ever possible we might gain some theoretical understanding of what NN is modeling by inspecting the transfer functions. Practical economic considerations like power consumption, speed, industrial chip manufacturing at scale are important but we also want to gain some theoretical understanding of why NNs are good at what they do: A table of values of the trigonometric functions can be unwieldy but they are the result of the simple Pythagorean formula. Similarly with AI we are basically generating huge tables of values and the race to be the leader is directing us to deal with bigger and bigger tables. Tiniest insects can do real time image processing, trajectory planning and flight control at minuscule energy costs with hardly any digital computation involved. We need an Occams Razor to to start simplifying things and pruning and sparcity techniques could be a first step for simplification.

@binjianxin7830 10 месяцев назад

There are tons of engineering know-hows, while others (Google) are trying to catch up with designing by machine learning but still seem to fall behind.

@stefanogrillo6040 10 месяцев назад

FYI DNNs are multipurpose trained processors, but yeah you can call them what you want

@pandoorapirat8644 9 месяцев назад

lol 26:04.7

@nauy 8 месяцев назад

If this is what NVidia thinks is the trend, they better watch their back. They’re still stuck in the von Neumann box and not addressing the big elephant in the room. Someone will leap frog them. It’s a matter of time.

@MCStairsOfDeath 7 месяцев назад

yep

@The_Uncertainity_Principal 5 месяцев назад

Could you explain this statement ? Not disagreeing at all just curious on your perspective

@The_Uncertainity_Principal 5 месяцев назад

Is the ‘elephant in the room’ fundamentally different architecture (i.e, groq) vs all this optimization at the edges ?

@nauy 5 месяцев назад

@@The_Uncertainity_PrincipalThe computational principle of neural networks is fundamentally very different from digital computers. The von Neumann architecture in digital computers segregates the processor from memory. This is optimized for running a long stream of instructions on the same data. Compute is heavy on each piece of data and centralized. Information is localized in each piece of data and stored in large memory connected by a bus to the processor. The capacity for this type of compute is a function of processor speed. In neural networks, compute is light on each piece of data and decentralized. Information is distributed over a large set of processors (nodes) and memory (weights). The capacity for this type of compute is a function of the connectivity, ie I/O bandwidth. Currently the weighted connectivity between nodes in most artificial neural networks (other than neuromorphic processors eg from Rain), is simulated via matrix multiplication, which has O(n^3) time complexity. GPU or SIMD only improves it to O(n^2). In real neural networks (like the one in your head) or neuromorphic processors, the weighted connectivity is implemented via physical connections, which has O(1) complexity. Now neuromorphic computers is better at solving this I/O bandwidth problem but at the cost of being inflexible in being able to modify the network architecture at any time. Also, current implementations are pure electrical, which puts a limit on the scale of the connectivity (impedance problem). I believe Rain processors have to use sparse connections (random subset and star) to get around this problem on even very modest network sizes. There are optical systems that are in research stages which can do matrix multiply in O(1). That’s the kind of leapfrog technologies I was referring to.

@nauy 5 месяцев назад

@@The_Uncertainity_PrincipalThe elephant in the room is I/O bandwidth. To scale up neural networks, new architecture is needed because the current processor/memory architectures, even the SIMD versions, are ill suited to extremely distributed computations in neural networks.

@goldnutter412 10 месяцев назад

Big numbers are awesome, can't get over how cool this video and how he explains it Group theory, abstraction, and the 196,883-dimensional monster @3Blue1Brown