Inside a Neural Network - Computerphile

Подписаться 2,4 млн

Просмотров 426 тыс.

50% 1

Just what is happening inside a Convolutional Neural Network? Dr Mike Pound shows us the images in between the input and the result.
How Blurs & Filters Work (Kernel Convolutions): • How Blurs & Filters Wo...
Cookie Stealing: • Cookie Stealing - Comp...
Rob Miles on Game Playing AI: • AI's Game Playing Chal...
Secure Web Browsing: • Secure Web Browsing - ...
Deep Learning: • Deep Learning - Comput...
/ computerphile
/ computer_phile
This video was filmed and edited by Sean Riley.
Computer Science at the University of Nottingham: bit.ly/nottscomputer
Computerphile is a sister project to Brady Haran's Numberphile. More at www.bradyharan.com

Опубликовано:

29 июн 2016

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 312

@pw7225 7 лет назад

Dr Pound is the best lecturer here. Very clear, intelligently funny, interesting topics. Would deserve his own channel

@theREAL9er 8 лет назад

The pictures he printed of the layers helped me grasp the concept so much better than other videos, so thank you

@drhasnainsikandar 2 года назад

Me too

@oliviamay 8 лет назад

Loving these videos with Dr. Pound, keep it up!

@mohammaddawas481 5 лет назад

This is the best explaination of what is going on inside a neural net! Now I can imagine it more clearly Thanks alot!

@ProphecySam 8 лет назад

I've been studying neural network for the last couple of months and haven't come across any resource that explains it with this perfection. You have made it so easy with the visualization. I'd really appreciate more videos on topics like RNN, how to set number of layers, filters etc (hyperparameters).

@aplcc323 6 лет назад

Computerphile, you single handedly helped me regain my interest with computer science. Thank you very much for all your videos (:

@rhoneletobe 8 лет назад

So useful. As a CS student, this was more helpful than a ton of other DLNN stuff I've seen online. Thank you!

@astropgn 8 лет назад

In the last video I asked how the images were in these various convolutions. I knew that they wouldn't be nothing like the input image, but I was very curious to see the process anyway. And now you make a video answering exactly what I wanted! Thank you so much! :)

@tho207 8 лет назад

what a fantastic explanation, I loved the digits convolution representation hope to see more videos about this! (RNNs)

@talhatariqyuluqatdis 8 лет назад

This guy is my second favourite on computerphile. Lovin these demos

@dolibert 6 лет назад

Mike and Rob, the stars of computerphile. Great content and nice puns. Keep it up guys

@Aarrmehearties 3 года назад

Massively interesting and well presented, even for my aging neural network!

@ehsankiani542 4 года назад

The best tutorial ever! Cheers, Mike!

@SomethingUnreal 8 лет назад

It'd be really interesting to take a network trained to detect random objects as seen by a camera, then give it the live feed from a camera and watch the activation of each neuron in realtime as the object moves about in the camera's view, or rotates around the object, etc. I guess the earlier layers would change a lot, while the deeper layers (which have a better idea of what constitutes an object) would change less.

@Vulcapyro 8 лет назад

Projects like this have been done, but only in the sense that they usually just output the most probable class(es) because that's usually the only real way to deal with the amount of information. For modern networks you should be able to visualize activations of a single layer in real-time, but the number of pixels you'd need for a given layer can range from thousands to millions. So doable, but probably not easy to visually parse just by looking at it.

@smileyball 8 лет назад

Consider looking into visualization approaches (like saliency/heat maps/deconvolutional neural network) and approaches that focus on maximal activation (like Google's DeepDream)

@MadJDMTurboBoost 7 лет назад

SomethingUnreal Id imagine if it was programmed properly and trained long enough, it may look similar to an fMRI.

@aungthuhein007 7 лет назад

Would love to have someone like him as my professor in my life!

@displayoff 8 лет назад

I love this man.

@davidm.johnston8994 6 лет назад

Thank you Mike, and thank you Shaun, this video is really helping me in my quest! I'm making a small game in which I'm trying to make an AI using the tensorflow library.

@Kitsudote 2 года назад

Oh wow, this video made me understand neurological networks in an insanely deep way. Thank you!

@harborned 8 лет назад

Fantastic video! Interesting to see "inside the mind" of a neural network

@cazino4 3 года назад

Excellent video! Visual seeing the neurons light up blew me away... It was like looking at an artificial, scaled down brain being imaged...

@Pfaeff 8 лет назад

Doesn't google use those captchas as a crowd sourced labeling technique for their own deep learning stuff?

@emosp0ngebob 8 лет назад

there's a computerphile video on that too somewhere, again a google project. you get shown two words, and the computer knows one of them and not the other, so when you type the two words in the computer learns what a word is. that's for transcribing libraries and things... i cant remember which computerphile video it was though.

@zubirhusein 8 лет назад

They do

@daniellewilson8527 4 года назад

I wonder if there’s a website that someone can go to to do the image things to help train the deep learning systems?

@kevintoner6068 4 года назад

wow.... I didn't expect to understand any of that, but it was all explained perfectly. It made sense. Awesome video

@hotfrost_ Год назад

Thank you so much. This was very helpful.

@shifter65 4 года назад

Love the visualization!

@TheCritic609 8 лет назад

Great Video!

@zakeryclarke2482 7 лет назад

Please do a video on the maths of forward and back propagation and how they are implemented

@Sebi0043 6 лет назад

Thank you very much! Very helpful video!

@probE466 8 лет назад

Could you maybe link to the actual code? Would be interesting to look at the implementation

@s.e.7268 3 года назад

it was very enjoyable, thanks for the video.

@Wurzelbrumpft1 8 лет назад

love this series about machine learning

@Fireking300 8 лет назад

Great video!

@punkkap 8 лет назад

Brilliant video.

@SinanAkkoyun 5 лет назад

Dr Mike Pound, please make a tutorial series on q learning! In depth

@heaslyben 8 лет назад

That was really interesting! Thanks!

@hosmanadam 5 лет назад

Very enjoyable, thank you!

@RG-jv2nv 8 лет назад

This really clarified the previous video :)

@deepquest 7 лет назад

Its a really good lecture to understand what is going on inside NN. I am using NN for target classification in thermal images. Is NN is a good approach to do that ? Or I should go for any other option.

@DracarmenWinterspring 8 лет назад

With all the edge detection going on, would it be harder to recognize a 4 if some versions had the top parts join at an angle, like the 4 in this font, versus the open version as in the video? Likewise a 7 with or without the strike through it? I mean, does it remember some kind of average of all the objects in a class or all of them / all of the sufficiently different ones (which might be hard for a large database)?

@Zorbeltuss 8 лет назад

A thought that I've gotten when thinking about this and the previous episode, would it be possible to "reverse" the order of the convolutional neural network, getting a sort of idealized result, probably not extremely useful in most cases, but likely somewhat usable for seeing what extra data can be used to train it for more accurate results or perhaps some sort of data generation. Doing the same for a standard neural network would not result in any useable data I know, but it seems like it might be possible with the convolutional one.

@Embedonix 8 лет назад

Can you share the caffe scripts you used please?

@shanesrandoms 8 лет назад

enjoying the neural net videos. looks like ANNs are coming back in after not really seeing much of it since the 90s. i remember my first exposure to the math ans theory behind this was an assembly program on my 8bit C64 back in the late 80s creating a 3 layer Back Propegation network

@crazygood150 8 лет назад

who needs dual monitors when you have dual PC! Great video btw

@jambalaya201 3 года назад

How do you even visualize the output of the NN? Crazy, this perspective is so insightful.

@davidscarlett5097 8 лет назад

How are the outputs of the multiple kernels at each layer managed? Are they somehow merged so that the kernels of the next layer all process the same input? Or do the 20 kernels of layer 2 operate on the 20 outputs of the layer 1 kernels respectively? And if the latter, then what happens when moving from a 20 kernel layer to a 50 kernel layer? Would some of the 20 kernels of the previous layer be duplicated twice, and others duplicated three times to make up the inputs to the 50 kernels in the new layer?

@haakonvt 7 лет назад

As always, a splendid video! However, every single clip taken from the angle where the pictures of the convolutions are visible, are out of focus. Pitty

@jonathanstrasburg3609 6 лет назад

I realize this isn't likely to get a reply this late, but I'm trying to replicate the configuration of this network. What activation function are you using for the first fully connected layer? Is it dotplus with a renormalization? I'm assuming FC2 is a softmax layer, so maybe they are both softmax.

@sedthh 8 лет назад

please do more on this

@Kubaizi 6 лет назад

very helpful, thanks

@mimArmand 5 лет назад

Very cool! @5:24 Grayscale is quite a few bits deep, 1-bit depth would be Black & White ( which is not the case in your images, looks like you have at least 16-bit images - if not 256-bit standard grayscale - )

@sirivellamadhuphotos 5 лет назад

@ 5:58 I got the point i am searching for. Thank you very much..

@mikejones-vd3fg 8 лет назад

so how do the nueral networks do this? is there speed advantages to this network vs just regular processing?

@bofk7306 8 лет назад

Can you look at your last but one fully connected layer and calculate the typical "distance" between different digits? E.g. just euclidean distance on the normalized terms in FC1. Would those distances depend on your neural network you're using or would they be similar across all successful neural networks? That is could you say something like a 1 and a 7 are typically closer than a 0 and a 4.

@kachrooabhishek 2 года назад

i just love this guy

@ericklestrange6255 4 года назад

amazing, i didnt know u could visualize the high rank features

@CarterColeisInfamous 8 лет назад

how were the kernels generated for this one?

@shiphorns 7 лет назад

After watching this, the one thing I don't feel is completely explained is where the convolution kernel values come from. At first he says they are things "like Sobel edge detectors", but later says they are not manually entered, but rather learned values. That leaves the obvious question of how are they initialized? Do they start as just matrices with random entries? During the training, how are they adjusted? Is the "training" some kind of iterative search for kernel values that give the strongest response (e.g. the values that most consistently uniquely identify the one digit being learned and most strongly reject the other 8 digits?) I could use a bit more explanation on what the training process looks like and how it adjusts all the kernels.

@vikassrivastava2680 7 лет назад

HOw can i get these algos if i want to do it on my machine?

@caw25sha 8 лет назад

I am one of those strange people who draws a horizontal bar through the number 7. How would you deal with that? Would you need a separate set of 7+bar training digits (in effect an 11th character) and then map both 7 and 7+bar back to 7?

@anassgxz 8 лет назад

can you do a episode on histograms of oriented gradient

@SleeveBlade 8 лет назад

really interesting! I would be interested tot see if it is possible to start from the final convolution and see which image fits it the best, as in 'what looks the most like a 2'.

@ruben307 8 лет назад

it would be interesting to know if there can be totally different pictures that just would get the same number. Similar to a hashing collisions.

@black_platypus 8 лет назад

Well sure, that's basically the same concept (irreversible / one-way transformations giving you an abstract result)

@quakquak6141 8 лет назад

I saw somewhere a neural network that was trained to fool convolutional neural networks, sometimes it produced normal images (in this case it would have produced a 2) other times it produced something that looked almost like pure noise but it was still able to fool the networks

@Locut0s 8 лет назад

Very interesting! I wonder if this gives some insight into how neurones in our brains work on a very basic level?

@cmptrn825 8 лет назад

Points for making me look at my screen with my head turned 90 degrees to the left until I realize I look like a crazy person

@black_platypus 8 лет назад

Now I see it, too... For some reason, YT gave me the video at a lower resolution (not watching it in full screen mode, I hadn't noticed), and I was thinking "I don't understand all the people complaining about the video being "wobbly", the video looks fine to me"... Then I saw I wasn't watching it at 50 fps, so I changed the quality. I, too, find it a bit weird. I guess stabilization doesn't quite kick in as hard when there are more frames to interpolate between?

@CarterColeisInfamous 8 лет назад

13:16 what he wants to say is that if the images are segmented then its much easier. the segmentation problem is hard. That's why google captchas are all mushed up on each other. Google apparently fixed the segmentation problem by just training it to recognize multiple pairs of letters

@MrArunavadatta 7 лет назад

excellent..

@thecakeredux 4 года назад

I imagined that each layer uses all its kernels on all the images of the previous layer. But that can't be right, hearing that the last convolutional layer here only outputs in a size of 50*4*4. Does that mean that there essentially are "kernel pipelines"? So kernel0 of layer1 will only be fed with the output of kernel0 of layer0?

@CarterColeisInfamous 8 лет назад

14:36 google captcha api morphs the filter the more samples you get until the training data is useless, also those image based captchas are also being broken after all the success of imagenet

@SapphireCrook 8 лет назад

I never knew YT even supported 50FPS. :O Also, cool computer learning. Today is a day of new smarts.

@germangb8752 8 лет назад

I'm taking a two credit course in deep learning next week!

@slayer646464 8 лет назад

amazing

@ahmadibraheem1141 4 года назад

Hey I have a question. After the first conv layer, we are left with 20 images of 24*24 pixels. Do these 20 images transform into one 24*24 sized image, to be given as an input to the next conv. layer?

@TheAbdelwahab83 2 года назад

No, after the first conv layer you have like a volume (24*24*20), and it is the input to the next conv layer of size ( 5*5*20), so if you apply this kernel to that input volume you'll get one image of size 20*20, and because you have 50 filters of (5*5*20) so your output will be 20*20*50

@DrBlort 8 лет назад

How do you decide what the convolution kernels should be? Is that important, or could they be defined randomly at the beginning?

@rory4987 5 лет назад

Neural network weights are set randomly and then learnt

@the1exnay 8 лет назад

would i be wrong in thinking that if you gave a convolutional neural network the ability to control where to click and what to type and gave it enough convolutions and kernals (perhaps beyond what current computers can handle) and trained it enough then it would be able to solve any captcha, even a new one with different interface that still used the same basic principles?

@dino130395 8 лет назад

Seeing that a lot of people are confused by this video being 50fps, I'd want to clear that up. 50fps is a standard frame rate for television and video in general. 60fps is a standard for animated and generated images, like animations, or games. Sure, you can do either with both, but it's generally so that high-frame-rate TV broadcast are always 50, not 60 fps. The scale for TV: 25, 50, 100, 200 Hz The scale for Computers: 30, 60, 120 Hz (Hz = fps)

@HyzerFlip 8 лет назад

Love the out of focus shots on the pictures...

@Remix00zero 8 лет назад

So would it be possible to use convolutional neural networks for something general like arbitrary image matching or are they limited to narrowly trained applications like the one here?

@2Cerealbox 8 лет назад

You can. I think google uses a neural net for their "visually similar images" feature.

@paulhendrix8599 8 лет назад

nice picture.

@elanvanbiljon5237 8 лет назад

Is there any chance you could upload a copy of the source code for the CNN some where? (or even pseudo code) I am sure many people would greatly appreciate it :D

@G3rain1 8 лет назад

It would have been way more interesting to see different examples of the same number and how it tranlates into the same output.

@CDmc98 8 лет назад

Shouldn't one be able to generate characters(letters, whatever) by going the other way around? I'm thinking what if you tell it to generate a picture from a fully connected layer?

@3zehnutters 8 лет назад

is there a GitHub link to the projekt, Mike ?

@recklessroges 8 лет назад

github or it didn't happen ;-) "I call GnuImageManip'Prog"

@CodeAbstract 7 лет назад

Why do you need a GitHub page? He just literally explained the full architecture of his built CNN (Convolutional Neural Network). Now, if you want to test this for yourself, you can easily implement all he said. Only find the right programming language which is supported by the libraries to complete the task. He even mentioned what he used, but you could also look at Lua with Torch for example. All the libraries that he mentioned all already there, so you won't need to code in any of the layers, just implement them.

@tamebeverage 4 года назад

Excuse me if I have missed something obvious, but I'm not sure I understand what the input of, say, C2 is. Is it a sort of average of all of the images produced by C1?

@gryzman 8 лет назад

What's the library/program Dr Mike is using please ?

@realcygnus 8 лет назад

Big Corporate Top Secret

@MrVbarroso 8 лет назад

It's called caffee.

@kolby4078 8 лет назад

using caffe in linux

@lohphat 7 лет назад

How to you replicate the learned connections to other systems? How is the "knowledge" abstracted for transport, backup, and further improvements? With discrete programming, the instructions are compact and finite and are easily copied.

@thejll Год назад

What if the training images have digits drawn at different scales?

@feliceserra106 8 лет назад

Why are there 4x4x50 neurons after the last conv-layer? I get 4x4x(20^2)x(50^4) neurons, if every 5x5 kernel runs over every image from the previous layer. I'm confused.. maybe the kernels in the following layers are 3-dimensional? Like 20 5x5x20 kernels in the second layer?

@feliceserra106 8 лет назад

Now I understand. Thanks a lot!

@feliceserra106 8 лет назад

Got it, thank you!

@tenalexandr1991 8 лет назад

I have the same puzzle. Could you enlighten me?

@feliceserra106 8 лет назад

In short: "Each kernel is 5x5xD, where D is the number of features in the previous layer" I dont know why their answers are not showing up on youtube. Maybe a google+ thing.

@iroxudont 8 лет назад

>using windows as a host machine and linux as a guest

@ebolapie 8 лет назад

Why anyone would use HyperV over KVM is beyond me.

@box8250 8 лет назад

It might be university property

@michaelpound9891 8 лет назад

Almost. Actually this was teamviewer - accessing a machine on another campus that is significantly more powerful that the one I have in this office. Deep learning doesn't really work in a guest due to needing the graphics card, which is tricky to get. The machine to the left is also Linux, and is used for deep learning, but someone else was using it at the time of this video!

@iroxudont 8 лет назад

Michael Pound Why not SSH

@beautifulsmall 3 года назад

wrote python convolution algos on bitmaps around this time just to self learn python, filters and convolutions are amazing to see in action . Its a little scarry to see how far we are now in 2021 . Covid hasnt stopped SW engineers .

@denischikita Год назад

You look like smart guy. Well done

@TheGameFreak013 6 лет назад

how do you know how many kernels, layers, etc. are best suited for your needs?

@m3n4lyf 5 лет назад

That is an excellent question! Unfortunately it requires at least a moderate amount of knowledge in the subject matter to answer, so I doubt you'll be getting a satisfactory response from this resource any time soon.

@andreyguskov1697 7 лет назад

If the first convolution layer has 20 filter and the second one has 20, does thing mean that each C2 filter processes all 20 images from C1? That would make 400 images for C2 output

@JohnHollowell 8 лет назад

I wonder, can you work backwards somewhat to get a general idea of what the original image looked like from the convolution layers?

@RoySchl 8 лет назад

i don't think so, that would be like asking "what 5 numbers did i multiply to get 3600?" there is only on possibility if you do the multiplication, but many possibilities when you try to guess backwards, and with those convolutions it's the same thing just exponentially worse. basically you drop a lot of information

@DagarCoH 8 лет назад

Well, I'd say yes, as if you have all the information every layer puts out, you just have to reverse the process the first layer did on the data it gave. Since you have many processes on the one image, there should be much redundancy and therefore a high certainty. If you however only have the output of the sixth convolution layer, I highly doubt that you could get much out of it.

@compuholic82 8 лет назад

Partially. The problem is that (in general) a convolution is not a reversible operation. However, you can apply something that is known as a "matched filter" which is basically a convolution with the transposed filter kernel. If you go backwards through the network you can (to some degree) reconstruct the input signal. If you look at this paper you can see how the reconstructions look like: arxiv.org/pdf/1311.2901v3.pdf And just to prevent confusion: The author calls it "Deconvolution". But he isn't doing a "deconvolution" as he describes in his paper. He is applying a "matched filter".

@andre.queiroz 8 лет назад

people interested in this experiment, you can actually do it in the Machine Learning course (Stanford) on Coursera

@jacobdawson2109 8 лет назад

I am curious, would it be possible to run this sort of neural network in reverse in order to produce the sort of "Deep Dream" images that you can see on the Internet? For instance, instead of asking the network 'what digit dose this image resemble?', ask 'what dose a 2 look like?'

@gpt-jcommentbot4759 Год назад

yes thats what deep dream is

@CarterColeisInfamous 8 лет назад

source code please?

@EQuivalentTube2 8 лет назад

The first layer looks like something Andy Warhol would do.

@pequenoZero 8 лет назад

It would be nice, if you talked a bit about how much data is needed for a CNN to be any kind of useful. The datasets in this video seem extremely big. Specifically it would be nice to have an idea on how well it works on many "categories" with a low amount of data.

@AlexanderTrefz 8 лет назад

Would you not have 11 output options? 0-10 and NaD(Not-a-Digit)?

@OddlyTugs 8 лет назад

This is single digit recognition, multi digit/character recognition is a whole 'nother can of worms. When there is no activation's on the output layer you know it is not a digit.

@AlexanderTrefz 8 лет назад

Francois Molinier that makes sense.

@andre.queiroz 8 лет назад

+Francois Molinier That's not how it works. the grayscale pixels represent the probability of it being a given number. if u input NaN u will probably see a few bright grays or a couple of almost whites.

@marcoswappner8331 4 года назад

Let's see if someone can help me out here. The first layer here outputs 20 24×24 images (or a 20 channel image) after performing all the convolutions. The second layer will output 20 20×20 images. But how are they constructed? How do they combine the 20 channels from the previous layer? I mean, they are not applying all 20 filters to each of the 20 channels, that'd be a 400 channel output. Do they simple add the convolutions for each channel up? So channel 1 of layer 2 is the sum of the convolution between kernel 1 of layer 2 with each of the 20 channels of layer 1?

@0xChRS 8 лет назад

whoa why is it at 50 fps?

@ChristopherPuzey 8 лет назад

europe is poor and can't afford the extra 10

@deadalusdx5637 8 лет назад

muh socialism

@AuroraNora3 8 лет назад

Except for Scandinavia

@CJSwitz 8 лет назад

Based on PAL (25/50/100HZ), whereas NA is based on NTSC (30/60/120HZ). There is no techincal reason for it anymore, but it was originally because it was clocked off of the AC power grid which ran at 60hz in NA and 50hz in Europe.

@ChristopherPuzey 8 лет назад

"So my monitor runs at 60 fps"

@Will-lt4by 7 лет назад

Can someone explain how the final convolutional layer is 4x4x50? My understanding based on the previous Neural Network video is that the first convolution will produce an output of 24x24x20, but then wouldn't the next convolution, which has 20 kernels, produce 20 images of the first image layer of the 20 produced from the first convolution, and then another 20 on the second image layer of the 20 produced from the first convolution, such that at the end of the second layer you'd have a 20x20x400 output, and so forth until at the end you'd have 4x4x(some large number) not 4x4x50?

@kazedcat 7 лет назад

You decide the depth on each layer. So the first layer will have 20 different 4x4x1 kernels but the second layer will have 20 different 4x4x20. Then after that he uses 50 kernels of 4x4x20 and then 50 kernels of 4x4x50 until the last layer before the fully connected network