Filter Count - Convolutional Neural Networks

Подписаться 11 тыс.

Просмотров 15 тыс.

50% 1

Patreon: / animated_ai
Learn about filter count and the realistic methods of finding the best values
My Udemy course on High-resolution GANs: www.udemy.com/...

Опубликовано:

30 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 49

@bycloudAI Год назад

This is channel is RU-vid's undiscovered gold mine, please keep up the amazing content!!

@ChrisRamsland94 Год назад

for real it's wild how much effort effort he puts into these.

@d_b_ Год назад

3:24 "only 2 categories...so 512 features is enough" - this statement sounds like it comes from familiarity with the problem. Is there something more to it? Did you see that number of features used in past papers? Was it from your own experimentation against 256 or 1024 features? Is there some math that arrives at this? I'd like to understand this better, so any additional color you have on this would be helpful!

@animatedai Год назад

You typically want more features than categories. So for something like ImageNET with 1000 categories, 512 wouldn't be enough. You'd want 2048 or higher. But this case only has 2 categories so 512 easily meets that requirement. And the exact value of 512 came from NVIDIA's StyleGAN papers, which is what I based that architecture on. I don't remember them giving a reason for that value, but it gave them good results and a higher value wouldn't fit into memory during training on the Google Colab hardware. It's more of an art than a science so let me know if that doesn't completely answer your question. I'm happy to answer follow-ups.

@d_b_ Год назад

@@animatedai Thank you, that helps

@hieuluc8888 2 месяца назад

0:16 If filters are stored in a 4-dimensional tensor and one of them represents the number of filters, then what does the depth represent?

@nikilragav 2 месяца назад

2:13 how does it stay at the same size? Padding the edges of the original image?

@arjunpalusa9421 Год назад

Eagerly waiting for your next videos..

@afrolichesmain777 2 месяца назад

Its funny you mention that the number of kernels is the least exciting part, my thesis was an attempt on finding a systematic way to reduce the number kernels by correlating them and discarding kernels that “extract roughly the same features”. Great video!

@buddhadevbhattacharjee1363 9 месяцев назад

Please create a tutorial on conv3d as well, and which would better for video processing (conv2d or conv3d)

@marceloamado6223 Год назад

Thank you for the video I really stress out about this matter now I am more calm knowing it's a conventional problem and solving it by euristics is the way

@umamusume1024 6 месяцев назад

When it comes to one-dimensional signals, such as time domain signals collected from speed sensors, what is the difference between visualization of 1D CNN and 2D one? Does it just change the height of the cuboid into 1? And, what algorithms do you recommend for deep learning of one-dimensional time domain signals? I would really appreciate your reply, because as a Chinese student doing an undergraduate graduation project, I can't find any visualization of 1D CNN on the Chinese Internet.

@krzysztofmaliszewski2589 Месяц назад

Which approach did you find more effective for your problem? Dense layers or Conv1D layers? Or did you go another way, e.g., LSTMs?

@MDTANJIDHOSSAIN-c3y 7 месяцев назад

Best video for the CNN.

@lionbnjmn Год назад

hey! do you have any sources on your statements at 2:21(about doubling channels when downsampling) and 2:50 (downsampling units should be followed by dimension-steady units)? currently writing a paper and trying to argue the same point, but i cant find any real research on it :)

@animatedai Год назад

It's just a common pattern that I've seen. There's no shortage of examples if you want to cite them, from the original resnet all the way up to modern diffusion architectures.

@whizziq Год назад

I have one question. What does number of features mean? For example the initial image is 512x512x3 (3 in this case are red-green-blue coloros). But what happened in the next layers? What are these 64, 128 and more numbers of features? Why do we need so many instead of just 3? Thanks. Appreciate your videos!

@Anodder1 Год назад

You have 3 dimensions, 2 spatial and 1 feature dimension. The 2 spatial dimensions encode where the information is and the feature dimension encodes different aspects under which the information can be interpreted. In the beginning you have the 3 color channels but the next layer has a much larger feature dimension in which each index represents one particular aspect like "how much red-green color contrast is between left and right is at this position". These aspects become more higher-level like "is this a circle", so the feature dimension needs to increase to cover all useful interpretations that could be applicable at that point. This agrees well with the shrinking spatial dimensions because each pixel in a later layer represents a larger area of the original image for which these many higher-level interpretations would be necessary.

@pi5549 Год назад

2:08 Don't think we can go 512x512x3 to 512x512xN if filterSize>1. If filterSize=3 we'd be going to 510x510xN, right? Thought experiment: 5 items, slidingWindowLen 3. 3 slide-positions (123 234 345).

@pi5549 Год назад

hmm, I suppose a feature can extend beyond the image by a pixel, might even collect useful information that informs it that it's dealing with an edge. Solving a jigsaw puzzle you usually collect the edge-pieces & try to work with them first.

@animatedai Год назад

This question is actually the perfect lead-in to my video on padding: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ph4LrdntONo.htmlfeature=shared That's actually the video that directly follows this one in my playlist on convolution. You can see the full playlist here: ru-vid.com/group/PLZDCDMGmelH-pHt-Ij0nImVrOmj8DYKbB&feature=shared

@carlluis2045 Год назад

awesome videos!!!

@KJletMEhandelit 9 месяцев назад

This is wow!

@tazanteflight8670 Год назад

Why do your filter examples have a depth of 8 ?

@jntb3000 3 месяца назад

Because the input has a depth of eight

@tazanteflight8670 3 месяца назад

@@jntb3000 Eight ... what? And why was 7 insufficient, and why is 9 too much?

@jntb3000 3 месяца назад

I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight.

@jntb3000 3 месяца назад

@@tazanteflight8670 I believe The depth of each kernel must match the depth of the input. In his example, the input has the depth of eight. hence the kernel has the depth of eight. If the input only has depth of 3 (like RGB colors) then the kernel should have depth of 3 also. I guess we could use kernel depth of just one for all input sizes also.

@hieuluc8888 2 месяца назад

I'm having similar questions, I thought there was only 1 filter?

@igorg4129 Год назад

wow!

@kinvert 2 года назад

Thanks for another great video!

@superzolosolo Год назад

How did you go from 512x512x3 to 512x512x64 while still using a kernel size of 3x3? Wouldnt you have to use a kernel size of 1x1 and have 64 kernels? That is the only way I understand it so far, I'm hoping you explain it later on. Other than that this is super helpful thank you so much 🙏

@animatedai Год назад

You're correct that you need 64 kernels, but the size of the kernels doesn't matter. It's fine to have a kernel size of 3x3 and 64 kernels.

@superzolosolo Год назад

@@animatedai I see now, thanks for clarifying

@adityamurtiadi911plus 7 месяцев назад

@@animatedai have you used padding of 1 to keep the dimension of output same ?

@bagussajiwo321 Год назад

So, if the filter value is 64, that's mean you stack the 512x512's photo 64 times? like you stack that face 64 times? or there are different pixel value for every filter?

@bagussajiwo321 Год назад

take an example like 3x3 matrix with 5 filters 1 0 1 0 1 0 1 0 1 so this value is 5 times stacked bcs the filter value is 5?

@animatedai Год назад

Have you seen my video on the fundamental algorithm? ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-eMXuk97NeSI.html You can think of each filter as a different pattern that the algorithm is searching for in the image (or input feature map). Each output value (each cube) represents how closely that area of the input matched the pattern. So you get a 2D output for each filter. And those outputs are stacked depth-wise to form the 3D output feature map. If you have 64 filters, you'll stack 64 of these 2D outputs together.

@bagussajiwo321 Год назад

@@animatedai i see!! thanks for giving me the previous video's link sorry for my silly question 😅

@omridrori3286 Год назад

This are animation you also use in the course?

@animatedai Год назад

I don't use these in my current course, but I'm planning to incorporate them into a new course that I'm working on now.

@SpicyMelonYT Год назад

How do you halve the resolution? The only way I can Imagine is to have a kernel that has half the size of the input data plus one. Is that correct to do or is something else happening?

@animatedai Год назад

This will be covered in an upcoming video, but to give away the answer: you can pad with "SAME" (in TensorFlow or manually pad the equivalent of it in PyTorch) and use a stride of 2.

@SpicyMelonYT Год назад

@@animatedai Oh I see, and that is still backpropagation compatible? I guess it would be but I have no clue how to do that little step backwards, I assume just act like the data was always smaller for that step

@animatedai Год назад

Yes, that still works fine with backpropagation. Are you working with a library like TensorFlow or PyTorch? If so, they'll handle the backpropagation for you with their automatic differentiation. If you're using a kernel size of 1x1, it would work to act like the data was always smaller (specifically you would treat it like you dropped every other column/row of the data and then did a 1x1 convolution with a stride of 1). But for a larger kernel size like 3x3, all of the input data will be used so that won't work.

@SpicyMelonYT Год назад

@@animatedai Ah I see that makes sense. I am actually building it from scratch in javascript. It is pretty slow but I am doing it do get a better understanding for it and I also find it fun. Also thank you for the responses that is really cool. I think what you are doing with these videos is really sleek and useful. I personally would like if you went into more depths about the actual math and numbers but I completely understand that your goal here is not that and to give a more intuitive explanation for people. Keep it up!

@animatedai Год назад

Good luck on your javascript CNN project! And thank you; I appreciate your support. The math is something I plan to cover; I even have the rough draft of the script written for a future video that goes over the math in detail. I just wanted to focus on teaching the intuition separately first so that it doesn't get lost in the calculation details.

@Looki2000 Год назад

Imagine the ability to build your own TensorFlow neural network using such 3D visualization.

@conficturaincarnatus1034 Год назад

The 3rd dimension part seems a bit tedious, I believe 2D visualizations is more helpful in practicality, just write down in a text the feature count at the bottom and go to the next. And for building neural networks in 2D visualization I've recently found KNIME to be amazing, although you are abstracting entire layers to a single box lmao.