How it's works at Inference time. I am not able to get it. Each output with give range between -1 to 1. Now how can I bring it BB into original image.? Kindly tell me the mathematics, how to compute it's? This is where I stuck. Help me🙏
When you say "dont adjust the class probabilities or coordinates" if there are no object centered in that grid cell, you mean simply pass on that cell and move to next, right? So you only backpropagate the NN when there is an object centered in that cell. Am I getting it right?
Hey I'm new to the field of Convolutional Neural Network. I have a presentation in school on YOLO and I need some help. Can someone please explain how the output of the convolution layer works. The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7. I understand that the depth is 64 because of 64 different filters (features) Thank you!
why they use 2 bounding boxes for 1 cell? For localization 1 bounding box for each cell should be enough or? In OpenCv for example the Object Detection draws only 1 bounding box around an object.
i guess more than 2 anchor boxes being in a same grid cell if u use a large grid is relatively low...check out andrew ngs video on yolo on deeplearning.ai s channel
I just Had one question When we know where the ground-truth centre of the object is why can't we scan just that area or nearby area why do we scan the whole image??
Yes you're correct that when we know where the ground-truth center is, we can just scan that area. The problem is generalization i.e. our model will only be good at that specific instance, and when the object happens to be located in another region of the image as is often the case in the test set, the model fails completely and that defeats our training objective and learning wouldn't have taken place in that respective. Hope it makes some sense? Thanks for reading.
This will make it easier (make sure you watch the previous videos as well to understand the building blocks): ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-9s_FpMpdYW8.html . Hope it helps!
YOLO is so fucking hilarious.. it's a big "fuck you" to all these kind of scientists who see things a bit too seriously. I love these kind of things and it gets me motivated in the science field, given that science for most part is very dry and it easily makes you depressed. Just thinking about the fact that "YOLO" will probably be mentioned in my masters thesis is so good :D 0:01 That picture is top notch.
Hey I'm new to the field of Convolutional Neural Network. I have a presentation in school on YOLO and I need some help. Can someone please explain how the output of the convolution layer works. The input to the first convolution network is a 448*448*3 tensor. And it's output is a 224*224*64 tensor on a filter of 7*7. I understand that the depth is 64 because of 64 different filters (features) Thank you!