This video present one of the fastest object detection algorithms for videos that can be used for real time applications. The algorithm is made easy for beginners. This is part 1, and part 2 will also be uploaded soon.
If I get an early reply, it would be very helpful, output feature size is 19x19. How they will create a label this downsampled size. How it mapped on to the original size.
19X19 gives information about each grid cell. So each cell corresponds to a pixel block, where one can find if there is an object present or not, using the confidence score. And if the confidence score is high, it predicts the class of the object, by multiplying confidence score with class score. for that pixel block [ cell]. Then for object of the predicted class, it takes into account the box estimation [anchor box], centre height and width. Finally for each grid cell, it gives you object class, and its bounding boxes, if the object is predicted in the box. Hope this is clear.
What threshold are you talking about ? If it is related to IOU, we generally give a threshold of 0.5 or 0.6. If the data is having very complex samples, then even a small threshold will do.
it's one of the best explanations I have seen. Loved your explanation :) . can you tell me how we get 19X19 as output with a 304X304 input image and 16X16 grid size?
Sorry for the late reply, I was busy with a training programme with 1000+ participants on Machine Learning for computer vision. It is simple, 304/16 =19 on x and y direction.