Get expert guidance, insider tips n tricks and Create stunning images, learn to fine tune diffusion models, advanced Image Editing techniques like In-Painting, Instruct Pix2Pix and many more. Join our Kickstarter campaign now! bit.ly/3JYh7A6
Very good explanation. Hi Sir. I have been following your tutorial on how to train a custom Yolov5 object detector as I am doing a school project on vehicle detection. I am having an error on training my model. Is it ok if you can help on this please.
Many thanks for this great video! You mentioned that one can use any object detection model for yolo pose - could you elaborate on that? How could one plug in the smallest version of yolov7?
You would need to retrain the network with a different backbone. The authors have trained it for the YOLOv7-W6 model. You can train the model using a different yolov7 model. What you would need is a config (.yaml) file corresponding to the smaller model. You can then train the model using the commands given here: github.com/WongKinYiu/yolov7/tree/pose I doubt it would give accurate results for smaller models. I would use mediapipe if I don't need multi-person pose estimation.
📚 LINK TO BLOGPOST: learnopencv.com/yolov7-pose-vs-mediapipe-in-human-pose-estimation/ ▶ LINK TO YOLO MASTERCLASS PLAYLIST: ru-vid.com/group/PLfYPZalDvZDLALsG9o-cjwNelh-oW9Xc4
As of 2024 Jan update, Mediapipe does supports mutiperson pose but limited to 5 at a time. For further info check out: developers.google.com/mediapipe/solutions/vision/pose_landmarker/
Thanks for the kind words Geoff! YOLO does not have good enough number of points for Face landmarks alignment. Mediapipe has a dedicated face mesh model that gives 468 3D landmark points on the face. You can check out our blog post on Creating Snapchat filters using mediapipe. You can learn about how to use the different points for your application. learnopencv.com/create-snapchat-instagram-filters-using-mediapipe/
As mentioned in the summary section, it's better to use YOLOv7 or other pose models as mediapipe is optimized for real-time performance which is more suitable for video inference. Hope that helps!
@@dj.qb91 For Multiperson we're checking out MMPose next -> github.com/open-mmlab/mmpose. You may also check it out and compare with YOLOv7. Check this out for getting started: mmpose.readthedocs.io/en/v0.29.0/get_started.html#inference-with-pre-trained-models
The pose solution model consists of two models. The detection model (that detects the body), and the landmark model (that maps the landmarks). If you can make the detection model detect the body without its upper part, theoretically, the solution will work.