YOLO-World - Real-Time, Zero-Shot Object Detection

Подписаться 37 тыс.

Просмотров 3,9 тыс.

50% 1

YOLO-World - It is a zero shot model which means you can detect objects without training your model on it.
GitHub: github.com/Aar...
For queries: You can comment in comment section or you can email me at aarohisingla1987@gmail.com
The YOLO-World builds the YOLO detector with the frozen CLIP-based text encoder for extracting text embeddings from the input texts, e.g., object categories or noun phrases.
The YOLO-World contains an Re-parameterizable Vision-Language Path Aggregation Network (RepVL-PAN) to facilitate the interaction between multi-scale image features and text embeddings. The RepVL-PAN can re-parameterize the user's offline vocabularies into the model parameters for fast inference and deployment.
The YOLO-World is pre-trained on large-scale region-text datasets with the region-text contrastive loss to learn the region-level alignment between vision and language. For normal image-text datasets, e.g., CC3M, we adopt an automatic labeling approach to generate pseudo region-text pairs.

Наука

Опубликовано:

5 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 43

@TheJAM_Sr 5 месяцев назад

I just wanted to say I found your channel this week and really appreciate your classes. I won’t even say they are tutorials because I can take what I learn and easily apply them to my project.

@CodeWithAarohi 5 месяцев назад

Glad you like them!

@arnavthakur5409 7 месяцев назад

Ma'am your work is really incredible

@CodeWithAarohi 7 месяцев назад

Thanks a lot 😊

@harshays2873 4 месяца назад

please make a video for training on custom data for this model

@rickyS-D76 3 месяца назад

Thanks, do you have detailed video on video object detection with label and confidence score...or any other resource that can be helpful. Thank you.

@CodeWithAarohi 3 месяца назад

@bb-andersenaccount9216 7 месяцев назад

good job. however it is not clear when setting the classes if you are giving a description prompt or just picking a pre trained class as usual. the person class you show in the example might be a typical pre trained label class instead a description prompt. this makes the example confusing

@CodeWithAarohi 7 месяцев назад

Thank you for the feedback!

@jeffg4686 6 месяцев назад

Oh nice. How do they come up with these ridiculous names... Is this actually better than grounding DINO, or just faster? Also, do they have safetensors? Do certain model types not work with safetensors, or is this their new plan to infect all the computers?

@learn_with_gaddal 7 месяцев назад

Awesome, thank you so much for sharing this information.

@CodeWithAarohi 7 месяцев назад

You are so welcome!

@ezequieligomez2135 4 месяца назад

Is this pre-trained on O365+GoldG or COCO dataset? How would I get to specifically get the one pre-trained on O365+GoldG?

@soravsingla8782 7 месяцев назад

Awesome

@p.logesharavind3528 7 месяцев назад

This is really cool and interesting .!

@CodeWithAarohi 7 месяцев назад

Glad you like it!

@عدنانمهداوي-ن5ث 6 месяцев назад

Yolo in real time is very slow, you know why??

@anamikamaurya22 6 месяцев назад

My god....now programmer will become the creater of 2025

@aneerimmco 3 месяца назад

informative, Thank you.

@CodeWithAarohi 3 месяца назад

Glad it was helpful!

@ShittheswaranSelvakumar 6 месяцев назад

nice explanation mam.. Thank you...:)

@CodeWithAarohi 6 месяцев назад

Most welcome 😊

@Sunil-ez1hx 7 месяцев назад

Amazing video

@CodeWithAarohi 7 месяцев назад

Thanks for the visit

@2xback2back14 7 месяцев назад

Hello, can you please demonstrate how to give custom text in "text to image generation using stackGAN", and even after 1000 epochs my model doesnt seem to generate birds images. Please help me.

@CodeWithAarohi 7 месяцев назад

I will try to cover this requested topic when I will continue with the GAN playlist.

@hemachandhers 7 месяцев назад

can you put video on fine tuning yolo world on custom dataset mam

@CodeWithAarohi 7 месяцев назад

ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-kl7yszVU6Tg.htmlsi=WRSX79c0QmuMBrWh

@Satchi017 7 месяцев назад

@@CodeWithAarohi Yes, how to build a custom yolo-world model for a totally new class, which is not even in large-scale vision-language datasets (Objects365, GQA, Flickr30K, and CC3M)

@Satchi017 7 месяцев назад

Sorry ma'am, the person class is in the pre-trained classes. I guess the example is biased. How can I detect the car FM antenna on your example image?

@CodeWithAarohi 7 месяцев назад

@@Satchi017 check this: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-WbCgU4GrjV4.htmlsi=qbiPic5BmDPTUAPn

@Satchi017 7 месяцев назад

@@CodeWithAarohi Ma'am, I have viewed the video. Rather than detecting "hard hat" and "gloves", how can I detect the object (Red Probe/wire) in the image (a.jpg)?

@himanshudnk 7 месяцев назад

i still not clear how it is different from traditional yolo models vs yolo world , as it is like we using pretrained model and in that we give classes as per and it is able to detect, is it also like yolov8 for example is trained on 80 classes , so yolo world has more other classes?

@CodeWithAarohi 7 месяцев назад

Using yolov8, We can detect the object classes. Suppose if model is trained on coco dataset then you can only detect those 80 classes which are present in coco dataset. And suppose, you created a custom yolov8 model to detect 5 classes then yolov8 will be able to detect those 5 classes. But in yolo-world, you can write the name of any object you want to detect. And it will detect that object because yolo world is trained on images and their text descriptions.

@iPrashantSmp 7 месяцев назад

How can I know the list of pretrained classes in the YOLOWorld world model?

@CodeWithAarohi 7 месяцев назад

I am not sure but YOLO-World is pre-trained on large-scale vision-language datasets, including Objects365, GQA, Flickr30K, and CC3M

@pifordtechnologiespvtltd5698 7 месяцев назад

Nice

@CodeWithAarohi 7 месяцев назад

Thanks

@informative7410 6 месяцев назад

How to convert yolo world into tflite ???

@CodeWithAarohi 6 месяцев назад

Haven't tried yet

@Hemamalini-f3i 7 месяцев назад

How to convert these detections into annotations?

@CodeWithAarohi 7 месяцев назад

There is no need to convert the detections into annotations for custom object detection. But still if you want to do that then you can write a script to fetch the bounding boxes co ordinates and store them in a file.