No video :(

Gemini Demo But With GPT-4 Vision API

Unconventional Coding

Подписаться 10 тыс.

Просмотров 2,4 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 15

@mallardlane8965 8 месяцев назад

Better Demo than Google 🙂

@corvo1068 8 месяцев назад

Your demo is better than the one from Google, it looks like they hand selected screenshots to send and gave more hints in their prompts, but didn't include the whole prompt in the demo.

@robrita 8 месяцев назад

nice demo!! the most interesting part here I'd say is when to capture the screenshot - maybe when you pause talking?? 🤔 and maybe you can add multiple capture when there's movement or diff images every second.

@unconv 8 месяцев назад

When it detects movement, it starts saving all the frames until movement stops. Then it splits the list of frames into six equal parts. Then it takes the sharpest frame from each part and makes a collage from them and sends that to ChatGPT. And yes, when talking stops, it sends a screenshot. If there was movement during talking, it sends the collage as well.

@robrita 8 месяцев назад

@@unconvawesome!! great job!! I love the idea of making a collage - I didn't see that coming. keep it up bro!!

@YuraL88 8 месяцев назад

Wow! Looks impressive! ❤

@Crovea 8 месяцев назад

that last blooper was funny :D

@avi7278 8 месяцев назад

Are you sending all the frames to gpt-v? I have a function which compares subsequent frames in a video and only extracts the ones that meet a difference threshold, so for example out of a 25 second video, it might pull out 7 frames which represent sufficient difference to be significant enough to send to the api.

@robrita 8 месяцев назад

you can even diff screenshots every second, I think that would be sufficient enough.

@avi7278 8 месяцев назад

@@robrita it depends, for example where scrolling text is involved it is not and there is no point in introducing loss potential where there is no cost. A lot can happen in a second.

@robrita 8 месяцев назад

@@avi7278 of course there's a cost, why assume not?? more images will took long time to respond, it's unnecessary resources grabbing for most of the use cases.. it's not like your yolo v8 on your pc 😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆😆

@unconv 8 месяцев назад

It only sends a collage of 6 "strategically selected" frames during movement (one image). And one image after talking stops.

@DarkNetDragoon 6 месяцев назад

Will it work if I try to change the model to gemini vision with all the parameteres?

@thenoblerot 8 месяцев назад

Great demo! Unrelated to this video... I tried your "ok-gpt" code with whisper (the tiny model) on a Pi 4. Recognition works fine, but latency is kind of a deal breaker :( I guess I have a reason to get a Pi 5 now!

@unconv 8 месяцев назад

Thanks! Good to know. Maybe I'll try to switch it to using the Whisper API (and leak all conversations to OpenAI lol)