Processing Videos for GPT-4o and Search

James Briggs

Подписаться 68 тыс.

Просмотров 6 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Наука

Опубликовано:

1 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 22

@jamesbriggs 4 месяца назад

📌 Code: github.com/aurelio-labs/semantic-chunkers/blob/main/docs/01-video-chunking.ipynb 📖 Article: www.aurelio.ai/learn/video-chunking

@alexilaiho6441 4 месяца назад

Probably one of the best applied AI channel. No fluff, only useful information. Your series on vector similarity is how I understood the entire topic. Thanks and Cheers mate!

@jamesbriggs 4 месяца назад

Vector sim series takes me back! Glad to hear this

@matti7529 4 месяца назад

Nice approach. Would be interesting to investigate how effective, fast and efficient this is versus using say scene detection built into ffmpeg or by looking for large discontinuities in perceptual hashes of frames indicating a rapid change of scene. Alternatively investigating the way the codec has written data as key frames may also work. I may have a look in to this my copious (ahem!) free time!

@st.3m906 4 месяца назад

Very cool! Is there a way to have it know what timestamp that video is at when it does it's chunking/cutting?

@jamesbriggs 4 месяца назад

Yeah, we’re building an abstraction in semantic chunkers that will provide info like this by default - it will come with some other useful features for video too

@st.3m906 4 месяца назад

@@jamesbriggs that would be AMAZING - I'm really looking forward to that. Thanks for the video :)

@wolpumba4099 4 месяца назад

*Video Summary: Semantic Chunking for Efficient Video Processing* This video demonstrates how to efficiently process videos using "semantic chunking," a method commonly used with text, but applicable to other modalities like video. *Key Points:* * *Why Chunk Video? (**0:25**)* Recent multi-modal models like GPT-4o and Gemini 1.5 can process videos, but feeding every frame can be inefficient and expensive. * *Semantic Chunking (**0:00**):* This method utilizes image embedding models to identify semantically similar frames and split the video into meaningful "chunks" based on changes in content. * *Implementation (**1:59**):* The video uses the `semantic-chunkers` library and explores two examples: * A simple video with a bunny and butterfly with clear scene changes. * A more complex video of a man driving and interacting with his car in various settings. * *Model Selection (**3:28**):* The video uses a Vision Transformer (ViT) model, highlighting that while effective for broad classification, it might not be ideal for fine-grained semantic understanding. * *Alternative Model: CLIP (**5:58**):* The video showcases CLIP, a model trained on semantic similarity, proving to be more sensitive to nuanced content changes and yielding more granular chunks. * *Benefits (**11:29**):* Semantic chunking allows for more efficient processing by focusing on relevant frames, saving time and money when feeding video data to AI models. *In conclusion, the video advocates for semantic chunking as a valuable technique for processing videos intelligently and efficiently, especially when working with expensive and time-consuming AI models.* i summarized the transcript with gemini 1.5 pro

@madkimchi5444 4 месяца назад

Cool demo. Any js/ts based libraries that can achieve this kind of functionality? I've looked in transformers.js and CLIP, but they mostly offer image classification. Do you think "Xenova/clip-vit-base-patch32" could work for this?

@RealLexable 4 месяца назад

Let it process to programme automatically RU-vid tutorials 😅

@RPhaF 4 месяца назад

Great video, this might be exactly what I'm looking for... Is the video semantic chunker able to detect when the content of a given slide for a presentation / course has changed (without needing to do text extraction) ?

@dumay_sacha 4 месяца назад

Thanks @jamesbringgs. Again super useful! Any idea if we can generate real video processing in AI at this stage. I am eager to animate someone talking via AI in real time. Hope to get there soon.

@fadiyounes6817 4 месяца назад

Really like the idea and how the library is very easy to use!! 🔥 I'd love to know the algorithm behind the different types of chunckers implemented in the library

@sanchaythalnerkar9577 4 месяца назад

Too good , I was actually wondering how did they solve the issue. And yea semantic search for images!

@vinception777 4 месяца назад

Damn this is awesome, I want librairies like these for Typescript 😂 I'm so going to turn to the darkside and learn to do webservices both in Python and Typescript/Node hahaha

@jamesbriggs 4 месяца назад

haha, yes join us

@trashchenkov 4 месяца назад

Thanks for the video! The trouble with colors caused by OpenCV and matplotlib. OpenCV library uses BGR while matplotlib uses RGB. So blue and red are mixed up.

@jamesbriggs 4 месяца назад

nice, I assumed the channels were messed up somewhere - that explains it, thanks!