Fast intro to multi-modal ML with OpenAI's CLIP

James Briggs

Подписаться 67 тыс.

Просмотров 13 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

8 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 23

@chanm01 Год назад

This was sick. Thank you for so patiently explaining each step. You could have just run a bunch of stuff you pre-wrote in a notebook. Doing it this way instead makes it an accessible entry point for people who might be interested in getting into ML in a more serious way. Very humbled.

@xiaozaowang5106 4 месяца назад

Thank you so much!! This is exactly what I need.

@antonispolykratis3283 2 года назад

So well spent time for me with this video, thank you so much.

@jamesbriggs 2 года назад

awesome, glad it was helpful :)

@leonardvanduuren8708 2 года назад

I am blown away by your videos and learning every second. You are simply the best out here in this area of computing. I may be starting an academic research in computational linguistics reg. semantic change in loanwords. I would love to get in touch with you.

@jamesbriggs 2 года назад

that's really cool to hear, thanks! Sounds like a fascinating topic to research - for getting in touch I usually recommend the discord chat: discord.gg/c5QtDB9RAP Or there is my email in the "About" section of the channel

@lovol2 4 месяца назад

Thank you

@dontolley1738 5 месяцев назад

Thanks for the great video. I am curious as the what type of performance you get? Obviously the hardware makes a difference, but in general how long does it take to get your results?

@lee155912000 11 месяцев назад

What would be outputted if you were to manually select a random point within the vector space? Would it return an incoherent image? Or would it throw an error?

@caiyu538 Год назад

Great. Great

@avbendre Год назад

this was awesome, how to get the code please , thanks

@ayushranjan2494 7 месяцев назад

What are you using as the ide ? since it suggests auto completion. does it uses Github co-pilot ?

@avbendre Год назад

When implementing this I got an error saying the images are in CPU and so embedding of this will not be possible, I was doing embedding of the image in my google drive with the help of clip embeddings. Have you or any of the people reading my comment has tried this? please respond thanks in advance

@avbendre Год назад

switched from google colab to jupyter notebook and it solved the issue, but ended up using cpu

@smoreshark 10 месяцев назад

what website or app are you using on the getting started section? I'm very very new to coding and stuff

@basi6621 Год назад

great video, thank you. Have you ever tried image+text sematic search on image+text dataset is that a good way to interpret the combination of this embedding? for e.g. (image = 512dim + text = 512dim) which way is better way to combine those two embedding? can i just concatenate it and search on the database concate this vector embedding?

@debayudhmitra9432 4 месяца назад

can you give the github code please

@venkatesanr9455 2 года назад

Thanks for the valuable videos. I hav some doubts, kindly reply. 1. Whether NER tags can be used in semantic search or search engines/information retrieval tasks. Any links will be useful. 2. I hav experienced in usage of sentence transformers whether open AI models are heavy or high dimensional vectors to do similarity search?. 3.Can we apply this clip approach for query (text) mapping with images ( like bill images having texts)/assisted with OCR results. Thks in advance

@jamesbriggs 2 года назад

Hey Venkatesan! 1. You could use NER tags as part of metadata filtering paired with your semantic search - see here www.pinecone.io/learn/vector-search-filtering/ 2. OpenAI models tend to use higher dimensionality vectors, ranging from (I think) 2048-dimensional to ~10K-dimensional, the out-of-the-box performance of OpenAI models are pretty incredible though 3. I'm not sure about this, often vision models struggle with text, but with a vision transformer I'd imagine this is not as major an issue as you have the attention mechanism which should help the model comprehend the image (and therefore written text in an image) as a whole - this is just speculation though, I'm not very familiar with vision models

@venkatesanr9455 2 года назад

@@jamesbriggs Thanks for your kind replies and inputs

@jamesbriggs 2 года назад

@@venkatesanr9455 you're welcome :)

@antonispolykratis3283 2 года назад

why +1 in (0, len(imagenette) +1) ?

@jamesbriggs 2 года назад

I don't know why I wrote that, should be without '+1'. thanks 😅