Тёмный

Fast intro to multi-modal ML with OpenAI's CLIP 

James Briggs
Подписаться 67 тыс.
Просмотров 13 тыс.
50% 1

Опубликовано:

 

8 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 23   
@chanm01
@chanm01 Год назад
This was sick. Thank you for so patiently explaining each step. You could have just run a bunch of stuff you pre-wrote in a notebook. Doing it this way instead makes it an accessible entry point for people who might be interested in getting into ML in a more serious way. Very humbled.
@xiaozaowang5106
@xiaozaowang5106 4 месяца назад
Thank you so much!! This is exactly what I need.
@antonispolykratis3283
@antonispolykratis3283 2 года назад
So well spent time for me with this video, thank you so much.
@jamesbriggs
@jamesbriggs 2 года назад
awesome, glad it was helpful :)
@leonardvanduuren8708
@leonardvanduuren8708 2 года назад
I am blown away by your videos and learning every second. You are simply the best out here in this area of computing. I may be starting an academic research in computational linguistics reg. semantic change in loanwords. I would love to get in touch with you.
@jamesbriggs
@jamesbriggs 2 года назад
that's really cool to hear, thanks! Sounds like a fascinating topic to research - for getting in touch I usually recommend the discord chat: discord.gg/c5QtDB9RAP Or there is my email in the "About" section of the channel
@lovol2
@lovol2 4 месяца назад
Thank you
@dontolley1738
@dontolley1738 5 месяцев назад
Thanks for the great video. I am curious as the what type of performance you get? Obviously the hardware makes a difference, but in general how long does it take to get your results?
@lee155912000
@lee155912000 11 месяцев назад
What would be outputted if you were to manually select a random point within the vector space? Would it return an incoherent image? Or would it throw an error?
@caiyu538
@caiyu538 Год назад
Great. Great
@avbendre
@avbendre Год назад
this was awesome, how to get the code please , thanks
@ayushranjan2494
@ayushranjan2494 7 месяцев назад
What are you using as the ide ? since it suggests auto completion. does it uses Github co-pilot ?
@avbendre
@avbendre Год назад
When implementing this I got an error saying the images are in CPU and so embedding of this will not be possible, I was doing embedding of the image in my google drive with the help of clip embeddings. Have you or any of the people reading my comment has tried this? please respond thanks in advance
@avbendre
@avbendre Год назад
switched from google colab to jupyter notebook and it solved the issue, but ended up using cpu
@smoreshark
@smoreshark 10 месяцев назад
what website or app are you using on the getting started section? I'm very very new to coding and stuff
@basi6621
@basi6621 Год назад
great video, thank you. Have you ever tried image+text sematic search on image+text dataset is that a good way to interpret the combination of this embedding? for e.g. (image = 512dim + text = 512dim) which way is better way to combine those two embedding? can i just concatenate it and search on the database concate this vector embedding?
@debayudhmitra9432
@debayudhmitra9432 4 месяца назад
can you give the github code please
@venkatesanr9455
@venkatesanr9455 2 года назад
Thanks for the valuable videos. I hav some doubts, kindly reply. 1. Whether NER tags can be used in semantic search or search engines/information retrieval tasks. Any links will be useful. 2. I hav experienced in usage of sentence transformers whether open AI models are heavy or high dimensional vectors to do similarity search?. 3.Can we apply this clip approach for query (text) mapping with images ( like bill images having texts)/assisted with OCR results. Thks in advance
@jamesbriggs
@jamesbriggs 2 года назад
Hey Venkatesan! 1. You could use NER tags as part of metadata filtering paired with your semantic search - see here www.pinecone.io/learn/vector-search-filtering/ 2. OpenAI models tend to use higher dimensionality vectors, ranging from (I think) 2048-dimensional to ~10K-dimensional, the out-of-the-box performance of OpenAI models are pretty incredible though 3. I'm not sure about this, often vision models struggle with text, but with a vision transformer I'd imagine this is not as major an issue as you have the attention mechanism which should help the model comprehend the image (and therefore written text in an image) as a whole - this is just speculation though, I'm not very familiar with vision models
@venkatesanr9455
@venkatesanr9455 2 года назад
@@jamesbriggs Thanks for your kind replies and inputs
@jamesbriggs
@jamesbriggs 2 года назад
@@venkatesanr9455 you're welcome :)
@antonispolykratis3283
@antonispolykratis3283 2 года назад
why +1 in (0, len(imagenette) +1) ?
@jamesbriggs
@jamesbriggs 2 года назад
I don't know why I wrote that, should be without '+1'. thanks 😅
Далее
OpenAI CLIP Explained | Multi-modal ML
33:33
Просмотров 22 тыс.
Starman🫡
00:18
Просмотров 13 млн
Пришёл к другу на ночёвку 😂
01:00
How AI 'Understands' Images (CLIP) - Computerphile
18:05
Why Does Diffusion Work Better than Auto-Regression?
20:18
OpenAI's CLIP for Zero Shot Image Classification
21:43
Fast Zero Shot Object Detection with OpenAI CLIP
29:32
OpenAI CLIP model explained
12:08
Просмотров 2 тыс.
Semantic Chunking for RAG
29:56
Просмотров 22 тыс.
[1hr Talk] Intro to Large Language Models
59:48
Просмотров 2,1 млн
RAG from the Ground Up with Python and Ollama
15:32
Просмотров 30 тыс.