Great video -- love how you covered the topic from high level down to the actual implementation. Question: you used the "Resnet" model in this video -- if I wanted to use the OpenAI Clip model instead, is it basically the same process (except switching the model name)? Do you think the CLIP model would be better for general purpose text/image search?
Thanks for the comments. I used the simplest ResNet model as it has 512 feature vectors, the more complex models has 1024, which would make for a larger Annoy index. (I have not tested this to see if it would be more accurate.) It is possible to use other networks, as long as you can get the values from the features layer in the network. I've not looked at Open AI Clip, if you are using it you will need a way to get the values for the feature vector from the network.
Interesting video, i built out something similiar using Tensorflow and scikit learn's nearest neighbour module. I've also tried using cosine distance as well with good results. Is Spotify Annoy the same NN algo? Thanks
Hi, Thanks, I don't think Spotfy Annoy is the same as scikit's nn algorithm, there is a great presentation on Annoy here with more details of how the algorithm works: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-QkCCyLW0ehU.html