No video :(

Vertex AI Matching Engine - Vector Similarity Search

ML Engineer

Подписаться 2 тыс.

Просмотров 14 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 73

@jobiquirobi123 2 года назад

Nice tutorial. Matching engine is really promising but it does require some setup, I will try to reproduce this tutorial and see what happens.

@ml-engineer 2 года назад

See many customers moving to Matching Engine, they're all happy with it. Only time to update the index could be quicker. But I guess this depends on the requirements. Getting new embeddings into the index in real time is not possible. Though there are workarounds.

@rubenszimbres 2 года назад

@@ml-engineer Sascha, do you think online inference is possible by running Cloud Run/ Cloud Functions on a Vertex AI endpoint, getting the embeddings and then submitting to Matching Engine ANN ? I was wondering if this may be a solution ....

@ml-engineer 2 года назад

Hi Rubens yes embeddings models are usually hosted with Vertex AI Endpoints or with Cloud Run (if you don't need a GPU). After you did the inference you need to tell the Matching Engine Index to take the new embeddings / vector into account, stored on Cloud Storage. And that's currently the bottleneck as this indexing process takes quite a long time.

@alexchan5643 Год назад

Thanks for the walkthrough. The documentation from GCP is quite messy It doesn't seem to have great support for metadata filtering compared to other stores, only very basic operations. Any thoughts from your experience?

@ArmandoCuevas-sx5cf Год назад

I would like to know the answer to this one too. I don't see support for metadata as pinecone does.

@alexchan5643 Год назад

@@ArmandoCuevas-sx5cf Based on my further investigations over the past week the metadata filtering is restricted to string matching only with key/value pairs (so no comparators on numeric values) and the idea is to pair the matching engine IDs with another key-value store like Bigtable where you could possibly do further complex filtering-comparing this setup to to Pinecone or Qdrant and considering the costs, I don't think I would use Matching Engine

@ml-engineer Год назад

Hi Alex Hi Armando Matching Engine as Alex already said supports string matching on metadata. cloud.google.com/vertex-ai/docs/matching-engine/filtering Pinecone is indeed more flexible on this point.

@ArmandoCuevas-sx5cf Год назад

@@alexchan5643 thanks a lot, that's helpful and you're right having metadata filtering availablo is a big advantage for Pinecone.

@user-fw7bj3jj7x Год назад

Amazing thank you! I'm really keen to see that video about how to use Cloud Run to make the Vertex AI Endpoint more accessible, did you end up making that video?

@ml-engineer Год назад

Hi Google released public endpoints it is no longer required to use a VPC network. Therefore you not necessarily need a Cloud Run service in front. Here is the documentation for the public Matching Engine endpoint: cloud.google.com/vertex-ai/docs/matching-engine/deploy-index-public In case you are still interested in the Cloud Run approach here I have a sample implementation for a image similarity matching solution github.com/SaschaHeyer/image-similarity-search/tree/main/query-service. The critical part is in the cloudbuild.yaml that contains the reference to the VPC network github.com/SaschaHeyer/image-similarity-search/blob/main/query-service/cloudbuild.yaml Let me know if that helps

@MOHAMMADAUSAF 6 месяцев назад

Hey awesome starter, just a question, given i have a index created with a bucket, if i were to add new files to the same bucket, will the index reflect the new data files, either by itself or even by triggering ? or simply put, how can i add new data from a bucket to an existing index without rebuilding entire index again, something equivalent of pinecone or weaviate upsert functionalities ? the docs arent helping me here

@ml-engineer 6 месяцев назад

Hi Mohammad I can recommend to use Vertex AI Vector Search / Matching Engines streaming capabilities. This way you can simply send new data via SDK to the vector database. Check out my sample repo to get you started github.com/SaschaHeyer/Real-Time-Deep-Learning-Vector-Similarity-Search It's the same process likes pinecones upsert.

@federicoph3407 Год назад

Thank you for the tutorial! Is it possible to choose the machine type? I tried with 100 vectors (94 kb), and in the endpoint's basic info I see machine-type: n1-standard-16. In the documentation it seems that there is a default machine based on shard size. The documentation says: "When you create an index you must specify the shard size of the index", but there is no parameter that refers to shard size during Index creation. There is also written "you can determine what machine type to use when you deploy your index" but, same as before, there is no parameter that refers to machine-type. I am a bit confused :/

@federicoph3407 Год назад

documentation: matching-engine -> create-manage-index?hl=en#create-index

@ml-engineer Год назад

Hello Federicoph, That is indeed a good question that is not covered in the video nor the article =). The machine-type can be defined when deploying the index. Like mentioned in the documentation deploy_index But if you actually check the the gcloud command there is nothing documented cloud.google.com/sdk/gcloud/reference/alpha/ai/index-endpoints/deploy-index So I always fall back to the actual implementation and there you can see the deploy_index method is indeed accepting a machine type. See here github.com/googleapis/python-aiplatform/blob/90bb8ef3d675af62b7cc1f0d2fdf99b476e8dde5/google/cloud/aiplatform/matching_engine/matching_engine_index_endpoint.py#L542 In your use case you can set it to the smallest machine. Also because you only have 100 vectors I recommend to use the brute force algorithm. Let me know if that helps.

@ml-engineer Год назад

Quick appendix This is reflected in the API documentation as well cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexEndpoints/deployIndex see the request body cloud.google.com/vertex-ai/docs/reference/rest/v1/projects.locations.indexEndpoints#DeployedIndex especially this part cloud.google.com/vertex-ai/docs/reference/rest/v1/DedicatedResources

@tyronehou3553 10 месяцев назад

Great tutorial! Can you update algorithm parameters like leafNodeEmbeddingCount and leafNodesToSearchPercent on the fly? I tried using the gcloud update index command, but nothing changes when I describe the index afterward, even when the operation is complete

@ml-engineer 10 месяцев назад

Hi no they can only be set during index creation it is not possible to update them. That's because a update would require a full index re-build which in the end is the same as creating a new index.

@ramsure9246 Год назад

Thanks for tutorial. Is there any Langchain compatible retriever for this matching engine index ?

@ml-engineer Год назад

Yes there is langchain support for Matching Engine. The Google team implemented it a few weeks ago. github.com/hwchase17/langchain/pull/3104 Currently writing an article on it that will be published in the next days. from langchain.vectorstores.matching_engine import MatchingEngine vector_store = MatchingEngine.from_components( index_id=INDEX_NAME, region=MATCHING_ENGINE_REGION, embedding=embeddings_llm, project_id=PROJECT_ID, endpoint_id=ENDPOINT_NAME, gcs_bucket_name=DOCS_BUCKET) relevant_documentation=vector_store.similarity_search(question, k=8)

@nooralsmadi5017 Год назад

Hi , How can I make it work from outside the network?, I mean send a request and get a response from out side the network ?

@ml-engineer Год назад

To send your request to the Matching Engine you need to be "inside" of the network. This can be complicated if you want to integrate it into an service that is running outside of that network. There is one simple approach that I really like. You can implement a Cloud Run Service that is part of the VPC network and takes your requests. This Cloud Run service can be also reached from outside the network. I have implemented exactly that in one of my other articles medium.com/google-cloud/recommendation-systems-with-deep-learning-69e5c1772571

@akarshjainable Год назад

where did you mention the schema of the data file(the one with input embedding vector)?

@ml-engineer Год назад

What do you mean with schema? The Matching engine does not need a schema as we just provide the embeddings. Can you rephrase 🙂 in case I misunderstood your question.

@akarshjainable Год назад

@@ml-engineer aah got it , so the embedding input vector file has to be in the format{"id":"string","embedding":[vector]}

@ml-engineer Год назад

are you referring to this section of the video? ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-KMTApM5ajAw.html

@akarshjainable Год назад

@@ml-engineer yes precisely.

@ml-engineer Год назад

Yes exactly. Alternative file formats are CSV or AVRO

@LucasGomide Год назад

Great content. Can u tell me about some alternatives? I am studying some options such as using pgvector with some model to generate embedding VS matching engine. I would like to understand pros /cons about those approaches

@ml-engineer Год назад

Hi Lucas Pinecone is also a highly recommended product. Or you can go open source with Faiss or Annoy but this requires you to take care of the infrastructure yourself. If you want a similarity search I recommend to either go with Matching Engine or Pinecone.

@anjanak8303 2 года назад

Thank you for the tutorial. With the avro format there is an allow and deny option that you can set for the embeddings inserted. There is little documentation as to how to use this in a query. Could you help with this?

@ml-engineer 2 года назад

Hello Anjana are you refering to the filtering functionality? cloud.google.com/vertex-ai/docs/matching-engine/filtering

@anjanak8303 2 года назад

@@ml-engineer Yes, the same. Could you tell me how to incoorporate that into a query? I have an idea on how to have it inserted in the index, but would be good if you could give a clarity there as well. Thanks for replying :)

@ml-engineer 2 года назад

Got it. Yeah it's not well documented. But if you check the proto file you can get an understanding on how you can use it when querying the matching engine. As simple as applying it to your query request namespace = match_service_pb2.Namespace() namespace.name = 'color' namespace.allow_tokens.append('red') request = match_service_pb2.MatchRequest() request.deployed_index_id = DEPLOYED_INDEX_ID request.restricts.append(namespace)

@anjanak8303 2 года назад

@@ml-engineer This worked!! I had not looked at the proto file in much detail, thank you so much😃

@ml-engineer 2 года назад

@@anjanak8303 perfect. I add this to the article I hope we can help more people having the same question.

@kadapa-rl6jg Год назад

Hi, Can you please help me understand how to orchastretate vertex AI through cloud composer

@ml-engineer Год назад

Hi I have written a comparison article between Cloud Composer and Vertex AI Pipelines to orchestrate ML pipelines. medium.com/google-cloud/vertex-ai-pipelines-vs-cloud-composer-for-orchestration-4bba129759de In general, if you want to use Vertex AI's capabilities as part of Cloud Composer, you can simply use the Vertex AI SDK as part of your composer tasks. Though I would highly recommend switching to Vertex AI Pipelines: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-gtVHw5YCRhE.html

@kadapa-rl6jg Год назад

@@ml-engineer my requirement is to orchastretate vertex ai pipeline through cloud composer via Terraform code

@elijahdecalmer613 2 года назад

you are a legend

@ml-engineer 2 года назад

¯\_(ツ)_/¯

@elijahdecalmer613 2 года назад

Excuse me, you briefly mention that there are workarounds to simulate real time indexing. Could you explain the options for this? Or point me to some docs. Beginner trying to work it out for a project :)

@ml-engineer 2 года назад

The feasibility of the solution depends on the number of new vectors you get between the indexing updates. You store the vectors that need to be indexes in the next index update round in Memorystore for fast millisecond access. Build a for example Cloud Run application that takes the vectors from Memorystore and calculate the distance yourself (it's just simple math). The same Cloud Run application also calls the Matching Engine. And in the end you combine the results if the distance is in your desired range. On long term I hope for quicker index updates using GPUs.

@ml-engineer Год назад

@@elijahdecalmer613 Google added streaming support which makes it easier to get new vectors into the index.

@ahmedmansouri2054 7 месяцев назад

@@ml-engineer if I want to update the new indexes in real-time can I just add new files in the GCS folder where your vector data is stored or do I have to add it Programmatically?

@majidalikhani2765 Год назад

Hey what is the parameter that decides the number of neighbours returned? I tried changing num_neighbours to no avail. it only returns 10 neighbours

@ml-engineer Год назад

Hi Majid You can define the number of neighbors you you want to retrieve when calling the matching endpoint response = my_index_endpoint.match( deployed_index_id=DEPLOYED_INDEX_ID, queries=..., num_neighbors=NUM_NEIGHBOURS )

@majidalikhani2765 Год назад

@@ml-engineer But in this tutorial you don't query this way. Instead match_service.proto is used which has a field num_neighbours = 3. But always returns 10 neighbours

@ml-engineer Год назад

@@majidalikhani2765 yes Google changed the way to get matching results, since o released the video. No need for complex .proto file handling anymore. Just us the SDK in the same way like creating the index much easier.

@ml-engineer Год назад

Will add the new way to the notebook in the next few days and publish an additional video.

@majidalikhani2765 Год назад

@@ml-engineer Google's documentation very poor smh. I got it working via the sdk. thanks

@AyushMandloi 9 месяцев назад

What is need to endpoints ? When u will be uploading more videos ?

@ml-engineer 9 месяцев назад

Hi Ayush what do you mean with your endpoint question? Recording 4 new videos about Generative AI on Google Cloud at the moment will be released in the next weeks.

@akarshjainable Год назад

Can I do a batch prediction on index, if Yes , Do I need a vpc network for that?

@ml-engineer Год назад

You need a VPC network this is a requirement to run queries against the index. Batch prediction over the complete index is not possible. This is due to the nature of the index you only get the k_nearest neighbors.

@akarshjainable Год назад

probably getting a bit greedy here, do you have plans to upload tutorial on two tower?

@ml-engineer Год назад

no worries, love all the comments here on youtube. Yes I release an article next week It's a deep dive on how to use the two-tower algorithm + Matching Engine + Vertex AI Pipelines to build a Deep Learning Recommendation Engine.

@akarshjainable Год назад

@@ml-engineer Thanks a ton

@ml-engineer Год назад

The article is published medium.com/google-cloud/recommendation-systems-with-deep-learning-69e5c1772571

@niladrishekhardutt 2 года назад

Great tutorial! How does the deny list work? Let's say I have a class fruit which will ONLY have deny list tokens (no allow) such as "apple", "mango", etc. How do I filter out "mango" in the query (search all fruits except mango)? I have tried the following method but it does not work as expected json {"id": "1", "embedding":[0.002792,0.000492], "restricts": [{"namespace": "fruit", "deny": ["mango"]}]} query deny_namespace = match_service_pb2.Namespace() deny_namespace.name = "fruit" deny_namespace.deny_tokens.append("mango") request.restricts.append(deny_namespace)

@ml-engineer 2 года назад

Hello Niladri thanks a lot. (Anjana in the comments had a similar question about allow tokens.) Your JSON and query are definitely correct. I don't see any issues here. Did you make sure to update the index after adding the restricts filter into the JSON?

@niladrishekhardutt 2 года назад

@@ml-engineer Hey Thanks for the quick reply. Yes, I have completely overwritten the index twice now (just to be sure) but it still doesn't seem to work. Is there any requirement for the token to be on the allow list as well?

@ml-engineer 2 года назад

Deny alone is without allow possible. see documentation: cloud.google.com/vertex-ai/docs/matching-engine/filtering#denylist {} // empty set matches everything {red} // only a 'red' token {blue} // only a 'blue' token {orange} // only an 'orange' token {red, blue} // multiple tokens {red, !blue} // deny the 'blue' token {red, blue, !blue} // a weird edge-case {!blue} // deny-only (similar to empty-set) See the following description: When a query denylists a token, matches are excluded for any datapoint that has the denylisted token. If a query namespace has only denylisted tokens, all points not explicitly denylisted, match, in exactly the same way that an empty namespace matches with all points. So the issues has to be somewhere else

@niladrishekhardutt 2 года назад

@@ml-engineer Unfortunately, this does not seem to be working :( I have looked at my JSON multiple times now and tried different variations but it still fails. Do you have any ideas?

@federicoph3407 Год назад

Hi @Nilandri, did you solve your problem? If yes, can you explain how please? If no, I got the same problem with the allow-list-tokens, I opened an issue on github and on googlecloudcommunity. Thank you in advance! @ML Engineer