Тёмный

Ollama 0.1.26 Makes Embedding 100x Better 

Matt Williams
Подписаться 26 тыс.
Просмотров 44 тыс.
50% 1

Embedding has always been part of Ollama, but before 0.1.26, it kinda sucked. Now it’s amazing, and could be the best tool for the job.
Yes I know I flubbed the line about bun. It’s not an alt to js. It’s a whole new runner for js/ts. Makes typescript which is a better js even better than it was.
Be sure to sign up to my monthly newsletter at technovangelist.com/newsletter
And if interested in supporting me, sign up for my patreon at / technovangelist

Наука

Опубликовано:

 

21 фев 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 212   
@Slimpickens45
@Slimpickens45 4 месяца назад
I am here for it. Lets goooo! And yes, videos on vector DBs would be amazing.
@dinoscheidt
@dinoscheidt 4 месяца назад
Postgres pgvector or Redis. Done. Vectors in DBs is incredible easy - beside all the very adversarial hype and marketing - what is hard is to iterate i.e. on the chunking size.
@Hypersniper05
@Hypersniper05 4 месяца назад
Easy just use plain Json to store the emeddings and text locally 😊 , granted not for scale but local projects it's fast enough
@efexzium
@efexzium 4 месяца назад
does anyone know where ollama rag codes exist?
@technovangelist
@technovangelist 4 месяца назад
ollama itself doesn't do anything with rag. rag would be part of the solution you build with ollama.
@MEDEBER-ENGINEERS
@MEDEBER-ENGINEERS 4 месяца назад
Definitely looking forward to vector DBs video.
@ChetanVashistth
@ChetanVashistth 4 месяца назад
You are a great teacher!! I want to see more videos of yours. Thanks for your service🙇
@joan_arc
@joan_arc 4 месяца назад
Hi Matt, thanks for making these videos. It is very informative and helpful.
@guidoschmutz
@guidoschmutz 4 месяца назад
Thanks a lot for all your videos, this one really helped me a lot, just started with Ollama and local LLMs a week ago and was using llama2 for embeddings and it was painfully slow and I didn't event know that it can be faster until I watched this video yesterday evening. Just changed it to using "nomic-embed-text" and I love it :-) Thanks and keep up the good work! I also really like you humor !!!
@NLPprompter
@NLPprompter 4 месяца назад
thank you, I really appreciate your works and support. can't wait next video.
@rccmhalfar
@rccmhalfar 4 месяца назад
Thanks for your superb videos, your content is so rich and well paced - would like to see more about model training using ollama and embedding
4 месяца назад
Thank you Matt for making these videos!
@sun33t
@sun33t 4 месяца назад
Thanks for posting these videos mate. I’m finding them so helpful in orienting myself in the world of ai tooling 🎉
@nicholasdudfield8610
@nicholasdudfield8610 4 месяца назад
Vids keep getting better - and thanks - I overlooked the embeddings due to gemma!
@archamondearchenwold8084
@archamondearchenwold8084 4 месяца назад
Your voice is amazing. I could listen to you present on anything man. Amazing video
@lucioussmoothy
@lucioussmoothy 4 месяца назад
Very informative and on point ..Keep up the good work Matt.
@joeburkeson8946
@joeburkeson8946 4 месяца назад
Looking forward to when tools to embed documents into models become available, thanks for all you do.
@brian2590
@brian2590 4 месяца назад
I jumped when i saw this. This is very exciting for me. Thank you!
@janduplessis1357
@janduplessis1357 4 месяца назад
Hi Matt, love your content - super stuff thank you, this is exactly what I was looking for and you explain it so well, I am working on a project of RAG search using open-source for a big Genomics project, providing specific information to users of the service, really detailed information about which test to request etc this video came just at the right time 👍
@technovangelist
@technovangelist 4 месяца назад
Great. Maybe I should suggest it to my sister who does that kind of thing.
@SyntharaPrime
@SyntharaPrime 3 месяца назад
Thank you for your great effort
@HistoryIsAbsurd
@HistoryIsAbsurd 4 месяца назад
Definitely still learning on this topic here so thank you for the vid! Be interesting to dive into
@trsd8640
@trsd8640 4 месяца назад
Great video! Embeddings take Ollama to the next level! And I love that you dont lose a word about Gemma ;)
@Turbozilla
@Turbozilla 4 месяца назад
I'm loving your videos! I really like that their to the point. Out of all the RU-vidrs doing video in this AI, LLM space, I enjoy yours the most. Keep the coming! Tell your family this is more important! Lol 😮. I'm kidding. 😂
@LordOfRuin
@LordOfRuin 4 месяца назад
Thank you! swapping my langchain embedding model with nomic-embed-text, really speed it up. This really is bigger news then gemma
@artur50
@artur50 4 месяца назад
Having a ball of laughter at the end . Cheers!
@JoshuaMcQueen
@JoshuaMcQueen 4 месяца назад
Really nice video Matt. We're thinking about doing a similar video testing top 5-10 vector DBs
@c0t1
@c0t1 4 месяца назад
I really loved this video! Great and super timely topic. Yes on a Vector DB comparison video.
@marcosissler
@marcosissler 4 месяца назад
Thank you Matt! 🎉
@disturb16
@disturb16 4 месяца назад
Could you share the source code of the examples you use in your videos?
@efficiencygeek
@efficiencygeek 4 месяца назад
Yes, please, specially the python script.
@potatodog7910
@potatodog7910 4 месяца назад
That would be helpful
@jrfcs18
@jrfcs18 4 месяца назад
Please share code you show in your example
4 месяца назад
Thank you for the video. I was looking into calling embedding in golang since all embedding services were very slow. PS: I thought there was a surprise at the end since there was a silent part after you finished talking.
@technovangelist
@technovangelist 4 месяца назад
There is a crowd of fans that love that at the end.
@joxxen
@joxxen 4 месяца назад
You are great, your content is great. Thanks
@vikrantkhedkar6451
@vikrantkhedkar6451 3 месяца назад
Great video i was really trying find some open source embedding model❤❤
@JulianHarris
@JulianHarris 4 месяца назад
This is absolutely brilliant. Also, to answer your question, looking at vector databases, I think a useful distinction is whether they support Colbert-style embeddings because Colbert is clearly the way forward when you want high-quality embeddings.
@karanv293
@karanv293 4 месяца назад
This is such good content. Can you do a full video tutorial on a production case of a best rag strategy. There's so many out there .
@riftsassassin8954
@riftsassassin8954 4 месяца назад
I personally struggle to understand and use embeddings effectively. This video is highly appreciated! please do go on a deep dive on the differences on vector db providers. I'll definitely like and share if you do!
@miikalewandowski7765
@miikalewandowski7765 4 месяца назад
Haha 😂 I love the ending! Reminds me of Roy Anderssons brilliant movie „Songs from the second floor“. Also great content. Keep it up 👌
@aisimp
@aisimp 4 месяца назад
Love the delivery. Got me laughing with “Hello World of RAG 😂” … totally agree 👍
@user-ne8kj2hx3j
@user-ne8kj2hx3j 4 месяца назад
Great video! would love to see the vector DB video as well
@martinisj
@martinisj 4 месяца назад
A video on vector databases would be great. As always, please do not forget to include a brief how-to, those well-thought snippets in your videos really do make a difference. Thanks!
@elanrider
@elanrider 4 месяца назад
All in for vector DBs!
@hossainmahi3559
@hossainmahi3559 4 месяца назад
Thanks a lot for your great videos! Please make a video on "how to" and "which" of vector databases.
@andrewowens5653
@andrewowens5653 4 месяца назад
@Matt Williams. It would be nice if you could do a video to clarify exactly which extended instruction sets are needed on the CPU to support Ollama? My old i7, only supports first generation AVX.
@ralphv.l8066
@ralphv.l8066 4 месяца назад
Thanks!
@technovangelist
@technovangelist 4 месяца назад
OMG, this is way too kind. You need to let me know how I can help you in any way. thanks so much.
@user-xj5gz7ln3q
@user-xj5gz7ln3q 4 месяца назад
Great video as always. Question: How is it different when using embeddings via the Mistral 7B model compared to BERT? I have been utilizing the Mistral 7B model with a 4096 vector dimension, hoping to capture more contextual information compared to BERT's 1536 vectors. However, I didn't notice any speed difference between the two. Just curious if anyone else has tried it and noticed any pros or cons.
@brandonheaton6197
@brandonheaton6197 4 месяца назад
Definitely do the side by side for the db options in the context of ollama on something like an M2. Our work machines for the public school system are M2s with only 8 gigs of RAM, as a reference point. The potential for a local teaching assistant is definitely close
@Pablo-Ramirez
@Pablo-Ramirez Месяц назад
Hello, all your videos are very interesting. I have been working for some time with Ollama and models like Phi3 and Llama3 and some specific models dedicated to Embedding. What I have not been able to solve when there are several similar documents, for example procedures, how can I bring the correct data when they are so similar. He brings me the information, however, he always mixes it up. Cheers and thanks for your time.
@colliander242
@colliander242 2 месяца назад
A great addition to Ollama. Hopefully, batching will be supported soon. As of now, it is one API call per string which makes it less suitable for larger data sets
@technovangelist
@technovangelist 2 месяца назад
I’m not sure I see the issue. Any competent developer can work with this.
@artur50
@artur50 4 месяца назад
if you could provide a full tutorial on that that would be awesome
@aminzarei1557
@aminzarei1557 4 месяца назад
I usually use all-MiniLM-L6-v2 with it's 384 dim and it's just work for most of the cases, Tiny but accurate and fast. But definitely gonna give Nomic a shot. Tnx 🙏
@rezkiy95
@rezkiy95 4 месяца назад
Your bunny wrote On a serious note great vids mate
@Vedmalex
@Vedmalex 4 месяца назад
Cool! Good news! Lets discuss vector db, algorithms for vector search
@piercenorton1544
@piercenorton1544 4 месяца назад
Would love a video on db options
@gambiarran419
@gambiarran419 4 месяца назад
Fantastic video. Do you offer your time as a consultant / programmer as your explanation is so clear about the subject matter.
@technovangelist
@technovangelist 4 месяца назад
No, I’m focused on RU-vid for a while. But thanks
@sanjayojha1
@sanjayojha1 4 месяца назад
Thanks for the update, I know mostly about vector db but I would like to know difference between vector store and vector db. For example difference between Faiss and a proper vector db like qdrant.
@mosth8ed
@mosth8ed 4 месяца назад
When OpenAI first came out with plugins, I became interested in learning more about all this kind of stuff, but was quite dissatisfied with pythons speed of handling what I was trying to do so I learned enough rust to make a vectorizer that would, when I loaded a project create embeddings of all the appropriate files for the project type using All-MiniLmv12(or 6 if I changed a setting) or when I saved a specific file, it would do that one as well, upload them to a locally hosted qdrant db, which I gave a gpt plugin access to, so if I wanted, I could ask anything about my current project and it would have all the current context. Once I finished it, I never used it again, but it was crazy fast, and a good learning experience.
@Persikys
@Persikys 4 месяца назад
It will be great to figure out that is the difference between all that vector dbs
@makesnosense6304
@makesnosense6304 4 месяца назад
Ok, so the big question now then is if you can use embeddings generated with one of these smaller ones on a big model? Are they compatible and how does this work?
@HoneyCombAI
@HoneyCombAI 4 месяца назад
Please make the video on different vector databases. I wouldn’t mind spending an hour watching the nuanced differences with a rubric defined early on!
@daryladhityahenry
@daryladhityahenry 4 месяца назад
Hi! Nice explanation. Now I know why people still use bert for this.. But, I want to know something, hope you can enlighten me. In the example, the data is either text, or pdf. What if it's from web? I mean, the data is really contaminated by many other text like: navigation text, title text, footer, ads, etc. We don't want that to be included into our vector db right? What kind of technique that we can use to clean up the data? Or maybe split ever sentence, and then embed it? looks if it's match our needs or not, and then put the fitting one to the vector db? But I'm afraid that ruins the data because sometimes the information context is more than a sentence. Right? I really confuse on this. Thank you :).
@JimLloyd1
@JimLloyd1 4 месяца назад
Hey Matt, I'm excited that ollama supports nomic-embed-text due to its large maximum sequence length of 8192 tokens. You mentioned "summaries and summaries of summaries". Summaries are really necessary when the max sequence length is 512 tokens, which is typical of most embedding models. I''m very curious to see if the 8K sequence length can significantly reduce the need for summarization. Thanks for your high quality videos.
@dawidw.6016
@dawidw.6016 4 месяца назад
Very ❤ Professional
@preben01
@preben01 4 месяца назад
Great video as always BUT - maybe Im just not getting everything but .... Does this mean I dont have to use langchain and a local chromadb? I can just send textchunks through the API ? If so, can you have document collections? Can you remove embeddings if you need to update? Will embedding affect one model or all?
@technovangelist
@technovangelist 4 месяца назад
Rag will always need a vector store whether that’s chroma or a json file or a kludgy Postgres or whatever. But for rag there was never a need for langchain. As things get much more complicated than rag, then lc has a place.
@mtprovasti
@mtprovasti 4 месяца назад
Db comparison for local instance? That would be interesting.
@prispeshnik-istini2
@prispeshnik-istini2 3 месяца назад
hi. I have a lot of questions. I changed your code and now it works with CSV files, but now I have a question - Where does the information broken into pieces fit? How do I work with her? " I will be grateful for your reply! Thanks !
@pixelplayhouse89
@pixelplayhouse89 4 месяца назад
windows need to upgrade your ollama to 0.1.26 to use Gemma model, just figure it out when trying to delete and re-download the model all over again.. so, we should read the docs firstly or just wasting your time.. btw.. I was miss out the new embedded model from nomic. Thanks for reminding us this important feature.. Great video as always.. thanks..
@sam.sleepwell
@sam.sleepwell 4 месяца назад
Great content! Super useful embedding. Seems we need to use nomic API from now on for using the embedding?
@technovangelist
@technovangelist 4 месяца назад
Until there is a better one
@roopad8742
@roopad8742 4 месяца назад
Is it just me or anyone else likes the realistic pause scenes at the end of the videos😂
@khangvutien2538
@khangvutien2538 4 месяца назад
Thanks for sharing. If I understand well, ollama is not Google Gemma but is working with them, and ollama 0.1.26 uses Gemma model for its nomic embedding. But I’m struggling to understand `splitIntoChunks()` in the video -In line 8, `chunks` is declared as `const`. -In line14, you push something into `chunks`. How can it work? Please help.
@technovangelist
@technovangelist 4 месяца назад
Support for Gemma was added. But that is unrelated to embedding. Embedding is possible because of support for Bert models such as mimic embed text. That’s a different model. As for the code, chunks is a const and I can’t modify chunks but I can add to the array that chunks represents. You can look more into typescript to see why this works.
@mshonle
@mshonle 4 месяца назад
Really curious to know about chunking techniques where the chunk size varies based on its content, with the goal of producing more precise or relevant results for RAG queries. (I also totally thought you were going to do a Ferris Bueller at the very end.)
@technovangelist
@technovangelist 4 месяца назад
There will be no naked showers in my videos. Even with the camera on my face. Or you meant “oh your still here? Go home”
@ilianos
@ilianos 4 месяца назад
That's a really interesting topic for me as well! I can recommend to look at advanced chunking strategies such as "semantic chunking" using NLTK or spaCy. You should read the article titled "How to Chunk Text Data - A Comparative Analysis" by Solano Todeschini.
@stephenthumb2912
@stephenthumb2912 4 месяца назад
RAG is just the database for models. It'll exist in some form until we don't have any use for databases in general. There will always be a cost for keeping everything in memory and that includes LLM's and other DL models.
@technovangelist
@technovangelist 4 месяца назад
There is a bit more to it. And RAG is the technique. The database, specifically a vector db, is a part of rag, but not everything. And there is a lot of choices with vector dbs. You also have to decide how you want to manage embeddings, how you want to break down the source docs and more. And there is always going to be a need for rag, as long as we have internal company info and until we have a massive revolution in computing with much faster bus speeds. Gemini is showing with its massive context size that the need for rag will not go away anytime soon.
@TimothyGraupmann
@TimothyGraupmann 4 месяца назад
Look at that speed boost! It's like watching the Silicon Valley series and discovering the compression algorithm!
@technovangelist
@technovangelist 4 месяца назад
I lived in a house just like that in Sunnyvale back in 96-99. Just before moving to Seattle to join MSFT. The house was exactly the layout as the show and the roommates were just as odd.
@nuvotion-live
@nuvotion-live 2 месяца назад
I keep hitting token count limitations when using embedding models. What am I doing wrong? What are the strategies to prevent that?
@technovangelist
@technovangelist 2 месяца назад
How? You are splitting up your text into smaller chunks, right?
@SODKGB
@SODKGB 3 месяца назад
Maybe you can answer this question for me, I know that we need to ingest content so it is searchable. In this video, where do your newly created embedding go in order for Ollama to access the content? Wondering if it is possible to just add newly created embedding into an existing gguf? Just want to make it easy to ingest and later ask and retrieve information using Ollama for Windows. Thanks.
@technovangelist
@technovangelist 3 месяца назад
You wouldn't add the embeddings to a model directly, though you can create a dataset from your content and then fine tune the model on it if you like. You add the embeddings to a vector db for RAG.
@SODKGB
@SODKGB 3 месяца назад
@@technovangelist thank you
@satyamgupta2182
@satyamgupta2182 4 месяца назад
Thank you for the video. But which model is getting embedded? For example, if I want to interact with a specific model "llama2" however I want to embed my text file using nomic to interact with it. This is how it works right? But here you're not really specifying the model that you want to chat with, only the model you want to embed.
@technovangelist
@technovangelist 4 месяца назад
I am specifying the model I want to use to embed the content that I want to ask llama2 about.
@laserboy23
@laserboy23 3 месяца назад
I'm using langchain (javascript ) 0.1.28 and ollama 0.1.29. I create my embeddings for a PDF file using the nomic-embed-text model. Every thing works fine! But when I'm starting my query (using model llama or mistral) the following exception is thrown: "Error parsing vector similarity query: query vector blob size (6144) does not match index's expected size (3072)" Can You help? Many thanks in advice!
@technovangelist
@technovangelist 3 месяца назад
I’m guessing you used llama2 or another model to do embeddings before. You need to redo all the embeddings
@user-jo3kt2hv9f
@user-jo3kt2hv9f 4 месяца назад
Yes Pls Videos on Vector DBs and KnowledgeGraph(Nebula,neo4j) would also be Helpful
@kvrmd25
@kvrmd25 4 месяца назад
Can you use NLU or tokenize text to split into chunks for better embeddings?
@mcpduk
@mcpduk 28 дней назад
old skool....loads of good frameworks make embedding VERY VERY simple
@technovangelist
@technovangelist 27 дней назад
It’s pretty hard to beat the simplicity without a framework. And most frameworks just complicate without benefit. Like langchain and llamaindex.
@geraldofrancisco5206
@geraldofrancisco5206 4 месяца назад
keep up
@sultansaeed7136
@sultansaeed7136 4 месяца назад
What about the most accurate embedding, the one that captures the semantic meaning of a text very well?
@yourspanishstories
@yourspanishstories Месяц назад
What prompt you used for the miniature of this video, man? "colorful llama in a library" 😂
@jimlynch9390
@jimlynch9390 4 месяца назад
I'm not sure I understand what you are saying. To use the new methods do we have to run a program to break a document we want to query up into chunks? Or does ollama do that for us. Seems to me that some models let you point to a book, pdf, or other text representation and ask questions. Oh and I'd really like a comparison of the vector dbs.
@technovangelist
@technovangelist 4 месяца назад
There are very few models that can point to a book or even a pdf and just answer questions about it. First the context size isn’t big enough and then they tend to forget stuff in the middle. Google is promising that is not the case with their new models but they promise a lot that doesn’t ever come true. And usually there is irrelevant info in the doc anyway. Rag helps get the model the relevant content for the particular query.
@somasuraj
@somasuraj 4 месяца назад
Can you make a video on How vector database work? It's internal working
@michaelthompson8251
@michaelthompson8251 23 дня назад
curious. maybe a bench marking of war and peace using the the various data bases based on size and or based on speed
@technovangelist
@technovangelist 23 дня назад
it really should be a more recent long document. War and Peace is probably a part of every model already. But I need to find something written in the last year or so that is long and not part of the training data for every model.
@unclecode
@unclecode 4 месяца назад
Amazing, Just switched from OpenAI to this a few days ago. Everything was doable locally except for this embedding that required OpenAI for quick development. Now, we've got all the pieces in place. By the way, please make a video on vector database. Do we really need a cloud service, or can we find more efficient ways to run it on the server at scale?
@potter207
@potter207 3 месяца назад
bunnys can fly
@MrMitdac01
@MrMitdac01 4 месяца назад
can you make a example how ollama host LLM in local LAN network for other can use LLm please
@andrebremer7772
@andrebremer7772 4 месяца назад
I am not sure if that feature is that big of a deal honestly. I recently set up Llama-Index using HF embeddings on top of Ollama. Very straightforward. Just a handful of lines of code and given all the available integrations, you have document loading and indexing handled for you.
@technovangelist
@technovangelist 4 месяца назад
Why require someone to use something extra if it is now built in?
@ClaudioBottari
@ClaudioBottari 4 месяца назад
A video about how to navigate in all the possibilities that we got in vector db field... it would be very useful
@vpd825
@vpd825 4 месяца назад
Like @Slimpickens45 says, please do a video on Vector DBs, but from the perspective of an Ollama user 🙏🏼
@theh1ve
@theh1ve 4 месяца назад
So are these embeddings 'better' than some of the huggingface embeddings? Having said that the more important question is what is in that flask, i think thats what we all want to know! 😊
@BR-lx7py
@BR-lx7py 4 месяца назад
It's nice that these embeddings are generated much faster, but have you ran any tests to see if they're any good?
@markbarton
@markbarton 3 месяца назад
So once we have the embeddings saved as vectors - in my case considering using Weaviate - do we have to use the same model in Ollama for the inference?
@technovangelist
@technovangelist 3 месяца назад
No. Embeddings is just to find similar text. The. You provide the source text to the model, not the embedding.
@markbarton
@markbarton 3 месяца назад
@@technovangelist Ah - makes sense - so Weaviate will return the results which in turn is then passed to the model - weaviate requires the query to be encoded using the same embedding model, which I assume all Vectors DBs would. A video on Vector DBs would be very useful - especially trying to set up a local instance - after all Ollama is very much geared around local LLM and a lot of vector DBs seem to be cloud hosted only. In a way what is more interesting is the best methods / prompts in using example search results to feed to the local LLMs - to demonstrate why it's a more powerful approach.
@DaveBriggs
@DaveBriggs 4 месяца назад
Would you have to use an overlap when chunking?
@technovangelist
@technovangelist 4 месяца назад
absolutely.
@kabaduck
@kabaduck 4 месяца назад
Super impressive if you're updating your previous videos with corrected content, I would love to see your workflow on this as a video; maybe you already did this?
@technovangelist
@technovangelist 4 месяца назад
There isn’t really a process correct it mark the old one as having a correction and post a new one. Luckily nothing I have said has been wrong yet. A few people have said something was wrong but no one has been able to point to any code or examples that proves their opinions.
@jeanchindeko5477
@jeanchindeko5477 4 месяца назад
4:42 ok I’ll not say bunny can fly or should fly! But Bun is definitely not an alternative to JavaScript, instead it’s an alternative to Nodejs, and the code you’re showing is written in Typescript which is a superset of JavaScript that Bun can natively support. Other than that thanks for this great informative and entertaining video.
@technovangelist
@technovangelist 4 месяца назад
Omg, I flub one line in my script and it gets pointed out immediately. It use to be that hardly anyone saw these.
@technovangelist
@technovangelist 4 месяца назад
But thanks for noticing. And watching. And being here.
@ischmitty
@ischmitty 4 месяца назад
Your TypeScript embeedding sample wasn't written to fire off the embeddings call in parallel. I'm not sure if that would make a huge difference locally depending on the utilization of system resources by ollama. It certainly makes a massive difference when using an API like OpenAI's embedding model, where you can process each chunk in parallel.
@technovangelist
@technovangelist 4 месяца назад
But ollama runs on your local hardware and is meant for single user rather than having the $750k er day compute costs. Plus there are all the security and privacy risks with that.
@ischmitty
@ischmitty 4 месяца назад
@@technovangelistWasn't meaning to compare local vs OpenAI et al. I agree with you on that. I was referring to writing asynchronous code to run the requests in parallel
@technovangelist
@technovangelist 3 месяца назад
But ollama won’t process things in parallel. Allowing for that would mean every request would be slower. If a process takes 75% or the system, running 2 or 3 of them with finite resources means everything runs slower.
@user-wr4yl7tx3w
@user-wr4yl7tx3w 3 месяца назад
how about looking at crewai and ollama together?
@fkxfkx
@fkxfkx 4 месяца назад
Maybe you could share with us the update procedure if we're running ollama webui for windows out of local docker, the best way to update it without screwing it up?
@technovangelist
@technovangelist 4 месяца назад
usually with docker its just a matter of pulling the container again. Why did you choose to use docker on windows?
@fkxfkx
@fkxfkx 4 месяца назад
@@technovangelisti Ok,that’s not updating, but it will work 👍 I do so much with windows, and so do my clients. It makes sense to keep docker windows in the loop. And so much online is about mac, this is outlier
@technovangelist
@technovangelist 4 месяца назад
That’s the standard way to update docker containers. they are supposed to be immutable.
@fkxfkx
@fkxfkx 4 месяца назад
@vangelist don't mean to be argumentative but while images are immutable, (below from microsoft copilot) Docker Containers: Dynamic and Mutable: Containers are dynamic and mutable instances created from images. Writable Layer: Containers have a writable layer where runtime changes can be temporarily stored. Statefulness: Containers can hold runtime data, but their core image remains unchanged. I assume a new upgrade image used to rebuild the container would have accommodations to preserve existing downloads of models, etc, but I could be wrong. Demolishing all previous work just to install an upgrade would be unfortunate. The folks on their discord are being a little hazy about this and it would be helpful to get a deterministic and clear statement of the situation. I'm just looking for a clear docker command to upgrade without losing my models downloads. 🤷‍♂
@henkhbit5748
@henkhbit5748 4 месяца назад
I am not familiar yet with ollama. I have been waiting for the windows version... Does it only support specific embeddings? I use for example BGE embeddings for rag. Is this possible? I also see in comments that ollama does not support multi user inference concurrently. If true than it's ok for testing but not for production. Btw: I prefer 2 legs Bunnies than flying Bunnies😉
@technovangelist
@technovangelist 4 месяца назад
Ollama for now is focused on being the primary production ready single user ai application. There are plenty of folks who have shown how to achieve concurrent use of multiple models but of course to enable max output that would have to involve multiple systems. Ollama can’t magically produce cycles out of thin air. Or are you just asking for queueing. That’s been there since day one.
@henkhbit5748
@henkhbit5748 4 месяца назад
@@technovangelist if I have a chat bot application based on ollma. Is it possible that multiple users can access the application without waiting or getting into a deadlock?
@technovangelist
@technovangelist 4 месяца назад
i guess it depends on how you build it.
@carterjames199
@carterjames199 4 месяца назад
Please do a vector db comparison video
@technovangelist
@technovangelist 4 месяца назад
its being worked on now. thanks
@pablocosta7181
@pablocosta7181 4 месяца назад
Hi Matt . You are realy impresionante. Could you share with me a siurce Code of video example. I'll be very happy
@userou-ig1ze
@userou-ig1ze 4 месяца назад
Can't you read the file fully into ram before processing? Sounds unbelievable that read/write speed is the limiting factor
@technovangelist
@technovangelist 4 месяца назад
I don’t think I understand the question. Can you clarify?
@userou-ig1ze
@userou-ig1ze 4 месяца назад
Mea culpa, I inferred at 5:50 that loading/processing the file would take most processing time, but I guess I was mistaken. Thanks for the reply though and continous commitment and interaction with userbase. Respect and thumbs up
@seannewell397
@seannewell397 4 месяца назад
Embedding is serialization to a common format for easier and faster comparisons. Sourcegraph use it for Cody. lmk what I'm missing cause that seems too simplistic.
@technovangelist
@technovangelist 4 месяца назад
It’s more than serialization. It’s understanding the meaning of the text. You aren’t comparing words but comparing the semantic meaning of those words
@717Fang
@717Fang 3 месяца назад
I noticed the best of AI for coding is Cody AI, I tried GPTchat to convert assembly masm code to port it to nasm … GPTchat could not do it in 20 tries, Cody AI did it in first try. I’m not sure how good is Ollama AI
@technovangelist
@technovangelist 3 месяца назад
If you are ok with online services then Cody is great. Codeium and copilot are also amazing. The benefit of ollama and all the tools that use it is security and privacy. Twinny is a great tool there as is continue. But you have to pick the right model. Some are better than other languages though finding one that supports more esoteric languages like what you are dealing with is tougher. But a model that does that well might not be as good with mainstream languages. There is no one ‘best’ solution.
Далее
Whats the best Chunk Size for LLM Embeddings
10:46
Просмотров 10 тыс.
Have You Picked the Wrong AI Agent Framework?
13:10
Просмотров 47 тыс.
меня не было еще год
08:33
Просмотров 2,5 млн
3.5M❤️ #thankyou #shorts
00:16
Просмотров 726 тыс.
Is Open Webui The Ultimate Ollama Frontend Choice?
16:43
Using Llama Coder As Your AI Assistant
9:18
Просмотров 65 тыс.
The Secret Behind Ollama's Magic: Revealed!
8:27
Просмотров 29 тыс.
Why are vector databases so FAST?
44:59
Просмотров 14 тыс.
Better Searches With Local AI
8:30
Просмотров 24 тыс.
Function Calling in Ollama vs OpenAI
8:49
Просмотров 28 тыс.
Все Смартфоны vivo Серии V30!
24:54
Просмотров 26 тыс.