Тёмный

Replace Github Copilot with a Local LLM 

Matthew Grdinic
Подписаться 3,3 тыс.
Просмотров 124 тыс.
50% 1

If you're a coder you may have heard of are already using Github Copilot. Recent advances have made the ability to run your own LLM for code completions and chat not only possible, but in many ways superior to paid services. In this video you'll see how easy it is to set up, and what it's like to use!
Please note while I'm incredibly lucky to have a higher end MacBook and 4090, you do not need such high-end hardware to use local LLM's. Everything shown in this video is free, so you've got nothing to lose trying it out yourself!
LM Studio - lmstudio.ai/
Continue - continue.dev/docs/intro

Хобби

Опубликовано:

 

27 янв 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 198   
@toofaeded5260
@toofaeded5260 3 месяца назад
Might want to clarify to potential buyers of new computers there is a difference between RAM and VRAM. You need lots of "VRAM" on your graphics card if want to use "GPU Offload" in the software which makes it run significantly faster than using your CPU and system RAM to do the same task. Great video though.
@chevon5707
@chevon5707 3 месяца назад
On macs the RAM is shared
@MeisterAlucard
@MeisterAlucard 3 месяца назад
Is it possible to do a partial GPU offload, or is it everything or nothing ?
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Yup. On LM Studio PC you set the number of layers, the higher the number the more GPU resources used. On ARM Macs yes, RAM is shared, and your option is to enable Metal or not.
@JacobSnover
@JacobSnover 3 месяца назад
​@@matthewgrdinic1238 I always enable Metal 🤘
@alexdubois6585
@alexdubois6585 3 месяца назад
and also for that reason, you pay for copilot (and give your data) or you pay for VRAM... Not really free. Return on investment might be a bit faster on discreate GPU... That you can upgrade...
@phobosmoon4643
@phobosmoon4643 3 месяца назад
3:15 dude, you illustrated the 'break tasks into smaller steps and then build-up'-thing PERFECTLY. Well done! Its surprisingly hard to do this. I think about how to do it programmatically a lot.
@RShakes
@RShakes 3 месяца назад
Your channel is going to blow up and you deserve it! Fantastic info, concise and even gave me hints to things I may not have known about like the LM Studio UI hint about Full GPU Offload. Also interesting take on paying for cloud spellcheck, I'd agree with you!
@programming8339
@programming8339 3 месяца назад
A lot of great knowledge compressed in this 5 min video. Thank you!
@RobertLugg
@RobertLugg 3 месяца назад
Your last question was amazing. Never thought about it that way.
@Aegilops
@Aegilops 3 месяца назад
Hey Matthew. First video of yours that RU-vid recommended and I liked and subbed. I tried ollama with a downloaded model and it ran only on the CPU so was staggeringly slow, but I'm very tempted to try this out (lucky enough to have a 4090). I'm also using AWS Code Whisperer as the price is right, so am thinking your suggestion of local LLM + Code Whisperer might be the cheap way to go. Great pacing of video, great production quality, you have a likeable personality, factual, and didn't waste the viewers time. Good job. Deserves more subs.
@MahendraSingh-ko8le
@MahendraSingh-ko8le 3 месяца назад
Only 538 subscribers? Too good of a content. Thanks You.
@levvayner4509
@levvayner4509 3 месяца назад
Excellent work. I was planning to write my own vs code extension but you just saved me a great deal of time. Thank you!
@mikee2765
@mikee2765 2 месяца назад
Clear, concise explanation of the pros/cons of using local LLM for code assist
@Gabriel-iq6ug
@Gabriel-iq6ug 3 месяца назад
So much knowledge compressed in only 5 minutes. Great job! I will give it a try to see if it would be possible to make it faster on Apple silicon laptops using MLX
@Franiveliuselmago
@Franiveliuselmago 3 месяца назад
This is great. Didn't know I could use LM Studio like this. Also FYI there's a free alternative to Copilot called Codeium
@therobpratt
@therobpratt 3 месяца назад
Thank you for covering these topics - very informative!
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Glad it was helpful! I believe these tools will become common place in the next few years, it's fun to be here where it all starts.
@BauldyBoys
@BauldyBoys 3 месяца назад
I hate inline completion, within a couple days of using co-pilot I noticed the way I was coding changed. Instead of typing through something I would type a couple letters and see if the ai would read my mind correctly. Sometimes it would don't get me wrong but it was overall a bad habit I didn't want to encourage. This tool seems perfect for me as long as I'm working on my desktop.
@andrii_suprunenko
@andrii_suprunenko 3 месяца назад
I agree
@RickGladwin
@RickGladwin 3 месяца назад
Same. I trialed GitHub Copilot for a while but ended up ditching it. I found I was spending time debugging generated code rather than actually understanding problems and solutions. And debugging someone else’s code, for me, is NOT my favourite part of software engineering! 😂
@cnmoro55
@cnmoro55 3 месяца назад
When you really understand how copilot works, and how to actually BEGIN writing and structuring the code, in order to trigger the right completion, then you start speeding things up. At first, I was just like you, but then I got the hang of it, and man, copilot is awesome.
@Fanaz10
@Fanaz10 2 месяца назад
@@cnmoro55 yeaaah, it's amazing when it starts "getting" you.
@bribri546
@bribri546 3 месяца назад
Great video Matt! Silly I came across this video. I have been playing around with integrating LLMs with Neovim some helpful content here! Hope all is well!
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Had not heard of Neovim before, looks like an outstanding project!
@madeniran
@madeniran 3 месяца назад
Thanks for sharing, I think this is very important when it comes to Data Security.
@square_and_compass
@square_and_compass 3 месяца назад
Keep up the momentum and you will be arguably among the well-organized content creators, I really liked your explanation and demonstration process
@CorrosiveCitrus
@CorrosiveCitrus 3 месяца назад
"Would you pay a cloud service for spell check?" Well said.
@howardjones543
@howardjones543 3 месяца назад
Would you pay $1900 for a spell check co-processor? I know you can do plenty of other things with it, but that's still the price of a 24G RTX 4090. That's a lot of cloud credit.
@CorrosiveCitrus
@CorrosiveCitrus 3 месяца назад
@@howardjones543 Yeah the hardware needs are expensive atm, but hopefully (and I would think most definitely) that will start to become less of an issue. I think for the people that already have the hardware today though, the choice is very obvious.
@skejeton
@skejeton 3 месяца назад
Well said, but I wouldn't pay $2000 on a GPU to use LLM only on that computer
@howardjones543
@howardjones543 3 месяца назад
@@skejeton Sure but you can play Starfield and encode video for significantly less. THIS is the application that requires all that VRAM - typical game requirements, even for heavy games, don't get near 24GB. The implication at the end of the video is that you would save $10/month with this free local LLM, but that's bending the truth a bit. If things like these new NPU-equipped processors and different models can remove the need for these gigantic GPUs, then it might be interesting.
@paulojose7568
@paulojose7568 3 месяца назад
@@howardjones543 the benefit of freedom tho (and privacy?)
@Gunzy83
@Gunzy83 3 месяца назад
Great video. Just earned my subscription. I'm a heavy copilot user and have a machine with a great gpu (little bit short of the 4090's VRAM though) so ill be keen to see how your teating of completions go (will have time to play myself when im back from a long awaited vacation).
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Thank you so much for the sub, it very much appreciated! It's late so I can't recall if I mentioned in this video, but somewhat shockingly the MacBook is actually faster then the 4090 with LLMs. Granted this isn't with the Tensor RT LLM framework (which should give a 2x bump), but for me, shows how lots of RAM and dedicated, modern hardware make Inference not only possible, but surprisingly easy. This bodes well for the future of local AI, and I'm super excited to see the PC space evolve.
@TheStickofWar
@TheStickofWar 3 месяца назад
Been using gen.nvim and Ollama for a while on a MacBook M1 chip. Will try this approach
@mammothcode
@mammothcode 3 месяца назад
Hey this is excellent! This is exactly what i was looking for recently
@mammothcode
@mammothcode 3 месяца назад
Is there any way perhaps we can configure that vscode extension to point to a hosted runtime of the same llms? There are a couple of hosted llm providers that seem to be serving llms of our choice for very cheap prices
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Not mentioned in the video, but Continue defaults to hosted LLM's. This video was to show how the local side works, but it's not required.
@paulywalnutz5855
@paulywalnutz5855 3 месяца назад
Great content! Straight to it and i leart something
@rosszeiger
@rosszeiger 3 месяца назад
Great video and tutorial! Very well explained.
@masnwilliams
@masnwilliams 3 месяца назад
Really love this format of video. Would love to get your thoughts on Pieces for Developers. We are taking a local-first approach when it comes to developer workflows, allowing users to easily download local LLMS like Llama2 and Mistral to use in our Pieces Copilot. We do also support the latest cloud llms as well.
@Mr_Magnetar_
@Mr_Magnetar_ 3 месяца назад
Good video. It would be cool to implement a project/model (I don't quite understand this) that would know about the entire codebase of the project. For now, autocomplete performs the function of searching and copying a solution from StackOverflow. I think if an LLM knew all the code and understood what it does, we could get significantly better code.
@niksingh710
@niksingh710 3 месяца назад
you are underrated. like it
@jf3518
@jf3518 3 месяца назад
Worked like a charm.
@juj1988
@juj1988 3 месяца назад
Thanks Matt!!!
@apolloangevin9743
@apolloangevin9743 3 месяца назад
Question: would you get much benefit by using multiple mid-range GPUs for the extra VRAM, for instance, I have a few 3060s I could use for a dedicated machine if I wanted to go down that path.
@DRAM-68
@DRAM-68 3 месяца назад
Great content. Your videos have gotten me interested in local AI processing. Next computer…M3 Ultra with max RAM.
@steveoc64
@steveoc64 3 месяца назад
Ha, same here. I’ve been comfortable with 16/32 go for my Mac’s. But now suddenly I can justify a 128gb monster.
@secretythandle
@secretythandle Месяц назад
To me the cost of Copilot is such a small factor, if you consider the cost of all the hardware required to run a half decent model, and the ongoing electricity bills, you're paying FAAAR more for a local setup. But the real beauty of LLMs is the privacy, being able to put whatever you in there and not send it to Microsoft to use for later fine tuning is a huge win and makes this investment worth it. Not to mention, there is no fear of that service one day just being taken away from you, or censored to the point that it's completely useless. Now if only GPUs weren't so damn expensive... thanks big tech.
@d3mist0clesgee12
@d3mist0clesgee12 3 месяца назад
Great stuff, thanks.
@AdamDjellouli
@AdamDjellouli 3 месяца назад
Really interesting. Thank you.
@aGj2fiebP3ekso7wQpnd1Lhd
@aGj2fiebP3ekso7wQpnd1Lhd 3 месяца назад
I use a good portion of my available computer resources just developing. At $10/mo, it's cheaper than the additional hardware to self-host currently, plus I don't have to manage, configure, or maintain anything. Copilot gets smarter automatically with time. For example, it's made huge strides on PHP lately.
@ariqahmer
@ariqahmer 3 месяца назад
I was wondering... How about having a dedicated server-like PC at home to run these models and have it connected to the network so it's available to most of the devices on the network?
@JasonWho
@JasonWho 3 месяца назад
Yessss, would love this. Lightweight app for devices, access to AI in VSCode or image gen locally without being on the physical machine.
@aGj2fiebP3ekso7wQpnd1Lhd
@aGj2fiebP3ekso7wQpnd1Lhd 3 месяца назад
That's what I would do and have a TPM or two in it. Used ones come down in price pretty quickly.
@connoisseurofcookies2047
@connoisseurofcookies2047 3 месяца назад
You could implement a REST API for a home server. If it only runs on the local network and doesn't talk to the internet at all, maybe on an isolated VLAN, there shouldn't be any security issues to worry about.
@petercooper4536
@petercooper4536 2 месяца назад
Totally possible. I have Win11, Ollama running Dolphin Mixtral, ollama-webui running in a Docker locally (looks just like ChatGPT), a tunnel set up through Cloud flare to a subdomain in my own DNS. It's available externally, securely. All because work blocks Open ai😄 but I can access my own local LLM anywhere. API could be exposed in the same way
@KucheKlizma
@KucheKlizma 3 месяца назад
I learned a lot of things I didn't know before, I thought that hosting local llms is much more HW restrictive. Might give it a spin.
@clpr635
@clpr635 3 месяца назад
nice I liked that you kept it short
@AlbertCloete
@AlbertCloete 3 месяца назад
Interesting that you need such a powerful graphics card for it. I always thought you only need that power for training the model, not if you only want to use the trained model.
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
A great point - I've got a related comment on the same theme I need to reply to but in short: I *happen* to have this beefy laptop but it's not at all required for local inference! The biggest constraint right now is memory, but their are plenty of (much) smaller models that fit just fine in more modest hardware. Long-term AI's here to stay, and AMD and Intel are already adding AI specific hardware to their chips. Yes dedicated hardware will be faster, but in 2 years local inference on CPU will be more than fast enough.
@gusdiaz
@gusdiaz 3 месяца назад
It seems Tab autocomplete is now available in pre-release (experimental), would you be able to setup a tutorial as a follow up for this if possible? Thank you so much!
@coreyward
@coreyward 3 месяца назад
I tried Dolphin Mixtral out on my M3 Max and it wasn't all that great at a ReactJS code exercise that I’ve used in interviews. It came back with code that didn't meet basically any of the requirements, so I nudged it the same I would have done for a candidate, but really couldn't produce anything better before it said it wasn't able to fix them with the context it had (incorrect). I tried the same prompt with OpenAI’s GPT-4 via ChatGPT and it did better in the initial shot but made some mistakes, which I again prompted like I would have done a candidate and it nailed the exercise on its 3rd response. It took devs with ~3-5 years of total experience (at least 1yr with React) around 25-45 minutes to complete this, so GPT-4 nailing it in about 2 minutes is pretty good.
@jumanjimusic4094
@jumanjimusic4094 2 месяца назад
Mixtral not mistral.
@kenneth_romero
@kenneth_romero 3 месяца назад
pretty cool. i wonder if you could get a model to be specific to a language to make them smaller to run for most commodity hardware. don't wanna shill out for a 4090, when Codeium is copilot like and is also free.
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
An excellent idea, and the answer is yes! Now I haven't heard of or tried language-specific models myself, but one project I'm super keep to see released is Nvidia's "Chat With RTX". The basic idea being training a model on data *you* provide - for example a book on your favorite programming language. www.nvidia.com/en-us/ai-on-rtx/chat-with-rtx-generative-ai/
@spaceshipdev
@spaceshipdev 3 месяца назад
Codeium all the way for me, outperforms CoPilot AND it's free.
@NeverCodeAlone
@NeverCodeAlone 2 месяца назад
Very good. Thx a lot.
@addictedyounoob3164
@addictedyounoob3164 3 месяца назад
I also strongly believe I've noticed that GPT's have been dumbed down due to censoring and other mechanism of which we don't exactly know how they work. if OpenAI doesn't account for this, the opensource community might surpass them with this new type of multiple expert architecture
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Well said.
@dlsniper
@dlsniper 3 месяца назад
Would this be possible to run on an AMD 7900XTX? It has 24GB of VRAM, but I'm not sure if CUDA is a must or not for these tasks?
@Phasma6969
@Phasma6969 3 месяца назад
Yes there are alternatives which let you use a local inference server with many Llama CPP has one built in too But I'd recommend another alternative Some use ollama as the inference endpoint You could even use others like fireworks or a custom endpoint
@electroheadfx
@electroheadfx 3 месяца назад
hey, thanks for the video, with how much ram you bought your M3 Max ?
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Apologies for not listing this more clearly - 36
@eotikurac
@eotikurac 2 месяца назад
i bought copilot about a year ago but was unable to find clients and i never really used it :( this is an interesting alternative
@juicygirls3989
@juicygirls3989 3 месяца назад
using dolphin mixtral with ollama and vscode extension on 4090 and it's working great, must say never tried copilot or it's alternatives, also it helps for boilerplate code, for more serious tasks it sucks as expected
@helloworld7796
@helloworld7796 3 месяца назад
Hey, I see in the comments people do understand LLM really well. I never played with it, what would you recommend me as a start? I am a software developer, so understanding it wont be an issue
@3dus
@3dus 3 месяца назад
Dammm... you missed codellama 70b by one day. Nice video!
@merlinwarage
@merlinwarage 3 месяца назад
Yeah. Run locally a 70b model as copilot xD
@cureadvocate1
@cureadvocate1 3 месяца назад
Running the 33b deepseek-coder model locally was slow enough. (The 6.7b model is REALLY good.)@@merlinwarage
@camsand6109
@camsand6109 3 месяца назад
@@merlinwarage If you have a silicon mac, totally possible.
@TheStickofWar
@TheStickofWar 3 месяца назад
@@camsand6109 it's possible but just not practical. Do you like waiting?
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
I know right lol. Downloading the smallest 2-bit version now.... Well it runs (36 gig M3, ~16 tokens / second), but...huh. The results are total nonsense. I realize this is the 2bit version but still. I guess even tough the model loads it still needs loads of extra RAM, or the low quantization is more punishing than I've seen in other models. The good news: you can try it here: catalog.ngc.nvidia.com/orgs/nvidia/teams/ai-foundation/models/codellama-70b I will say: from a few tests I ran I still prefer Dolphin Mixtral - didn't expect that but it's a *crazy* good model.
@CaribSurfKing1
@CaribSurfKing1 3 месяца назад
How many tokens can be used in repo context window? Can it do \explain I.e. understanding a repo and app you have never seen before?
@praveensanap
@praveensanap 3 месяца назад
can you make a video demoing how to fine tune the model with private code repositories.
@Andy_B.
@Andy_B. 3 месяца назад
Hi, one question: That local LLM code advisor you run, does it provide contextual coding help?! Means, does it index/ overlook a bunch of files, and gives recommendations, or is it only working in one single (c++, python...) file?! Thanks
@ilearncode7365
@ilearncode7365 3 месяца назад
This is an important question. With copilot, it is aware of stuff that I have written in other files for the same project. Dont know if it is just doing so by storing keystrokes, or if it knows to look at the entire project
@Andy_B.
@Andy_B. 3 месяца назад
yes, in vscode copilot you just need to open all the files, which you wnt to have indexed by the copilot. closed files won't be indexed@@ilearncode7365
@hmdz150
@hmdz150 3 месяца назад
Ollama + Codellama
@UnrealFocus-le7ox
@UnrealFocus-le7ox 3 месяца назад
great video subbed
@uncleJuancho
@uncleJuancho 2 месяца назад
this is the first time I watched your video and I can’t stop noticing that your mic is behind you
@D9ID9I
@D9ID9I 3 месяца назад
So if you buy CPU like R7-8700G with 64GB RAM and dedicate integrated GPU to that model processing it will have enough RAM to run complex models. And you can use external GPU for usual display tasks.
@coalacorey
@coalacorey 3 месяца назад
Interesting!
@RegalWK
@RegalWK 2 месяца назад
What about privacy? I mean about copyright? Can we use it with company/client code?
@subzerosumgame
@subzerosumgame Месяц назад
I have an Macbook pro m3 11/14 with 36gb would that make it work?
@SuperCombatarms
@SuperCombatarms 2 месяца назад
Is there anything I can do with 12gb VRAM that is worth trying?
@DS-pk4eh
@DS-pk4eh 3 месяца назад
If we forget being online/offline, local solution requires an investement in high end hardware. An 4090 is around 1500USD and an M3 Max with 36Gb could cost much more than M3 Pro with less RAM. So, monthly fee for copilot wil cost you 100 for a year, that means it will start to cost as much as RAM/GPU upgrade on MacBook after abour 6-7 years or it will cost you the same only after 15 years!. You would imagine that copilot performance will gradually improve with the time (on the hardware level) so you will have better underlying hardware after 3 or 4 years, while you would be stuck with the same hardware if you purchased your own hardware. However, if you already have this hardware than it is much easier decision. Do not forget, local AI will take some ressources on your computer, so you will have less computer for other things.
@a.yashwanth
@a.yashwanth 3 месяца назад
There is a chance that copilot subscription cost would increase from the current 10$ to maybe 15 or 20 or more. Each user costs 20$ for Microsoft apparently with the limited features it has now.
@berndeckenfels
@berndeckenfels 3 месяца назад
Does it make to share a gpu server remotely used by multiple developers - use same model instance or dedicated?
@merlinwarage
@merlinwarage 3 месяца назад
You can do that, but calculate the price of the server (for 4-5 developers you will need an 4090 at least, A100 for 10+ devs) + electricity cost vs the copilot's $10-19/month per user plan. Even with the $19 business plan you can use copilot for ~2 years with a 5 person team for the price of a server. ~4 years with the personal subscription.
@berndeckenfels
@berndeckenfels 3 месяца назад
@@merlinwarage but I have to expose my code to Microsoft and can’t train own codebases. But yes I would expect more like 200 devs on a single A100 so it pays off. Most use it quite sparingly
@morsmagne
@morsmagne 3 месяца назад
I’d have thought that the gold standard would be GTP-4 128k using Playground Assistances mode with Code Interpreter enabled?
@valdisgerasymiak1403
@valdisgerasymiak1403 3 месяца назад
I am trying to find how I can run local LLMs to replace copilot completion - very useful to just write the small part of code where I begun. Any thoughts?
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
You bet - 1. Download LM Studio and download a code-centric model that fits on your hardware (Mixtral Instruct or Code Llama are great places to start). 2. In VS code install the Continue Extension. 3. Click Continues extension tab on the left side of VSCode's interface, then click the little gear icon (at the bottom of the screen). 4. Add this code within the JSON's models block: { "title": "Mixtral", "provider": "lmstudio", "model": "CodeLlama7b" } You're ready to go, though please let me know if you run into any issues!
@KryptLynx
@KryptLynx 3 месяца назад
I will argue, it is faster to write the code than to write code description for AI
@Bryan-zo6ng
@Bryan-zo6ng 3 месяца назад
Can 3000 series run LLMs?
@a.yashwanth
@a.yashwanth 3 месяца назад
How much cpu/gpu usage does each code completion command take and how long?
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
A great question: On PC we can separate and scale between the two, on Mac it's all or nothing. At least on the Mac then, a response from Open Oraca that generated at 28 tokens / sec pinned the GPU for the duration of that specific 3.2 second response time. I'd imagine that;s the case for all responses.
@JohnWilliams-gy5yc
@JohnWilliams-gy5yc 3 месяца назад
Between M3 Ultra vs 4090 24GB, who wins on this LLM arena? How about the AI accelerator Intel Habana Gaudi3?
@D9ID9I
@D9ID9I 3 месяца назад
Jetson AGX Orin 64GB wins, lol
@bowenchen4908
@bowenchen4908 Месяц назад
what is your machine's spec? I have a M2pro but I can't even run 3bit quantization :/
@elvemoon
@elvemoon 3 месяца назад
Can i run LM Studio on my desktop and connect to the server with my laptop using this plugin?
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
Unfortunately I haven't tried this yet, but I'd imagine just as you can point locally networked computers to say, a web server, this would be no different.
@CharlesQueiroz25
@CharlesQueiroz25 2 месяца назад
can I do it in IntelliJ IDE?
@MadHolms
@MadHolms 3 месяца назад
but Copilot does know the contex of your project and applies suggestions based on that. Does the local model and Continue doing this?
@airplot3767
@airplot3767 2 месяца назад
With Continue, you need to manually choose which files to upload with each request. I guess copilot does this automatically
@-E42-
@-E42- 3 месяца назад
since I don't want all of my code to be transparent to Microsoft, Github Copilot is out of the question for me and a local LLM sounds like a great idea - but how do you feel about the user agreement of VSCode - it seems Microsoft gets to see and evaluate all of your code anyway. Data politics to me are a largely underconsidered aspect of AI tech.
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
A great point, and one I haven't considered deeply yet. At first blush then so long as the code's part of a virtuous feedback loop I'm ok with it. As well, Microsoft has a vested interest in the quality of AI beyond just siphoning data, so again, for now....I'm ok with it.
@aidencoder
@aidencoder 3 месяца назад
Hmm, while not spellcheck ... people _do_ pay a cloud service for help with AI help with grammar and structure. It's closer to that I think.
@rasalas91
@rasalas91 3 месяца назад
5:32 - damn.
@jonathanozik5442
@jonathanozik5442 3 месяца назад
I'm so Jel of You having enough hw to run mixtral locally
@burajirusan4146
@burajirusan4146 3 месяца назад
A top notch video card is a requirement? High PC ram can run these offline AIs?
@merlinwarage
@merlinwarage 3 месяца назад
For 7b models you can use any GPU with at least 8-10GB VRAM. For 13b - 30b models 10-16GB VRAM, For 40-70b models you'll need 16-24GB VRAM. The system RAM doesn't really matter in this case.
@D9ID9I
@D9ID9I 3 месяца назад
@@merlinwarage igpu shares system ram
@2xKTfc
@2xKTfc 3 месяца назад
@@merlinwarage For the $1500 for a 4090 you can buy a LOT of DDR5 memory and make that available to the GPU - Windows does it automagically. It's not anywhere as fast as GDDR6 (like, not even close) but you can get 64GB spare memory quite easily. I'd be curious about how usable (or unusable) that would be.
@gh0stcloud499
@gh0stcloud499 3 месяца назад
pretty interesting but I can't justify giving up that many resources just for code completions. Especially on a Mac where you will already need to dedicate a significant chunk if you are using Docker or some other virtualisation software. I guess on a windows/linux machine with a dedicated GPU this won't matter as much, unless you are a game developer.
@vmx200
@vmx200 3 месяца назад
Would a 3090 and 64gb or ram be good enough?
@kotekutalia
@kotekutalia 2 месяца назад
Why don't you just install and try?
@vmx200
@vmx200 2 месяца назад
@@kotekutalia I will when I have time I was just curious if someone out there knew
@marcopfeiffer3032
@marcopfeiffer3032 3 месяца назад
It’s not exactly free if you need a potent gpu with lots of ram. My m1 with 16GB is already struggling with my Docker containers. I’d have to calculate how long I can use copilot until it is more expensive than a Mac ram upgrade.
@hope42
@hope42 3 месяца назад
I have a t1660 6gb vram, i7 10th been, 32gb ram. Is the anything I can run?
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
VRAM is the key and yes, that should be more than enough to run Llama 7b.
@freddieventura4382
@freddieventura4382 3 месяца назад
just quickly copy paste from Chat gpt?
@nasarjafri4299
@nasarjafri4299 3 месяца назад
Yeah but doesn't a local LLM needs atleast 64GB of ram? How am I suppose to get that as a college student. P.S correct me if Im wrong
@michaelcoppola1675
@michaelcoppola1675 3 месяца назад
extra note: local works on airplane mode - useful for travel
@LeandroAndrus-fn4pt
@LeandroAndrus-fn4pt 3 месяца назад
Can it do TDD like copilot?
@dwhall256
@dwhall256 3 месяца назад
It is up to you, the driver, to make that happen.
@fandorm
@fandorm 3 месяца назад
Well, it's free as long as you first pony up $2,379.99 for the GPU. It will take 230 months (almost 20 years) of copilot use but then it will be essentially free!
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
On the Windows side LM Studio let's you run on CPU, "offload" to the GPU, or a combination of both. On CPU only, Llama 7b on a 2 year old Intel 10850H laptop pulled 5 tokens / second (tokens roughly equate to words). Humans generally read between 5 and 10 words per second, so even this "worse-case" is nearly real-time. And that's just now - Intel and AMD are already starting to add dedicated AI hardware to their new chips. Faster helps bus is not needed, and basically any future hardware is going to be far more performant. Copilot is must faster now but, that won't always be the case.
@ilearncode7365
@ilearncode7365 3 месяца назад
imagine paying microsoft to let you help them train their AI that they intend on replacing you with.
@tbird81
@tbird81 3 месяца назад
Yeah, but you'll be able to run GTA6 when in comes out on PC.
@fandorm
@fandorm 3 месяца назад
@@tbird81 🤣
@shoobidyboop8634
@shoobidyboop8634 3 месяца назад
Is mac ram field-upgradeable, or does apple force people to buy their overpriced ram?
@angrygreek1985
@angrygreek1985 2 месяца назад
"would you pay to use spellcheck?" Well, when the minimum hardware specification to run an LLM locally (well) is a $1600 USD GPU, then yeah, I would. For a hobbyist it would take years to make up the cost in paying for the cloud service, and by that time the GPU will be out of date.
@gwrydd
@gwrydd 3 месяца назад
Does it just support web development or is it like general purpose?
@steveoc64
@steveoc64 3 месяца назад
Great video, thanks. I still find AI to be completely useless for any Dev work outside of webdev It has no idea whats going on when I code in zig for example :(
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
I've been dealing with insurance companies lately. When faced with composing yet another email to yet another repair shop, I finally gave in and had chat to write one for me. Not only was the result more concise and clearly stated than what I was writing by hand, it was far faster. It's little things like that are slowly winning me over and more importantly, freeing up time to do more meaningful work.
@steveoc64
@steveoc64 3 месяца назад
@@matthewgrdinic1238 my commiserations that you need to deal with insurance companies at all … some of my first career projects were with insurance companies, so I feel your pain :) Yep agree, I am a big adopter of AI in my workflow too, and it’s been invaluable for web work, particularly dealing with css ! That’s why I’m really keen on setting up a local model, so I can hopefully train it over time to be great at zig development as well. The challenge here is that we are building a new language and std lib which evolves daily, so prior art is neither available or helpful. Exciting times
3 месяца назад
why tho
@kishirisu1268
@kishirisu1268 3 месяца назад
Do you ever realize how much VRam do you need to run smallest language model? I can say - 32Gb! Go buy consumer GPU with such memory volume..
@2xKTfc
@2xKTfc 3 месяца назад
The smallest language model takes like 1MB of space, so you're blatantly wrong. Note that you said "smallest" and not "best" or even "usable".
@MrBrax
@MrBrax 3 месяца назад
neat concept, but copilot for like 30 years is still cheaper than buying a 4090 haha
@YA-yr8tq
@YA-yr8tq 3 месяца назад
As of now ans afaik, no tool tops aider-chat
@dackerman123
@dackerman123 3 месяца назад
But local is not free.
@YuriMomoiro
@YuriMomoiro 3 месяца назад
I also recommend the AMD card as they come with huge VRAM for much cheaper. Of course they are slower but if you care more about quality and speed, worth the consideration.
@zizzyballuba4373
@zizzyballuba4373 3 месяца назад
There's a perfect LLM in my brain, i'll use that
@tbird81
@tbird81 3 месяца назад
Hate to break it to you, it's not perfect.
@zizzyballuba4373
@zizzyballuba4373 3 месяца назад
NOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO ARGHHHHHHHHHHHHHHHHHHHHH@@tbird81
@camsand6109
@camsand6109 3 месяца назад
"Would you pay a cloud based service for spell check" grammarly not gonna like this take lol
@Hobbitstomper
@Hobbitstomper 3 месяца назад
So, you are telling me if I don't want to spend $20/month on copilot, I should instead buy a $2000 graphics card or a $4000 MacBook.
@bcpr
@bcpr 3 месяца назад
No…if you already have the tools, it’s a good alternative. If you don’t already have them, then pay for Copilot lol
@Hobbitstomper
@Hobbitstomper 3 месяца назад
@@bcpr Yeah that kind of a title and description would make more sense. The way he described it in the video: "superior to paid services [...] results are absolutely worth it [...] local LLM is free (as opposed to paying $20/month)", makes it sound like he's saying that people should consider a $2000/$4000 upgrade over a $20/month plan.
@ScreentubeR
@ScreentubeR 2 месяца назад
I have 4090 which I bought for VR gaming. Now I know I can use it for local LLMs with good enough output for coding or other tasks, even better. Would I go and buy 4090 to be able to use local LLMs only, hell no.
@jeikovsegovia
@jeikovsegovia 3 месяца назад
“sorry but i can only assist with programming related questions" 🤦
@sergeyagronov9650
@sergeyagronov9650 3 месяца назад
could not reproduce the effects and i copied everything you wrote or said, can you please paste text here
@matthewgrdinic1238
@matthewgrdinic1238 3 месяца назад
You bet! Create a web page that contains a blue circle. The ball should be centered on the screen horizontally, and placed at the top of the page. Update this page so that the ball drops to the bottom of the page using a realistic gravity equation written in JavaScript. When the ball reaches the bottom of the screen, it stops. Update the logic so that when the ball reaches the bottom of the page it bounces up realistically like a rubber ball. On each bounce it looses some momentum until it stops completely. Do keep in mind the only model I've found to be truly great at this specific task is Dolphin Mixtral Q3. That is, it basically one-shots everything on the first pass. It also helps to, in LM Studio, to check the box in advanced settings that keeps the entire model in memory Finally and to be clear, the more of a conversation you have with the AI the better. That is, it's totally ok to have it clarify its reasoning, tell it something doesn't work and ask it to fix it - it's very much a case of well, chatting with it.
@kesor6
@kesor6 3 месяца назад
So basically ... it will not work for most people and it is just better to use something like copilot. Got it.
Далее
it begins… developers LEAVING Copilot
6:55
Просмотров 56 тыс.
Реинкарнация
00:47
Просмотров 1,3 млн
RTX 4090 Performance in 3D and AI Workloads
16:54
Просмотров 2,4 тыс.
I Tried Every AI Coding Assistant
24:50
Просмотров 660 тыс.
Introducing Zed the Vscode killer
2:18
Просмотров 4,8 тыс.
All You Need To Know About Running LLMs Locally
10:30
Blender on the M3 Max
12:51
Просмотров 34 тыс.
AI coding assistants just leveled up, again…
4:51
Просмотров 1,1 млн
Git vs. GitHub: What's the difference?
10:06
Просмотров 371 тыс.
Git MERGE vs REBASE: The Definitive Guide
9:39
Просмотров 53 тыс.
Hula Hoop Challenge With Women On The Street!
0:38
Просмотров 28 млн
Chips evolution !! 😔😔
0:23
Просмотров 15 млн