Build Your Own GPT-4o Voice Assistant in Python with Groq, Llama3, OpenAI-TTS & Faster-Whisper ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-pi6gr_YHSuc.html
Try the updated tutorial with GPT 3.5 Turbo, OpenAI Whisper and an open sourced Bing AI API: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-aokn48vB0kc.html
It's actually easy you first generate a face of your choice on midjourney or dall-e and you import that picture in an app like facedance, ifunface, speakpick, etc..
I had some problems with the speak and talk part, so it ended up like a chatbot that works with "hotkeys"/command triggers for input to make specific things. Like, command trigger "music" opens a youtube playlist and things like that. I'm happy with the results :) edit: now It can "talk"... I generated some phrases on viocevox, downloaded the audio files and made it play along with the texts in the code at some key points
idk if i commented before but i really enjoy this. its simple to understand and easy to follow especially youre clean code and the use of comments makes it verry easy to code allong and to customize stuff as needed.
This is great, and I'd love to try it, but the text is so small and kind of blurred that it's a challenge to make out the code. Will you add it to the description or pinned comment? That'd be really helpful.
If you are looking for a topic for your next video, I would love to see you take this to a web interface using flask. I have been trying so many different ways from other videos but always end up in a dark rabbit how with a chatbot, unable to find something that works. It keeps recommending code that breaks in so many ways and loses the original function that was working.
Great example, well explained and acutally works. I have tried multiple youtube examples and forever end up in a rabbit hole spiral with chatgpt providing corrections to then only create further errors. I really liked how you explained each function and process. I was a great tutorial in provide clear and precise instructions that were very informative. Thank you.
@@Ai_Austin Hi, i just wanted to ask about an error I am getting! I have done pip3 to install everything, and when I run it I get an error saying pyaudio wants installed. I go and do everything "pip3 install pyaudio" "pip install pyaudio" etc. Nothing is working, it does like half of it then says that "Could not build wheels for pyaudio" blah blah blah. Then it says that there's an error with "#include 'portaudio.h' ". Do you know how to fix this?????
Great tutorial, Austin. Simple, to the point. Would it make sense to upgrade this to the Turbo model now? Also, could you do a tutorial about fine-tuning {prompt: x, response: y} to clone your friends using chat history data?
Absolutely. If you just change the engine variable in the open ai function of the code, you can just specify “gpt-3.5-turbo”. Then it will send your prompts to the new version of the API. Fine tuning is absolutely in the video pipeline. Have a few others ahead of it but will be creating a fine tuning tutorial here soon.
Sure, here's an improved version of your statement: "I used ChatGPT to analyze the script of this video and engaged in a conversation where ChatGPT3 generated a micro-detailed strategy to guide you through every last detail that you might need to know. In summary, during our conversation, you asked about creating a GPT-3 powered voice assistant with Python. I provided you with a step-by-step guide that covers everything from importing necessary libraries and setting up the OpenAI API key to defining functions for transcribing audio to text, generating responses, and speaking responses. We also discussed the importance of error handling and adding additional features to improve the accuracy and usefulness of the voice assistant."
I have just tried to do something like this for my program, but you are the first one, thank you very much, great job. Now I will use it for my program. Thank you.
I understood nothing, but damn IT stuff and programing is fascinating, it would probably take me 1000 years to learn it, that is why all I can do is admire people like you.
I bet you could learn it. Its not reserved for some super high iq humans. Checkout the free online book “automate the boring stuff with python”. Give yourself a month. Study it 1-2 hour 3 times a week, this program will look like fluent english!
WOW you are an incredible tutor i have been an instructor/teacher for 30 years now and i NEVER seen code writing and concepts explained so clearly and understandable like you just did here explaining and teaching code is not so trivial as many would thing and there are plenty examples for that on the net GREAT video (and note that im not even talking about the specific content itself) keep up the good work, SUBSCIBED
c'mon bro really? you've never in you're 30 whole years of teaching never seen it explained better amongst professional teachers? I mean sure the video is informative but c'mon.
I made a similar code, it s very easy, but you can improve yours by saving in a txt file all the questions and answers so it can memorize what you said before. You just have to give all the content of the file for each request
The big problem with this is that chatGPT is only relevant for many queries up to 2021. You really need to make this to interact with Bing Chat which has access to current data.
Great idea Mike, I got a Bing AI Voice Assistant Tutorial coming soon. You are right, having access to current data for our voice assistant is a huge improvement and I’m working on getting that out for you guys now! The bing voice assistant I am making will be completely free if you have beta access to bing as well. Unlimited questions.
@@Ai_Austin Sounds good, though I noticed this morning I already have both voice input and output available on Bing Chat. Don't know when Microsoft added that. Sadly you have to press the microphone icon to activate whereas it would be much more useful to be able to start with some sort of voice activation like Google assistant (especially if it could be customised). What we really need is something as interactive as in the movie 'Her' (I and many like me would pay a monthly fee for that btw) ....Keep up the good work
can you make this with GPT4all? would love to see a video on how to get this running on a offline system since you dont want to be depending on their model, if it gets out of hand we need backup models
this is a great tutorial!. I really love it if you upgrade it. What i mean by upgrade is that, import the python programme in to any type of device such as arduino or raspberry pi ( If possible). Make it wireless.
Hi, how do we change the voice to sound a bit like normal voice. And how do we make this work like google AI. For it to come up on our phones when we say 'Hey Genius' Or just call her name.
I watched your video and really enjoyed it. Please make another project like this where it will be a mobile application and whenever I call genius it will respond like Siri or google assistant. And if you make a video let me know with a little reply. In the end, I will say one thing, you are a wonderful teacher
the video was good and i followed it but al last what files Did you download while you were running the programm can you tell and if i want to convert the voice to jarvis's voice how can i do it
Thanks for the great video :) One nit-pick, the text is so small it's a struggle to read, I'm constantly leaning into the screen just to know what I'm looking at. There's heaps of dead space around your avatar, maybe consider zooming in a bit on your next vid.
with sr.Microphone() as source: recognizer sr.Recognizer() audio = recognizer.listen(source) its highlightin "sr" as an error and when i run it it says invalid syntax, and when i try to pip install the library it says that its already installed
Hello, loved the video works wonders. Would you be able to make a video series on how to add other features? such as opening apps, opening websites, setting alarms, adding a todo list & having it speak at cirten times of the day, say you want an alarm at 7am the bot would say good morning (name) today is (Date) with the weather being (weather info) & so forth I think it would be really cool
Had to get rid of the underscore in speech_recognition to get that to work. And I had to run pip install pyaudio to get it to work, but it works. Does this thing have contextual memory? Will it remember by conversations with it? I don't see any logging or context, so I don't think it does.
This code works, but it is not optimal. Using speech_recognition to detect the initial command is slow because it requires sending the audio to a server, waiting for the server to process it with a large model, and then receiving the result. Ideally, a pre-trained KWS model that can recognize a single command and runs locally should be used instead.
Very cool, make it and I'll use it, especially would love it if we could upload a Mid-journey etc talking avatar of our choice (or photo that could be adapted).
Nice. Is it possible to create a Telegram bot using OpenAI's latest model released 3 days ago? Using the chat endpoint? It would be nice if you create a tutorial for that.
Thanks Mate! Through this i was able to completly copy famous Chatbots like Siri or Alexa and thanks to the python statement "in", i was able to create a bot, who can filter my commands from whole and variable sentences. My Bot almost feels like a human teacher i can ask any question 😁 ... well ... almost ... davinci seems not to be able to tell the correct date and time since both is created from learning and not from actual live data (i asked GPT directly, Davinci refused to give me a usefull answer 😂)
Great video! I found it really informative and helpful. Thanks for sharing your knowledge and expertise with us. Looking forward to more videos like this in the future!
This is a good video it could be even better though with a release of GPT 3.5 turbo if you would take and show this again using GPT 3.5 turbo and whisper I think he would have a lot better response and a lot of people will really jump on wanting to do this. Thanks.
I have been researching Whisper. Its barrier to entry is a lot higher. Meaning if you want to run Whisper without having to pay for every question to transcribe, it needs to be done locally. Which puts you in the position of either needing a PC with 10+ GB of video ram. I also have not seen any evidence that the whisper api performs better in transcription than google speech recognition. OpenAI is the hype but I don’t want to make people feel obligated to shell out money for something that is currently possible for free. If one needed offline transcribing and has a beast of a pc to power the python program, Whisper would be a great choice.
I think today’s computers are probably powerful enough to handle text to speech I am a blind individual and I use several apps on my phone on my computer that dude just this kind of conversion and they’re not high power apps or high power computer. Some of them sent off to the Internet for processing but one of the things that could be done. If CPU horsepower is a real concern is push it off to the GPU most computers have Decent graphics processing units that would process much faster than a CPU ever could and it doesn’t take a lot of code to do that. I do think there’s a little more involved in writing code but I don’t think it’s any strong barrier. I think it’s just something Hass to be learn how to do. I’m in the process of trying to learn some of these things myself and I don’t see it as difficult as what you think it might be Again being blind it’s a little hard for me to quickly ramp up to the stuff but I’m getting there
I like your delivery style, however to be really effective the code needs to be legible, at least for those of us that are great coders. Even after magnification itsome of it was just a blur. It would be excellent if you could provide a file with the code in it.
I appreciate the feedback, is that happening even on 1080p with a computer monitor? Either way Ill make sure zoom in on the code and start linking a github repo for the projects. Thanks Nicholas!
@@Ai_Austin On my MacBook Pro Retina from 2015, the code is very readable at 1080p, only slightly blurred. Still, it would be convenient not having to write the code, but it might be a better learning experience writing the code myself.
Wow. Are you also able to create an artificially intelligent speech app that can describe pictures fed into it in order to help the blind understand what is happening in the picture
Yes we can modify parameters of the tts. Ask chatgpt how you can modify the parameters of the tts and you will have a little code snippet. just copy and paste the three lines after the initialization, you can modify the values for testing different voices and speech rates
i need help, it says "Python was not found; run without arguments to install from the microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases." what do i do??
Currently working on integrating some Amazon API's to make the a much more usable experience. Including no wake word. I have 0 technical background but in the last 6 hours with the help of chatgpt I have a working model
My goal exactly with these tutorials is you add your own preferences and upgrade upon these. Super cool to hear you’re doing it with no coding background man!
In case anyone is wondering as of today (2024) basically everything is outdated in this video unfortunately ;-; Hopefully this can save some people from trying and failing.
I've done the same in PHP using a few different APIs and streaming the data as to reduce the latency as much as possible, but its still laggy. Reducing the lag between a question and response is the tricky bit.
Id check out my new Bard voice assistant tutorial! Its faster than openai's api's and free. The past week I have been using Bard way more than chatgpt. Its just better for fact based responses that need to check recent internet data to verify its answers. And somehow faster than chatgpt without back-searching google.
Bro you have used gpt-3 not chat gpt. Chatgpt have extra features such as it can answer question based on previous question and responses. For example on chatgpt Question :-Creat a basic html page Answer:- *code* Question:- now add the background colour to black Answer:- * modified code with background colour black * But when u use gpt-3 for the same it will treat both the questions differently.. and not give the upgraded HTML code.
Sounds like you found some feature you want to add. Feel free to add a fork to the github repo linked in the new voice assistant if you wanted to actually contribute :)
That is a great question. Ive yet to find the need to learn whisper. Its my understanding that its superior for language translation and perception of accents. It also isn’t free like the speech recognition method i showed.
@@Ai_Austin I was testing Whisper over the weekend. It works great - English is excellent, while even small languages are acceptable with an editor. API is not that expensive, you can transcribe a movie for around 0,50 EUR. However, there is also possibility to install it on your server, running it locally and with that it will only cost the price of the infrastructure.
Is it possible to retain a session-like memory of previously asked questions with the API like we can do on the ChatGPT web interface? For instance if I ask "Where is the oldest tree located"? and follow it by "How tall is it?", can we make API responses retain the context?
Using 3.5-turbo it is possible to have contextual memory. It would definitely add some complexity and would potentially want to create a command to refresh memory if you did so.
@@arjund1173 ngl so I just copy and pasted my code into ChatGPT and told it that it wasn't hearing my voice and ChatGPT fixed it for me lol. Also make sure you have the packages installed too
All are actually pretty simple tasks with Python. Just a matter of adding a wake word for the new task and adding the few lines of code needed for each desired task you mentioned. ChatGPT could probably even do it for you!
Hey, that's a tutorial. But i got an error and no idear how to solve it. Can you help me? I got this message over and over: *An error occurred: cannot access local variable 'recognizer' where it is not associated with a value* *Say 'Jarvis' to start recording your question*
i got it to say "say "Genius" to start recording" but it still wont work due to gTTS not working, it says on line 66, col 35 that theres a problem and it cannot be defined