@@murtazasworkshop sir I am thinking of making a vision virtual assistant like I want to create a virtual assistant with vision so I want to use Llava multimodal . When I start the app then it starts storing frames from webcam and a certain interval then I speak to assistant and when I stop speaking the frames will be stored in a folder and last 60 frames will be stored then after detecting that I am not talking it will take all these 60 frames and stack upon each other and make a one image for convenient then this one image containing all last 60 frames in one image will be sent to the modal along with STT as a prompt and then we will get our response from the modal and speak it using TTS. So I am making a vision virtual assistant that uses webcam ☺️😀 this is my idea can u make it? Pls ❤️☺️🙏❤️
@@murtazasworkshop in previous video ai 4 every one you use nvidia graphics where you install and set path all stuff But i don't have any graphic i have normal laptop but i know ann cnn should i continue further or not Please reply