Thank you for the great video. It was really helpful for getting everything set up. If I may ask, I have a 4090 graphics card and I can this maxing out my GPU usage so the cuda should be working correctly. However, my prompts when asked take anywhere between 20s and 2 minutes to return and after a few questions the chatbot stops responding at all and just stays processing. Is this normal?
@@TirendazAI hi, I have use ollama3 with this localhost port : 127.0.0.1:11434. but I am confusing, how to load the model with transformers ? so I can follow your step?