SaraEye - This is the world's first ChatGPT with a sense of sight!

Подписаться 345

Просмотров 1,8 тыс.

50% 1

Introducing the world's first ChatGPT with a sense of sight!
Be amazed by our ChatGPT-powered voice assistant that not only hears but also sees and asks questions based on what it sees.
Is this the future of ChatGPT?
www.SaraAI.com
#AI #artificalintelligence #computer #vision #computervision #gpt4 #gpt #chatgpt #chatgpt4 #openai #google #texttospeech #speechtotext #voicerecognition #opencv

Опубликовано:

5 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 11

@2000ReRyRo 4 месяца назад

Why the weird delays in responding by the human? Why does the machine seem more natural than the human?

@monsieur3d985 10 месяцев назад

Extremely interesting. What do you transmit to ChatGPT so as it could interact through vision ? I guess you give some indications about what your Sarakit "see", is it ?

@ArturMajtczak 10 месяцев назад

Exactly right. The cameras observe the environment, and in the background, a separate program identifies objects, people, motion, etc. This information is invisibly sent as prompts to ChatGPT, which then responds as you can see in the video.

@monsieur3d985 10 месяцев назад

@@ArturMajtczak This reminds me of the SHRDLU program developed by Terry Winograd at MIT in 1968 (based on around fifty nouns, verbs and adjectives in 3D world of blocks). I guess that it takes a lot of computing power to do this analysis, and that you do it on an external server from the pairs of images sent. I guess that Rasperry and your SaraKit card are used only to position the motors, process the image and sound and communicate with the server, is it ? Your approach is interesting. Do you think the GPT-4 Vision update uses a similar principle and communicates through prompts with the conversational system? (this system quickly has its limits it seems to me).

@ArturMajtczak 10 месяцев назад

@@monsieur3d985 Sending images to a server and waiting for a response is indeed too slow and costly, so the image analysis is actually done on the Raspberry Pi itself, using a simple trained model. While this model might not recognize everything, it certainly has broad and sufficient capabilities. Image recognition isn't performed in real-time at 25 frames per second - that's not necessary at this stage. We just analyze changes in the background image, which takes about 100 to 600 ms. As I mentioned, this process runs in a separate thread and is efficient enough for our purposes. In terms of the GPT-4 Vision update, while it might use a similar principle of communicating with the conversational system through prompts, our approach focuses on local processing to avoid the delays and costs associated with server-based processing. This method, although it has its limitations, is quite effective for our current needs.

@kam7847 Год назад

Cool, where can I buy it?

@ArturMajtczak Год назад

We are developing the project, but it is not available for sale yet

@2000ReRyRo 11 месяцев назад

Kind of ironic that you call this a "natural" conversation. It is so UNnatural that I'm not sure which is the robot -- the little black thing sitting on the desk or the bigger white thing sitting on the chair.

@ArturMajtczak 10 месяцев назад

@@2000ReRyRo heh heh

@yogi9704 5 месяцев назад

whats the point of the camera, except asking the questions

@ArturMajtczak 5 месяцев назад

1. You don't need to use a wake word like 'Alexa' or 'OK Google'; just look at SaraEye, it sees you and knows you are speaking to it - it’s more natural, the way people communicate. When we are in a group and speak while looking at someone, that person knows we are speaking to them... 2. By looking at the device, and more importantly, SaraEye looking back at you, a unique bond is formed that is hard to achieve by talking to a 'speaker' like Alexa. :)