No video :(

Code your online Multimodal ChatGPT App in Python (chat with Images and Audio)

Enric Domingo

Подписаться 528

Просмотров 4,2 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 26

@DavidCarmonaMaroto 3 месяца назад

Grande Enric

@enmingwang6332 Месяц назад

Terrific tutorial, love it👍👍

@henkhbit5748 2 месяца назад

Great video! Maybe the next time use langchain for interfacing for better flexibility when using other multi modal LLM. Now you have Gemini flash/pro and Claude 3.5 sonnet multi models beside openai 4o model.

@enricd 2 месяца назад

totally! that's something I had in mind, but it would add some more complexity (although the interface and integration would be better) and also when dealing with Gemini 1.5 2 weeks ago, I found that langchain was not yet capable of dealing with upload files like videos, audios, docs and so, so it was still for the Gemini 1.0 capabilities but not for the 1.5. But yes, it would have some advantages and depending on the project, I would use it and also structure better the code with more abstractions and a better architecture. Here I wanted to show a basic and plain integration of it.

@nkwachiabel5092 2 месяца назад

Great Video! Very simple and straight forward. Question: Have you thought about how you can show those responses with images from the model? Also, how about file upload (csv, pdf, etc)? will they all still be in the same format?

@enricd 2 месяца назад

Thanks! GPT-4o doesn't produce images, it could generate the prompt for dalle-3 or any txt2img model to generate them. If you want to send it a document, you could use the langchain document loaders for it :)

@tonywhite4476 2 месяца назад

Nice app. Love the UI. I was wondering if the tts feature is waiting for the response to finish streaming before it converts it? In other words, is the streaming response feature increasing the audio response time?

@enricd 2 месяца назад

Thanks Tony! Yes, as I implemented it here, the TTS receives the response text when this is fully completed and then it starts to convert it to audio, and we receive the audio in the UI when this is fully completed. It's possible to do a more advanced strategy where you stream the text response to TTS (phrase by phrase, otherwise it could miss in properly taking the language, tone, and so) and then we could also stream the TTS audio response while it's generated, in order to reproduce it while it's generated and not to wait until the end of it. With these two stream workflows the final audio response would get to us so much faster (I'm pretty sure that this is what OpenAI is doing in the demo of chatgpt-4o and also in the current app probably)

@tonywhite4476 2 месяца назад

@@enricd I was wondering if there was a way to do it synchronously.

@AkulSamartha 27 дней назад

Can this be extended to chat with websites also. Please suggest

@aayushsmarten 8 дней назад

Wondering, if there is only this way to pass the image in the API? I mean, one with passing the URL from the internet, and second is what you have shown. In the shown method we convert into the byte string and so on. Wondering if there is any other way to send the local image to the API. Please share, thanks.

@enricd 8 дней назад

Hi aayushsmarten, here you can fin the OpenAI API docs: platform.openai.com/docs/guides/vision . As far as I know, there are only these 2 options to send images to the API, the direct URL and the base64 encoding of it. :)

@fallou_fall 3 месяца назад

Well done Sir Amazing work!

@danielmartosarroyo5969 3 месяца назад

Amazing 😮

@user-mo2en6us8x 2 месяца назад

Thank you for your video. I have a question whether the image generation is still not worked. Because gpt-4o generates text outputs, then maybe it can generate the encoded text of the output image that we want.

@enricd 2 месяца назад

Nice question! so the fact that a digital image can be encoded from its pixels values to the png or jpg compressed format and then those bytes to base64 to be sent through the internet, and then in the end decompressed back again, doesn't mean that it's possible to generate meaningful images directly in base64. It's something almost impossible, a 2D image of something needs to be generated in a 2D array pixel by pixel so its visual patterns make sense, and only then transformed or compressed to png/jpg and base64 :), but really good question to think out-of-the-box!

@sergiosobral5776 3 месяца назад

Hey, how do I remove the openKey entry? And go straight.

@enricd 3 месяца назад

Hi! what do you mean? the openai api key? you needed to get one on platform.openai.com creating an account first and charging some dollars first (5 dollars would be enough) so then you can consume tokens doing requests to the models

@sergiosobral5776 3 месяца назад

Sorry, I wrote it wrong. I have the API key, but I would like to remove this add the key section. How do I get it to the direct model when it runs? Without the need to validate the key.

@enricd 3 месяца назад

@@sergiosobral5776 so if you are cloning the code from the app and running it locally in your computer, you can directly assign the openai_api_key variable to your api key in plain text, and even remove those lines related to the verification of the key

@tonywhite4476 3 месяца назад

Why so many requirements/denpendencies?

@enricd 3 месяца назад

They are needed to build the entire website and their components :)