Тёмный

Stable Diffusion AI Audiobook Player With Real Time Transcription Prompt And Image Generation 

Roundy Creations
Подписаться 180
Просмотров 2,1 тыс.
50% 1

Опубликовано:

 

15 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 33   
@222inverter
@222inverter 2 месяца назад
Wow... looks really cool... keep up your great work...
@roundycreations
@roundycreations 2 месяца назад
I will try :)
@kaizen9554
@kaizen9554 2 месяца назад
Great project. Hopefully we can try out hands on it when it’s shared. Cheers 🥂
@roundycreations
@roundycreations 2 месяца назад
I will try to push some early version to github asap with some simplified installation , cheers!
@chama6775
@chama6775 2 месяца назад
Wow amazing, great work ! Hope you will get more visibility 👌 Another step toward the unlimited generated entertainment... But awesome work
@roundycreations
@roundycreations 2 месяца назад
Thank you 🙌
@joshuaam7701
@joshuaam7701 2 месяца назад
Amazing workflow, the images didn’t always match but amazing potential.
@roundycreations
@roundycreations 2 месяца назад
Yes, prompt generator is not aware of context of the whole book, which would be the next challenge. As for consistency and perfect matching to the current situation in the story it would be even harder challenge :)
@playthisnote
@playthisnote 2 месяца назад
Nice similar to what I made in Python.
@roundycreations
@roundycreations 2 месяца назад
It's in Python, probably I will rewrite it in Unity engine and C# for more flexibility
@SpencerThayer
@SpencerThayer 2 месяца назад
An impressive first step.
@build.aiagents
@build.aiagents 2 месяца назад
Phenomenal
@johnstarfire
@johnstarfire 2 месяца назад
this is genial and shows how to combine more ai to do stuffs and this will be the future, think to when will be possible to generate videos with consistent characters, it would generate movies from books, maybe we'll need the power of quantum computers, but we are seing were we are going.
@roundycreations
@roundycreations 2 месяца назад
And couple bazillion of GBs of VRAM would be handy as well
@BeAsYouAre108
@BeAsYouAre108 2 месяца назад
Wow. Please create a tutorial on how to do this.
@roundycreations
@roundycreations 2 месяца назад
I will improve it a bit and upload it to github some day, for now it's just a barebones
@CrudelyMade
@CrudelyMade 2 месяца назад
I wonder if a top model like grok might be able to read through and generate prompts per paragraph while keeping the whole book concept in mind. and then those prompts might be commented out, so they are not read, but be processed by the image engine. "on the fly" would require a very fast llm, but if the graphic book can be pre-developed by the engines, this is a visual story book generator that could do some pretty intense stuff. like.. if the llm pre-read the book, created prompts for all the characters, stored samples of the images for each character for reference throughout the book... I think you have most of the cogs, it's just very impressive to think where this can end up in a year.
@roundycreations
@roundycreations 2 месяца назад
Yeah, I'm working on it :) ChatGPT4o mini was released with very cheap API, so probably going to employ it to manage concept of the book, style, consistency, plot and character development, then it will reply data to local LLM for prompt generation. It wouldn't be then fully local, but there is already max usage of GPU with current stuff, so not much can be added to current workflow if we talk about local computation. Optionally 4o mini can do also prompt generation, this way I will save about 7-8 seconds GPU work and I could use this spare power for SD and use TurboXL models with some control nets or animatediff or some picture interpolation...rabbit hole in general :) If you haven't noticed it doesn't do TTS, I load audio book that was pre generated before, so I need to also transcribe it on the fly
@CrudelyMade
@CrudelyMade 2 месяца назад
@@roundycreations well, your efforts are appreciated. :-) I would love to see the same story illustrated in pixel art or anime style, especially if they're european stories like the brothers grimm stuff. once that works decently, will be nice to have LLM agents code games based on the stories with the different graphic styles, leading to different gameplay concepts based on the same basic story. like.. hansel and gretel would be very different games if they were pixel style or action anime style. :-) my brain is years in the future enjoying things that might never be made. :-D
@roundycreations
@roundycreations 2 месяца назад
@@CrudelyMade "I would love to see the same story illustrated in pixel art or anime style, especially if they're european stories like the brothers grimm stuff." - it's just a matter of SD model used, so that shouldn't be big deal I'd say. Challenge is the consistency and context of the book translated to correct prompts each paragraph. "will be nice to have LLM agents code games based on the stories with the different graphic styles, leading to different gameplay concepts based on the same basic story" , I think we need to wait a little bit more. For now LLM can code maybe flappy birds without errors lol. But by looking at the speed of everything now, we might wake up one day and it will be there
@CrudelyMade
@CrudelyMade 2 месяца назад
@@roundycreations it'll be there because people like you are making the building blocks. ;-) I work in tech, I know we're years away. and it's fascinating to see early development of concepts that'll end up in much greater things. then I can say, "I used 8 inch floppy disks!" and "I remember when the guy first automated decent on the fly image generation for stories!" your efforts are also great examples of how things can work together, and these concepts can often be applied to other projects, as it's easier to see outside the box when you watch someone outside the box. :-) "one day... we'll have a box so big, the whole universe will be inside of it.. and then we'll climb out of the box."
@roundycreations
@roundycreations 2 месяца назад
I don't really make these blocks, people way smarter than me do those ;) but you don't need to know how lego brick is made to build a lego castle I guess
@ziad_jkhan
@ziad_jkhan 2 месяца назад
Hopefully, it's an open-source project
@roundycreations
@roundycreations 2 месяца назад
Yes, but I didn't release the source yet as it's a big mess for now
@ziad_jkhan
@ziad_jkhan 2 месяца назад
@@roundycreations Well, that could actually be a reason to open-source it and ask others to help clean up the code if they find it useful
@therobotocracy
@therobotocracy 2 месяца назад
Nice!
@therobotocracy
@therobotocracy 2 месяца назад
Do you have a discord or something like that?
@roundycreations
@roundycreations 2 месяца назад
I added in the video description
Далее
10 AI Animation Tools You Won’t Believe are Free
16:02
Principles of Beautiful Figures for Research Papers
1:01:14
Grand Final | IEM RIO 2024 | BO5 | КРNВОЙ ЭФИР
6:35:24
AI vs Artists - The Biggest Art Heist in History
44:23
Просмотров 350 тыс.
Hollywood is so over: The INSANE progress of AI videos
21:34
Adobe is horrible. So I tried DaVinci Resolve
45:17
Просмотров 391 тыс.