Sound Generation with Deep Learning || Approaches and Challenges

Valerio Velardo - The Sound of AI

Подписаться 48 тыс.

Просмотров 23 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

27 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 50

@edwardcheng8974 3 года назад

Literally went through the audio signals and deep learning series in 2 days. So useful and clear. Can't wait for this new series.

@ValerioVelardoTheSoundofAI 3 года назад

Thanks a lot!

@sZlMu2vrIDQZBNke8ENmEKvzoZ 3 года назад

@@ValerioVelardoTheSoundofAI hello, thanks to you from me too. im working on a parallel architecture that only controls a vst and compares the outcome to sounds from songs or samples that include sounddesign that mostly stems from synths. the only problem that i have to sort out right now is how to speed up the mid render process for the input attempt step, if you know if theres possibilities to parallelize VST->dsp->render outputs (as far as im concerned that always going over CPU), i would be very thankful.

@Agnostic080 3 года назад

Amazing series. The in-depth knowledge of an expert really shines through.

@ValerioVelardoTheSoundofAI 3 года назад

Thank you!

@johnkidsman 8 месяцев назад

AI and Music -- perfect combination! :)

@agenticmark 16 дней назад

Highly useful explanation my guy. Great videos!

@doyourealise 2 года назад

amazing stuff, :) ill be doing this. Recently finished music genre classification :)

@avidreader100 3 года назад

You could have perhaps elaborated on the slide that suggested different architectures such as GAN, AE, VAE, and VQ VAE. I could look up the net to get an idea. That was important for me to follow what we were talking about. But it would have saved some distraction if you had spent one more minute on this. I guess a novice feels where the it pinches! I have done online courses on machine learning, and audio signal processing. While going through deep learning I found your channel. I appreciate the effort you have put in to share so much in a lucid manner.

@ValerioVelardoTheSoundofAI 3 года назад

Thank you Sowmyan! Each of those architectures deserves a whole series. BTW, 2 of those I'll cover in this one (AE, VAE).

@kamilwiktorowski2050 3 года назад

Once again, you got my like. Good job buddy :D

@ShortVine 3 года назад

You are doing excellent job, already on Audio topic there are not many videos on yt

@ValerioVelardoTheSoundofAI 3 года назад

Thank you!

@hackercop 3 года назад

Just started this tutorial and really exited, want to do this for images but this looks really good to learn that skill.

@benbensimon77 Год назад

Thank you so much for this series! You're helping so many people getting introduced to DL audio ... thank you. I was wondering for the data points size: why is the time frequency representation more "compact"? If we have an FFT with a 512 window with no overlap (best case scenario), there are 256 amplitude points and 256 phase points aka 512 data points every 512 samples (?) Isn't it the same number of data points per seconds?

@sarahalhajri812 Год назад

thank you for that such useful information, Slake link is no longer active, how can I join this workspace? I think i need to a new link.

@nikoshazaridis2766 2 года назад

Great video, thank you.

@sagarkulkarni9114 3 года назад

Great video. Thank you :)

@navinbondade5365 3 года назад

Hello, can you. please tell me how to show many librosa spectrograms like matplotlib subplot structure, like if there are 7 different kind of sound and a spectrogram from each sound and all spectrogram show in a single plot like we see in matplotlib subplot

@andrewwray9345 3 года назад

Great video as always, Valerio! Do you know where I could listen to an example piece of generated audio that demonstrates what a problematic phase reconstruction sounds like?

@markusbuchholz3518 3 года назад

Not sure what your are looking for. However the most outstanding probably job was done by Aiva Technologies. There is a free tool to compose your own music and be aware of challenges you can meet www.aiva.ai/. The problem with "reconstruction" is highly related with Deep Neural Networks (mainly architecture, hyperparmeters, proper datasets, etc.). We need to remember about the complexity of signal spectrum => "complicated" spectrum decrease the generation quality (neural network can not be capable to approximate our signal function). Good luck

@ethnicalbert 2 года назад

Hi can you point me towards some actual examples of where this technology has got too today?

@motherbear55 3 года назад

Couldn't MFCCs be used to condition wavenet, similar to how mel spectrograms are generated for conditioning wavenet in Tacotron 2?

@harishlakshmanapathi1078 3 года назад

I have a doubt when you said that we would be using variational autoencoders then it means that the model would be predicting the spectrogram as the image. I don't know how would you convert it into a wav file again. If there are any articles related to it do share. ANd yes I love your videos so much, i get to learn from them a lot !!!

@ValerioVelardoTheSoundofAI 3 года назад

Thank you Harish! Once you generate a spectrogram via a VAE, you can convert it to a waveform using an inverse short-time Fourier transform. I'll cover this topic in coming videos in the series.

@Sawaedo 3 года назад

Thank you!

@onesource9527 6 месяцев назад

I thought he was saying 'rare' audio. It's raw audio. Needs some audio processing for language in next video.....lol - but good content.

@sivahemanth6592 3 года назад

Can you please make a video on the generated audio? I worked on Deep Convolution GAN's for generating images by taking random noise but never trained it on audio signals

@ValerioVelardoTheSoundofAI 3 года назад

This video is part of a series called "Generating Sound with Neural Networks". Stay tuned for the coming videos!

@vinaykulkarni4397 3 года назад

Very informative video.Please share reference/books used.

@ValerioVelardoTheSoundofAI 3 года назад

You can now find links to the papers I mentioned in the description box.

@vinaykulkarni4397 3 года назад

@@ValerioVelardoTheSoundofAI Thanks. Please share books that you refer.

@ValerioVelardoTheSoundofAI 3 года назад

@@vinaykulkarni4397 unfortunately there are no books I'm aware of which cover this topic.

@vinaykulkarni4397 3 года назад

@@ValerioVelardoTheSoundofAI Thanks for your prompt reply. I would be looking forward for this series.

3 года назад

To analyze audio in Python (such as samples or loops) would you recommend mp3 or wav

@ValerioVelardoTheSoundofAI 3 года назад

Both formats are OK. With a library like librosa you can easily load mp3 and wav. If I had both formats available for a dataset, I'd go for wav.

@yashbhatt3206 3 года назад

why is it problematic to generate audio from Mel frequency coefficients?

@ValerioVelardoTheSoundofAI 3 года назад

Because MFCCs drop a lot of info from the original signal, which can't be reconstructued faithfully. Check out my video on MFCCs to learn more!

@yashbhatt3206 3 года назад

@@ValerioVelardoTheSoundofAI yah sure... thank you.

@tershire 2 месяца назад

Thank you!! Very helpful & clear!

@Erosis 3 года назад

Speaking of autoencoders, do you think you could eventually talk about audio clustering using the embedding? That would be really cool! I've also seen some interesting loss functions used also on the embedding to enforce small distances between two of the same class (or similar classes) and large distances for very different classes.

@Erosis 3 года назад

Just to also add, there's a paper called "SCAN: Learning to Classify Images without Labels" that is SOTA for image clustering. They do something cool where they send an image + the same image with augmentations into the network and use that to make the embedding distances close. I wonder if something similar could be done with audio augmentation!

@ValerioVelardoTheSoundofAI 3 года назад

@@Erosis thank you for the suggestion! I'm definitely going to cover clustering / music similarity with embeddings. Thank you for the reference too :) I wasn't aware of the SCAN paper.

@hijonk9510 3 года назад

Is this some sort of Siamese architecture?

@Erosis 3 года назад

@@hijonk9510 Yes. Both the original image and the augmented image enter a network with the same weights and structure. Keep in mind, SCAN has more steps after that (mainly regarding the clustering).