Understanding Audio Signals for Machine Learning

Подписаться 47 тыс.

Просмотров 56 тыс.

50% 1

Learn about audio digital signals. I explain the difference between analog and digital signals, and how to convert an analog sound into a digital format that can then be processed for machine learning. I also delve deeper into Audio to Digital Conversion concepts such as sampling, quantization, and aliasing.
Slides:
github.com/musikalkemist/Audi...
Join The Sound Of AI Slack community:
valeriovelardo.com/the-sound-...
Interested in hiring me as a consultant/freelancer?
valeriovelardo.com/
Follow Valerio on Facebook:
/ thesoundofai
Connect with Valerio on Linkedin:
/ valeriovelardo
Follow Valerio on Twitter:
/ musikalkemist

Наука

Опубликовано:

27 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 75

@mudassirkhan9054 День назад

Hey Valerio, This is just amazing content, i like the depth, the way you explain in so simple terms, you satisfied my curiosity for this whole topic.

@adityajindal3738 3 года назад

Loves the understanding, clarity in content & excellent examples through applications! love it

@didismit1766 3 года назад

I honoustly never take the time to comment or compliment on youtube videos. You, my man, are simply amazing and I truly enjoy listening to you. Going all the way till your last video =)

@ValerioVelardoTheSoundofAI 3 года назад

Thank you for taking the time :)

@hydraulicgames2493 Год назад

I was just curious about sound processing and found your lecture series. After I started watching, I binge watched the whole series! Absolute piece of art! PS-I started watching with an absolute zero knowledge about the subject.

@user-ky9ur8gm2f Год назад

Браво! Превосходная подача материала!

@nakarosz 3 года назад

Amazing content Valerio! Thank you!

@chriskingston1981 2 года назад

Thank you so much for creating these video, I am really enjoying them! Always worked with computers, music and sound when I was young, and still am. Have all the basic knowledge of prorgamming, ml and music. But this is so much more depth, didnt knew I like this stuff. Thank you for creating a new passion for me. Ai with music❤️❤️❤️

@danielgurgel8080 3 года назад

Loving your content. Thanks a lot!

@rangiding99 3 года назад

Thanks for your wonderful job, beautifully done!

@theaihacker777 3 года назад

bro your content is so helpful. very concise and straight to the point.

@ValerioVelardoTheSoundofAI 3 года назад

Thanks!

@wehbihabli7425 3 года назад

appreciate all the details! great

@joeljoseph26 3 года назад

you are awesome! the best ML tutorial for audio signals

@ValerioVelardoTheSoundofAI 3 года назад

Thank you Joel!

@jancooqhedon895 3 года назад

i came here for a language classification research, but now im amazed with the music thing

@ValerioVelardoTheSoundofAI 3 года назад

That's an incredibly deep rabbit hole :)

@vigneshreddyjulakanti7583 Год назад

Great one thanks ❤️

@parkerhyde_ 7 месяцев назад

This is invaluable. Thanks so much

@hersheyscoco1 4 года назад

Great stuff Valerio, this is amazing content - very educational. When you cover the audio features, can you also cover in depth MFCC's, and how they are typically used? I have yet to seen a good treatment of MFCCs and get an intuitive feel for how they work.

@ValerioVelardoTheSoundofAI 4 года назад

Thank you! I intend to cover MFCCs quite in depth. Stay tuned :)

@fredrickpwol8639 2 года назад

Thanks a lot for this, It's helping through a project I'm working on. I'm really grateful

@ValerioVelardoTheSoundofAI 2 года назад

You're welcome!

@vandanagoyal3037 2 года назад

Thanks for the video

@michaelmanuel1676 2 года назад

your amazing please keep it up!!!

@phosphoricx 2 года назад

I’m not sure your aliasing demo actually has aliasing. Usually you would hear nasty artifacts when downsampling so much without an antialiasing filter. Audacity is likely applying an antialiasing filter to reduce the bandwidth of the signal before downsampling it.

@ankithooda1536 2 года назад

It is so so awesome.

@AhmetAksoy 4 года назад

Thank you very much. It is a very educational video. ( But in the audio there are some short bass bursts. )

@akashraj_vlogs 3 года назад

appreciate ur efforts in field of knowledge

@ValerioVelardoTheSoundofAI 3 года назад

Thanks!

@akashraj_vlogs 3 года назад

@@ValerioVelardoTheSoundofAI hey can please make a video of converting one instrumental sound to another or one person vocal to another in python.thanx in advance

@juniorsilva5713 4 месяца назад

Thanks a lot! =)

@sabrinahuda7308 3 года назад

Hi Valerio ! do you perform the conversion in librosa or it need to code from the beginning ? and can i have the example of pseudocode or algorithm for the adc conversion?

@SuperLucasGuns 3 года назад

this is great stuff

@ValerioVelardoTheSoundofAI 3 года назад

Thanks!

@mattdistad1338 Год назад

When you resample in Audacity, you are not hearing aliasing. Audacity used a LP filter (as any good downsampler should) to avoid aliasing. What you're hearing in the high frequencies being filtered out

@subramaniannk3364 3 года назад

A question . Does the music player in the computer assume/know that audio files are sampled at a particular frequency? Would the music player work fine if the audio files are sampled at a different rate?

@avidreader100 3 года назад

At 15:45, the picture that appears seems to have an error. The numbers of amplitude scale are out of order in binary notation.

@JogosEtudoMais 3 года назад

Valerio, thank you for this amazing work. You are helping me a lot, I am studying audio and you are answering all my questions. Do you have any book recommendation for me?

@ValerioVelardoTheSoundofAI 3 года назад

Thank you! Unfortunately, there aren't many resources about AI audio. I'm currently writing a book on the topic. A book with "traditional" DSP approaches is "Fundamentals of Music Processing" www.springer.com/gp/book/9783319219448

@JogosEtudoMais 3 года назад

@@ValerioVelardoTheSoundofAI I'll buy yours when it is ready!

@ValerioVelardoTheSoundofAI 3 года назад

@@JogosEtudoMais thanks!

@ramportland 5 месяцев назад

@9:13 of the video, is it above or below the nyquist frequency?

@heecheolcho3246 3 года назад

Thank you for good works. I hava a question. (16x44100x60) / (8x1024x1024) = 5.046844 Why 5.49MB?

@ramkumarkoppu Год назад

I guess he intended to say 5.047MB and made a typing error in the slide which he read out.

@arunmehta8234 3 года назад

I really enjoyed it. I couldn't understand the profile photo abstractness.

@mohamadqodosi7057 3 месяца назад

I got a liitle confused, Does aliasing mean we can hear frequencys higher than our hearing rang after digitalization of signal?

@maryamashfaq6700 3 года назад

What is the Memory storage format of audio and video file???

@Drew_7 Год назад

Thank you for this amazing walkthrough; this is going to help me SO much with ML. Also, question for this section 17:37, do you know why we have to divide the bit depth and resolution sampling rate by 1048 window? In other words, why do we divide by 1,048,576 and then again by 8 bytes? Is there some resource on why this is default? (I'm assuming this has to do with the way computers work.)

@johnnyvishnevskiy8090 Год назад

A bit can either be 1 or 0 (On / Off) A byte is a group of 8 bits. So if there are 8 bits, where each bit can only be 0 or 1, there are a total of 2^8 = 256 different values that a byte can represent. So let's consider bit depth * sampling rate = 16 * 44,100 = 705,600 bits per second. There are 1,048,576 bits in megabits (mega means 1 million, but the closest binary representation of that is 2^20 = 1,048,576). So 705,600 / 1,048,576 = 0.6729 megabits per second. Remember that 8 bits = 1 byte? 8 megabits is also 1 megabyte. So we just need to divide by 8. 0.6729/8 = 0.0841 megabytes per second = 5.0468 megabytes per minute. I'm assuming it's a typo in the video.

@Drew_7 Год назад

@@johnnyvishnevskiy8090 You're a freaking legend, thank you so much. This all makes sense and now I'm gonna read it like 1,048 times. lol

@johnnyvishnevskiy8090 Год назад

@@Drew_7 np!

@adamsik1025 3 года назад

Is there any sense in buying a headphones that have transfer range up to 80 000Hz if humans are capable of hearing sounds "only" up to 20 000Hz?

@maryamashfaq6700 3 года назад

List all the digital formats of audio that are saved in a memory ??please answer it

@vijaykhandagale2591 3 года назад

Would you suggest some reading material to accompany your videos?

@ValerioVelardoTheSoundofAI 3 года назад

Yes, this great book: - Music Similarity and Retrieval www.springer.com/gp/book/9783662497203

@vijaykhandagale2591 3 года назад

@@ValerioVelardoTheSoundofAI Thanks for the suggestion.

@venkatesanr9455 4 года назад

Hi Valerio, I have one doubt that if an audio has a sampling rate of 8000 Hz. Can you say whether it is correct to extract audio features by upsampling the audio to 44100 Hz or 32KHz. KIndly give some suggestions on this

@ValerioVelardoTheSoundofAI 4 года назад

I would stick with the files at 8KHz, which has the advantage of resulting in lighter data.

@venkatesanr9455 4 года назад

@@ValerioVelardoTheSoundofAI I believe that you are saying, it is to keep 8000 hz for audio feature extraction. Whether it is wrong/some effects will be there when we have tuned sampling rate to 44100hz and extracted audio features. Kindly reply

@imamuddin8042 4 года назад

@@venkatesanr9455 if you upsample, that would not hurt your signal ... But that would require more data storage to accommodate more samples, as @Valerio mentioned. Moreover, if your signal is bandpass signal of a carrier frequency (not a baseband), there is a limit of upsample as well, I mean you can't upsample your signal as much higher rate as you want

@venkatesanr9455 4 года назад

@@imamuddin8042 Thanks for your kind response, Sharif. I have recorded speech samples at sample rate of 8000 Hz. While processing this I had this doubt by upsampling the speech data towards 44.1k or 32kHz and extracting audio features would have any effects.

@imamuddin8042 4 года назад

@@venkatesanr9455 Since you have recorded Speech Signal (that is 20Hz-20 Khz spanned) at sample rate of 8kHz , so already included aliasing effect of signals over 4khz... So I would recommend to put an anti aliasing low pass filter cut off at 4 khz in analog domain, before you sample your data. Or, you can initially use sampling rate 44.1 k , instead of thinking about upsampling it later

@hossien2843 3 года назад

can you share the music?

@maryamashfaq6700 3 года назад

What is the digital format of audio that saves in a memory ?

@ValerioVelardoTheSoundofAI 3 года назад

By far the most relevant one is wav

@meedkal79 4 года назад

We have a project that recognizes speech, can i help me that

@ValerioVelardoTheSoundofAI 4 года назад

If you need general feedback, I suggest you to join The Sound of AI community (sign up link in the description). If you need more involved help, I do consulting.

@meedkal79 4 года назад

Ok thank you