From my point of view there is only one thing, which can be improved here. You should publish the new video each day - no just joking I understand your effort to make this priceless content. For me you are one of the ten top RU-vidr. Exciting and impressive as always. Applause also for the team who make a librosa package
Absolutely amazing course! Valerio's serenity and clarity while narrating highly convenient slides and code samples makes this course even accessible to idiots like me. Outstanding effort! Much much appreciated.
Someone here is not afraid of a copyright strike! Joking. Introduction to librosa is something missing on RU-vid, so it's great you are closing the gap!
I couldn't wait to stay upto date with this series.... Thanks to the effort and time you are putting in into this... Will be waiting for more on the freq part.... Truly Sound of AI ...
Wuuah! I got it running. Thank you so much, Valerio! You know what? The biggest problem is not to write the code itself, but fighting with package installation, virtual environments, jupyter notebooks took me hours.:-((
For the calculation of the amplitude envelope you should have taken the absolute value of the sound signal. That's why you see certain blue peaks above the redline in the graphs.
Hey. I tried to find a cause for the graphical overshoot of the signal plot librosa creates, compared to the MAX plot. It is visible in several places in the plots shown in the video. I could not find any possible cause, or fix. Is this a known issue? Is this something purely graphical, and can I be sure that there is no problem with the calculation of MAX, or the time-axis alignment of the signal and the AE function?
At 33:00, for the Duke Ellington plot, at just ahead of the 5 minute mark, there is a peak with the max marked as lower than the max value. How can this be? There are other such peaks which are not captured in that plot as well.
I see this situation too, I think maybe there is a bug in librosa. Because when I use the matplotlib to plot the original signal, all the max points matched. You can try it yourself. orignalTimestep = [] for i in range(len(debussy_signal)): orignalTimestep.append(i*sample_duration) plt.plot(orignalTimestep,debussy_signal)
@@inverseai No, that's not true. The code is not considering negative values within the signal data. He should have taken the absolute value of the sound signals.
Great content! I just have a minor question: if fancy_ae_debussy is completely matched with ae_debussy then why are there some peaks not being covered by the red line?
Had the same problem. When applying the max() function to the whole signal, the max values were not included so I applied np.abs() to the signal and got the desired result. I guess the librosa.display.waveplot() has some logic to do something similar :)
first time here - loving the course - will definitely follow. One small bit of feedback if I may be so bold - surely you should be taking ABS() value of each sample rather than relying simply on the positive excursions of the waveform? Amplitude is dependent on both positive and negative excursions of the sample value.
Hi, I have a question. May I know how to determine the value for the FRAME_SIZE? This is because my result of the red graph does not match the signal perfectly.
@@baghdadiabdellatif1581 basically the frame size; i.e. hop size = frame size, meaning no hops .. But I guess having a hop gives a better granularity. They are approximations anyway
How do you keep the same size between the ae vector and the signal vector? the hop_length jump shouldn`t reduce 1/hop_length the size of ae vectors? Debussy has 661500 data, ae_debussy has 1292 data, but in the graphic they have equal number of data.
Nice tutorial. I agree with the comment by Wolfram Blechner that installing packages is the hard part. Without that, nothing can be done. So, I suggest providing info on that, or better yet showing at least one way of doing it. For example, after installing anaconda and jupyterLab I made a 'tutorial' environment that works for me using: conda create -n tutorial matplotlib numpy librosa ipython
Would you ever want to use the maximum absolute value instead of the raw maximum? Do you think those would be too similar where it doesn’t matter, or would it be useful on a more volatile signal (something where a digital process has introduced error)
Thank you for your video! It's very helpful. I have a question: Why there are some signal values which seem higher than the plotted AE values? Shouldn't all the maximal values included in the amplitude envelope? But the code seems correct.
Hi Valerio, I know this is an old video, but I would like to point out that for getting frames to time, you need to specify sample rate, I’m assuming librosa uses some default which is the same for the audio you used 22050?
Awesome tutorial! Quick question on the amplitude envelop plot. How come some of the amplitude spikes in the original waveform are not captured by the red AE line?
Lucas, as a test, I trimmed down the 3 audio files to just a couple hundred samples where there was the big spike in debussy (just samples 355148:355348). The Debussy amplitudes in this 200 samples ranged from negative 0.62515 to positive 0.2832, but It looks like the librosa.display.waveplot is drawn as an envelope that is symmetric and ranges from negative 0.62515 to positive 0.62515. So I guess the waveplot tool is actually displaying an envelope of plus/minus the abs(samp values) rather than drawing the actual amplitude values.
Thank you.i have a question plz. Why this method better than using Hilbert ()? import numpy as np import matplotlib.pyplot as plt from scipy.signal import hilbert, chirp duration = 1.0 fs = 400.0 samples = int(fs*duration) t = np.arange(samples) / fs signal = chirp(t, 20.0, t[-1], 100.0) signal *= (1.0 + 0.5 * np.sin(2.0*np.pi*3.0*t)) #The amplitude envelope is given by magnitude of the analytic signal. analytic_signal = hilbert(signal) amplitude_envelope = np.abs(analytic_signal) print(amplitude_envelope) plt.plot(t, signal, label='signal') plt.plot(t, amplitude_envelope, label='envelope') plt.show()
Hello, thank you very much. It has been very useful for me. I'm having trouble understanding alpha in librosa.display. How does it affect the color in this plot? I mean, why do some points change even though there isn't any overlap in the plot?
i am facing this error while trying to use ipd.audio ValueError: rate must be specified when data is a numpy array or list of audio samples. how do i resolve it ?
There is a small mistake, avoid the apostrophe marks from the code. eg: should look like - ipd.Audio(debussy_file) instead of ipd.Audio('debussy_file')
In the final amplitude envelope plots, I notice that there are a few peaks in blue that go above the amplitude envelope that was calculated. How did that happen?
Hi, thank you for your efforts. its really awesome video and explanation. however, it is possible to split audio with overlap audio. for example in one audio have two person are talking and both of them speak on the same time then the audio will overlap, it is possible to split the audio and recognise the audio belong to speaker 1 or 2? Thank you
Hi, Firstly I wanna thank you so much because I have watched all videos up till this and each one is extremely beneficial. But now I have a problem. To import librosa firstly I installed librosa on anaconda prompt by using this code "conda install -c conda-forge librosa". After that, I tried to import again librosa but I took an error on jupyter whisch is "ModuleNotFoundError: No module named 'librosa'". I searced it on google but I can find just this ( "conda install -c conda-forge librosa") for solution. If you help me, I will be thankful to you. Thanks in advance 🙏🙏
It is a great tutorial but I have a question regarding to the librosa.frames_to_time. Since each frame contains 1024 samples. Does the function always return the timestamp associated with the 1st sample of each frame(1024 samples)? If it is, then let's say there exists a frame where the maximum occurs at the 100th sample. If we now draw the amplitude envelope on the waveplot, then we are assigning the 100th sample value of that frame to the 1st sample timestamp of that frame. Am I correct?
That's a good point! I checked the documentation, but it doesn't say how librosa.frames_to_time works under the hood (you should check the code to learn more). However, I think your assumption regarding getting the timestamp for the first sample of a frame is reasonable. If that's the case, your second assumption is also correct.
@@ValerioVelardoTheSoundofAI Thank you very much for your prompt response and advice. After digging into the implementation, I discovered that the 1st assumption is correct in this scenario because we have not passed in anything to the n_fft paramters. Hence, it will be the 1st sample of each frame. However, if n_fft is not 0 or None, then it will not be the 1st sample.
There is another classical way of extracting envelope of a signal which uses Hilbert Transform. Please refer to this article. en.wikipedia.org/wiki/Analytic_signal
Can some one explain me what is the need of converting the frames array to time by using frames to time function. Can some one brief me what are we actually doing there and what was the necessity to do so?
This is great!! Thank you!! A couple of questions if you have time, no worries if not!... Do you have a link as to how to sign up for the soundofAI Google community? I've spent an hour already but haven't worked it out yet, I accidentally made my own community 😂 Google plus seems to have disappeared. I'll have another go later. I'm working with fairly high frequency animal calls and my amplitude envelopes are not looking quite right (They follow the signal both negative and positive , I'm not sure if it's due to the high frequency or if maybe I need to make the amplitude of all the signals uniform before I start or probably I've made a mistake somewhere.Not sure if I can post a screenshot, if I can I will... (My sample frequency is 192kHz, and I've also tried using a smaller frame size of only 64). Strangely the AE worked for one of my three signal's but not the other two, so also maybe I've just forgotten code for the other two somewhere..
Hello! First of all, thank you for all the videos; your content is really interesting! I have two questions (maybe they are basic): 1) I have seen the video from Leakage, however, I do not understand why the high frequency appears when discontinuity. And why does this "discontinuity" appear (we do not replay the sample; even though it cant be separated in integer periods). 2) I have try with 2 different samples (one music that I have and the noise file that you have in your website) but, when I calculate the time for the plt.plot(t, etc.) I have to calculate the time specifically for each sample. I cannot use the same t because then an error appears. Thank you again for all of your videos! :D
If I remember correctly, I picked these files from the GTZAN Genre Dataset. I think you can find a link to the dataset in a previous video in the series.
Thank you so much for you videos. I really appreciate the time you've taken for your explanations. I have one question on the amplitude envelope discussed here. I thought it would be the same as a envelope detector (such as a Hilbert transform or a rectifier with a low pass filter), but in the video you get a scalar per frame and an envelope detector would provide a vector of the length of the frame. I don't understand the difference between the two types of envelopes, concept-wise? Is the one in this video more to assess for loudness? If you have the time, I'd love to watch a video of you on envelope detectors in python.
Thank you for the tip! I still haven't found a way of resizing the font in Jupyter. That's one of the n reasons why I prefer to make videos with Pycharm :D
its a great and tiresome you did thank you alot. but please it will be really good if you get an input data of multiple .wav file from an audio folder thank you
Hi! I am excited in this calculation. Currently, the calculation is to calculate the whole piece, but I am wondering is there any way to get the array of envelopes in every second? I mean in the first second, the amplitude_envelope is [[0.00000000e+00, 2.32199546e-02, 4.64399093e-02, ..., 4.00544218e+01, 4.00776417e+01, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00], and second is..., and so on. Thanks!
Anyone in 2023 thats running into issues with librosa.display.waveshow(debussy) or librosa.display.waveplot(debussy). I was able to get it to work using numpy instead: import numpy as np plt.plot(np.linspace(0, len(debussy) / 22050, len(debussy)), debussy) plt.title("debussy") plt.ylim([-1, 1])