Speech Recognition And Summarization System In Python [Project Tutorial]

Подписаться 59 тыс.

Просмотров 17 тыс.

50% 1

We'll build a system that can recognize speech in audio files, generate a transcript, and summarize the transcript.
By the end, you'll have a full speech recognition system that will run on your computer, and be able to transcribe and summarize podcasts, lecture notes, and meeting recordings.
All of the code will be written in JupyterLab using Python. You can read an overview of the project, and see the full code, here - github.com/dataquestio/projec... .
We'll start out by build a speech recognition system using a python package called vosk for the recognition and pydub for loading the audio. Then, we'll use recasepunc to add punctutation to our transcripts. We'll then use a huggingface pipeline to summarize the transcripts.
You'll need to install some packages and download 2 audio files, which is covered in the README - github.com/dataquestio/projec... .
We have some viewer Q&A at the end, which may help you if you have any issues with the project.
Chapters:
00:00 Introduction
01:53 - Speech recognition using vosk
11:34 - Adding punctuation to our transcript with recasepunc
17:01 - A function to transcribe longer audio files
21:13 - Summarizing the transcripts using huggingface transformers
27:21 - Wrapping up with a project overview
28:22 - Q&A: Speech recognition using vosk
31:45 - Q&A: Adding punctuation to our transcript with recasepunc
35:03 - Q&A: A function to transcribe longer audio files
37:38 - Q&A: Summarizing the transcripts using huggingface transformers
39:26 - Q&A: Project overview
-----------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef

Опубликовано:

11 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 44

@aranlufthansa45 2 года назад

Really helpful so far. These videos are a blessing especially because of the depth the tutorials go into. Although Dataquest is a little too expensive for my pockets this RU-vid channel is really a goldmine.

@Dataquestio 2 года назад

Glad to hear you like them, Aran! We regularly have sales and scholarships for Dataquest to make it more affordable.

@angelandaarvee6437 10 месяцев назад

Nice video...u really are good in teaching..nice work..thanks

@stephanefedim6759 11 дней назад

Thanks for this high-quality video. Can you give the impact of the splitting in the semantic of the summarization? Assuming that it's not intelligent tokenization you applied, batches of 850 tokens can overlap in terms of the meaning of the global doc.

@apostolistzimas 2 года назад

Hi and thanks for the tutorial. Can you make a tutorial on how you can add custom vocabulary in the dictionary of the vosk model you used in this tutorial. I have read the documentation but I can't get my head around it. I am using Python in PyCharm IDE and I don't know if it requires Linux or not. For sure something advanced.

@paulaganbi5236 7 месяцев назад

Hey Vik. Thank you for this video. Please I need your urgent response on this one lol: is it possible to integrate a topic modeler with this script? if possible how would I achieve this? Thank you

@hamzafayaz775 Год назад

Hi Bro can you make a video to do transcibe with speaker diarization where it specify what each speaker said, I need this i would b glade if you can make a video or guide me in this

@tanishqhazari8760 Год назад

Not able to load the mp3 file. Getting: [WinError2] The system cannot find the file specified. Although the file is present in the correct directory.

@petkovicjanuario9674 Год назад

I was having the same problem and solved it installing ffmpeg from conda-forge (I use anaconda environment, as suggested by them at the beginning of the Data Science Path). Run the following code in anaconda prompt: conda install -c conda-forge ffmpeg

@bravelionable 2 года назад

Thanks for this tutorial. Please can you help with the function for cleaning silence in an audio? Thank you.

@Dataquestio 2 года назад

Hi there - I can consider doing this in an upcoming video. Also take a look at this - github.com/WyattBlue/auto-editor . -Vik

@bravelionable 2 года назад

@@Dataquestio thank you 😊

@jordancarbone3474 2 года назад

Where is the model file pulled from? I get FileNotFoundError when I try to run the 3rd line of code: model =Model(model_name="vosk-model-en-us-0.22")

@hareshsuppiah9899 2 года назад

I get the same error

@samuelmichael7909 2 года назад

You'll need to install some packages and download 2 audio files, which is covered in the README - ru-vid.com?event=video_description&redir_token=QUFFLUhqa2IzNnJtc0h1RUllM3BDU1dYMHVqUmJnSF9FQXxBQ3Jtc0tuaGVOZTE1bXNPSHYzRy1qY0FuZ0c3S3gzenM5RHNNelNtenRGc3dnWU1ScEZrMS1DSVNwVjBWMmVlUmdkWV95d3hMUTZGeGI2V0VxT05YLWQtZkFVZ3FPbXkwMm90UUlEZ2t3NDRWMW1PUEpVYjFjWQ&q=https%3A%2F%2Fgithub.com%2Fdataquestio%2Fproject-walkthroughs%2Fblob%2Fmaster%2Fspeech_recognition%2FREADME.md&v=Ot0TjanvMxM

@hareshsuppiah9899 2 года назад

Managed to fix by watching the set up provided in this other YT link ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-3Mga7_8bYpw.html&ab_channel=BrandonJacobson

@nailouza557 2 года назад

@@samuelmichael7909 I've installed all the packages mentioned and downloaded the 2 audio files into the same repository, still getting the same error. Anyone else figure out how to pull the model file?

@Dataquestio 2 года назад

Hi there - the model files download automatically for me with vosk version 0.3.42 on mac. You can see the vosk code here that downloads the models - github.com/alphacep/vosk-api/blob/master/python/vosk/__init__.py . Is it possible that you have an older version of vosk? Are you using mac, linux, or windows? If the models don't automatically download, you can still get the models from here - alphacephei.com/vosk/models . You just have to put them in /usr/share/vosk, or in your home folder at .cache/vosk. Make sure to extract the zip file into a folder with the same name as the model.

@FranciscoHernandezAlejandre 11 месяцев назад

Any solution for this? RuntimeError('Error(s) in loading state_dict for {}: \t{}'.format( RuntimeError: Error(s) in loading state_dict for Model: Unexpected key(s) in state_dict: "bert.embeddings.position_ids".

@a092devs Месяц назад

did you fixed your issue?

@AtticusDenzil Год назад

really cool shit!

@warrenjudsonmcsa2961 2 года назад

Hey, where does the marketplace.mp3 file supposed to live?

@Dataquestio 2 года назад

Hi Warren - this should go in the same directory as your code file. If you want to put in a folder, just change the path when you open it. Example - AudioSegment.from_mp3("marketplace.mp3") if it is in the same directory, AudioSegment.from_mp3("folder/marketplace.mp3") if it is in a folder.

@warrenjudsonmcsa2961 2 года назад

@@Dataquestio I had it in the right spot but had to manually install ffmpeg to bin folder to get AudioSegment to play "marketplace.mp3". Thanks for the response.

@hopelesssuprem1867 2 года назад

can I create a diarizer with vosk i.e. can I get a conversation with speakers (speaker1:text... and so on)?

@Dataquestio 2 года назад

Yes, vosk does have a speaker identification model - you can find that model here - alphacephei.com/vosk/models .

@hopelesssuprem1867 2 года назад

@@Dataquestio thank you, but I can't understand how to use it. I mean that there is an example on github but it shows only x-vecrors, but how to get speaker1, speaker2 and so on I don'y know. I would be thankful for any hint in this direction

@AkashBNHAI 4 месяца назад

@@hopelesssuprem1867 hi got any idea to include speaker identification and integrate it with the code? Please help out

@bensoul2687 Год назад

I have an error (file not found error) the system cannot find the path specified

@Dataquestio Год назад

Is the file at the path you specified? You want to make sure to download the file and move it to the right directory.

@petkovicjanuario9674 Год назад

@@Dataquestio I use the Anaconda Environment and running "conda install -c conda-forge ffmpeg" in Anaconda Prompt solved it.

@maphilak 2 года назад

Please do people counting using opencv and python from video/image

@Dataquestio 2 года назад

Thanks for the idea, Maphila! I'll consider it for an upcoming video.

@abhinavab5673 Год назад

Frame-Rate = 16000 ^ SyntaxError: cannot assign to operator

@oxydol3456 2 месяца назад

importing predefined ASR model form vask. i see.

@stonejack-xc3vp Год назад

6 de lai~

@casafurix Год назад

hmm nice rick-roll

@udiibgui2136 Год назад

Hi! Thank you for the tutorial. I ma getting /opt/anaconda3/lib/python3.8/site-packages/pydub/utils.py:170: RuntimeWarning: Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work warn("Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work", RuntimeWarning) Which results in FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe' when I try loading the file. Any help?

@Dataquestio Год назад

You need to install ffmpeg - ffmpeg.org/download.html

@warrenjudsonmcsa2961 2 года назад

Hello, in the video you had “user/vik/.virtualenvs/voice/bin/python” as a path then "recasepunc/recasepunc.py predict recasepunc/checkpoint", shell=True, text=True, input=text)" as the rest of the code. I have put the 2 files(checkpoint and recasepunc.py in my env environment but I keep getting a called processor error. here it is- CalledProcessError: Command '/Users/wjuds/anaconda3/envs/Dataquestvoice_20220702 recasepunc/recasepunc.py predict recasepunc/checkpoint' returned non-zero exit status 1. I read through stack overflow but I'm even more confused. Please assist again. Thanks

@Dataquestio 2 года назад

Hi Warren - user/vik/.virtualenvs/voice/bin/python is the path to your python interpreter. From the path that you shared, it looks like you'd want to use `/Users/wjuds/anaconda3/envs/Dataquest/bin/python` as the interpreter. `recasepunc/recasepunc.py` is the path to your recasepunc file. It's a relative path, so it should be in the same directory as your code.

@samuelmichael7909 2 года назад

CalledProcessError Traceback (most recent call last) in () 1 import subprocess 2 ----> 3 cased = subprocess.check_output("python recasepunc/recasepunc.py predict recasepunc/checkpoint ", shell=True, text=True, input=text) 1 frames /usr/lib/python3.7/subprocess.py in run(input, capture_output, timeout, check, *popenargs, **kwargs) 510 if check and retcode: 511 raise CalledProcessError(retcode, process.args, --> 512 output=stdout, stderr=stderr) 513 return CompletedProcess(process.args, retcode, stdout, stderr) 514 CalledProcessError: Command 'python recasepunc/recasepunc.py predict recasepunc/checkpoint ' returned non-zero exit status 2. please how do I fix this?

@Dataquestio 2 года назад

Hi Samuel - did you download the vosk punctuation model from alphacephei.com/vosk/models/vosk-recasepunc-en-0.22.zip and unzip it into the directory with your code? It should be unzipped into a folder called recasepunc. You also need to install torch and transformers.