Real-Time Speech Recognition With Your Microphone [Beginner Tutorial With Full Code]

Подписаться 59 тыс.

Просмотров 52 тыс.

50% 1

Build a real-time local speech recognition system that uses your microphone with Python and Jupyter. This will run on your own computer, without the need for a cloud service or a GPU.
By the end, you'll have a fully working Jupyter notebook that can record microphone audio, transcribe it, and display it. You'll also have ideas for how you can extend it.
The full code and a project overview are here - github.com/dataquestio/projec... .
Chapters
00:00 Project overview
02:14 Creating Jupyter widgets to start and stop recording
11:13 Recording from your microphone with pyaudio
20:08 Recognizing live speech with vosk
29:51 Project overview and use cases
------------------------------
Join 1M+ Dataquest learners today!
Master data skills and change your life.
Sign up for free: bit.ly/3O8MDef

Опубликовано:

10 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 59

@SashaBaych 8 месяцев назад

Amazing! Thank you so much for this thorough and clear tutorial!

@benyusu8045 8 месяцев назад

in thread Thread probem

@aparnnaperi Год назад

Thank you for this, it worked for me. The explanation was also very clear in the tutorial, keep up the good work.

@michaelcamangeg1199 Год назад

Thank you!

@shyjukoppayilthiruvoth6568 Год назад

great tutorial

@aparnnaperi Год назад

Hi , As it is mentioned in the video, the output does take a long time to get to the screen. Is there a tutorial on how to use the recasepunc model directly in the same notebook? Any help would be greatly appreciated. Thanks.

@meditationandrelaxationmus7158 9 месяцев назад

Thank you it was a great tutorial. Can you please create the same with vosk about Speaker identification.

@wachsenmitaktien3593 2 года назад

sounds really interesting - is this the level of projects you will learn at the dataquest subscription or is the members area more the prequel for what you learn on the YT channel?

@Dataquestio 2 года назад

Hi Wachsen - on Dataquest, we have courses that teach you data concepts, as well as projects to help you apply your skills. We have both guided projects, which have more guidance than these projects, and portfolio projects, which are similar to the RU-vid projects (with some added instructions, etc). So Dataquest both helps you learn all of the data skills, and has projects to pull it all together.

@PressF5 Год назад

cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button

@user-jb4kt6vu9s 7 месяцев назад

Hi can you tell me how to run more than two language models at the same time... A video tutorial for the same would be a great help...

@hssp1534 Год назад

how to load the model if I have already downloaded it since after I entered the model name and ran the bloc of code it started downloading the model separately. Please advise how to load model if it's already downloaded

@anybcd 2 года назад

Thanks for this, i never knew widgets can be created in jupyternotebook. Thanks for this

@PressF5 Год назад

cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button

@hopelesssuprem1867 2 года назад

Thank u so much for this tutorial. Please create the same with vosk about speaker identification

@Dataquestio 2 года назад

Thanks for the idea! -Vik

@hopelesssuprem1867 2 года назад

@@Dataquestio thank's for your answer. I will watch this with a great pleasure)

@franekpodlach7217 Год назад

Nice

@caseykauf6615 Год назад

Can you use pycharm instead of jupyter?

@user-sg9kf8tu3d 3 месяца назад

sir, i am getting error that subprocess returned non-zero exit status 1. please help me solve the problem. I need to show this project in 2 days in my college

@SIR_Studios786 Месяц назад

where is the model downloaded ?

@olivercarmignani9082 Год назад

Really nice explanation video! I tried to use id int visual studio code in a while loop, but i don't have success. Which changes have to be applied?

@PressF5 Год назад

cab someone help me everything works fine but the widgets that 2 buttons are not displaying in my jupytr lab record and stop button

@quinman16 Год назад

🙏does this work with CircuitPython?🙏

@surajt2077 10 месяцев назад

How would you modify this to work as a web app? Or on a similar client side like a bot that joins a zoom calls? Or a browser plug-in that you can turn on and off and transcribe live? Curious as I want to implement something like this.

@surajt2077 10 месяцев назад

I know you said you can’t run on the cloud. But what if you create an endpoint that receives a boolean via a button or programmatic call that then launches this code to start transcribing by accessing the local microphone from the cloud? Is this kind of thing possible?

@hautboisjc Год назад

Hi Vik, thanks for doing this. However, it doesn't work for me :( When i hit the record button, it doesn't transcribe whatever I say. Instead, it shows "WARNING: reverting to cpu as cuda is not available"

@Dataquestio Год назад

Hey there - it's hard to diagnose the issue remotely, but I wouldn't worry about the warning. CPU inference can work with vosk. The most likely issues are that there's no function connected to the on_click event for the record button, or the thread hasn't started for recording. Adding print statements in the code can help you find which pieces are working/aren't.

@ohassairi 5 месяцев назад

can you add translation offline ?

@pfuhad3760 Год назад

Is Vosk the best speech to text opensource library . If there are others with better accuracy without using GPU , can you please tell me .

@Dataquestio Год назад

As of now, I would use whisper instead of vosk. There is a version of whisper that runs on CPU.

@pylou7064 Год назад

Hey nice video ^^, So, it's not real time right? Is 1 second delay work ? And i have an other question is this possible to know other information, like the time the word is pronounce and when ? Thanks .

@Dataquestio Год назад

Yes, there is a short delay to process. I'll probably make another video at some point showing how to do this without vosk (so you can get more info, and do true real time).

@narayanasaicharan2217 Год назад

@@Dataquestio Eagerly waiting for the video😋

@sertacince6571 Год назад

@@Dataquestio As far as I can see, the video has not been published and I need it urgently. Could you explain roughly how to do it from here?

@rohitdarshan_dtu3520 Год назад

sir, I am getting an issue with the last lines of code if you could clear my issue will be very happy possibly please see my concern as soon as possible model = Model(model_name="vosk-model-en-us-0.22")

@nishaldevadiga6766 8 месяцев назад

same Edit: I found the solution Your model is getting stored in the .cache folder which you can find in the user folder of C:\ Users\..\.cache\vosk\ Delete all the vosk model (litrally delete all folders/files inside that) folders u have Then run that segment of program Hope it helped you!

@ohassairi 5 месяцев назад

i tried it. it works but i got some words lost !! can't find why

@REALVIBESTV Год назад

I need something like this that can work in Unreal Engine 5.1

@sharliduravlog2 2 месяца назад

is it offline or online?

@moses5407 Год назад

ON device translation, even with a delay, would be great,too!

@Dataquestio Год назад

You can actually do this - github.com/mozilla/translate .

@hssp1534 Год назад

I ran your code provided in the link but it gives "OSError: [Errno -9998] Invalid number of channels"..how to resolve it. Please advise for solution

@dredmaster9343 8 месяцев назад

i am getting the same error, have you resolved your error? if yes then do tell me

@hssp1534 8 месяцев назад

@@dredmaster9343 Nope not yet. I left trying. I havent revisited the code since a long time

@stephenyipck Год назад

This is great but what if I want to use this code in a .py file?

@Dataquestio Год назад

Hi Stephen - you can write the same code in a .py file. JupyterLab also allows you to export notebooks as .py files that you can run.

@kilovolt2494 Год назад

@@Dataquestio Actually, that was part of my question. I got rid of widgets (for now) and made a .py file. It works perfectly fine, except for one little detail: when the main function stops, it doesn't terminate the script, so it hangs forever after printing "Stopped." I already tried joining threads and even explicitly saying 'quit' at the end, it still hangs. What can be the reason of that? Is that caused by vosk?

@NickolayShmyrev Год назад

This tutorial is wrong, Vosk do not recommend using pyaudio due to latency issues. Our demos use sounddevice.

@Dataquestio Год назад

I wouldn't call the tutorial wrong. It works fine, and latency was not an issue from what I could tell. Both sounddevice and pyaudio are wrappers over portaudio, so they shouldn't function extremely differently (aside from the Python API being different). I couldn't find any references to sounddevice on the vosk documentation site - if that is indeed the recommended way to use vosk, I would advertise that fact somewhere.

@SARMADALHAFIDH Год назад

Is it free or must pay, please?

@nishaldevadiga6766 8 месяцев назад

free

@SuperShank76 Год назад

I took the trouble to watch the entire video but when I hit "Start Recording", there is no transcribing happening. There is no error either. Thumbs down.

@ramwarner5541 5 месяцев назад

Sameee

@ivorpratap1479 Год назад

Is anybody Help, Says attribute error FYI p = pyaudio.Pyaudio() , ----> 3 p = pyaudio.Pyaudio() 4 for i in range(p.get_device_count()): 5 print(p.get_device_info_index(i)) AttributeError: module 'pyaudio' has no attribute 'Pyaudio'

@JonzieBoy 3 месяца назад

You have to be very careful with capitalization, Pyaudio is not the same as PyAudio

@doritos7372 Год назад

OSError: [Errno -9998] Invalid number of channels . i was getting this

@hssp1534 Год назад

I rectified it after i used the right sounds device. I was using my cell phone mic earlier but then switched to a headset and the error never appeared again