Real-time Speech to Text with DeepSpeech - Getting Started on Windows and Transcribe Microphone Free

Подписаться 9 тыс.

Просмотров 132 тыс.

50% 1

Thank you very much for watching! If you liked the video, please consider subscribing to the channel :)
In this video I explain how to setup the open source Mozilla's Deep Speech engine on Windows to recognize real-time microphone audio for free. The same process can also be used to transcribe an audio file.
DeepSpeech: github.com/mozilla/DeepSpeech
Examples: github.com/mozilla/DeepSpeech...
Common Voice: voice.mozilla.org/
Or follow me on:
Github: github.com/federico-terzi/
Twitter: / terzi_federico
Website: federicoterzi.com

Опубликовано:

16 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 288

@FedericoTerzi 3 года назад

If you are interested in these topics, you can also follow me on Twitter :) twitter.com/terzi_federico

@jeongwonkim247 3 года назад

was there a video on how to transcribe the audio files into text? Please let me know and thank you!

@ALZlper 3 года назад

I really like, that you mention the platform at the end!

@dibu28 2 года назад

Thank you. Started DeepSpeech in a minutes.

@TTTrouble 2 года назад

Thanks so much for making this video, it was exactly what I was looking for!

@dayworkhard 3 года назад

thank you for sharing. i donated my voice there. this is so cool!

@FedericoTerzi 3 года назад

That's great! :) We are one little step closer to an open voice model

@samriviera6299 3 года назад

Thanks for this video! I got everything working. As you said, it's not as good as proprietary solutions but for simple commands like "start", "stop" or "turn on light" it should work. Looking forward to contribute.

@FedericoTerzi 3 года назад

Glad you liked it :)

@KuboF 3 года назад

Thanks for this short, straightforward, to-the-point video! By reading the manual I thought I am going to need to take a vacation to learn just to run DeepSpeach, now I am very confident about doing it quite quickly!

@FedericoTerzi 3 года назад

Thanks! Running it is pretty easy with the prebuilt model. Things start to get real complex when you want to train your own :)

@KuboF 3 года назад

@@FedericoTerzi Yeah, using pre-built model is my first step to training my own 😅

@FedericoTerzi 3 года назад

Good luck! If you succeed, please let me know how hard it was :)

@KuboF 3 года назад

@@FedericoTerzi I very much hope I could one day 😅

@patataboom2645 3 года назад

Soo have you finished? :))))

@Karma-vf2qu 4 года назад

Uuu, really good content here! Grandee

@FedericoTerzi 4 года назад

Thanks :)

@sisfabricio Месяц назад

Works on Windows after struggling for a while, many thanks

@shravanhegde2237 25 дней назад

what struggles were they ,could u please tell em?i need to set it for my project so it would really be helpful

@SivaShankarsss 3 года назад

I was looking for this kind of video.. Currently I am working on creating AI assistant. This will help me a lot

@silversurfer8057 3 года назад

realy helpful for me (I think your video is the only one on the subject?). in addition to this, a tutorial on mozilla's TTS would actually be great. I would like something more detailed for that. I currently don't understand how to use new datasets to get other voices. i guess you have to train a model with a dataset. a tutorial on this would be really really cool! maybe you have also dealt with it?In any case, deepspeech and tts can theoretically be combined well.

@hitlab 4 года назад

Thanks for making this man!

@FedericoTerzi 4 года назад

You're welcome :)

@jane_shi 2 года назад

Thanks for ur video! I used Python 3.8.6 and DeepSpeech v0.9.3 and it worked well!

@hssp1534 Год назад

but im not able to find the deepspeech library in jupyter. How did you install it?

@jane_shi Год назад

I just did what he showed in the video

@yacinemamdouh1271 3 года назад

Great Video, I had some problems but now it works. Thank you

@FedericoTerzi 3 года назад

Thanks!

@khalidelgazzar Месяц назад

Great video. Thank you 😊

@christosangelopoulos 3 года назад

Job nicely done and presented, thank you.

@FedericoTerzi 3 года назад

Thanks!

@ilyasayusuf5447 3 года назад

Wow great library thank you

@LukeHildreth 3 года назад

Got this working on windows! thanks for the tut!

@FedericoTerzi 3 года назад

Glad to hear that :)

@sauravprashar 3 года назад

Could you please help me I am getting a DLL error

@LukeHildreth 3 года назад

@@sauravprashar I'm actually not sure how to answer that. I'm pretty new to programming. Hope you find the answer!

@techtree1369 Год назад

Thank you!

@sslaia 3 года назад

Excellent. If you could make a tutorial on how to train own model. The big players have already done that for well-known languages. In contrary this one could help with neglected languages like mine. So a tutorial on how to train own model in a new language would be very helpful.

@FedericoTerzi 3 года назад

Thank you! Unfortunately, I don't know the model that well...

@marly1017 3 года назад

can you please do a video about implanting this code to a project please?

@bouchradahamni9881 3 года назад

very nice . plz make a video of how you train your own model

@maputo658 4 года назад

super nice! was able to follow it successfully, but on a mac.

@FedericoTerzi 4 года назад

Glad to hear that :)

@LukeHildreth 3 года назад

I'm trying this too. Hid did you activate the script after setting up the virtual environment?

@Monsieur.Nobody. 3 месяца назад

Do you think we can run whisper or fast whisper llm on esp32's? Sort of in a form factor like the carputer or beepberry?

@SuperlativeCG 2 года назад

What if I have multiple wav files and I want to transcribe each one and output to a text file? How do I do that?

@stefang5639 3 года назад

Thanks, finally a good tutorial for Deepspeech!

@wellingtonfurtado2074 3 года назад

Do you can do a tutorial teaching about how use deepspeech in unreal engine?

@potpu 2 года назад

Hi Federico, thank you for your video. do you know how to integrate Deepspeech into talon?

@ahmedsaeed5149 2 года назад

Thank you thank you thank you

@waquezemerson4863 2 года назад

Hi can I ask on how I can integrate this to my application? My application is now working on ionic environment is it possible to integrate this one?

@ariefsaferman 2 года назад

does the vad streaming work outside deepspeech? i wanna use it in another ASR framework

@Codacus 3 года назад

nice info

@chetanmundhe8619 4 года назад

Very nice video,

@FedericoTerzi 4 года назад

Thank you :)

@yohannesayana9456 Год назад

How can we build a speech to text model from scratch in other less resourced languages using deepspeech?

@explorefoodculture 2 года назад

Hi Terzi, can this software run on mac? and can it translate movie videos in to any language? thanks in advance!

@sayyidumarshiddiq2397 2 года назад

What should i do if my laptop has installed python 3.8 version

@Dumpitzz 4 года назад

„Scripts\activate“ is not working. I get a error „parameter wrong -850“

@skaterope 3 года назад

thanks !

@FedericoTerzi 3 года назад

You're welcome :)

@sebastianochipocomancini1853 3 года назад

Hi! You are using an already pre-trained model to do this speech-to-text application. But what if you want to train this model with another dataset, like for example in spanish or in italian? Which would be the steps to take in order to train the model to recognize speech in another language that isn't english?

@ThesongsIlikeThemost 3 года назад

hi, you can find already trained model for Spanish, Italian, German, Polish, and French here. gitlab.com/Jaco-Assistant/deepspeech-polyglot

@sebastianochipocomancini1853 3 года назад

@@ThesongsIlikeThemost Thank you so much, I finally found the spanish model here: drive.google.com/drive/folders/1-3UgQBtzEf8QcH2qc8TJHkUqCBp5BBmO (which is a link that was on the url you sent me). Replacing the .pbmm and the .scorer files in the command line, it works fine for spanish!

@samuelige9368 3 года назад

Can you use deepspeech for a diacritic system

@fashadahmedsiddique8412 2 года назад

Hey, can it be possible upon using colab environment

@liamblu 3 года назад

I get stuck at installing the requirements.txt ERROR: Could not find a version that satisfies the requirement deepspeech~=0.8.0 ERROR: No matching distribution found for deepspeech~=0.8.0 Edit: I already downgraded to Python 3.9.0 which is said to be compatible...

@chaitanyamalpure6226 3 года назад

Thank you for the video. Nice tutorial to get familiar with!!!!! Also, I have found a german pre-trained model. could you please explain how to work with german or any other pre-trained model.

@FedericoTerzi 3 года назад

You should be able to simply pass the german model and scorer and you should be ready to go :)

@chaitanyamalpure6226 3 года назад

@@FedericoTerzi Thanks alot. It worked!!!!!!!!!!!!!!!!!!!!!

@izufarahiyahizzuddin2119 Год назад

i already run the code, but it cannot recognize my voice, anyone has solution for it

@niharjani9611 Месяц назад

Heyy, Pls Solve my query , How many languages does it support ? Like english , spannish could you provide a list of it., I tried to find it on Github and reddit, but was unsucesfull !!!

@amrousimen7170 3 года назад

good video

@aznperswazinable 2 года назад

(deepspeech) C:\Users\user\Documents\deepspeech>pip3 install deepspeech ERROR: Could not find a version that satisfies the requirement deepspeech (from versions: none) ERROR: No matching distribution found for deepspeech pip and pip3 not working on version 3.10 any ideas?

@Luc_Skywalker 2 года назад

ERROR: Cannot install deepspeech==0.9.3 and numpy>=1.15.1 because these package versions have conflicting dependencies. deepspeech 0.9.3 depends on numpy=1.12.0 I am unable to get around this to work, any idea?

@waveNiaC 3 года назад

Can we somehow play with the energy(loudness) levels under which audio is captured , triggering the transcription.? I mean every little sound triggers deepspeech, while we want it to be triggered only when a person speaks. Can somehow an energy threshold be determined? I'm working on it, but I could save some time if there is already a solution. There seems to be a condition in vad_collector() that I am finding hard to understand. Thank you

@FedericoTerzi 3 года назад

Hey, yes that's almost surely possible by playing around with the audio stream. I don't know exactly how though

@soulkingdom4600 3 года назад

what is the difference between deep speech and deep speech 2?

@balajicmb1132 2 года назад

Speech to text transcribe open source library using python pycharm an another id Es using method code is available bro?

@murtazahussain8224 3 года назад

Is deepspeech compatible with nvidia Rtx3090 ?

@shampoo1296 2 года назад

help Import Error: DLL load failed: no se puede encontrar el modulo especificado

@abhignaconscience358 3 года назад

At 5:04 You told you're going to show nice little project what is it ??

@simgplusnervt4698 2 года назад

Nice video. Can you make a video about the use in android?

@tommyboy3164 2 года назад

was wondering if you could help. I'm getting this error: ERROR: Could not find a version that satisfies the requirement deepspeech (from versions: none) Also, where do you put the two model files after you download

@KPawan108 9 месяцев назад

I am also getting the same error. Did you get the answer now?

@weweweqeqeqe3240 2 года назад

can this use for movies ?

@adribmahmud 2 года назад

can you please make a video how to train ?

@sebastianochipocomancini1853 3 года назад

What should I do if I want to use an application like this one for another language like spanish?

@stefang5639 3 года назад

You can download the language model for other languages as well from the source shown in the video.

@doodlearsh739 2 года назад

hi , i cant install requirement.txt with pip . can you help me

@tobiaskarl4939 3 года назад

Different numpy versions requirements make it fail for me. deepspeech 0.9.3 numpy 1.14.4 pip 10.0.1 PyAudio 0.2.11 scipy 1.5.4

@watevakid 3 года назад

hmmm after I install DeepSpeech into my venv, I do not see "mic_vad_streaming"... any idea on how to install it?

@FedericoTerzi 3 года назад

You have to download it from the deepspeech examples: github.com/mozilla/DeepSpeech-examples

@rosarangithalagahawatta6300 2 года назад

how can i download mic_vad_streaming

@Piriponzolo 3 года назад

Ciao, Federico. Complimenti per il video, molto bello e interessante! Deep Speech funziona anche per l'italiano?

@FedericoTerzi 3 года назад

Grazie mille! Si c'è un modello italiano, la performance non è il massimo ma funziona: github.com/MozillaItalia/DeepSpeech-Italian-Model

@Piriponzolo 3 года назад

@@FedericoTerzi Ciao e grazie, Federico. Ho scompattato lo zip, ma poi mi sono arenato. Come si va avanti?

@FedericoTerzi 3 года назад

Dopo il processo dovrebbe essere simile a quello del video, anche se non ho mai provato a farlo girare direttamente (ho solo fatto delle prove con il Bot telegram che lo usa). Ti conviene guardare gli esempi sulla repo o contattare il maintainer, che sembra molto preparato a riguardo :)

@Mr_Yod 3 года назад

@@FedericoTerzi Grazie: lo proverò. Gli altri sistemi che ho provato in Python sono atroci o richiedono la connessione all'internet (quello di google). Certo però che essere compatibile solo con Python 3.6 quando siamo alla 3.9 già da un po'... =( EDIT: Dal link che hai messo dice "Requisiti: 'Python 3.7+' "

@istiyakahamedmilon6512 3 года назад

Can I use it to generate Bengali language?

@abdulbaqi6170 2 года назад

There is an article on internet how to make srt files for movies via deepspeech. I can't get that working in the windows can you make a video how to convert audio files into text or srt via deepspeech pls? it would be very useful and increase your video views

@tobiaskarl4939 3 года назад

1) Python 3.6.5 doesn't work. I updated to 3.6.7 2) activate give an error ... edit activate.bat in Scripts folder and put and '.' after "delims=:" in line 4 then execute Scripts\activate.bat explicitly

@ramsimmha8672 2 года назад

Its Really cool! I tried this its working but its not printing the text which got listened. Is anyone here faced this? Please help me to fix this.

@imsteven3044 3 года назад

Why is teh function of the scorer?

@vasanthmaisa293 9 месяцев назад

how did you directly get mic_vad_streaming folder inside the deepspeech folder without doing anything

@abdullamasud4278 4 месяца назад

he cut out that part from the video. After downloading the file, he simply copy pasted it inside the folder

@droidsons1371 3 года назад

NIce Tutorial..! So I have a custom trained Language model which has (.model) extenstion, how to I convert it into .scorer file?

@FedericoTerzi 3 года назад

Thanks! They are two different things, you can't convert one into the other :)

@sibyllasystem1209 Год назад

Hope we could use it in the Windows environment so that I can study foreign languages easily somemday : )

@mytop5602 3 года назад

amazing, thank you. can you please make a new video how to install it on debian and train it?

@FedericoTerzi 3 года назад

Thank you! The installation process should be pretty similar on Debian, as long as you have the right python version. Regarding training the model, that's very difficult and expensive to do...

@SivaShankarsss 3 года назад

How to train with Indian ascent

@freegsbox 3 года назад

Awesome!! can it recognize from files too? and how, please?

@FedericoTerzi 3 года назад

If I'm not mistaken, the script used in the video also accept an argument for wav files :)

@lemon3335 Год назад

How to integrate into UE4

@danielwhite5997 3 года назад

I would like to use deepspeech on a website. Is there a good method for running this in a Javascript environment?

@FedericoTerzi 3 года назад

There might be, but I dont' have any experience with it! If you are OK only supporting Chrome, then your best bet would be the Web Speech API (which is free and works great): developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API/Using_the_Web_Speech_API

@esakkisundar 3 года назад

@Federico Terzi, Im from India. It is not recognizing Indian English accent. Any thoughts on how to get Deepspeech recognize

@FedericoTerzi 3 года назад

Unfortunately, there's not much we can do about it. That model was trained on American english, so it struggles with other accents

@1979gian 3 года назад

Ciao Federico, grazie per il fantastico tutorial! Mi chiedevo se magari potevi gentilmente potevi farne uno con l Italian Model per i principianti come me

@FedericoTerzi 3 года назад

Ciao Gianluca, grazie per i complimenti! Non posso prometterti niente dato che non è la mia area di competenza, ma me lo segno :)

@robc3863 3 года назад

Thanks for the video! Is any guidance on how to integrate DeepSpeech into an application on Windows? I'm sure that would be very useful for developers! Thanks!

@FedericoTerzi 3 года назад

Hey, If you app is written in Python, the integration would be pretty easy. Otherwise, your best bet is to look at "tensorflow-lite deepspeech", although I don't have any experience with that

@robc3863 3 года назад

@@FedericoTerzi Hi, thanks but our app is C++, but so far not found any example of binding DeepSpeech to it. We also don't have many clients with nVidia GPUs...

@FedericoTerzi 3 года назад

Nvidia GPUs are really not needed (as long as you are not training the model on the client's PC), CPU will handle inferring ok for most use-cases. Regarding the lack of examples, I'm sorry about that, probably the recent Mozilla layoffs did not help the project...

@user-xk4sj2lz9h 3 года назад

What should I add to change the voice recognition language?

@FedericoTerzi 3 года назад

If you are lucky, you might be able to find a pretrained model for your language online. At that point, you can simply point the script to the other model. If you can't find it, then you could create your own model in theory, but that is very difficult in practice

@patataboom2645 3 года назад

I'm working on a college project and I need to make the speech-to-text in my language. Any idea how to use deepspeech in Romanian? I saw the language is available

@mozes_ma 2 года назад

Hey, similar challenge here, any ideas so far?

@Cezar-on8lb 6 месяцев назад

Hello! How DeepSpeech can be compared with Open AI Whisper?

@FedericoTerzi 4 месяца назад

No reason not to use Whisper today! It's amazing

@ragnov3286 2 года назад

Can you also Integrate deepspeech into a web app with some API? thanks

@FedericoTerzi 2 года назад

If you're using Chrome or Safari, you might want to check out the Web Speech API, which is much simpler for web apps :) developer.mozilla.org/en-US/docs/Web/API/Web_Speech_API

@rakeshkumarkuwar6053 3 года назад

Hello Federico, Thanks for the video. I followed the steps but after recognizing the audio, it is not returning any result in the command line. After "Recognized: " it's just blank. Sorry, I can't attach any screenshot. I'm using deepspeech 0.7.4 and it is working fine with audio to text conversion. But for microphone application throwing this issue.

@FedericoTerzi 3 года назад

Hi, most probably you have a problem with your microphone, check if you can record correctly using another application

@fablefoxweaver 2 года назад

I have the same problem. It listens fine (as in, in detects that I'm talking), but doesn't recognize any words. Using 0.9.3.

@jacobkelley257 3 года назад

so I followed everything you did. originally started with python 3.7 and it indeed eventually ran int an error trying to install the requirements.txt so I downgraded to 3.6.8. deleted the folder and started over. this time I got everything to work and when i run the mic_vid_streaming.py with the downloaded files, it says "listening..." and whenever I speak it says "Recognized: " but says nothing after that. it clearly is hearing me because it only spits out "Recognized: " when I say something, but then it doesn't print what I said. have any idea what it might be? I'm a begginer to python and coding in general but I was trying to troubleshoot by changing line 194 to text = stream_context to see if my words were somehow in that but it just says "Recognized: " not sure what that means

@FedericoTerzi 3 года назад

Perhaps it does not hear you loud enough, can you try with another microphone? If I recall correctly, there is a "device" option in the script to specify it

@LukeHildreth 3 года назад

Is it possible to write these commands into a python file and just run that?

@FedericoTerzi 3 года назад

Sure! You can simply edit that script file to fit your needs :)

@ritwikghorui2731 3 года назад

Thank you so much, but if anyone has done this in a python file kindly please share the link. I'm facing some problems kindly please if anyone has done please provide the link. I have a deadline coming up, please help me.

@fahrul8025 3 года назад

Awesome video ! i have already following this instruction the step one by one, but at the end, when the last step. i have a problem "ModuleNotFoundError: No module named 'webrtcvad' . can you help me with this problem? thanks.

@FedericoTerzi 3 года назад

You might need to install the package with: "pip install webrtcvad"

@fahrul8025 3 года назад

@@FedericoTerzi Awesome! It works now. But when im speaking there is no any results. Im sure my microphone works well. Any suggestion?

@jeongwonkim247 3 года назад

was there a video on how to transcribe the audio files into text? Please let me know and thank you!

@FedericoTerzi 3 года назад

Yes, you can use the script to transcribe audio files as well, but be prepared for some not-so-good results. What's the script "--help" option

@DaeOh Год назад

Thanks. I can't find the follow-up video though

@DaeOh Год назад

Nevermind, I used Whisper for this application!

@purushothaman2783 3 года назад

please put how to use as python api

@jargolauda2584 3 года назад

IBM Via Voice worked perfectly already in 1998, I wonder what happened to it? With IBM Via Voice you could speak and the text was fed into text editor.

@FedericoTerzi 3 года назад

There are a ton of great (commercial) speech to text products out there. The biggest selling point of DeepSpeech, even though it doesn't perform as well as commercial alternatives, is that it's opensource and free to use, which opens up a ton of possibilities by itself! Unfortunately, the future for DeepSpeech is uncertain at the moment, as Mozilla is cutting all non-essential projects...

@tamgaming9861 3 года назад

@@FedericoTerzi can you make a tutorial for python3.8 or higher? I cant downgrate python and the higher deepspeech versions have different filetypes now. would be awesome if you can show also how to train your own model. I would love to do it in ubuntu, because its also free.

@murtazahussain8224 3 года назад

@@FedericoTerzi fed can u help me with my project .. willing to pay or hire any developer if u can help

@maxge8504 3 года назад

Interesting topic. I expected to use it in my project but when I tested, it doesn't regognize my voice as good as your :( It catches 50% of my words, and most of the time, it writes a wrong one :( But thank you anyway!

@FedericoTerzi 3 года назад

Yeah, I've experienced the same problem myself. The model is not comparable with cloud-based solutions as of now, especially for non-native speakers like me :)

@drin1drin 2 года назад

How can I implement an Italian Recognizer?

@FedericoTerzi 2 года назад

You might prefer Vosk with an italian model for that :) alphacephei.com/vosk/

@mouradtoumi7296 3 года назад

I have no skills in Python, I'm trying to read from wav file instead of mic and display metadata, I tried -f arg but didn't work :( any help ?

@tamgaming9861 3 года назад

I havent got it to work because i cant install python3.6, my python is already higher. But what i read is that you need a special version of wav-format. I mean to remember it was 8 bit, and mono and 16khz but not sure. MP3 does not work so far. There are some softwares who can translate from mp3 to wav online. Hope it helps.