No video :(

DeepSpeech | Speech to Text | Common Voice | Donate Your Voice

Подписаться 855

Просмотров 14 тыс.

50% 1

In this video I talk about Mozilla's DeepSpeech and Common Voice
github.com/moz...
Donate Your Voice or Verify Others commonvoice.mo...
Dont like google/youtube? Get my videos sooner on LBRY
lbry.tv/@tuxfoo:e
Follow me on mastodon social.librem....
Support me on Liberapay liberapay.com/...

Опубликовано:

26 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 27

@marly1017 3 года назад

That's a great video can you do a video about implementing it into your project

@liamblu 3 года назад

So do I understand this correctly: there is no actual application for Windows and Linus which uses deepspeech and is ready to translate my speech from mic into text while writing a text file? All of this is still in development?

@tuxfoo8224 3 года назад

Pretty much; Deepspeech itself is just a engine with a API that other applications can use. There are real world applications making use of it for things like transcribing voicemails and applications for foreign languages such as Te Reo Maori. I am not aware of any applications using deepspeech that transcribe speech into a text file but such a program should not be too hard to create if you know a programming language such as python. I think once the dataset gets large enough to fine tune without to much bias, we will see more applications making use of it.

@dingamcamille2666 3 года назад

Hello can i train deep speech on my national language? And any language in the world can be used as dataset? thank you

@tuxfoo8224 3 года назад

Yes you can, There are already dataset's for 60 languages on the common voice project. commonvoice.mozilla.org/en/datasets Adding a new one require's a lot of training data, there was a project here in NZ where it was used to create an application for Te Reo Maori. Pretty hard to create a dataset by yourself, if your language does not exist then asking them to add it and getting speakers to help train it might be the best option.

@wtfvids3472 4 года назад

So basically... use it for short commands? Or does it get better. Sorry i am a lazy man who didnt watch the whole video

@kafkaMt 3 года назад

I think I would do spanish as my L1, and English as is my L2, but I am not sure if reading with a foreign accent will be any good contribution. ED** seems like is one of the main points xD

@tuxfoo8224 3 года назад

Foreign accents provide useful training data to prevent bias. Spanish is also needed as there is only 290 validated hours in the Spanish dataset.

@what8586 3 года назад

where can i get voices like the one in this video?

@tuxfoo8224 3 года назад

The dataset can be downloaded from here commonvoice.mozilla.org/en/datasets

@kafkaMt 3 года назад

btw does anyone knows if there is a problem opening the common voice platform from the android version of firefox? cause the page doesn't load from a mobile device.

@tuxfoo8224 3 года назад

Loads for me in firefox on android. They changed the URL so that might be the issue commonvoice.mozilla.org/en

@kafkaMt 3 года назад

@@tuxfoo8224 yeah, well on my desktop works fine, maybe I just need to update my ff version in the mobile app because it did load fine in the chrome browser app though.

@monartalmada8962 3 года назад

there is now an android app called CV Android you can use

@DeeDeeCHAUNCEY 4 года назад

is there a way to install on mac? I need it to use with this software autoedit.gitbook.io/autoedit-3-user-manual/speech-to-text

@ashishkuwar6917 4 года назад

how to deal with audio file above 16khz??

@saikrishnayallapu7185 4 года назад

I guess if its VAD streaming, resampling has to be done...

@murtazahussain8224 3 года назад

is deepspeech compatible with nvidia Rtx 3090?

@tuxfoo8224 3 года назад

Should be if there is a version of CUDA that is compatible with the GFX driver you are using. CUDA is only really required if you plan on training your own models.

@russianfool 3 года назад

Yes, it is. I'm running tensorflow/torch-based stuff on the new 30xx cards and it all works great.

@emanuelzamorano5392 9 месяцев назад

Are you training the model? @@russianfool

@ytfp 4 года назад

This is promising but there are a few glaring defects in this approach. First I think the voice recognition is fairly good, but its pattern matching is horrible as it does not even attempt to match output to an exsisting real life word. To me this is the most glaring problem. There are very rare instances where you might want a word not in the English dictionary but these are very rare and could be switched off. Words like proofsless, almuch, aveenes, vordwardine, watol etc. should not being returned. Although it shows the user what is actually being heard, it is counterproductive and might influence the narrator to alter his or her speech to accommodate, thus defeating the purpose. This model is obviously dumbed down on purpose and gives a misleading accuracy rating by not pattern matching its words. Still something not quite right still requiring input when deepspeech has been available so long and computers can feed deepspeech voice data at a much higher rate than humans. Again I could see if they were asking for contributions for transcription, but not voice data, there is plenty of that around. I see voice data as relevant only if it is for your own personal recognition. Secondly the privacy issue of uploading your voice is really an intrusion into one's privacy and should be a choice. There already exists millions of hours of audio sources from tv, movies, interviews, commentary out there. It is quite suspect to want entrely new data. Asking people to transcribe already available sources would be much less intrusive on your privacy and provide a wider range of ambiant environments that this voice was recorded in. as an example: Sean Connery still has his same voice give or take but his voice has a whole different acoustic in a movie than it does a time he may have given a speech in an auditorium. Although individual contributions are more consistant, this is also its failing, especially since no audio standard has been demanded of contributors (based on some of the horrible noise in some of those recordings).

@tuxfoo8224 4 года назад

I agree that pattern matching would help improve accuracy, hopefully it gets implemented. There are use cases when pattern matching could get in the way, so it should be a optional flag in the api. Contributions to the common speech data set is completely voluntary and what people say is predetermined and you can skip anything that you do not want to say. I think that varying audio standards is kind of the point, to train a voice model that will work for most people it needs data of varying standards, accents, genders and ages. While there might be a lot of data sets around, most of them are not free and some of the ones that are have already been used to train the model. Asking people to manually transcribe audio is a really big ask; especially considering using machine learning to train a decent voice model can require 10000-100000 hours of recordings.

@monartalmada8962 3 года назад

@@tuxfoo8224 not only that but there are still the copyright issues, technically, you cant just go and download content from the internet if you dont own it or the rights to do so

@tuxfoo8224 3 года назад

@@monartalmada8962 That's what I meant by "free"(Freedom respecting) not gratis :) Unfortunately most content cannot be used for any purpose.