transcription and speaker identification OpenAI-Whisper and Pyannote [Python]

Mastering Python

Подписаться 454

Просмотров 17 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

25 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 38

@chungrandy780 8 месяцев назад

Is there a colab version?

@userrjlyj5760g 11 месяцев назад

ما شاء الله تبارك أخ محمد .... شكراً لك

@hrishikeshnamboothiri.v.n2195 Год назад

try to include its requirements.txt also... Thanks

@Yacine_zaki_abderrazzak Год назад

Thanks man, you deserve the best

@lawrencemedina5593 Год назад

conda activate open_chatting does not work on my computer. "EnvironmentNameNotFound: Could not find conda environment: open_chatting You can list all discoverable environments with `conda info --envs`."

@masteringpython Год назад

install conda toolkit then create an environment called open_chatting by typing : conda create --name open_chatting after that install the libraries that i mentioned in the video then run the code

@bhuvneshsaini93 3 месяца назад

Please provide requirement.txt, else its really very hard to make it workable.

@bootneck2222 Год назад

Great video. Thank you. Can the output be displayed on screen whilst it is processing?

@Hirotodoroki Год назад

trying to run this but getting File contains data in an unknown format. tried several files and tried a wav file too, but no luck

@masteringpython Год назад

I advise you to use python anaconda to create development environment .Then install whisper openai ,after installing this library run a simple test to check if everything works correctly .Then install pyannote library and also run a simple test ( read carefully the installation guides maybe you missed something while installing the library)

@nadeembaig5943 5 месяцев назад

@Hirotodoroki were you able to resolve the error (File Contains data in Unknown Format)?

@sakibzaman7719 Месяц назад

is it working on any other language?

@ryanschwartz3340 Год назад

nice video. Is the repo hard-coded to your directory structure? when I tried to change it, it said the format wasn't recognized

@masteringpython Год назад

do you mean segment file ?

@ApparaoMulpuri-d6m 11 месяцев назад

Hi, Thanks for the Video. Need approach on how we can implement the solution with the large Audio with duration of 3 hours.

@KamilKaczmarekSolutions 11 месяцев назад

chunks

@KamilKaczmarekSolutions 11 месяцев назад

chunks and saving .txt from these chunks in files, add logic to see what chunks it already has (if you face error or sth, and you want to come back and don't have to start over, just continue where it left off)

@ThePikkutyyppi Год назад

can i use this program to split speakers to their own files? or is this only for transcription

@masteringpython Год назад

read more about pyannote to see how to split speakers

@ThePikkutyyppi Год назад

@@masteringpython What? Where?

@leoncezammit2502 11 месяцев назад

Im really struggling to get this working, would i be able you to send you my output log ?

@EhsanEslahchi Год назад

does this model work on languages other than English?

@masteringpython Год назад

onely english

@PaweDuzy 9 месяцев назад

@@masteringpython Only english? What is I change model = whisper.load_model("small.en") to "small"? Acording to Whisper github documentation.

@JasminePlows-r4y Год назад

Thanks for the demo. I am getting the following error, even while using your audio.mp3 file: end = int(millisec(j[3])) return (int)((int(spl[0]) * 60 * 60 + int(spl[1]) * 60 + float(spl[2])) * 1000) ValueError: invalid literal for int() with base 10: ''

@JasminePlows-r4y Год назад

@mamido mami Yes, I did that, still getting the same error

@auflute Год назад

same problem

@lunarl-l1k Год назад

same problem

@jbatista2008 Год назад

From the error message and the code, it seems that the error is happening because the millisec function is trying to convert an empty string to an integer. The millisec function splits a time string, given in the format "hh:mm:ss.sss", into hours, minutes, and seconds, and then converts these components to milliseconds. Here is an example of the string being parsed: ['[', '00:00:00.998', '-->', '', '00:00:20.622]', 'G', 'SPEAKER_01'] When this loop runs, it returns an empty 'end' string: for l in range(len(k)): j = k[l].split(" ") start = int(millisec(j[1])) end = int(millisec(j[3])) The array position you want for 'end' is 4, not 3. Plus, it has a ']' symbol, so it must be cleaned up: for l in range(len(k)): j = k[l].split(" ") start = int(millisec(j[1].rstrip(']'))) # remove trailing ']' end = int(millisec(j[4].rstrip(']'))) # remove trailing ']'

@enriqueleonmacias249 Год назад

Wow, the transcript takes like two times the duration of the file to process. I guess that this solution wouldn’t work to monitor hours of call recordings unless you use gpu servers.

@masteringpython Год назад

it is recomended to use cuda ( nvidia gpu ) for speed cpu is very slow