Multi Speaker Transcription with Speaker IDs with Local Whisper

Prompt Engineering

Подписаться 177 тыс.

Просмотров 37 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

25 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 59

@BenoitStPierre 11 месяцев назад

This is unbelievably good timing for me! I just started researching how to do this with pyannote and whisper last night and gave up before starting to integrate the two, and woke up to this in my subscriptions 😅

@engineerprompt 11 месяцев назад

glad it was helpful :)

@station2040 10 месяцев назад

Me too! I was building this from scratch using pyannote. I was considering using whisper, but was still sorting out the diarization aspect of it. I was planning on Diarization first, sorting out the gaps, etc. This may save me a considerable amount of time.

@AngusLou 5 месяцев назад

It is amazing, even better if you can do a near real-time speaker diarization and speech-to-text

@AbrahamKoshy 3 месяца назад

After identifying two speakers, I want to completely cut out speaker 2 and create an audio file with only speaker 1 segments joined together. Is this possible?

@MrJames-d5x 9 месяцев назад

@PromptEngineering This was working really well but it appears the latest WhisperX update broke the colab notebook. Could you please update it? Thanks

@adhirajsingh483 11 месяцев назад

i would love if you could do a realtime audio transcription using whisper which will exactly output speaker, start and end time and the transcription at that point?

@arthur3183 5 месяцев назад

That's great, thanks! Could you please explain how you turn the results into a readable txt file, as in the original whisper transcription?

@serroipjlan-lp4gv 9 месяцев назад

How do you get the .txt .srt .json .tsv .vtt files ? The model works but cant find these files after running the model. Thanks in advance.

@grantsolomon7417 10 месяцев назад

Is there a way to have live, real time transcription with diarization?

@Tournicoton 5 месяцев назад

Also interested!

@jamesonvparker 27 дней назад

I would also like to know the answer to this

@MuhammadAdnan-tq3fx 20 дней назад

Can we use own fine-tuned model inside wisperx

@jorgemorales5584 Месяц назад

How can I download the transcript in a .txt file? Thanks.

@bryguy7290 2 месяца назад

@engineerprompt I noticed you had AI instructions in the video for "separate speakers" - Would you be able to create a video showing how you got Mac Whisper to to do this, and what are the export results. Thank you.

@aa-xn5hc 11 месяцев назад

Super useful, thanks!!!🙏🏻🙏🏻

@engineerprompt 11 месяцев назад

Thank you

@SmogyKev Месяц назад

What if i want to use large-v3?, did i just change on the "model = whisperx.load_model("large-v2", device, compute_type=compute_type)" to model = whisperx.load_model("large-v3", device, compute_type=compute_type)

@MartinRadio2 9 месяцев назад

will the speaker ID Diarization work for multiple audio files with the same speakers? Like If I have multiple podcast episodes, will it always recognize the same speaker as the same ID ?

@akashsahu9787 4 месяца назад

No, the speaker id is local to a single file only. It will not be able to map two speaker id of two diiferent audio files

@pragati_agrawal 7 месяцев назад

Hi... Can you please tell me how to transcribe and then translate and the recognize multi speaker

@jasonhoneychurch1 4 месяца назад

I completed all the steps and it works but am I missing something isn't it purpose to transfer into a .txt file? so I can read it!

@amortalbeing 10 месяцев назад

thanks a lot this was really good.

@harrisonford4103 2 месяца назад

Can you detect how many speakers are talking at the same time?

@KunalMehta-u3s 2 месяца назад

can we also label the speaker with their name instead of saying speaker 00 and speaker 01?

@10dollarbanana 10 месяцев назад

You’re the best

@oumaimamouelhi5161 6 месяцев назад

why using whisperx is faster then directly using pyannote ?

@Nihilvs 11 месяцев назад

very very nice ! i like !

@thambidurai-qc7cx 5 месяцев назад

How do I give custom instructions for my whisper ai model in order to fine-tuning

@Hasi105 11 месяцев назад

Is there a realtime audio transcription possible?

@engineerprompt 11 месяцев назад

Yes, I have done it via the API, will make a tutorial on it if there is interest.

@Hasi105 11 месяцев назад

@@engineerprompt Would be nice to get this running localy for assists like memgpt, chat or RP models

@henkhbit5748 11 месяцев назад

Thanks for the video. Would like to see meta seamless4mt for speech to text and the reverse. And it's support more than 1000 languages also... Already use whisper locally for speech 2 text.

@opusmas7909 11 месяцев назад

This is so good, I whish that there was a keyboard that uses Whisper locally in the mobile. There is one, but not multilingual.

@ilianos 11 месяцев назад

That's exactly what I was thinking! 😀 SwiftKey belongs to Microsoft, but I guess if they integrated Whisper into it, there'd be a huge spike in computational ressources that are needed for such a new feature roll out...

@ilianos 11 месяцев назад

What's the name of the one you found (not multilingual)?

@opusmas7909 7 месяцев назад

Sorry, I saw the message just now, the name is OpenAi Whisper Keyboard by Kai Soapbox. But now I use FUTO voice input

@toastrecon 10 месяцев назад

Do we know how long it usually takes to get access to the diarization? I submitted my company name and website, but the API calls still aren't working after about 30min. Are those manually approved by the research team?

@ruffy9937 9 месяцев назад

try making a new token , that seemed to have worked for me

@frazuppi4897 11 месяцев назад

Thanks a lot!

@Rollex-rr2xq 7 месяцев назад

Can we save the model by running it on colab. Then download the saved model on my cpu machine to do transcribe. Is it possible??

@engineerprompt 7 месяцев назад

Yes, I think its possible but you will need a GPU to run it.

@slofo22 11 месяцев назад

@Prompt Engineering And how does the transcript export to me. I don't see that in the video you show the export of the transcript to: json, txt,srt,vtt

@engineerprompt 11 месяцев назад

The result is Json, you will just need to write that to disk

@slofo22 11 месяцев назад

@@engineerprompt I'm a newbie, how do I do this?

@slofo22 5 месяцев назад

@@iluminathy3210 yes

@shikharmishra7208 8 месяцев назад

AssertionError Traceback (most recent call last) in () 1 embeddings = np.zeros(shape=(len(segments), 192)) 2 for i, segment in enumerate(segments): ----> 3 embeddings[i] = segment_embedding(segment) 4 5 embeddings = np.nan_to_num(embeddings) Hey, I am getting assertion error, please let me know how to solve this error, Thanks

@TirtharajBiswas 11 месяцев назад

What do you use for adding subs to your youtube videos?

@engineerprompt 11 месяцев назад

Descript.com

@carmenlanders6663 7 месяцев назад

I really wish you had shown more end results of the diarization. I can barely tell if this will work for me. I really wanted to make sure it was worth the time and energy to make this happen.