Тёмный

Multi Speaker Transcription with Speaker IDs with Local Whisper 

Prompt Engineering
Подписаться 177 тыс.
Просмотров 37 тыс.
50% 1

Опубликовано:

 

25 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 59   
@BenoitStPierre
@BenoitStPierre 11 месяцев назад
This is unbelievably good timing for me! I just started researching how to do this with pyannote and whisper last night and gave up before starting to integrate the two, and woke up to this in my subscriptions 😅
@engineerprompt
@engineerprompt 11 месяцев назад
glad it was helpful :)
@station2040
@station2040 10 месяцев назад
Me too! I was building this from scratch using pyannote. I was considering using whisper, but was still sorting out the diarization aspect of it. I was planning on Diarization first, sorting out the gaps, etc. This may save me a considerable amount of time.
@AngusLou
@AngusLou 5 месяцев назад
It is amazing, even better if you can do a near real-time speaker diarization and speech-to-text
@AbrahamKoshy
@AbrahamKoshy 3 месяца назад
After identifying two speakers, I want to completely cut out speaker 2 and create an audio file with only speaker 1 segments joined together. Is this possible?
@MrJames-d5x
@MrJames-d5x 9 месяцев назад
@PromptEngineering This was working really well but it appears the latest WhisperX update broke the colab notebook. Could you please update it? Thanks
@adhirajsingh483
@adhirajsingh483 11 месяцев назад
i would love if you could do a realtime audio transcription using whisper which will exactly output speaker, start and end time and the transcription at that point?
@arthur3183
@arthur3183 5 месяцев назад
That's great, thanks! Could you please explain how you turn the results into a readable txt file, as in the original whisper transcription?
@serroipjlan-lp4gv
@serroipjlan-lp4gv 9 месяцев назад
How do you get the .txt .srt .json .tsv .vtt files ? The model works but cant find these files after running the model. Thanks in advance.
@grantsolomon7417
@grantsolomon7417 10 месяцев назад
Is there a way to have live, real time transcription with diarization?
@Tournicoton
@Tournicoton 5 месяцев назад
Also interested!
@jamesonvparker
@jamesonvparker 27 дней назад
I would also like to know the answer to this
@MuhammadAdnan-tq3fx
@MuhammadAdnan-tq3fx 20 дней назад
Can we use own fine-tuned model inside wisperx
@jorgemorales5584
@jorgemorales5584 Месяц назад
How can I download the transcript in a .txt file? Thanks.
@bryguy7290
@bryguy7290 2 месяца назад
@engineerprompt I noticed you had AI instructions in the video for "separate speakers" - Would you be able to create a video showing how you got Mac Whisper to to do this, and what are the export results. Thank you.
@aa-xn5hc
@aa-xn5hc 11 месяцев назад
Super useful, thanks!!!🙏🏻🙏🏻
@engineerprompt
@engineerprompt 11 месяцев назад
Thank you
@SmogyKev
@SmogyKev Месяц назад
What if i want to use large-v3?, did i just change on the "model = whisperx.load_model("large-v2", device, compute_type=compute_type)" to model = whisperx.load_model("large-v3", device, compute_type=compute_type)
@MartinRadio2
@MartinRadio2 9 месяцев назад
will the speaker ID Diarization work for multiple audio files with the same speakers? Like If I have multiple podcast episodes, will it always recognize the same speaker as the same ID ?
@akashsahu9787
@akashsahu9787 4 месяца назад
No, the speaker id is local to a single file only. It will not be able to map two speaker id of two diiferent audio files
@pragati_agrawal
@pragati_agrawal 7 месяцев назад
Hi... Can you please tell me how to transcribe and then translate and the recognize multi speaker
@jasonhoneychurch1
@jasonhoneychurch1 4 месяца назад
I completed all the steps and it works but am I missing something isn't it purpose to transfer into a .txt file? so I can read it!
@amortalbeing
@amortalbeing 10 месяцев назад
thanks a lot this was really good.
@harrisonford4103
@harrisonford4103 2 месяца назад
Can you detect how many speakers are talking at the same time?
@KunalMehta-u3s
@KunalMehta-u3s 2 месяца назад
can we also label the speaker with their name instead of saying speaker 00 and speaker 01?
@10dollarbanana
@10dollarbanana 10 месяцев назад
You’re the best
@oumaimamouelhi5161
@oumaimamouelhi5161 6 месяцев назад
why using whisperx is faster then directly using pyannote ?
@Nihilvs
@Nihilvs 11 месяцев назад
very very nice ! i like !
@thambidurai-qc7cx
@thambidurai-qc7cx 5 месяцев назад
How do I give custom instructions for my whisper ai model in order to fine-tuning
@Hasi105
@Hasi105 11 месяцев назад
Is there a realtime audio transcription possible?
@engineerprompt
@engineerprompt 11 месяцев назад
Yes, I have done it via the API, will make a tutorial on it if there is interest.
@Hasi105
@Hasi105 11 месяцев назад
@@engineerprompt Would be nice to get this running localy for assists like memgpt, chat or RP models
@henkhbit5748
@henkhbit5748 11 месяцев назад
Thanks for the video. Would like to see meta seamless4mt for speech to text and the reverse. And it's support more than 1000 languages also... Already use whisper locally for speech 2 text.
@opusmas7909
@opusmas7909 11 месяцев назад
This is so good, I whish that there was a keyboard that uses Whisper locally in the mobile. There is one, but not multilingual.
@ilianos
@ilianos 11 месяцев назад
That's exactly what I was thinking! 😀 SwiftKey belongs to Microsoft, but I guess if they integrated Whisper into it, there'd be a huge spike in computational ressources that are needed for such a new feature roll out...
@ilianos
@ilianos 11 месяцев назад
What's the name of the one you found (not multilingual)?
@opusmas7909
@opusmas7909 7 месяцев назад
Sorry, I saw the message just now, the name is OpenAi Whisper Keyboard by Kai Soapbox. But now I use FUTO voice input
@toastrecon
@toastrecon 10 месяцев назад
Do we know how long it usually takes to get access to the diarization? I submitted my company name and website, but the API calls still aren't working after about 30min. Are those manually approved by the research team?
@ruffy9937
@ruffy9937 9 месяцев назад
try making a new token , that seemed to have worked for me
@frazuppi4897
@frazuppi4897 11 месяцев назад
Thanks a lot!
@Rollex-rr2xq
@Rollex-rr2xq 7 месяцев назад
Can we save the model by running it on colab. Then download the saved model on my cpu machine to do transcribe. Is it possible??
@engineerprompt
@engineerprompt 7 месяцев назад
Yes, I think its possible but you will need a GPU to run it.
@slofo22
@slofo22 11 месяцев назад
@Prompt Engineering And how does the transcript export to me. I don't see that in the video you show the export of the transcript to: json, txt,srt,vtt
@engineerprompt
@engineerprompt 11 месяцев назад
The result is Json, you will just need to write that to disk
@slofo22
@slofo22 11 месяцев назад
@@engineerprompt I'm a newbie, how do I do this?
@slofo22
@slofo22 5 месяцев назад
@@iluminathy3210 yes
@shikharmishra7208
@shikharmishra7208 8 месяцев назад
AssertionError Traceback (most recent call last) in () 1 embeddings = np.zeros(shape=(len(segments), 192)) 2 for i, segment in enumerate(segments): ----> 3 embeddings[i] = segment_embedding(segment) 4 5 embeddings = np.nan_to_num(embeddings) Hey, I am getting assertion error, please let me know how to solve this error, Thanks
@TirtharajBiswas
@TirtharajBiswas 11 месяцев назад
What do you use for adding subs to your youtube videos?
@engineerprompt
@engineerprompt 11 месяцев назад
Descript.com
@carmenlanders6663
@carmenlanders6663 7 месяцев назад
I really wish you had shown more end results of the diarization. I can barely tell if this will work for me. I really wanted to make sure it was worth the time and energy to make this happen.
@amin816
@amin816 11 месяцев назад
hi does speaker diarisation work with other languages
@spider279
@spider279 10 месяцев назад
it works for all langages embedded in whisper normally
@Webwerp-hs3gp
@Webwerp-hs3gp 5 месяцев назад
discord link is expired OP
@J-io1uj
@J-io1uj 8 месяцев назад
>says local whisper >shows only google colab
@themodfather9382
@themodfather9382 9 месяцев назад
waste of time, need standalone app
@picklenickil
@picklenickil 2 месяца назад
Care to venture in on how would you build something like this at scale?
@Slokingseba
@Slokingseba 9 месяцев назад
whisperx.load_align_model returned that: No default align-model for language: sl Does this only work for english? :)
Далее
OpenAI Voice Engine - Realistic Voice Cloning
8:55
Просмотров 10 тыс.
Can Whisper be used for real-time streaming ASR?
8:41
Физика пасты Карбонара 🧪🔬
00:57
МЭЙБИ БЭЙБИ - Hit Em Up (DISS)
02:48
Просмотров 287 тыс.
Fine tuning Whisper for Speech Transcription
49:26
Просмотров 24 тыс.
Speaker diarization -- Herve Bredin -- JSALT 2023
1:18:56
How the Cybertruck might KILL Tesla
27:53
Просмотров 186 тыс.
The Unreasonable Effectiveness Of Plain Text
14:37
Просмотров 610 тыс.
Run your own AI (but private)
22:13
Просмотров 1,6 млн
Have You Picked the Wrong AI Agent Framework?
13:10
Просмотров 74 тыс.
Физика пасты Карбонара 🧪🔬
00:57