This is unbelievably good timing for me! I just started researching how to do this with pyannote and whisper last night and gave up before starting to integrate the two, and woke up to this in my subscriptions 😅
Me too! I was building this from scratch using pyannote. I was considering using whisper, but was still sorting out the diarization aspect of it. I was planning on Diarization first, sorting out the gaps, etc. This may save me a considerable amount of time.
After identifying two speakers, I want to completely cut out speaker 2 and create an audio file with only speaker 1 segments joined together. Is this possible?
i would love if you could do a realtime audio transcription using whisper which will exactly output speaker, start and end time and the transcription at that point?
@engineerprompt I noticed you had AI instructions in the video for "separate speakers" - Would you be able to create a video showing how you got Mac Whisper to to do this, and what are the export results. Thank you.
What if i want to use large-v3?, did i just change on the "model = whisperx.load_model("large-v2", device, compute_type=compute_type)" to model = whisperx.load_model("large-v3", device, compute_type=compute_type)
will the speaker ID Diarization work for multiple audio files with the same speakers? Like If I have multiple podcast episodes, will it always recognize the same speaker as the same ID ?
Thanks for the video. Would like to see meta seamless4mt for speech to text and the reverse. And it's support more than 1000 languages also... Already use whisper locally for speech 2 text.
That's exactly what I was thinking! 😀 SwiftKey belongs to Microsoft, but I guess if they integrated Whisper into it, there'd be a huge spike in computational ressources that are needed for such a new feature roll out...
Do we know how long it usually takes to get access to the diarization? I submitted my company name and website, but the API calls still aren't working after about 30min. Are those manually approved by the research team?
@Prompt Engineering And how does the transcript export to me. I don't see that in the video you show the export of the transcript to: json, txt,srt,vtt
AssertionError Traceback (most recent call last) in () 1 embeddings = np.zeros(shape=(len(segments), 192)) 2 for i, segment in enumerate(segments): ----> 3 embeddings[i] = segment_embedding(segment) 4 5 embeddings = np.nan_to_num(embeddings) Hey, I am getting assertion error, please let me know how to solve this error, Thanks
I really wish you had shown more end results of the diarization. I can barely tell if this will work for me. I really wanted to make sure it was worth the time and energy to make this happen.