Fine tuning Whisper for Speech Transcription

Подписаться 8 тыс.

Просмотров 12 тыс.

50% 1

Get Life-time access to the ADVANCED Transcription Repo:
- trelis.com/advanced-transcrip...
Video Resources:
- Dataset: huggingface.co/datasets/Treli...
- Slides: docs.google.com/presentation/...
- Simple Whisper Transcription Notebook: colab.research.google.com/dri...
- Basic fine-tuning notebook: colab.research.google.com/git...
- PEFT Example: colab.research.google.com/dri...
Other links:
➡️ Trelis Resources and Support: Trelis.com/About
Chapters
0:00 Fine-tuning speech-to-text models
0:17 Video Overview
1:39 How to transcribe RU-vid videos with Whisper
7:39 How do transcription models work?
20:08 Fine-tuning Whisper with LoRA
43:32 Performance evaluation of fine-tuned Whisper
48:32 Final Tips

Наука

Опубликовано:

8 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 56

@SiD-hq2fo 4 месяца назад

I cant thanks enough for the quality content you are providing please continue to upload such video!!

@gautammandewalker8935 2 месяца назад

Great video! You are one of the best teachers I have ever heard.

@AbdennacerAyeb 4 месяца назад

easy, simple, well organized. Thank you

@miblish5168 4 месяца назад

This video really saved my @$$. I had Whisper&CoLab running a few moinths ago, but it broke. Your video and notebooks showed me why, and taught me several new tricks! Keep it up please.

@scifithoughts3611 2 месяца назад

@Trellis have you considered instead of fine tuning, use an LLM to correct the spelling of Whisper output? (Prompt it to fix “my strell” to “mistrell”, etc.)

@scifithoughts3611 2 месяца назад

Or another alternative is to prompt Whisper with the context and correct spelling of its common transcript mistakes?

@anasdavoodtk3160 4 месяца назад

Great explanation. the drum story!. good work

@heski6847 4 месяца назад

great, thx! I needed it.

@master2054 4 месяца назад

good job!!

@dachuandu6539 15 дней назад

best explanation ever

@onursarikaya1385 2 месяца назад

Thank you! It's a great investment :)

@TrelisResearch 2 месяца назад

you're welcome

@m_tron99 2 месяца назад

Great video. Can you do one on using WhisperX for diarisation and timestamping?

@seancarmody1506 23 дня назад

Loved the video, I'm wondering if it's possible to do something similar using a vision model. Say for example a resnet trained for a certain task. Do you think it would be possible to train an adapter to allow the llm to understand the resnet features? I watched your Llava training video but the concept seemed a little different than I expected

@TrelisResearch 23 дня назад

I suppose the original resnet didn't include attention, so probably that would be a disadvantage to transformers used now. But yes, in principle you could attach a resnet to the inputs of an llm - but I think it would be done something like in my llava / idefics video.

@user-kr2ec9sd8u 3 месяца назад

This video was very instructive, thanks! For my case, I need a model that recognize items on a list, it consists mainly of medical vocabullary, so a simple whisper model does not get them. Regarding the terms and their pronunciation I will record them in a later moment, but are they inserted in the "DatasetDict()" part of the code instead of Hugging Face's "common_voice"? Also, how is the taught model saved and used in a new project? Untill now I've only used a simple model = whisper.load_model("small") code line in my projects

@TrelisResearch 3 месяца назад

Your training data will need to be prepared and included into the huggingface dataset (like the new dataset I created). To re-use the model, it's easiest to push it to huggingface hub as I do here, and then you can load it back down by using the same loading code I used for the base model. Technically I think it's possible to convert back to the openai format as well and then load it using a code snippet like you did. See here: github.com/openai/whisper/discussions/830#discussioncomment-4652413

@RustemShaimagambetov 4 месяца назад

Great video! How much data(rows) do we need to train to get acceptable results? Is it enough 5-6 rows ??

@TrelisResearch 4 месяца назад

Yes, even 5-6 can be enough to add knowledge of a few new words. I only had 6 rows. Probably 12 or 18 would have been better here.

@LinkSF1 4 месяца назад

Do you know if there’s a way to downsample the frequencies? Eg if I have a 24khz sample I want to downsample to 16khz, what would be the preferred way of doing this?

@TrelisResearch 4 месяца назад

Howdy! Actually you can check in this vid there's a part towards the middle where I show how to downsample

@PierreDELOM 4 месяца назад

Very instructive videos. Next one with Diarization ?

@TrelisResearch 4 месяца назад

interesting idea, I'll add to my notes

@user-yu8sp2np2x 4 месяца назад

Recently I faced a situation where I fine-tuned a model on a training set and it returns good results from the training set example or validation set examples but when I give an input which he has never seen then it tends to produce contextually irrelevant results. Could you suggest what one should do in such a case? One thing that we can do is to make our training dataset more extensive but other than else can we so something else?

@TrelisResearch 4 месяца назад

create a separate validation set using data that is not from your training or validation set (could just be wikitext) and measure the validation loss on that during training. If it is rising quickly, then you are overtraining and need to train for less epochs and/or lower learning rate

@jetpro 3 месяца назад

Do you know how to export it to ONNX and correctly use it in deployment? Helpful video!

@TrelisResearch 2 месяца назад

I haven't dug into that angle for ONNX but here's the guide for getting back from huggingface to whisper and probably you can go from there? github.com/openai/whisper/discussions/830#discussioncomment-4652413

@mrsilver8151 2 дня назад

thanks for your great work, is there any way to convert the finetuned model to run with faster whisper, or there is another way to fine tune for faster whisper ?

@TrelisResearch День назад

yup - see here: github.com/SYSTRAN/faster-whisper/issues/248 opennmt.net/CTranslate2/python/ctranslate2.converters.TransformersConverter.html If you try it, could you let me know if that really gives a 4x speed up on a GPU?

@javadasoodeh 9 дней назад

Thank you for your explanation. Imagine I’m gonna start training whisper on a low resource language. I don’t have the whole entire dataset at first to feed for training. If I do the same as you do, meaning correct the transcription and pair it with the voice, and finally give it to model for fine tuning. If I do these several times, the model doesn’t forget the pervious learning or wouldn’t overfit? By and large, I would like to create a pipeline to collect pair of voices with their manual transcriptions, and then fine tune the model each time. Could you guide me what do I need to do this in this way?

@TrelisResearch 7 дней назад

to avoid forgetting and overfitting you should blend in about 5% of original/english type voice data into your new dataset.

@Rems766 2 месяца назад

I've trouble fine tuning the large-v3 model. When I am evalutating, the compute_metrics function do not call properly the tokenizer method and it do not work. Any idea why?

@TrelisResearch 2 месяца назад

hmm that's odd, I haven't trained the large myself, I assume you tried posting on the github repo? any joy there, feel free to share the link if you create an issue

@tariqyahia9039 3 месяца назад

Question, does the training file have to be in vtt format? or can it be in .txt?

@TrelisResearch 3 месяца назад

has to have time stamps, so vtt (or srt and you can convert to vtt).

@user-xd1ic9qk8d Месяц назад

good job!! but I'm not finding the checkpoints folders

@TrelisResearch Месяц назад

They'll be generated when you run through the training . Also, you need to set save_dir output_dir to somewhere you want the files to be.

@imranullah3097 4 месяца назад

For low resource language how to train tokenizer and add and then fine tune whisper.?

@TrelisResearch 4 месяца назад

oooh, yeah low resource is going to be tough. Probably the approach depends on language and whether it has close languages. Ideally you want to start with a tokenizer and fine-tuned model for a close language. If you do need to train a tokenizer, you can check this vid out here: huggingface.co/learn/nlp-course/chapter6/2?fw=pt

@imranullah3097 4 месяца назад

Kindly make a video on the following. Hifi-gan with transformer Multi model (text+image)

@TrelisResearch 4 месяца назад

thanks, I'll add to my list. I was already planning on multi-modal some time. will take me a bit of time before getting to it

@AndrewBawitlung 3 месяца назад

What to do when my language is not in the whisper tokenizer?

@TrelisResearch 3 месяца назад

Probably imperfect, but maybe you could choose the closest language and then fine-tune from there.

@simonsu-yz9vo 3 месяца назад

is it possible to fine tuning for speech translation?

@TrelisResearch 3 месяца назад

yes, you just need to format the Q&A for that.

@AndrewBawitlung 4 месяца назад

can u compare it with XLS-R?

@TrelisResearch 4 месяца назад

thanks for the tip, will be a while before I get back to speech but I have noted that as a topic

@_loong9906 14 дней назад

Great video! But, in my checkpoints, there's no 'added_tokens.json' or 'config.json' and so on. what's happening to me. what did i miss?

@TrelisResearch 12 дней назад

You mean you are running training but not finding those files in your saved checkpoints? whereas you see them in my video when I do the same?

@sumitjana7794 3 месяца назад

I have transcripted text in .srt format, can I train with it??

@TrelisResearch 3 месяца назад

Yes! And for this script you can just convert srt to vtt losslessly using an online tool.

@sumitjana7794 3 месяца назад

thanks a lot @@TrelisResearch

@ivor1113 10 дней назад

Can you share the code in ADVANCED-transcription?

@TrelisResearch 10 дней назад

Howdy! Yes, the code is in the ADVANCED-transcription repo, which you can buy lifetime access to (incl. future updates). If you buy and something is missing, you can create an issue in the repo.