Pytorch Seq2Seq Tutorial for Machine Translation

Aladdin Persson

Подписаться 80 тыс.

Просмотров 81 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 140

@amitg2k 3 года назад

This is great stuff. Explained like a pro. Could you please create videos on similar lines with slight modifications like: 1. How to use custom data-set 2. How to use basic RNN and/or GRU (I tried but ran into multiple issues). These branch offs will be very helpful in overall understanding on how to modify code to address custom problems. Thanks in advance :)

@morancium Год назад

Field and BucketIterator is not working!!

@aryansharma4902 5 месяцев назад

have you found any solution for this. I am facing the same issue right now.

@alexkonopatski429 3 года назад

Hi, I am from Germany and hell yeah I love this Video!

@vansadiakartik 4 года назад

These tutorials are very helpful. Keep up the good work mate.

@AladdinPersson 4 года назад

Appreciate that a lot, thank you!

@qiguosun129 3 года назад

Thanks for this tutorial. I just do a similar project about GNN combined with encoder-decoder architecture, this video helps me a lot.

@willieman2532 Год назад

ty, this helped me in my hw assignment implementing Seq2Seq!

@meylyssa3666 3 года назад

Please can you do some tutorials on new torchtext, there are a lot of new things there!

@tianyiwang7930 4 года назад

It is amazing!! I'm learning NLP and AI and your videos just perfectly solve my problems.

@AladdinPersson 4 года назад

Happy to hear you found them useful, means a lot to me that you take the time to write the kind words:)

@tianyiwang7930 4 года назад

@@AladdinPersson I've watched several more videos, but still need to catch up because I am a new subscriber. I'm still on my way learning, and I have some questions regarding this topic. Please don't be mad if I asked a stupid question : - ). Seq2Seq works for language translation with its advantage of inconsistency of input/output lengths. However, I didn't quite get the part that how a german sentence of length 12 will be translated to english sentence of length 16 (while during the training process, the translated sentence length also varies. It really confused me. I had experience with LSTM on text classification, and in that project the output of my LSTM will always have the same length.) I understand there must be some ways but I couldn't quite get that. Could you please explain more on how this works?

@seanbenhur 3 года назад

@@tianyiwang7930 if I am not mistaken..all the inputs are padded..both sentences!!

@tianyiwang7930 3 года назад

@@seanbenhur That makes sense then, thx.

@mahdidehshiri1832 3 года назад

thanks a lot for making these excellent tutorials, you are best

@injeranamitmita 3 года назад

Great intro for ML! What if I am working with a language that does not yet exist in spacy, can I create my own custom tokenizer?

@vladsirbu9538 3 года назад

Hi, I can't seem to find the video where you go through the paper, could you please point me to it? I love the intro btw :D

@MenTaLLyMenTaL 2 года назад

Will the first output of the model be token or not? In the intro you've shown that there is no in the output sequence. But @39:52 on line 174, you do output[1:] with the intention to skip the token, which is contradictory. Shouldn't loss be comparing the entire output sequence i.e. output[:] with target[1:] ?

@duongocyen 3 года назад

when I run the code I have this error: Traceback (most recent call last): train_data, valid_data, test_data = Multi30k.splits( AttributeError: 'function' object has no attribute 'splits' Process finished with exit code 1 I have search for some solution form the internet but it doesnot work. Could you please take a look at my error? I really appreciate your time.

@nishantpall1747 3 года назад

train_data, validation_data, test_data = Multi30k(split=('train', 'valid', 'test'),language_pair=('de', 'en')) use this

@ankanbasu7381 Год назад

i can't understand the intuition behind making batch_size the index 1 of shape. (sequence_len, batch_size, word_size). pytorch docs say lstm uses this shape until it's mentioned batch_first = True but it seems confusing to me. (batch_size, sentence_len, word_size) seems more intuitive. can anyone explain me the first shape (when batch_first=False)

@SAINIVEDH Год назад

Why have you not used Hidden, Cell states in encoder LSTM ?!. I'm quite confused. Thanks

@subaruhassufferredenough7892 4 года назад

Thanks for this informative video. Could you explain how you were able to remember the shapes of all the matrices correctly.

@AladdinPersson 4 года назад

I had gone through everything for the video multiple times so just remembered it I guess :)

@supervince110 3 года назад

Dude, I learn more from than from uni.

@meseretfetene754 2 года назад

where should I save my datasets on my drive? can the program read wherever it is????

@parthchokhra7298 4 года назад

Hi, I am not able to load german tokeniser. OSError: [E050] Can't find model 'de'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

@AladdinPersson 4 года назад

Did you do: Python - m spacy download "de" And made sure that's in the same environment as where you train your model?

@parthchokhra948 4 года назад

@@AladdinPersson yup this worked for me

@simarpreetsingh019 4 года назад

@@parthchokhra948 hey parth , i got same issue, but while I type this command, it shows linking fail, could you help me with this

@1chimaruGin0_0 4 года назад

@@simarpreetsingh019 I can't use spacy.load('de'). but spacy_ger = spacy.load('de_core_news_sm') This work for me.

@1chimaruGin0_0 4 года назад

(another option) If you are using Windows, 1. Right click on anaconda navigator 2. Run as administrator 3. Open anaconda prompt 4. python -m spacy download de this work for me

@rohankavari8612 2 года назад

Hey, Your code doesnt work for the new pytorch and torchtext version. Please could you port ur code?

@henryli85 11 месяцев назад

use versions torchtext 0.6.0 and torch 1.13.0, just import torchtext, no need for the .legacy part

@harrisonford4103 Год назад

That's really helpful! Do you have the code for translation_sentence function?

@risheshgarg9990 4 года назад

Very helpful content....I was looking for some tutorials on attention and transformer networks and came across your work - you taught me a lot Sir...Keep it up.

@AladdinPersson 4 года назад

Thank you for saying that, appreciate it a lot.

@Scientificommnuty Год назад

what framework does he use for python? Is it visual studio?

@sanju12121 2 года назад

from utils import translate_sentence, bleu, save_checkpoint, load_checkpoint ModuleNotFoundError: No module named 'utils' I am getting an error here

@prathameshjadhav3041 2 года назад

check the github repo for utils class

@snehagodwani2509 3 года назад

Hello I have noticed that in seq2seq class you have given target as input but how we will deal with this in validation time, that time we don't want to pass target data.

@felixmohr8354 Год назад

I noticed the same thing, and I guess you are right. One should probably have `target = None` and then make a case distinction. And in fact I think that there is a mistake in this function, because the prediction might actually be *longer* than the length of the target sequence. In that case, the rest of the prediction is just ignored. This has implications for the loss computation. The predictions should be at least some number of steps T forward after the target sequence length has been matched. But this would also require the training batch to be padded to some higher number of tokens.

@user-or7ji5hv8y 4 года назад

by far, this is the best presentation. in a way, I am struggling to keep track of all the shapes. I guess, there is no easy way other than keep track of all of them. I noticed that the shapes of inp_data and target changes with every batch, yet within the batch, all samples have the same shape? How is that possible? Are they padded within each batch to have the same length?

@AladdinPersson 4 года назад

Yes you're correct, we need to pad each batch to have the same length, and this is from how a tensor is constructed. We can't have variably length of a dimension in a tensor

@youmeifan4770 3 года назад

Thanks for your tutorial helped me a lot! But I got ImportError: cannot import name 'translate_sentence' from 'utils' (/opt/anaconda3/lib/python3.8/site-packages/utils/__init__.py) Do you have any idea I can solve this problem?

@duongocyen 3 года назад

you could find the utils file in his github repo. you download it and copy it to your project folder

@shurah100 2 года назад

Hi @Aladdin Person. I got error statement which NO MODULE NAMED ‘TORCHTEXT’. How can I fix this error? I’m beginner with this PyTorch.

@weneedlittlepatience 2 года назад

You jave to create a virtual environment with conda first and install the packages in there

@Ramm165 4 года назад

Great work thanks for the video :)

@AladdinPersson 4 года назад

Thank you :)

@Ramm165 4 года назад

@@AladdinPersson Hi aladdin in the character level lstm you have unsqueezed the embedding output in the lstm layer but here it is not unsqueezed may i know y ?

@vansadiakartik 4 года назад

Hey, here in seq2seq forward we keep the outputs[0] as zero and the target[0]=vocab.english.stoi[""]. Should we not keep outputs[0,:,vocab.english.stoi[""]=1

@AladdinPersson 4 года назад

We keep the outputs[0] as zero because doing in this way makes the indexing convenient (although you can quite easily change this to start at 0 too) and then we simply ignore this zero element by doing outputs[1:] when we send it to the cross entropy loss. It would be helpful if you could refer to the lines from Github that you feel are wrong/confusing and propose your alternatives, it's a bit difficult to follow now

@khanhaominh6265 3 года назад

thank you for sharing video

@simarpreetsingh019 4 года назад

Is torch's gpu version necessary for running this? And I got stuck at last function, the program got stuck at the epoch function after first iteration. I am running it in jupyter notebook, utils file also included in same directory, I am using torch's CPU version.

@AladdinPersson 4 года назад

No it's not necessary but it might take a long time running on the CPU (maybe that's why you think its stuck?). Where exactly does it get stuck?

@simarpreetsingh019 4 года назад

It stuck after I got an output for epoch 0 /100

@AladdinPersson 4 года назад

@SIMAR PREET SINGH I don't think that means it's stuck, it's probably training and just taking a very long time for you since you're on the CPU. Try printing loss.item() and setting batch size to 1: do you get anything printed? I mean if you're not getting an error it's most likely training.

@simarpreetsingh019 4 года назад

So it doesn't work offline, so I have to switch to Google colab, it worked there fine and got results. Thanks for help. And thankyou for the video lesson

@thecros1076 4 года назад

hey can this be used to make text to speech as text and speech are sequence to sequence ....if not i would love to see a video of how to implement text to speech project using pytorch

@AladdinPersson 4 года назад

CROS comin with them great 💡! I have a few videos that I want to do but will definitely get back to this one in the future 👊

@thecros1076 4 года назад

@@AladdinPersson please do ❤️❤️will surely like to see this one soon❤️❤️

@mrunalnasery9415 Год назад

what version of toechtext are we using?

@henryli85 11 месяцев назад

torchtext 0.6.0 and torch 1.13.0, just import torchtext, no need for the .legacy part

@ashishjohnsonburself 3 года назад

Hi could you post the link to the last video which you have mentioned at 0:35 ?

@AladdinPersson 3 года назад

I made a paper review of the first Seq2Seq paper but in retrospect I do not think it lived up to the standard I want to set so I unlisted it. I'll add it so it's accessible for members if you still want to take a look though

@AladdinPersson 3 года назад

Let me know if it doesn't work since RU-vid told me it's still in beta version, but link is here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Ui-K4RUkC58.html

@ashishjohnsonburself 3 года назад

@@AladdinPersson I watched the video. I do acknowledge that making a paper review is challenging at times, when you have time constraints but you did a decent job.

@Virtualexist Год назад

Ig a lot has changed can’t even use imports correctly now. And language models en, de also don’t load. 🥲🥲🥲🥲🥲

@kaushilkundalia2197 4 года назад

Was struggling to make an implementation of this. So so so happy I found out your tutorial. Thanks a lot for making this. Keep up the great work

@AladdinPersson 4 года назад

I appreciate you saying that, it means a lot and is exactly why I want to continue doing these videos! ❤

@gauravms2799 3 года назад

@@AladdinPersson can you share where you learnt from all this not specific to this video but all this stuff your top resources

@dogkansarac4889 3 года назад

loved your tutorial. i have a question though. when we implementing the encoder, you said: "x shape" as (seq_len, N). shouldn't it be as (input_size, seq_len, N), where input_size is the vocabulary? because we one-hot encode each word in the first place

@AshokSharma-ec9so Год назад

I got the following errors while implementing: 1. ImportError: cannot import name translate_sentence, save_checkpoint, load_checkpoint from 'utils' 2. AttributeError: module 'torchtext.nn' has no attribute 'Module' Has someone else also been going through these errors ? Can anyone please suggest the resolution for these issues ?

@Aswin255 4 года назад

Holy shit, thank you good sir. You are a lifesaver.

@MuhammadFhadli 3 года назад

When you called build_vocab method for german and english? How can pytorch now if you are trying to build vocab for the specific language? Because you just pass train_data at once. Can someone explain? Thank yu

@feravladimirovna1044 4 года назад

will you make tutorials about Allennlp library or suggest us sources to learn it?

@AladdinPersson 4 года назад

Not too familiar with AllenNLP so would have to do some research on that

@GodsGreatest Год назад

you scared me with that intro 😅

@SpiderMan-wk4gk 3 года назад

good job, i have a question for dataset, how do get dataset for my computer ? how do create dataset for new language ? i want to Vietnamese -> english but i dont have dataset and i dont know how do create dataset for that !!! please suport !

@raviraja2691 4 года назад

Hi. Great tutorial! Can you explain how did you write the translate_sentence function? It will be a great help. Thanks

@AladdinPersson 4 года назад

Unfortunately I didn't make an explanation of this, but it's on Github and we're essentially doing as we did in the video just one time step at a time. I think if you read through the code you will understand it, here is the code for it: github.com/AladdinPerzon/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq/utils.py Let me know if you still have questions about this

@raviraja2691 4 года назад

@@AladdinPersson Thanks a lot man!

@北海苑优质男 3 года назад

您是我的导师！i love you

@riyajatar6859 2 года назад

I am still stuck at shape of target and output tensor. Don't you think both have same shape, and we don't need to reshape. Because if target has shape (N, T, voc_size) and output will also has same shape. Correct if I m wrong

@kirillkonovalov9072 3 года назад

Could you, please, show how to actually use the model to translate sentences without the function that you imported? It’s not quite clear to me. Thank you!

@computergyan6224 Год назад

Minimum how much pair of sentences we need to have a decent translation

@rishabhahuja2506 3 года назад

Hi Aladdin, in your implementation you have passed the context vector that we got at the end of the encoder network to the first state of decoder network, then for the next state of decoder network you are using previous hidden state, cell state of decoder network as its context vector. In the lecture, ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YAgjfMR9R_M.html&ab_channel=MichiganOnline at 4:34, the author has used both the previous hidden state, cell state of decoder network and context vector that we got from the encoder network as input to the current state of decoder network. Don't you think that if we use later in our implementation that would give better results and allow the model to give better results??

@4SAnalyticsModelling 7 месяцев назад

This video has been very helpful for me to be able to implement a seq2seq model for a (slightly different) time series forecasting task!! Thanks so very much!!

@____khan 3 года назад

at 0:32, he says he went over the Seq2Seq paper in his last video but I cannot find that video for some reason. Can anyone link to it please?

@computerscience8532 3 года назад

spacy_ger = spacy.load('de') give error : module 'de' has no attribute 'load'

@godse54 3 года назад

any suggestions how to make encoder-decoder network using tensorflow.. or can you make one video using tensorflow

@nazaninadavoodi3563 Год назад

Thank you so much.I have a question why Decoder does not have activation layer?

@dockertutorial778 3 года назад

Thanks for sharing, i find answer in your video on how to get one single sentence translation result.

@raivenhasan 3 года назад

Do you need to set up a deep learning environment before running this?

@AmitGupta-pm8iw 3 года назад

Hi first of all thanks a lot for the awesome work you are doing. I have implemented and tested your code on other sentence and the model was not able to translate correctly even a single word.

@yuvrajkhanna5841 4 года назад

awesome video man thx for explaining everything well just a quick question : you made the forward function of the seq2seq to use target values I mean for training it fine but while predicting we won't have that right? I mean I understood we can basically use a while loop and see if x== we stop it there but just curious that did u write the model totally again for testing or something like if model.eval() do this. what I was curious was how did u implement that? and I was just wondering if there was a way to write that code in such a way that I won't need to pass the target in the forward function of that decoder. if possible pls make a vid on that testing part of that model. once again great vid man one of the best explanations I have seen makes me understand not only the concept but also how to implement things you are doing great work

@AladdinPersson 4 года назад

You're completely right that we need to modify our approach during evaluation. We do this by modifying our loop to predict one time step at a time until we either reach a max_length allowed or if we reach an EOS like you mentioned. If you want to see how I implemented it you can see the code for it here: github.com/AladdinPerzon/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq/utils.py

@nareshmalviya5232 Год назад

i got error to import Field could you help me for

@andyfeng6 2 года назад

Thanks for your detailed sharing, I got a problem, normally the batch size is the first dimension of the input, but in this seq2seq model it’s the second dimension, is there any knows the reason?

@weiyingwang2533 Год назад

I am confused here as well. If anyone gets the answer, I would appreciate they can reply. 😃

@yasinugur9805 4 года назад

Great tutorial but I could not try it with my dataset. When I run for loop for batch it return "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ". I dont know why. Can you help me about it with mail?

@AladdinPersson 4 года назад

I am much more active on RU-vid so it's better if you ask on this platform. I'm not sure what could be wrong with the data loading approach, I do have an entire video dedicated to this topic of loading text datasets, wish you the best of luck on this

@fedjioraymond8039 3 года назад

Thanks very much

@francoisvallee1620 2 года назад

I am getting : ImportError: cannot import name 'translate_sentence' from 'utils', could anyone help

@francoisvallee1620 2 года назад

problem resolve

@100sourav100 3 года назад

Great tutorial! Just a question: when predicting/testing the model with German sentences don't we update the teacher_force_ratio?

@phsyco202 4 года назад

Hey, can you please tell me how do you deal with words that do not have any vector representation(OOV words) but still exists in your data. Great explanation tho!.

@AladdinPersson 4 года назад

We loaded the data using torchtext and it also takes cares of oov tokens, in this case we just used the default which is "" but you can also specify it which you do in Field if I remember correctly

@user-or7ji5hv8y 3 года назад

Given that the first input into the decoder is , would the first prediction be bad and unlikely result in the word, 'Good'?

@AladdinPersson 3 года назад

Was a while ago I did this, any specific time stamp? The first input shouldn't have a major influence because we're passing through the context vector, so if it does that it's probably just because it needs more training

@yutaizhou8822 4 года назад

I think standardizing your future videos using einsum and namedTensor would be great! Your video on einsum was wonderful. I think NamedTensor would be something that you'd enjoy, too. That, and maybe explore PyTorch Lightning?

@AladdinPersson 4 года назад

So far I haven't used Lightning, but it seems like a great library. I think I haven't come across the use cases of it yet, for example I haven't had models train on a TPU or using 16-bit precision

@BL0WUPFISH 4 года назад

Hey man, I've noticed in some implementations people use the hidden_size as the embedding dimension instead of a separate set value embedding_size, what is the reason for this?

@AladdinPersson 4 года назад

Simplicity I guess, they aren't really related at least in my view. Could you provide an example where you've seen this and I could give better reasoning.

@BL0WUPFISH 4 года назад

@@AladdinPersson I first saw it here: pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html , the official tutorial. If you look under the encoder and decoder you will see hidden_size as the dimension of the embedding

@seanbenhur 3 года назад

Thanks for making these videos..i have a doubt..Is it possible to use this same approach for any other language translation!?..and how hard is to deploy this!?

@AladdinPersson 3 года назад

Yeah the approach should work in general, for deployment I'm not sure, haven't tried it!

@seanbenhur 3 года назад

@@AladdinPersson Thanks...expecting a deployment tutorial from u!!

@kirillkonovalov9072 3 года назад

Love the tutorial. Thank you!

@stephennfernandes 4 года назад

Great video .. I'm a new subscriber , and a huge fan of your work .. keep up the great work 🔥

@AladdinPersson 4 года назад

Thank you, I appreciate you saying that more than you know! :)

@Robert-fp3rs 4 года назад

Hey man, I'm having some cuda errors on runtime and noticed that it consumed my whole GPU memory. May I ask if what GPU is this trained on? Thank you.

@AladdinPersson 4 года назад

It was run on a GTX 1060 with 6 GB of VRAM so nothing extreme is needed. How much VRAM does your GPU have? I suggest decreasing batch_size from 64 to 2/4/8/16/32, embedding size to 200, hidden size to 256 and change num_layers to 1 if you have to. Let me know if you're able to run it then :)

@Robert-fp3rs 4 года назад

@@AladdinPersson Oh now i get, I'm using Google Colab's GPU, the VRAM is 12gb i think. I'm trying to train it on my custom dataset which has 70k rows of data. Unfortunately I only have 1 GPU and can't load it all on the the GPU so i have to get a subset of my current training data, may you suggest any workaround on this issue? Plus I'm conducting an experiment on using pre-trained embeddings and came across this paper: www.aclweb.org/anthology/N18-2084/ Do you think this will work?

@Robert-fp3rs 4 года назад

@@AladdinPersson And also, what's the loss of your training?