This is great stuff. Explained like a pro. Could you please create videos on similar lines with slight modifications like: 1. How to use custom data-set 2. How to use basic RNN and/or GRU (I tried but ran into multiple issues). These branch offs will be very helpful in overall understanding on how to modify code to address custom problems. Thanks in advance :)
@@AladdinPersson I've watched several more videos, but still need to catch up because I am a new subscriber. I'm still on my way learning, and I have some questions regarding this topic. Please don't be mad if I asked a stupid question : - ). Seq2Seq works for language translation with its advantage of inconsistency of input/output lengths. However, I didn't quite get the part that how a german sentence of length 12 will be translated to english sentence of length 16 (while during the training process, the translated sentence length also varies. It really confused me. I had experience with LSTM on text classification, and in that project the output of my LSTM will always have the same length.) I understand there must be some ways but I couldn't quite get that. Could you please explain more on how this works?
Will the first output of the model be token or not? In the intro you've shown that there is no in the output sequence. But @39:52 on line 174, you do output[1:] with the intention to skip the token, which is contradictory. Shouldn't loss be comparing the entire output sequence i.e. output[:] with target[1:] ?
when I run the code I have this error: Traceback (most recent call last): train_data, valid_data, test_data = Multi30k.splits( AttributeError: 'function' object has no attribute 'splits' Process finished with exit code 1 I have search for some solution form the internet but it doesnot work. Could you please take a look at my error? I really appreciate your time.
i can't understand the intuition behind making batch_size the index 1 of shape. (sequence_len, batch_size, word_size). pytorch docs say lstm uses this shape until it's mentioned batch_first = True but it seems confusing to me. (batch_size, sentence_len, word_size) seems more intuitive. can anyone explain me the first shape (when batch_first=False)
Hi, I am not able to load german tokeniser. OSError: [E050] Can't find model 'de'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
(another option) If you are using Windows, 1. Right click on anaconda navigator 2. Run as administrator 3. Open anaconda prompt 4. python -m spacy download de this work for me
Very helpful content....I was looking for some tutorials on attention and transformer networks and came across your work - you taught me a lot Sir...Keep it up.
Hello I have noticed that in seq2seq class you have given target as input but how we will deal with this in validation time, that time we don't want to pass target data.
I noticed the same thing, and I guess you are right. One should probably have `target = None` and then make a case distinction. And in fact I think that there is a mistake in this function, because the prediction might actually be *longer* than the length of the target sequence. In that case, the rest of the prediction is just ignored. This has implications for the loss computation. The predictions should be at least some number of steps T forward after the target sequence length has been matched. But this would also require the training batch to be padded to some higher number of tokens.
by far, this is the best presentation. in a way, I am struggling to keep track of all the shapes. I guess, there is no easy way other than keep track of all of them. I noticed that the shapes of inp_data and target changes with every batch, yet within the batch, all samples have the same shape? How is that possible? Are they padded within each batch to have the same length?
Yes you're correct, we need to pad each batch to have the same length, and this is from how a tensor is constructed. We can't have variably length of a dimension in a tensor
Thanks for your tutorial helped me a lot! But I got ImportError: cannot import name 'translate_sentence' from 'utils' (/opt/anaconda3/lib/python3.8/site-packages/utils/__init__.py) Do you have any idea I can solve this problem?
@@AladdinPersson Hi aladdin in the character level lstm you have unsqueezed the embedding output in the lstm layer but here it is not unsqueezed may i know y ?
Hey, here in seq2seq forward we keep the outputs[0] as zero and the target[0]=vocab.english.stoi[""]. Should we not keep outputs[0,:,vocab.english.stoi[""]=1
We keep the outputs[0] as zero because doing in this way makes the indexing convenient (although you can quite easily change this to start at 0 too) and then we simply ignore this zero element by doing outputs[1:] when we send it to the cross entropy loss. It would be helpful if you could refer to the lines from Github that you feel are wrong/confusing and propose your alternatives, it's a bit difficult to follow now
Is torch's gpu version necessary for running this? And I got stuck at last function, the program got stuck at the epoch function after first iteration. I am running it in jupyter notebook, utils file also included in same directory, I am using torch's CPU version.
@SIMAR PREET SINGH I don't think that means it's stuck, it's probably training and just taking a very long time for you since you're on the CPU. Try printing loss.item() and setting batch size to 1: do you get anything printed? I mean if you're not getting an error it's most likely training.
So it doesn't work offline, so I have to switch to Google colab, it worked there fine and got results. Thanks for help. And thankyou for the video lesson
hey can this be used to make text to speech as text and speech are sequence to sequence ....if not i would love to see a video of how to implement text to speech project using pytorch
I made a paper review of the first Seq2Seq paper but in retrospect I do not think it lived up to the standard I want to set so I unlisted it. I'll add it so it's accessible for members if you still want to take a look though
Let me know if it doesn't work since RU-vid told me it's still in beta version, but link is here: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Ui-K4RUkC58.html
@@AladdinPersson I watched the video. I do acknowledge that making a paper review is challenging at times, when you have time constraints but you did a decent job.
loved your tutorial. i have a question though. when we implementing the encoder, you said: "x shape" as (seq_len, N). shouldn't it be as (input_size, seq_len, N), where input_size is the vocabulary? because we one-hot encode each word in the first place
I got the following errors while implementing: 1. ImportError: cannot import name translate_sentence, save_checkpoint, load_checkpoint from 'utils' 2. AttributeError: module 'torchtext.nn' has no attribute 'Module' Has someone else also been going through these errors ? Can anyone please suggest the resolution for these issues ?
When you called build_vocab method for german and english? How can pytorch now if you are trying to build vocab for the specific language? Because you just pass train_data at once. Can someone explain? Thank yu
good job, i have a question for dataset, how do get dataset for my computer ? how do create dataset for new language ? i want to Vietnamese -> english but i dont have dataset and i dont know how do create dataset for that !!! please suport !
Unfortunately I didn't make an explanation of this, but it's on Github and we're essentially doing as we did in the video just one time step at a time. I think if you read through the code you will understand it, here is the code for it: github.com/AladdinPerzon/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq/utils.py Let me know if you still have questions about this
I am still stuck at shape of target and output tensor. Don't you think both have same shape, and we don't need to reshape. Because if target has shape (N, T, voc_size) and output will also has same shape. Correct if I m wrong
Could you, please, show how to actually use the model to translate sentences without the function that you imported? It’s not quite clear to me. Thank you!
Hi Aladdin, in your implementation you have passed the context vector that we got at the end of the encoder network to the first state of decoder network, then for the next state of decoder network you are using previous hidden state, cell state of decoder network as its context vector. In the lecture, ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-YAgjfMR9R_M.html&ab_channel=MichiganOnline at 4:34, the author has used both the previous hidden state, cell state of decoder network and context vector that we got from the encoder network as input to the current state of decoder network. Don't you think that if we use later in our implementation that would give better results and allow the model to give better results??
This video has been very helpful for me to be able to implement a seq2seq model for a (slightly different) time series forecasting task!! Thanks so very much!!
Hi first of all thanks a lot for the awesome work you are doing. I have implemented and tested your code on other sentence and the model was not able to translate correctly even a single word.
awesome video man thx for explaining everything well just a quick question : you made the forward function of the seq2seq to use target values I mean for training it fine but while predicting we won't have that right? I mean I understood we can basically use a while loop and see if x== we stop it there but just curious that did u write the model totally again for testing or something like if model.eval() do this. what I was curious was how did u implement that? and I was just wondering if there was a way to write that code in such a way that I won't need to pass the target in the forward function of that decoder. if possible pls make a vid on that testing part of that model. once again great vid man one of the best explanations I have seen makes me understand not only the concept but also how to implement things you are doing great work
You're completely right that we need to modify our approach during evaluation. We do this by modifying our loop to predict one time step at a time until we either reach a max_length allowed or if we reach an EOS like you mentioned. If you want to see how I implemented it you can see the code for it here: github.com/AladdinPerzon/Machine-Learning-Collection/blob/master/ML/Pytorch/more_advanced/Seq2Seq/utils.py
Thanks for your detailed sharing, I got a problem, normally the batch size is the first dimension of the input, but in this seq2seq model it’s the second dimension, is there any knows the reason?
Great tutorial but I could not try it with my dataset. When I run for loop for batch it return "RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ". I dont know why. Can you help me about it with mail?
I am much more active on RU-vid so it's better if you ask on this platform. I'm not sure what could be wrong with the data loading approach, I do have an entire video dedicated to this topic of loading text datasets, wish you the best of luck on this
Hey, can you please tell me how do you deal with words that do not have any vector representation(OOV words) but still exists in your data. Great explanation tho!.
We loaded the data using torchtext and it also takes cares of oov tokens, in this case we just used the default which is "" but you can also specify it which you do in Field if I remember correctly
Was a while ago I did this, any specific time stamp? The first input shouldn't have a major influence because we're passing through the context vector, so if it does that it's probably just because it needs more training
I think standardizing your future videos using einsum and namedTensor would be great! Your video on einsum was wonderful. I think NamedTensor would be something that you'd enjoy, too. That, and maybe explore PyTorch Lightning?
So far I haven't used Lightning, but it seems like a great library. I think I haven't come across the use cases of it yet, for example I haven't had models train on a TPU or using 16-bit precision
Hey man, I've noticed in some implementations people use the hidden_size as the embedding dimension instead of a separate set value embedding_size, what is the reason for this?
Simplicity I guess, they aren't really related at least in my view. Could you provide an example where you've seen this and I could give better reasoning.
@@AladdinPersson I first saw it here: pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html , the official tutorial. If you look under the encoder and decoder you will see hidden_size as the dimension of the embedding
Thanks for making these videos..i have a doubt..Is it possible to use this same approach for any other language translation!?..and how hard is to deploy this!?
It was run on a GTX 1060 with 6 GB of VRAM so nothing extreme is needed. How much VRAM does your GPU have? I suggest decreasing batch_size from 64 to 2/4/8/16/32, embedding size to 200, hidden size to 256 and change num_layers to 1 if you have to. Let me know if you're able to run it then :)
@@AladdinPersson Oh now i get, I'm using Google Colab's GPU, the VRAM is 12gb i think. I'm trying to train it on my custom dataset which has 70k rows of data. Unfortunately I only have 1 GPU and can't load it all on the the GPU so i have to get a subset of my current training data, may you suggest any workaround on this issue? Plus I'm conducting an experiment on using pre-trained embeddings and came across this paper: www.aclweb.org/anthology/N18-2084/ Do you think this will work?