seq2seq with attention (machine translation with deep learning)

Подписаться 39 тыс.

Просмотров 29 тыс.

50% 1

sequence to sequence model (a.k.a seq2seq) with attention has been performing very well on neural machine translation. let's understand how it is working!
takeaway
1. seq2seq
2. attention
3. teacher forcing
reference: arxiv.org/pdf/...
NEURAL MACHINE TRANSLATION
BY JOINTLY LEARNING TO ALIGN AND TRANSLATE
all machine learning youtube videos from me,
• Machine Learning

Опубликовано:

28 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 46

@skymanaditya 3 года назад

Nice tutorial. For a moment I got confused in the output translations like Nan, Nul which I confused with NAN (not a number) and NULL (empty state). :D

@TheEasyoung 3 года назад

Haha sorry for confusion... yeh indeed these nan, nul are Confusing :)

@mohamedibrahimbehery3235 2 года назад

Man these are 11 minutes of exception right here... You practically explained a complex topic in just 11 minutes... unlike other channels... you're awesome and keep it up :)

@TheEasyoung 5 лет назад

reference: arxiv.org/pdf/1409.0473.pdf

@imadsaddik Год назад

Thank you so much, your explanation was very clear

@jinhopark3671 4 года назад

Very helpful and easy to understand. Keep up the good work!

@superaluis 4 года назад

Your videos are awesome. Thank you!

@rishirajgupta6262 4 года назад

Thanks for this video. You saved my time

@yashumahajan7 4 года назад

How this fully connected layer working ...how we are geting the relevant hidden state from this fully connected layer

@ashishbodhankar1993 2 года назад

bro you are my hero!

@hafsatimohammed9604 2 года назад

Thank you so much! Your explanation helps a lot!

@dani-ev6qz 3 года назад

This helped me a lot thank you!

@djsnooppyzatdepoet7568 4 года назад

Thank you Sir.Very Easy to understand. Thank you

@jinpengtian2072 4 года назад

very clear tutorial, thanks a lot

@ladyhangaku2072 4 года назад

Thank you, sir! One question: How are the attention weights are calculated? I know the equation for it but I don't really understand the dependencies between the last hidden states of the RNN and the decoder. Could you recommend something to read about or could explain it here?

@TheEasyoung 4 года назад

The attention weights are initially random number, and incrementally calculated during back propagation. The research paper is good resources :)

@coqaine 4 года назад

@@TheEasyoung why watch your videos if we need to read the original research paper anyway. :) Maybe you could make a seperate video on the matrix and q.k.v vectors. :)

@sahiluppal3093 4 года назад

Nice explaination.

@sophies674 2 года назад

Most awesome 😱🎉😱😂explaination 😱🌼🎊🌼🎉🤗🤗

@anujlahoty8022 3 года назад

Very Nice Video.

@InNOutTube 4 года назад

who noticed point you for the 1 at 6.13 minutes :(

@dhanojitray9475 4 года назад

thank you sir.

@sahidafridi6379 4 года назад

🤣

@xtimehhx 4 года назад

Great video

@vybhavshetty9094 4 года назад

Thank you

@bhavyasri642 3 года назад

Great explanation sir..could you please share me that colab link ?

@gamefever6055 3 года назад

nan and null value

@greyxray 3 года назад

thank you for the great videos! Could you elaborate a bit regarding the implementation of the fully collected layer in the example with the attention mechanism (or point to the code example if such exist)? I am not quite sure what are the dimensions of this part of the network.

@TheEasyoung 3 года назад

here is the well written code colab.research.google.com/github/tensorflow/tensorflow/blob/r1.9/tensorflow/contrib/eager/python/examples/nmt_with_attention/nmt_with_attention.ipynb the dimension should be the rnn output which you also can see from the code.

@gamefever6055 3 года назад

best explanation in 10 min

@anggipermanaharianja6122 4 года назад

well explained, thanks for the effort to make it

@balagurugupta5193 4 года назад

Neat explanation! thank u

@chanjunpark2355 5 лет назад

안녕하세요 질문이 있습니다. 디코딩 부분에서 논문을 보면 g(Y t-1, S t, C) 즉 출력언어를 생성할 때 Context벡터와 이전 출력 그리고 현재 시점의 hidden State가 입력으로 들어가게 되는데 설명해주신 그림을 보면 이전 출력(Y t-1)과 Context벡터(C)만 들어가는 것으로 보입니다. 어떤것이 맞는 건가요? 그림에서 S t가 입력으로 들어가야 할 거 같아서요 ! 제가 틀렸을 수도 있습니다.

@TheEasyoung 5 лет назад

피드백 감사합니다. g는 nonlinear function을 의미하고 그 안에 이전 출력값, 현재 rnn출력값, 그리고 이전 context 벡터가 들어갑니다. 이는 제가 현재 비디오에서 설명하는 것과 같아보이네요. 전 현재 스테이트를 context 벡터를 만드는 데 사용한 것으로 설명하는데, 제 설명이 좀 더 논문보다 직관적으로 이해가 쉬울 것으로 생각되었습니다.

@chanjunpark2355 5 лет назад

@@TheEasyoung 네 감사합니다. 좋은설명 잘 들었습니다

@23232323rdurian 5 лет назад

do you happen to know of a good TOKENIZER for 日本語 Japanese text? No spaces between the Japanese words, greatly complicating tokenization. I've found a few (Mecab), but I havent found anything truly effective (no NLTK J tokenizer)... I'm building a Synthetic Text Generator (word/phrase level) but I'm missing the Jtokenizer.... Thank you....

@whatohyou01 4 года назад

AFAIK mecab is one of the best japanese POS tagger and it can be edited to add new data but that's about it. Damn those japanese without white space.

@Deshwal.mahesh 3 года назад

Can you explain Luong, Bilinear, Global, Local using the same methodology?

@TheEasyoung 3 года назад

Good suggestions! I am one of engineer who mostly spend time for work, I can’t guarantee if I will create these video or not, but thanks for suggesting good topics.

@kartikafindra 4 года назад

thank you so much, what is the meaning of softmax?

@TheEasyoung 4 года назад

Softmax is to provide probability of each class. Before softmax, numbers are not 0 to 1 value. After softmax, you can see numbers on each class node is range 0 to 1 which you can consider as probability and you use max probability one as prediction of your deep learning model.

@kartikafindra 4 года назад

Minsuk Heo 허민석 thank you. Is the also called activation function?

@TheEasyoung 4 года назад

No. Activation function is different concept. Sigmoid which is popular activation function is similar in a sense giving range 0 to 1. Softmax is normally located after activation functions. The big difference is While activation function only care about one node, softmax cares all nodes and normalize percentage on all over nodes.

@kartikafindra 4 года назад

Minsuk Heo 허민석 okay thank you so much. :) where can I find the dataset to try translation pair?

@TheEasyoung 4 года назад

There are many places. Here is one. www.tensorflow.org/datasets/catalog/wmt19_translate