Q&A - Hierarchical Softmax in word2vec

ChrisMcCormickAI

Подписаться 16 тыс.

Просмотров 15 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

22 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 36

@abhijeetsharma5715 3 года назад

This was the best explanation of HS that I saw! Very clearly explained. In my opinion, the most essential part of this was that even with HS we do have |V|-1 output units..but only log|V| units need to be computed while training since remaining units are "dont-cares" and we can compute loss based on the log|V| output units only. However, while testing, we would have certainly required to compute all |V| softmax probabilities to make any prediction, but we don't really care about testing/predicting since our aim is just to train the embedding.

@gemini_537 9 месяцев назад

I like your comments, but I don't quite understand why only log|V| units need to be computed during training. Could you give an example?

@vgkk5637 4 года назад

Thank you Chris. well explained and just perfect for people like me who are interested in understanding the concepts and usage rather than academic maths behind it..

@ChrisMcCormickAI 4 года назад

Thanks VeniVig! That's nice to hear, and I'm glad it provided a practical understanding.

@doctorshadow2482 6 месяцев назад

Thank you. Good explanation. Some questions: 1. At 2:40. Why are we interested in getting the sum as 1, which softmax provides? What's wrong with using existing output values, we already have the weight for 8 higher than others, so we have the answer. Why do we need the extra work at all? 2. At 9:49. What is this "word vector"? Is it still one-hot vector for the word from the dictionary or something else? How is this vector represented in this case? 3. At 15:00. That's fine, but if we trained for "cpupacabra", what would be with the weights when we train for the other words? Wouldn't it just blend or "blur" the coefficients making them closer to "white noise"?

@자연어천재만재 2 года назад

This is amazing. Although I am Korean and bad at English, your lecture made me smart.

@stasbekman8852 4 года назад

there is a small typo at 13:50 - it should be .72, instead of .62 so that it adds up to 1. and thank you!

@joyli9106 3 года назад

Thank you Chris! I would say it's the best explanation I've ever seen about HS.

@user-re1bi2bc8b 4 года назад

Incredibly easy to understand thanks to your explanation. Thank you Chris!

@ariverhorse 9 месяцев назад

Best explanation of HS I have seen!

@hamzaleb9215 5 лет назад

Always clear explanation and right to the point. Thanks Chris. Waiting for next videos. Your 2 articles explaining Word2Vec were just perfect.

@ChrisMcCormickAI 4 года назад

Thanks for the encouragement, Hamza!

@Dao007forever 2 года назад

Great explanation!

@j045ua 4 года назад

These videos have been a great help for my thesis! Thank you Chris!

@amratanshu99 4 года назад

Nicely explained. Thanks man!

@nikolaoskaragkiozis5330 3 года назад

Hi Chris, thank you for the video. So, if I understand correctly there are 2 things that are being learned here. 1) The word embedding, 2) the Output Matrix which contains the weights associated with the output layer?

@mahdiamrollahi8456 3 года назад

Hello dear Chris, Hope all is well, Thanks for your lecture, that was fabulous. I have some tiny questions: - For negative sampling, it is said that negative samples will selected randomly. So, It means that we just need to update the params for those samples instead of all possible words in softmax? (so, in softmax we need to update the params for both correct and incorrect classes, true?) -How we can calculate the output matrix? How do we have it? -if we want to calculate the prob of all context words, we need to traverse over all the tree, right? Best Wishes.Mahdi

@samba789 4 года назад

Great videos Chris! I absolutely love your content! Just a quick clarification, is the output matrix also going to be weights that we need to learn?

@corgirun7892 5 дней назад

super amazing

@gavin8535 3 года назад

Nice. What does the vector 6 look like and where is it? Is it in the output layer?

@souvikjana2048 4 года назад

Hi Chris. Great Video. Could you explain the part how the binary tree is trained. I can't seem to understand for the input-context pair(chewpacabra-active) how do we select 0/1 at the root node or the subsequent nodes below.

@haardshah1676 4 года назад

I think you know the sequence of 0/1s for the context word. So for each node, you have a logistic regression model that takes as input the embedding for the input word and outputs probability of 0/1 for that node. So for the example you describe, we know the true label for root node should be "1", for 4th node is "0", and for 3rd node is "1".

@utubemaloy 4 года назад

Thank you!!! I loved this.

@guaguago2583 5 лет назад

Your fans, second comment, I am a chinese phd student, very very expect next videos :D

@ChrisMcCormickAI 5 лет назад

Thank you! I'm hoping to upload a new video about every week or so.

@yuantao563 4 года назад

The video is great! Is there any rules in why each blue node corresponds to the given row in the output matrix? Like why first blue node is row 6? How is it determined?

@ChrisMcCormickAI 4 года назад

Hi Yuan, It's just a byproduct of the Huffman tree building algorithm. If I recall correctly, I think it does result in the rows being sorted relative to the tree depth (the frequency of the word). This isn't important to the implementation, though.

@abhijeetsharma5715 3 года назад

You can assign each blue node to any row in output-matrix. Order of assigning rows is unimportant since this is not like any RNN output(i.e. it isn't sequential output). Just like input units can be in any order in a vanilla neural net.

@ANSHULJAINist 4 года назад

how to implement the hierarchical softmax for any model? Do frameworks like pytorch tensorflow have inbuilt implementation for them? If not, how can be build to work it on any model ?

@8g8819 4 года назад

Great video series, keep it going !!!!

@ChrisMcCormickAI 4 года назад

Thanks, giavo!

@anoop5611 3 года назад

What does the output vector list in blue contain? Something from the hidden-output weights?

@anoop5611 3 года назад

Okay, I missed the part that answers it. So, output vector's particular row would correspond to one of those non-leaf nodes, and the size of the row would be equal to the number of units in the hidden layer? Thank you Chri,s!