My name is Grant Sanderson. Videos here cover a variety of topics in math, or adjacent fields like physics and CS, all with an emphasis on visualizing the core ideas. The goal is to use animation to help elucidate and motivate otherwise tricky topics, and for difficult problems to be made simple with changes in perspective.
For more information, other projects, FAQs, and inquiries see the website: www.3blue1brown.com
I have a question, you said that the embedding of a word basically holds the dimensional components of a vector in the embedding space , and as you illustrated , these vectors originate from the origin i.e [0,0,0,...0] . But how do you represent the vectors that don't originate from the origin . For example in the video the vector that you got as a result of subtracting E(King) - E(Queen). That vector doesn't start from the origin. How would that vector be represented.?
I don't believe that's true. If that's true, it has to be a coincidence, and has nothing to do with PI. What if we use octal, for instance, instead of decimal notation? decimal 3141 is octal 6105, decimal 31415 is octal 75267 decimal 314169 is octal 1145457 PI Is octal 3.1103 aprox. Base 10 is arbitrary, not a "natural" base. In every situation where PI can be used to describe nature, the fact doesn't depends on the numeric notation used.
If in machine learning there's such a thing as temperature, and there is also such a thing as entropy. Then it must exist the concept of heat, which would be a difference in entropy times the temperature
I was writing about gaussian when this got recommended. I didn't even say it out loud. We often joke how google listens through our phones. Can Google read minds too? I wasn't using Chrome, Google search, Android or even Windows.
Hi, great video as always. One thing I didn't get is why is Temperature param even matter if it doesn't change the order of elements. The token with maximum input will always end up having max probability and will be chosen regardless of T.
I've managed to solve it on my own in a very different method. Just at the beginning i want to adress that it's not a great method because it doesn't prove the concept but manages to give a "proposition" of the answer. My method involved solving this problem in different dimensions. Instead of solving the 3D sphere with 4 points, I solved the 2D circle with 3 points, 1D line with 2 points, and 0D point with one point. I assumed that the number of points would be the case because 0D point has max 1 point and line has 2. After that, for a point it was 100% obviously and in a line 50%. The only hard part of this method for me was solving the 2D version because im not very good with probabilities but after a while i realised that we just choose a random number between 0-1, and then the probability is this number squared. I then proceeded to square the average of 0-1, one half, so the probability came as a quarter, 0,25. The only thing left for me to do was notice the pattern, 0D - 1/1, 1D - 1/2, 2D- 1/4, 3D -1/8 (1 over 2 to the power of the number of dimensions for this problem)
Isnt this the gamblers phalacy? Like sure overall theyll land i. The middle but that means nothing when each balls falls once at a time. Even if i hit low 5 times theres no guarentee ill hit the middle again
But remember that the labels we put on these embeddings still come from our own use of language. The model just sees the numbers and we associate those numbers with things like "italianness" for example. So its very debatable if the model "understands" anything and isn't just making extremely complex statistical correlations