Vectoring Words (Word Embeddings) - Computerphile

Computerphile

Подписаться 2,4 млн

Просмотров 292 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

26 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 414

@VladVladislav790 4 года назад

"Not in this data set" is my new favorite comeback oneliner

@MrAmgadHasan Год назад

It's similar to "not in this timeline" that we hear a lot in time-travel scifi

@Jason-wm5qe Год назад

😂

@kurodashinkei 5 лет назад

Tomorrow's headline: "Science proves fox says 'Phoebe'"

@bookslug2919 4 года назад

Fox News

@xario2007 5 лет назад

Okay, that was amazing. "London + Japan - England = Tokyo"

@yshwgth 5 лет назад

That needs to be a web site

@cheaterman49 5 лет назад

More impressed by Santa + pig - oink = "ho ho ho"

@VoxAcies 5 лет назад

This blew my mind. Doing math with meaning is amazing.

@erikbrendel3217 5 лет назад

you mean Toyko!

@Dojan5 5 лет назад

I was actually expecting New York when they added America. As a child I always thought New York was the capital of the U.S., I was at least around eight when I learned that it wasn't. Similarly, when people talk of Australia's cities, Canberra is rarely spoken of, but Sydney comes up a lot.

@Cr42yguy 5 лет назад

EXTRA BITS NEEDED!

@kamalmanzukie 4 года назад

grow up

@wohdinhel 5 лет назад

“What does the fox say?” “Don’t they go ‘ring ding ding’?” “Not in this dataset”

5 лет назад

Train the same algorithm on songs instead of news articles and I figure you could get some really interesting results as well. Songs work on feelings and that should change the connections between the words as well - I bet the technology can be used to tell a lot about the perspective people take on things as well.

@argenteus8314 5 лет назад

@ Songs also use specific rhythmic structures; assuming most of your data was popular music, I bet that there'd be a strong bias for word sequences that can fit nicely into a 4/4 time signature, and maybe even some consistent rhyming structures.

@killedbyLife 5 лет назад

@ Train it with only lyrics from Manowar!

@ruben307 5 лет назад

@ I wonder how strong Rhymes would show up in that dataset.

4 года назад

@@killedbyLife That's odd - I listen to Manowar regularly. Nice pick. 😉

@buzz092 5 лет назад

Always love to see Rob Miles here!

@RobertMilesAI 5 лет назад

@yondaime500 4 года назад

Even when the video doesn't have that "AAAHH" quality to it.

@Chayat0freak 5 лет назад

I did this for my final project in my bsc. Its amazing. I found cider - apples + grapes = wine. My project attempted to use these relationships to build simulated societies and stories.

@Games-mw1wd 4 года назад

would you be willing to share a link? This seems really interesting.

@TOASTEngineer 4 года назад

Yeah, that sounds right up my alley, how well did it work

@ZoranRavic 4 года назад

Dammit Dean, you can't bait people with this kind of a project idea and not tell us how it went

@KnakuanaRka 4 года назад

You want to give some info as to how that went?

@blasttrash Год назад

you are lying you did not do it. if you did, then paste the source(paper or code). - cunningham

@Alche_mist 4 года назад

Fun points: A lot of the Word2vec concepts come from Tomáš Mikolov, a Czech scientist at Google. The Czech part is kinda important here - Czech, as a Slavic language, is very flective - you have a lot of different forms for a single word, dependent on its surroundings in a sentence. In some interview I read (that was in Czech and in a paid online newspaper, so I can't give a link), he mentioned that this inspired him a lot - you can see the words clustering by their grammatical properties when running on a Czech dataset and it's easier to reason about such changes when a significant portion of them is exposed visibly in the language itself (and learned as a child in school, because some basic parts of it are needed in order to write correctly).

@JDesrosiers Год назад

very interesting

@afriedrich1452 Год назад

I keep wondering if I was the one who gave the inventor of Word2vec the idea of vectoring words 15 years ago. Probably not.

@notthedroidsyourelookingfo4026 Год назад

Now I wonder what would've happened if it had been a Chinese, where you don't have that at all!

@StoutProper Год назад

Wonder how this works with Japanese? Their token spaces must be much bigger and more complex

@newbie8051 Год назад

Technically you can share the link to the newspaper

@panda4247 4 года назад

I like this guy and his long sentences. It's nice to see somebody who can muster a coherent sentence of that length. So, if you run this (it's absurdly simple, right), but if you run this on a large enough data set and give it enough compute to actually perform really well, it ends up giving you for each word a vector (that's of length however many units you have in your hidden layer), for which the nearby-ness of those vectors expresses something meaningful about how similar the contexts are that those words appear in, and our assumption is that words that appear in similar contexts are similar words.

@thesecondislander Год назад

His neural network has a very large context, evidently ;)

@MrAmgadHasan Год назад

Imagine a conversation between him and D Trump.

@wolfbd5950 4 года назад

This was weirdly fascinating to me. I'm generally interested by most of the Computerphile videos, but this one really snagged something in my brain. I've got this odd combination of satisfaction and "Wait, really? That works?! Oh, wow!"

@bluecobra95 5 лет назад

'fox' + 'says' = 'Phoebe' may be from newspapers quoting English actress Phoebe Fox

@skepticmoderate5790 4 года назад

Wow what a pull.

@rainbowevil 3 года назад

It was given ‘oink’ minus ‘pig’ plus ‘fox’ though, not fox + says. So we’d expect to see the same results as for cow & cat etc. of it “understanding” that we’re looking at the noises that the animals make. Obviously it’s not understanding, just an encoding of how those words appear near each other, but we end up with something remarkably similar to understanding.

@rich1051414 5 лет назад

This thing would ace the analogy section of the SAT. Apple is to tree as grape is to ______. model.most_similar_cosul(positive['tree', 'grape'], negative['apple']) = "vine"

@adamsvoboda7717 4 года назад

Meanwhile in 2030: "human" + "oink oink" - "pig" = "pls let me go skynet"

@veggiet2009 5 лет назад

Foxes do chitter! But primarily they say "Phoebe"

@Verrisin 4 года назад

floats: some of the real numbers - Best description and explanation ever! - It encompasses all the problems and everything....

@RobertMilesAI 4 года назад

"A tastefully curated selection of the real numbers"

@nemanjajerinic6141 9 месяцев назад

Today, vector databases are a revolution to AI models. This man was way ahead of time.

@SeanSuggs 4 месяца назад

Rob Miles and computerphile thank you... IDK why youtube gave this gem back to me today (probably for my insesent searching for the latest LLM news these days) but I am greatful to you even more now than I was 4yrs ago... Thank you

@muddi900 5 лет назад

'What does it mean for two words to be similar?' That is a philosophy lesson I am not ready for bro

@williamromero-auila7129 5 лет назад

Breau

@_adi_dev_ 4 года назад

How dare you assume my words meaning, don't you know its the current era

@cerebralm 4 года назад

that's kind of the great thing about computer science... you can take philosophical waffling and actually TEST it

@youteubakount4449 4 года назад

I'm not your bro, pal

@carlosemiliano00 4 года назад

@@cerebralm "Computer science is the continuation of logic by other means"

@Alkis05 3 года назад

This is basically node embedding from graph neural networks. Each sentence you use to train the it can be seen as a random walk in the graph that relates each world with each other. The number of words in the sentence can be seem as how long you walk from the node. Besides "word-vector arithmetics", one thing interesting to see would be to use this data to generate a graph of all the words and how they relate to each other. Than you could do network analysis with it, see for example, how many clusters of words and figure out what is their labels. Or label a few of them and let the graph try to predict the rest of them. Another interesting thing would be to try to embed sentences based on the embedding of words. For that you would get a sentence and train a function that maps points in the word space to points in the sentence space, by aggregating the word points some how. That way you could compare sentences that are close together. Then you can make sentences-vector arithmetics. This actually sounds like a cool project. I think I'm gonna give it a try.

@jamesjonnes Год назад

How did it go?

@PerMortensen 5 лет назад

Wow, that is mindblowing.

@vindolanda6974 2 месяца назад

I love how this video seems to have time travelled from 1979 - the austere painted brick classroom, the shirt, the hair and beard, even the thoughtful and articulate manner seem to come from another time.

@b33thr33kay Год назад

You really have a way with words, Rob. Please never stop what you do. ❤️

@Sk4lli 4 года назад

This was soooo interesting to me. I never dug deeper in how these networks work. But so many "Oh! That's how it is!". When I watched the video about GPT-2 and you he said that all the connections are just statistics, I just noted that internally as interesting and "makes sense" but didn't really get it. But with this video it clicked! So many interesting things, so thanks a lot for that. I love these videos. And seeing the math that can be done with these vectors is amazing! Wish I could like this more than once.

@kal9001 5 лет назад

Rather than biggest city, it seems obvious it would be the most written about city, which may or may not be the same thing.

@packered 5 лет назад

Yeah, I was going to say most famous cities. Still a very cool relationship

@oldvlognewtricks 5 лет назад

Would be interested by the opposite approach: ‘Washington D.C. - America + Australia = Canberra’

@Okradoma 4 года назад

Toby Same here... I’m surprised they didn’t run that,

@tolep 3 года назад

Stock markets

@joshuar3702 5 лет назад

I'm a man of simple tastes. I see Rob Miles, I press the like button.

@LeoStaley 5 лет назад

I'm a simple man. I see Rob Miles, I click.

@koerel 4 года назад

I could listen to him all day!

@mynamesnotsteve 3 года назад

I'm surprised that there's been no mention of Rob's cufflinks in the comments for well over a year after upload

@worldaviation4k 5 лет назад

is the diagram with angles and arrows going off in all directions just for us to visualise it rather than how computers are looking at it, I didn't think they'd be calculating degrees. I thought it would be more about numbers of how close the match is like 0-100

@abdullahyahya2471 Год назад

Mind blown, Thanks for the easy explanation. So calm and composed.

@arsnakehert Год назад

Love how you guys are just having fun with the model by the end

@MakkusuOtaku 5 лет назад

Word embedding is my favorite pass-time.

@superjugy 5 лет назад

OMG that ending. Love Robert's videos!

@tridunghuynh5573 3 года назад

I love the way he's discussing complicated topics. Thank you very much

@Verrisin 4 года назад

Man, ... when AI will realize we can only imagine 3 dimensions, it will be so puzzled how we can do anything at all...

@overloader7900 4 года назад

Actually 2 spacial visual dimension with projection... Then we have time, sounds, smells...

@Democracy_Manifest Год назад

The amount of neurons is more important than the experienced dimensions.

@tapanbasak1453 10 месяцев назад

This page blows my mind. It takes you through the journey of thinking.

@cheeyuanng853 2 года назад

This gotta be one of the best intuitive explanation of word2vec.

@distrologic2925 4 года назад

I love that I have been thinking about modelling natural language for some time now, and this video basically confirms my way of heading. I have never heard of word embedding, but its exactly what I was looking for. Thank you computerphile and youtube!

@kamandshayegan4824 10 месяцев назад

I am amazed and in love with his explanations. I just understand it clearly, you know.

@tommyhuffman7499 Год назад

This is by far the best video I've seen on Machine Learning. So cool!!!

@WondrousHello Год назад

This has suddenly become massively relevant 😅

@SpaceChicken Год назад

Phenomenal talk. Surprisingly compelling given the density of the topic. I really do hope they let this man out of prison one day.

@patricke1362 7 месяцев назад

super nice style of speaking, voice and phrasing. Good work !

@lonephantom09 4 года назад

Beautifully simple explanation! Resplendent!

@Razzha 5 лет назад

Mind blown, thank you very much for this explanation!

@debayanpal8107 5 месяцев назад

best explanation about word embedding

@RazorbackPT 5 лет назад

I would suspect that this has to be very similar to how our own brains interpret languange, but then again evolution has a tendency to go about solving problems in very strange and inefficient ways.

@maxid87 4 года назад

Do you have examples? I am really curious - so far I always assumed nature does it the most efficient way possible.

@wkingston1248 4 года назад

@@maxid87 mammals have a nerve that goes from the brain to the throat, but due to changes in mammals it always goes under a vien in the heart then back up to the throat. This is so extreme that on a giraffe the nerve is like 9 feet long or something. In general evolution does a bad job at remmoving unnecessary features.

@Bellenchia 4 года назад

Clever Hans

@maxid87 4 года назад

@@wkingston1248 how do you know that this is inefficient? Might seem like that at first glance but maybe there is some deeper reason for it? Are there actual papers on this topic that answer the question?

@cmilkau 4 года назад

I doubt there is a lot of evolution at play in human language processing. It seems reasonable to assume that association (cat~dog) and decomposition (Tokyo = japanese + city) play an important role.

@abhisheksarkar_neo 6 дней назад

4 years ago if only we had the vision to understand that this is actually a multi billion dollar, world changing idea!

@helmutzollner5496 Год назад

Very interesting. Would like to see more about these word vectors and how to use them.

@JamieDodgerification 5 лет назад

Would it be possible for Rob to share his colab notebook / code with us so we can play around with the model for ourselves? :D

@jeffreymiller2801 4 года назад

I'm pretty sure it's just the standard model that comes with gensim

@steefvanwinkel 4 года назад

See bdot02's comment above

@danielroder830 5 лет назад

You could make a game with that, some kind of scrabble with random words, add and substract words to get other words. Maybe with the goal to get long words or specific words or get shortest or longest distance from a specific word.

@Nagria2112 5 лет назад

Rob Miles is back :D

@datasciyinfo5133 Год назад

Thanks for a great explanation of word embeddings. Sometimes I need a review. I think I understand it, then after looking at the abstract, n-dimensional embedding space in ChatGPT and Variational Autoencoders, I forget about the basic word embeddings. At least it’s a simple 300-number vector per word, that describes most of the highest frequency neighboring words.

@michaelcharlesthearchangel Год назад

Me too. I loved the review after looking how GPT4 and its code/autoencoder-set looks under the hood. I also had to investigate the keywords being used like "token" when we think about multi vector signifiers and the polysemiology of glyphic memorization made by these massive AI databases. Parameters for terms, words went from 300 to 300,000 to 300,000,000 to 1.5 trillion to ♾ infinite. Meaning: Pinecone and those who've reached infinite parameters have created the portal to a true self-learning operating system, self-aware AI.

@kenkiarie 4 года назад

This is very impressive. This is actually amazing.

@funkysagancat3295 2 месяца назад

Awesome that with ChatGPT and stable diffusion this video got super relevant

@Sanders4069 6 месяцев назад

So glad they allow this prisoner a conjugal visit to discuss these topics!

@taneliharkonen2463 2 года назад

Mind blown... Able to do arithmetic on the meaning of words... I did not see that one coming :o A killer explanation on the subject thanks!! :D

@endogeneticgenetics Год назад

Would love sample code in cases like this where there’s a Jupyter notebook already laying about!

@redjr242 4 года назад

This is fascinating! Might we be able to represent language in the abstract as a vector space? Furthermore, similar but slightly different words in different languages are represented by similar by slightly different vectors in this vector space?

@bruhe_moment 4 года назад

Very cool! I didn't know we could do word association to this degree.

@MrSigmaSharp 5 лет назад

Oh yes, explaination and a concrete example

@AbhishekNigam-n4q 7 месяцев назад

Wonderful explanation

@rafaelzarategalvez6728 5 лет назад

It'd have been nice to hear about the research craze around more sophisticated approaches to NLP. It's hard to keep up with the amount of publications lately related to achieving "state-of-the-art" models using GLUE's benchmark.

@Noxeus1996 4 года назад

This video really deserves more views.

@DrD0000M 5 лет назад

3rd result for dog is "bark incessantly." Even AI knows dogs are annoying mutants. Fun fact: Wolves don't bark, well, almost never.

@Dawn-hd5xx 5 лет назад

Wild cats also don't meow. Even feral "domestic" (as in the species) cats don't meow, it's only towards humans that they do.

@NicknotNak 4 года назад

Duddino Gatto they mew. But they outgrow it pretty quickly. Humans don’t babble like babies when we grow up, but if that was the only think our feline overlords responded to, we would.

@HeavisideDelta 5 месяцев назад

This is one of the coolest things i've seen in a while. Just thinking how small a neighbourhood of one word/vector should we take ? Or how does the implementation of context affect the choice of optimal neighbourhoods ?

@HeavisideDelta 5 месяцев назад

And contexts themselves vary from a person to another depending on how they experienced life. So it would be interesting to see also a set of optimal contexts and that would affect the whole thing.

@nonchip 4 года назад

3:00 pretty sure that graphic should've been just 2 points on the same line, given what he said a few sentences before that.

@panda4247 4 года назад

Yep, if the mapping of images is just taking the values each pixel and then making N-dimensional vector (where N is number of pixels), then the picture with more brightness would be the on the same line (if solid black pixels were still solid black, depending on your brightness filter applied).

@cmilkau 4 года назад

There are a lot of words that appear similar by context but are very different in meaning. Sometimes they're exact opposites of each other. This doesn't matter too much for word prediction but for tasks that extract semantics. Are there techniques to get better semantic encoding out of the text, particularly separating synonyms from antonyms?

@Efogoto Год назад

Auto-antonyms, words that mean the exact opposite in different context: cleave, sanction, dust ...

@vic2734 5 лет назад

Beautiful concept. Thanks for sharing!

@helifalic 5 лет назад

This blew my mind. Simply wonderful!

@theshuman100 4 года назад

word embeddings are the friends we make along the way

@SanderBuruma 5 лет назад

absolutely fascinating

@joshuakessler8346 7 дней назад

Excellent video

@rishabhmahajan6607 3 года назад

Brilliantly explained! Thank you for this video

@UserName________ Год назад

How far we've come only 3 years later.

@dixie_rekd9601 Год назад

I kinda overlooked the importance of this video when it was released 3 yrs ago. not its basically the explanaition of how chatGPT does its thing... but with more data.

@Galakyllz 5 лет назад

Amazing video! I appreciate every minute of your effort, really. Think back, wondering "Will anyone notice this? Fine, I'll do it." Yes, and thank you.

@Gargamelle 3 года назад

If you train 2 networks with different languages I guess the latent space? would be similar. And the differences could be really relevant to how we thought differently due to using different language

@PopeGoliath 5 лет назад

How is the size of the hidden layer chosen? Are there ways to calculate how big a layer is useful? Would selecting different sizes cause it to encode different data? In his example, if the hidden layer had 6 nodes, would it produce the categories of "noun, verb, adjective" etc, since that is likely the most descriptive thing you can do with so few categories?

@4.0.4 5 лет назад

I never thought the difference between "man" and "woman" could be so intertwined in a language! I can imagine it would make some people uneasy.

@Veptis 4 года назад

it's more than slightly surprising that you can explain this concept in 17 minutes, instead of going to a semester full of lectures.

@Rockyzach88 Год назад

Yeah but did you "learn" or just "understand while listening". Those are not the same things. Although, they may complement each other nicely in some cases.

@112Nelo 5 лет назад

What kind of news data set has the info that Santa says HOHOHO?

@RedwoodRhiadra 5 лет назад

One that includes all the "human interest" fluff that newspapers publish around the holidays...

@blakef.8566 4 года назад

You mean HO_HO_HO

@RafaelCouto 5 лет назад

Plz more AI videos, they are awesome!

@phasm42 4 года назад

The weights would be per-connection and independent of the input, so is the vector composed of the activation of each hidden layer node for a given input?

@TrevorOFarrell 4 года назад

Nice thinkpad rob! I'm using the same version of x1 carbon with the touch bar as my daily machine. Great taste.

@dialecticalmonist3405 2 года назад

Santa giving you "ho ho ho" was both terrifying and humorous at the same time. Wow.

@robinw77 4 года назад

When the room started getting brighter and brighter, I thought the rapture was happening xD

@flomo1743 2 дня назад

Can someone explain the compression of information from that hidden layer. Is it basically training of the model that allows the hidden layer to adjust and define meaningful weights to each node, that leads to it not needing the same length vector as the input vector to produce a meaningful output? Or am I missing something here. Also is there a way to conceptualize these say 1000 dimensional vectors or is it just something you have to go with ?

@unbekannter_Nutzer Год назад

@0:56 A set of characters doesn't have repetition and - in further not specified sets - the ordering isn't specified. So dom, doom, mod and mood map to the same set of characters, so a set is underspecific.

@Pasan34 2 года назад

Put 'politician' instead of Santa in the last example. Buy you a beer if it comes out as 'bleating'.

@23232323rdurian Год назад

you can easily predict = WITHOUT word vectors or machine learning, by just using a CONCORDANCE over a (huge) text corpus. Not nearly as sexy as neural networks, but it works fine, and EASY to understand. word A is similar to word B *precisely* because lots of people USED A and B in similar contexts in your corpus. drawback? sequential, very SLOW compiling of very SMALL corpora, compared to neural wordvec techniques. but it doesnt necessarily HAVE to be that way..... the CONCORDANCE approaches could be (?) parallelized too.....

@BenMakesGames 4 года назад

oh, that's so neat. if I'm understanding right, this model has 300 dimensions on which every word falls, to varying degrees... what are those 300 dimensions?? if you look at each one, and ask it to order all the words on each, are there any patterns? like "oh, this must be the 'size' dimension: it's got the universe on one end, whales in the middle, and quarks on the other"? things like that?

@RobertMilesAI 5 месяцев назад

I believe there's no privileged basis, there's no reason for the basis vectors to be particularly meaningful, the vectors only matter relative to each other. You could rotate the set of embeddings however you want and get the same results. The principal components might be interesting though

@MenacingBanjo 2 года назад

Came back here because I fell in love with the Semantle game that came out a couple of months ago.

@edoardoschnell 4 года назад

This is über amazing. I wonder if you could use that to predict cache hits and misses

@IvanBerdichevsky 4 месяца назад

Bro I feel like we're going to start using AI jargon as popular lingo very VERY soon. lol "Not in this dataset" , "Elaborate" , "You need to fine-tune yourself buddy"

@RikusLategan 3 месяца назад

deserted custard

@gideonocholi130 3 месяца назад

😂

@RikusLategan 3 месяца назад

This is funny now. Two years from now... reality

@APaleDot 22 дня назад

Chat, is this real?

@alisalloum629 2 года назад

damn that's the best enjoyable informative video I've seen in a while

@augustgames6502 4 года назад

Was hoping for a repo to pull the code from or similar

@siddharth__pandey 4 года назад

Came here from DeepLearning.ai specialization I was doing when I couldn't grab Word2vec.. Nice explanation

@carlossegura403 4 года назад

Wow, it is 2020, and I haven't used Gensim and GloVe in years - ever since the release of BERT and GPT.

@jellyboy00 4 года назад

I think this looks more like an auto-encoder than GAN

@cmilkau 4 года назад

It's structurally the same as an AE, but it's trained differently. An AE has to reconstruct its input, but this network needs to find closely related, but different words.