OUTLINE: 0:00 - Graph Neural Networks and Halicin - graphs are everywhere 0:53 - Introduction example 1:43 - What is a graph? 2:34 - Why Graph Neural Networks? 3:44 - Convolutional Neural Network example 4:33 - Message passing 6:17 - Introducing node embeddings 7:20 - Learning and loss functions 8:04 - Link prediction example 9:08 - Other graph learning tasks 9:49 - Message passing details 12:10 - 3 'flavors' of GNN layers 12:57 - Notation and linear algebra 14:05 - Final words
From the correct level of mathematical precision, to the pedagogy of the content and up to the voice of the speaker. It all fits like a charm. Chapeau!
What an amazing video, I’m subscribing for sure!! And definitely checking the rest of your video. I always struggle to learn from math to concept, but your approach is inverted in that regard, and works so well for me!
I am curious how you'd represent the data for things like occupation and interests in a neural net. If you have a guaranteed range, numbers can be normalized to 0-1 (e.g. pixel data), so that is simple enough. But it would seem to me that One-hot might lose or slow down "A is similar to B, but dissimilar to C", but I have no practical insight to that. And now for a little snark: awfully convenient that Andy is bi, every guy in his social network is either gay or bi and every girl is straight or bi. He's the main character of an LA musical, isn't he? ;P
Great point! While doing this video I thought someone would ask about these variables. You're right - usually it'll be much simplier if the input features are ordinal or numerical. Otherwise, we will try one-hot encodings, which as you mentioned might cause problems related to sparisity. A common alternative is to use interger (or label) encodings where each category is given an integer. Another alternative is to just allow the network to learn the encoding themselves through a learned encoding. See here: machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/ In practice we usually have to run experiments with these choices to determine which is optimal given the data distribution :) And hahaha I meant for the network to just show their friendships rather than their preferences 😅😂
Amazing how you managed to include so much information in a relatively short video without compromising the depth of explanation. Subscribed and hoping for more content in future.
Thanks Alex Foo, for such great content. I am working on MultiVariate Time Series Anomaly Detection using GNNs, Transformers, and GANs, do you know of any resource where I can start? I searched a lot but couldn't find anything other than papers, which are not that much useful. Thanks
Ah yes multivariate time series anomaly detection is pretty specific so there might only be papers currently, but you could check out this interesting GAT paper (arxiv.org/abs/2106.06947) with code (github.com/d-ailin/GDN ) and this transformer based method for forecasting (towardsdatascience.com/multivariate-time-series-forecasting-with-transformers-384dc6ce989b)
0:40 "more than five years" - yep, that checks out. Artificial Neural Networks go back to 1943, according to Wikipedia. Natural Neural Networks even further back. en.wikipedia.org/wiki/Artificial_neural_network
According to this guy, Neural networks are around for 5 years... I took a course in ML almost 15 years ago, and GNNs were a follow-up topic back then. If the author makes such a obvious mistake in the start of the video, what is the rest of the video actually worth?
still not entirely clear to me. The main questions I have is. 1) What constitutes a training sample? In a Convolutional Neural Net that would be a particular image, and the training set typically contains millions of them. But here somehow, we only have a single graph (you can't have multiple because this would change the architecture itself). So are you using the same training sample over and over again? And 2) How do you know how many layers of Message Passing you have to do, how do you know this process even converges?
Really a great job. As i was banging my head. Now understood it well in overview. If possible can u please make elaborative video on message passing and KGCN Knowledge Graph Convolution Network
Overall good video, thanks. It is excellent but the weakness is the part where it discusses how the embeddings are generated after the message passing is done. That point about the embeddings went by to fast for me and some more details and explanation on that point would help. Thanks again.
great video, but i have one question: if the GNN was Directed instead of undirected, how would a nodes message be aggregated if the node was one from the edge of the graph (doesn't have any incoming messages), or would the nodes message just be a constant (or unchanging)?
12:18 I do not understand: The weights in the convolutional flavor are fixed based on the structure of the graph...? What does that mean? So they are not learnable? To my understanding they are independent of the structure and just the same for every connection, but not fixed. Do you meanthat they are *shared* over all positions in the graph? 13:51 Why is the weight matrix dxd and not more generally dxk for a hidden size k ?
you recorded that video in an elevator, right? constantly there's elevator music from a shopping center during the entire video. be more careful next time
This is great, but I don't understand how age etc. is transformed into a number and then calculated with weights etc. Also, what about when the data is e.g. text?
to me it seems if you make too many rounds of message passing, the value of all people's representation will tend towards a unique value. is that the case in practice?
It may be because I'm inexperienced with neural networks, but what does it mean with categorical data to be multiplied and added together, eg how is 0.25*doctor+0.8*scientist actually represented in the network? Is it one-hot encoded or something else?
Great question! Usually it'll be much simplier if the input features are ordinal or numerical. Otherwise, we will try one-hot encodings, which might cause problems related to sparisity. A common alternative is to use interger (or label) encodings where each category is given an integer. Another alternative is to just allow the network to learn the encoding themselves through a learned encoding. See here: machinelearningmastery.com/how-to-prepare-categorical-data-for-deep-learning-in-python/ In practice we usually have to run experiments with these choices to determine which is optimal given the data distribution :)
I’ve been trying to give feedback on every SoME1 submission I watch, but I just don’t have much to say about yours since it’s mostly theory. It’s well made and everything’s reasonably clear, so over all it's pretty good. My only real suggestion is to add examples to some of the sections. In Message Passing, it was a little unclear how messages were aggregating data, and since the envelopes just changed color, I didn’t get what was happening the first time I watched this section.
Agreed! I did consider adding in examples to parts like message passing to make things clearer, though I eventually decided not to as it seemed to distract from the introductory objective of this video. Might consider going further in depth for future videos :) thanks so much for taking the time to watch this so closely and for the thoughtful feedback!
gosh. How is such a high quality resource languishing in such relative obscurity? ~1k subs, 50k views. Maybe RU-vid's GNNs need some tweaking. @AlexFoo do you have a Discord?
Such a great explanation for GNN. The examples are easier to understand so that I could clearly get the concept right!! Thanks for the wonderful video!!
This seems like it is 5 years behind other areas in progress. Has anyone tried applying transformer-like architectures to graphs? Looking only at immediate neighbours seems like a major handicap, allowing to attend to all nodes seems like an obvious improvement.
Ah yes so far deep learning on graphs have a large problem related to collapse, where every node learns the same vector of values. The progress in this area has been recently novel due to this problem, and several other sensitive problems listen in this blogpost by Michael Bronstein towardsdatascience.com/do-we-need-deep-graph-neural-networks-be62d3ec5c59