No video :(

Node Classification on Knowledge Graphs using PyTorch Geometric

DeepFindr

Подписаться 32 тыс.

Просмотров 35 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

24 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 119

@prajwol_poudel 2 года назад

Do you need to build the F.softmax in the final classification layer in the model? I think the torch.nn.CrossEntropy loss does this internally for us.

@DeepFindr 2 года назад

Yep that's right! There are two variants of Cross entropy (one with logits and one without). The regular one already applies softmax, I think I forgot that in this video. I'm used to this since it was not always included. :D

@elyseemanimpiregasana2117 2 года назад

@@DeepFindr I think we used cross entropy since we have multi-class classification and then softmax is the best activation function to produce probability of each class at the final layer since it assigns high probability to the predicted class.

@kvnptl4400 Месяц назад

Highly appreciate the effort. I really like how you started with the theoretical part and then worked on a real dataset. Thanks!

@MyGeoStats 3 года назад

Best Pytorch geometric tutorial forever. Thank you for save my research project

@Braininfection 3 года назад

Thank you so much. I watched so many videos about NNs and yours is one of the most impressiv so far. I know it's a very special topic but I hope that you will reach more people in the future! Your combination of theoretical aspects and practice is so helpfull. I have seen that you just started. I hope you keep it up!

@DeepFindr 3 года назад

Thanks for the kind feedback! :)

@zjy1716 2 года назад

Thank you very much for this great tutorial! The best on RU-vid

@simonwinkler2015 3 года назад

Never looked at it from that perspective, thanks man.

@hizircanbayram9898 3 года назад

Fantastic video! Thanks for this hands-on video. Subscribed!

@jff711 3 года назад

Very nice video, thank you!

@santiagoinfantino2368 3 года назад

Ohh dude thanks for posting this videos, i'll use this knowledge for my research

@santiagoinfantino2368 3 года назад

How are the BoW embeddings calculated here? I have my own dataset and would like to know how to create the embeddings for the node features :D

@DeepFindr 3 года назад

Happy it helps :) In that dataset they are pre-calculated, so I don't know in detail. But I would assume they are just based on word counts. So basically search your text and count the occurrence of every word. These counts represent your feature vectors then. Is that what you are looking for? :D

@El_Pancho_Alvarez 3 года назад

This is wonderful! Could you make a video of node classification for fraud detection?

@DeepFindr 3 года назад

Hi! Thanks :) sounds interesting. Sure can do that. However there are a couple of vids on the list I first need to finish :) Do you know a specific dataset for the fraud detection? Best

@wilfredomartel7781 Год назад

Nice explication!

@peterstrom8522 3 года назад

Nice! Thanks!

@DeepFindr 3 года назад

Sure, no problem. I hope this is what you were looking for :)

@riyajatar6859 2 года назад

Thanks for such a nice explanation. Could you also make some videos and notebook on inductive graph Neural Network. That would be great fun

@DeepFindr Год назад

I'll note it down :) thanks!

@umarmalhi3072 3 года назад

Thanks Man. it really help me!

@trangquyen4307 8 месяцев назад

Hi can you show how to save best model and predict it on totally new test set

@TorjusNilsen-bs5st 3 года назад

Softmax was used in the final layer but from Pytorchs page on Crossentropy loss "criterion combines LogSoftmax and NLLLoss in one single class" doesn't this mean we should skip the F.softmax in our forward pass as it's already included in the loss function? Also, in this setting all data was input at once. If the graph + featurenodes are too large for this, is it possible to train using batches of smaller graphs/adjacency matrices? Thank you for a good intro to GCNs!

@DeepFindr 3 года назад

Hi! Yes at some point softmax was included into cross entropy loss, I didn't know it when I made this video. So it's not needed anymore :) If you have a large graph that doesn't fit into memory, there are generally two approaches: 1. Use sampling (checkout cluster GCN or neighbor sampler in PyG) - this is the recommended way 2. Divide your graph into a couple of smaller graphs if possible - batching is supported by Pytorch geometric. However I think an implementation for a shared edge matrix is not available. Hope that helps

@TorjusNilsen-bs5st 3 года назад

@@DeepFindr Yes thanks! I have a project with high feature image data, and a way to generate a graph representation from the images, in a semi-supervised problem. Wanted to try graph convolutions (out of interest, of course not much can compare to CNNs for computervision) but realized GCNs lack generalization across graphs/laplacians for batching, so currently trying if it can work for node classification (pixels) with batched images. Do you think it could be possible to train a CNN supervised to create some kind of embeddings with a graph structure and use that to classify nodes semi-supervised? Sry for a very general question but I haven't been able to find a lot of discussion on this as GCNs are relatively new.

@DeepFindr 3 года назад

Hi, if I understood you correctly you try to convert an image to a graph. You might want to check out that blog post which also talks about graphs in computer vision: medium.com/@BorisAKnyazev/tutorial-on-graph-neural-networks-for-computer-vision-and-beyond-part-1-3d9fada3b80d If you want to use node embeddings for predictions you maybe could use a couple of CNN layers on fixed grids of an image, and then use the corresponding grids as nodes and the activations as node features. E.g. Divide your image into a 5x5 grid and then apply the filters only on each section of the grid. Then you would have 25 nodes which are connected according to the neighboring in the grid. And then you could perform message passing to learn about the rest of the image. I don't know :D actually just a trivial idea. What classification would you perform on each of the nodes? Object detection?

@TorjusNilsen-bs5st 3 года назад

@@DeepFindr Haha nono I appreciate it :D My problem is semantic segmentation, i.e. classifying each pixel of an image. I was however interested in GCNs so wanted to try including it. The two possible ways I considered was: 1. Converting images to a graph directly(of which i have a method), and training a GCN net similar to CNN. Problem here is each image will have a different adjacency/graph Laplacian and learning on one subgraph/image might not generalize well to others so it could be hard/impossible to train weights across batches. 2. Similar to what u mentioned, i considered training some CNN layers supervised and trying to create some graph structure from their embeddings (or superpixels) to see if it can somehow boost learning.

@DeepFindr 3 года назад

@@TorjusNilsen-bs5st hi, yes the first approach sounds a bit sparse to me. But regarding different adjacency matrices, I wouldn't consider this a problem. Because if you use molecules for instance, there might be ones with hundreds of atoms and very different adjacency matrices. But what would be the node feature information for the first case? The 3 color channels? Would you treat each pixel as a node? Thanks :)

@ShaliniRamnath 3 года назад

Thank you BOSS !!

@Fdutchman Год назад

Danke schön :)

@user-rh7jb7pf7q 3 года назад

Thanks so much!

@aaff8573 Год назад

Can GNN explainer capable of handling heterogeneous graph structures?

@i_shy 9 месяцев назад

Hello, is normalization of node features necessary?

@deadliftform4920 2 месяца назад

best

@bryancc2012 3 года назад

detailed content! really appreciated. this is still more like supervised training, with the existing class information to train and predict. if we only have citation information and the bag of word vector, (no class information at all), can we cluster the papers?

@DeepFindr 3 года назад

Thanks! Yes this particular video is supervised. In the unsupervised situation we don't have the classes. Therefore another approach for optimizing the embeddings is required. Probably you are already familiar with these papers: paperswithcode.com/task/graph-clustering. The first one (Attributed Graph Clustering via Adaptive Graph Convolution) is actually also applied on the Cora dataset, where they try to partition the nodes into clusters. They use a graph filters / a k-order graph convolution, so basically another (unsupervised) layer type to generate the embeddings. These embeddings are then used to perform the clustering. I hope this is what you where looking for :) You probably will have to create the new layer by yourself, if you need help I can also make a video on that!

@JorGe-eu3wi 3 года назад

Great video! Can you show what the code would look like when using a dataset with more than one graph and how to use certain graphs only for training and others only for testing. Thank you!

@DeepFindr 3 года назад

Hi :) in part 3 of the GNN series I uploaded I show an example for molecules. Hope that helps!

@emreipek4485 6 месяцев назад

Hello sir. I have a question. In CNN training, Convolutional layers (kernels) is trained while training in order to provide better feature extraction of convolutional layers. Is GCNConv layer being trained while training? As I understand, GCNConv layers just provide passing messages of nodes and aggregations. In this way, there is no parameter to train in GCNConv, am I wrong? I guess, only linear layers is trained over training. Can you enlighten me please?

@DeepFindr 6 месяцев назад

Hi, there are also weights in GCNConv. For more details on this have a look at the Graph Attention video on my channel, where I discuss this in detail :)

@emreipek4485 6 месяцев назад

@@DeepFindr Thank you sir :)

@YeeYes 2 года назад

Wonderful Video! I have a question, could you please tell me why train mask is very small compare to test mask?

@DeepFindr 2 года назад

Hi! The size should always be the same. Or do you mean the number of ones in the mask?

@DeepFindr 2 года назад

How many ones are there in each of the masks?

@user-te2mu9kb6n Год назад

great video ! how do i then convert these predictions back to a graph to visualize it?

@lolokay7044 2 года назад

Thank you so much for this, super helpful. Is there one on graph regression in gnn as well?

@DeepFindr 2 года назад

Hi! I haven't uploaded a video on graph regression yet, but changing from classification to regression is very simple. The GNN architecture stays the same, only the output layer needs to change (e.g. A single value per node) and you need to adjust the loss function to e.g. MSE and that's it :)

@lolokay7044 2 года назад

@@DeepFindr Oh, thank you! Do you have any particular dataset in mind that I can work on for graph regression?

@DeepFindr 2 года назад

Unfortunately there is no node-level regression dataset available in PyG and other libraries :/ If you want to perform regression in order to "practice" you can of course take any binary node-classification dataset and simply treat the class labels (0 and 1) as float values and perform node-level regression. Another thing are for example traffic networks (predicting the travel speed for nodes), which is also a regression problem. It however also requires to consider a temporal component. I'm currently working on a video on that. :)

@lolokay7044 2 года назад

@@DeepFindr Thank you so much for the replies, really appreciate your work. Looking forward to the video!

@cat-cu1cx 2 года назад

Thank you for this series! When you say 75% nodes are predicted correctly, that only includes within the 140 labelled datasets right? As we have no way to validate the rest of the nodes?

@DeepFindr 2 года назад

Yes exactly :)

@leotrisport 2 года назад

Hi the idea of sending the complete graph with some masked nodes is what they say transductive setting ? It’s safe to assume that if we need another graph we need to train the whole thing again?

@DeepFindr 2 года назад

Hi! Generally larger graphs can only be trained in a transductive setting. There exist other layer types like GraphSAGE that are also able to train inductively. If you have many smaller graphs you can of course also train them inductively. I've also seen inductive variants of GAT for example. So there are certainly workarounds :)

@kk008 Год назад

now if I want to get deep dive into these predictions to understand the decision making process, which GNN explanation model should I use or any specific technique?

@DeepFindr Год назад

Hi, I have some videos on XAI on graphs. I would look into GNN explainer as it is already implemented in Pytorch geometric. There are some other libraries as well, such as dive into graphs

@kk008 Год назад

@DeepFindr thank u so much. I will also look into ur XAI videos. I'm really grateful.

@kk008 Год назад

I have mailed you with some queries also.

@Missbeautyqueeb 2 года назад

How would you calculate the AUC score for this one? and plot a ROC curve? So for a node classification GNN

@DeepFindr 2 года назад

Hi, the easiest way is convert the predictions and labels for each node of interest (after applying the test mask) to two arrays and feed them to sklearn's ROC-score or something like that. Let me know if this helps :)

@jonimatix 2 года назад

Im really enjoying learning about GNNs, so thanks a lot for your video, so keep posting :) One question, how is pytorch geometric different from GraphSAGE?

@DeepFindr 2 года назад

Thanks for your feedback! GraphSAGE is just one implementation of a GNN layer. It's especially intended for larger graphs as it uses sampling while training the model. Besides GraphSAGE there are many other layer types like Graph Conv, Graph Attention, Graph Isomorphosm... All of those are available in Pytorch geometric including many more, and GraphSAGE as well (it's called SageConv in PyG). Hope that answers the question :)

@jonimatix 2 года назад

@@DeepFindr Yes thanks a lot! Do you think you can create a tutorial on how to create a GNN from scratch (including data set preparation, etc.) for product recommendations? That would be really helpful to further understand and apply a problem to solve with GNN. Thanks

@DeepFindr 2 года назад

Hi:) actually something similar is already there. In my video series GNN Project I talk about all of this e.g. Dataset creation in part 2. But it's not for product recommendations unfortunately :/

@itprime2399 Год назад

First of all, thank you so much Sir for such an amazing lecture on GNN. Sir, I want to extract every class data in the form of a graph after node classification. How can I do that?

@DeepFindr Год назад

You mean how to convert your data to a graph dataset?

@MdFarhan-gh7ez 2 года назад

Great video!! How do we install pytorch geometric packages for cuda version 11.3? I see that the colab notebook in my system is showing the version 11.3. Thanks!

@DeepFindr 2 года назад

Simply select your version here and give it a try: pytorch-geometric.readthedocs.io/en/latest/notes/installation.html If you have conda installed on colab, things have become much easier with PyG. :)

@chandrasutrisno 2 года назад

Does GNN sensitive to imbalanced dataset?

@DeepFindr 2 года назад

Yes, it's the same as with all other neural networks. But you can try weighted loss functions to improve the learning.

@AI_ML_DL_LLM 2 года назад

Hi sir, when you mask out node A from the training, does it mean that other nodes will not see it at all in the training and massage passing (both features and label of node A are totally ignored, as if it doesn't exist)? or the features of node A will be passed on to other nodes but in the loss function the label of the node A will be ignored? thank you again

@DeepFindr 2 года назад

Hi! The mask is just applied to the loss. This is also called transductive learning, because the model has seen all node embeddings during training. If you want to train inductively, you need to use layers like GraphSAGE which will sample parts of the graph :) Best regards

@AI_ML_DL_LLM 2 года назад

@@DeepFindr many thanks, it was a great help

@awadelrahman 3 года назад

at 3:30 I think you meant bidirectionally instead of uni-directionally?

@DeepFindr 3 года назад

Oh yes, absolutely :) I see you are a very attentive viewer, thanks!!

@awadelrahman 3 года назад

@@DeepFindr Great videos ! this is why :)

@alizindari4044 3 года назад

Hi thanks for your great series on GNN. just a question. I have a case that I don't need the label for each node. i want to get the final feature vector of each node after training (in both training and test nodes) after applying forward propagation. can you please tell me how to do it? because I need to compare these new vectors with feature vectors before training.

@DeepFindr 3 года назад

Hi :) do you mean that you want to access the learned representations instead of the actual prediction? In the forward function you can simply return the prediction and the propagated node embeddings. So basically calling your model would return pred, embedding = mymodel(inputs) Did I understand it correctly? :)

@DeepFindr 3 года назад

This way you can still train your model with the predictions and the loss function but also have access to the final embeddings

@alizindari4044 3 года назад

@@DeepFindr HI I tried the code you said but it just returned one argument that is the final prediction for each node in the shape of (2708,7) but the final embedding should be (2708,1433). am I making mistake?

@DeepFindr 3 года назад

@@alizindari4044 hi! Can you please send me a screenshot of the code to deepfindr@gmail.com? :) I'll reply via mail then. Thanks!

@johanaabizmil955 3 года назад

Very good explanations ! If I want to try it on my own text dataset I have to build a graph with geometric.transforms right? Do you know a doc / tuto for that ?

@DeepFindr 3 года назад

Hi! In the documentation there is one page how to create your own dataset: pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html Is this what you are looking for? :)

@johanaabizmil955 3 года назад

@@DeepFindr Yes it is helping me but I still don't understand where I can specify that I want a word as a node for example

@DeepFindr 3 года назад

I also found this github project. Take a look at the python notebooks, he is also creating custom datasets there. github.com/khuangaf/PyTorch-Geometric-YooChoose If it doesn't help, let me know and I will make a video on how to do that :)

@DeepFindr 3 года назад

Especially take a look at the YouChooseDataset he creates :)

@johanaabizmil955 3 года назад

@@DeepFindr Thanks I let you know :)

@inFamous16 2 года назад

Hey, Can you please make a video on how to perform Node Classification on custom datasets? I currently have a csv file containing source_node, relation and destination_node. How to generate node_vector for the custom datasets?

@DeepFindr 2 года назад

Hi! I have to recent videos on that topic - how to convert a tabular dataset to a graph. :)

@inFamous16 2 года назад

@@DeepFindr I have loaded csv into Neo4j and able to access KG. But not understanding how to generate node_vector? As CORA dataset is provided with node_vectors, which is generated using BOW model. But isn't there any way to calculate node_vector for custom KG when we don't have any dictionary to generate BOW?

@DeepFindr 2 года назад

Each of the elements in your Knowledge graph should have attributes - out of which you can build the node vector. If they don't have properties you can't build node features. The BOW nodefeatures are based on the publications in Cora. Therefore those are also simply properties of the nodes

@inFamous16 2 года назад

@@DeepFindr ok sir.. thank you for your valuable time

@siddhantmathur7319 3 года назад

Hey, I have a conventional csv training dataset. Can u tell me how to integrate it with pytorch_geometric so that rather than using the features of already present datasets within the library, I can create my own custom datasets and apply GNN on them for further classification ?

@DeepFindr 3 года назад

Hi! On the documentation website there is also a section on how to create a custom dataset: pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html Also you would need to connect your data points in the csv somehow. That means you need to create the adjacency information. Let me know if you need further help :)

@siddhantmathur7319 3 года назад

@@DeepFindr Exactly! I am not able to connect the data points in csv to produce further adjacency info like data edges, attributes etc so that i can further load them in the format from torch_geometric.data import Data, DataLoader data_list = [Data(...), ..., Data(...)] loader = DataLoader(data_list, batch_size=32)

@DeepFindr 3 года назад

@@siddhantmathur7319 I think I should make a video on this in the future. Further down is another comment with the same question - I linked an example repository there in which they go from table to pytorch geometric dataset. Hope that helps :) otherwise let me know and I will make a video in the next couple of weeks!

@giannismanousaridis4010 3 года назад

In GCN class after calling super() you 've written "torch.manual_seed(42) " is this used ?

@DeepFindr 3 года назад

Hi! Yes I would say so (implicitly). Whenever there are random processes (like random weight initialization in a neural network) the seed ensures that the numbers are reproducible.

@DeepFindr 3 года назад

You could also place this after import torch :)

@giannismanousaridis4010 3 года назад

@@DeepFindr thanks for the explanation

@ry_zakari 3 года назад

@@DeepFindr Hello, Thank you so much for these wonderful tutorials, I would like to talk to you in private; I'm currently a PhD student: my mail id: rufaig6@gmail.com

@mhadnanali 3 года назад

Make playlists, please. I visited your channel and the videos are scattered. You will get more views.

@DeepFindr 3 года назад

Actually I have Playlists for the different topics. Somehow the one for GNNs doesn't appear. I have to check. Thanks for the hint

@mhadnanali 3 года назад

@@DeepFindr that would be nice. its difficult to keep track. sometimes you mention in the video "I explained in the previous video" and viewers have no idea that which one was the previous video. Maybe you can add links in the description. Your content is great. I hope you get more views and be motivated.

@DeepFindr 3 года назад

Hi! Sure I understand that :) Yep I should work a bit on the descriptions.

@DeepFindr 3 года назад

I just realized that the GNN Playlist was set to private. So now it should be in chronological order. Thanks again for your comment

@mhadnanali 3 года назад

@@DeepFindr I just saw. that seems better. Thank you for the content and your work. more power to you.

@user-bl8hi7je1z 3 года назад

Hi again ,could you please guide me how can i convert image to graph representation, using python ,i used by matlab and i used sift to extract feature and another functor for extract edges .i have lunges data set and i want to extract it as graph ,my target to study the spread of deases in this area either erosion or dilation, i really appreaciate you alot in advance🙂

@DeepFindr 3 года назад

Hi. I have never done that before but intuitively I would write a function that creates a custom dataset for Pytorch. Here you would pass #pixles times the extracted node features as the node feature matrix. That means you re-arrange the shape of your image and add the node features. So x would have a shape of [pixles, node features]. Then you have to add the connectivity information somehow, the only way I can think of is to connect each pixel to all the surrounding pixels. For pytorch geometric you can create a COO format list of connection pairs. Best regards :)

@user-bl8hi7je1z 3 года назад

@@DeepFindr thanks alot for your quick respond,i have being learned from you alot.regarding to your answer,do you mean you dont normally used images as graph representation or you just used build in database .becouse am searching for images for different test experiment either medical or normal .for that i think to convert the rgb images to graph.

@DeepFindr 3 года назад

@@user-bl8hi7je1z I meant that I never tried to convert images to graphs. Maybe you should try to use Convolutional Neural Networks instead, as they are better suited for images? Or do you want to do some fancy stuff with graphs? :D

@user-bl8hi7je1z 3 года назад

@@DeepFindr this is the idea to see how graph with behave with cnn .thanks again

@DeepFindr 3 года назад

@@user-bl8hi7je1z ok yeah sure. Sounds interesting :) but it should be pretty easy to convert the images to graphs. You just create a feature vector for each Pixel and this vector could for instance contain the color values (RGB). For the Adjacency info you then simply take the surrounding 9 pixels (with exceptions for the borders). So you have a node feature vector of size [# pixels x Node features (RGB)] and an adjacency matrix of size [# pixels x # pixels]. I would write a function im2graph or something that does this for each of the images. Cool project! I would be interested to hear about the result :)