That I didn't understand it is more about me and my experience than it is about his explanation. I didn't understand it, but I have upvoted it anyway because it was interesting.
Hi Siraj, I've been waiting for this paper. What a pleasant surprise to learn it has been published only few days ago! I have downloaded the paper and will read it over my second, third, and fourth cup of coffee. You've done an excellent job presenting this very complex topic.
the rewards are the videos, the code and the knowledge he shares here. Man there are people that play fking video games and they receive thousands of dollars in donations for that. Stop complaining please, this is useful knowledge worth more than a dollar/month.
It''s incredible that after so much has been done on a dataset like mnist you can still get state of the art if you come up with something clever. In short a capsule network adds a third dimension to the network shape. cool.
I am so very grateful for your efforts to deliver all of this information. you're a very good educator. the way you explain these complex solutions is very demystifying, easy to understand, and showing it in practice to validate what we came to understand thanks to you, gives a sense of success which is inspiring. it hasn't been long since I began watching you (2017 summer) but your passion for discovery and success in uncovering (teaching about) these developments has been an enlightening experience! thank you!
6 лет назад
Keep it up Siraj. To me you are like "La Mouche du Coche" to me, energizing my will to carry on with the subjects that matters to us. Thank You
Just stalked your LinkedIn profile, you seem to have a passion for teaching. I still wonder why haven't you pursue a PhD yet? Going to Stanford and doing the project with Andrew Ng in his Google Brain project sounds like so much fun for people like you. Anyway, thanks for the video. Even if you decide to continue your study or doing something else out there, please keep making useful educational videos like this. Thanks :).
To have a chance at the core of the Google brain team, you probably need to produce three or more sequences of work that beat some non-trivial state of the art in a huge way.
You are wrong. they did test it on CIFAR10 with less promising results (~10% when SOTA is ~3-4%)). But this is not that important. They clearly state at the paper that this is not suppose to be a fully formed amazing new architecture but: "There are many possible ways to implement the general idea of capsules. The aim of this paper is not to explore this whole space but to simply show that one fairly straightforward implementation works well and that dynamic routing helps."
It would be interesting to see the results of a dynamic routing capsule model being attacked by the pixel attacks at 1, 3 or 5 pixels as done in the paper you mentioned and how it fairs against CNNs
Guys just to be clear the image of the Neural Net at 6:44 is not cropped. It is the original image that is in the paper. The publishers themselves published a cropped image by mistake. (I find it quite funny actually)
Hey Siraj, Thanks for the video! A note: advantage of CNN over MLP is not the computational complexity, but statistical efficiency - we use "translational symmetry" in the image, teaching the net that e.g. an eye is the top of an image is the same thing as an eye in the bottom of an image.
Siraj, absolutely love your videos and am incredibly impressed with how fast you can get videos out on novel concepts. If you'll allow me one critique, I do think that your videos would benefit if you spoke slower, especially during the sections where you are explaining code. Quite frequently, I slow down that part to .75x so that my brain can absorb the connection between your words and the code I am seeing. Keep up the amazing work!
He was talking about capsules for quite some time now. I think it's still not the final solution to equivarianc, but a small step in the right direction.
Man, I definitely give you props for the change in your video style, it's still you, but its now a lot easier to understand and follow. Quick question, if I may. Do you think capsule based neural networks could be a way to crack down with some of the issues of 3d generation with conv nets?
Not that I don't like the content or anything, but not mentioning the first author at all is absolutely not fair in terms of attribution. This is regardless of who had the original idea. Someone actually did the work to make this paper happen and she deserves credit for that.
Hi, siraj, thanks for some precise summarization for concepts. I would recommend that maybe you could set an estimated knowledge level of potential audiences for each video and try to explain your ideas for different levels of audiences to avoid mixing easy and difficult stuff together.
Hinton's paper sounds quite similar to "Network in Network" by Lin et al, 2013, arXiv. "Network in Network" like Hinton's paper: 1) Captures abstractions in nested neural bundles, and is less susceptible to overfitting than prior works. 2) Uses "global average pooling", but does so over the classification layer, and not per capsule or neuron bundle as in Hinton's paper. arxiv.org/abs/1312.4400
Some correction to the difference between AlexNet and VGG: The author of AlexNet wasn't a computer science guy so he relied on his sharp intuition a lot. AlexNet, while very significant on it's own right, is very arbitrarily put together. VGG is a network that was made by computer science folks, it is very ordered, has more layers with a consistent layout and is much simpler overall in its structure. VGG is still often used in DL research. Besides, the number of neurons for each layer is 2^x, where x is an integer, suggesting that it corresponds to the number of GPU threads (different versions of VGG for different GPUs). Also, it's worth mentioning that GoogleNet doesn't use full connected layers in the end, it's purely convolutional. It's problematic in DL research because it works well but nobody really understands it. ResNet was a very deep network of 152 layers, and by theory shouldn't work at all, but I don't know the exact details.
Siraj, don't you think that scoring better and better on MNIST is a bad target? A 100% accuracy wouldn't make any sense because there are quite a few digits in MNIST which are genuinely ambiguous. Why should new models achieve a rate much higher than what the SOTA is? Shouldn't we move on to more serious baselines?
I've thought a lot about this before, and I've seen some of the digits you are talking about. The digits are ambiguous to you (and me), but obviously they aren't ambiguous to the algorithm. The question is resolved by finding out whether the classification correctly represents the original intention of the person who wrote the digits, and it is reasonable to assume that their intention is correctly reflected in the 'y' target. I've had to come to the realisation over the last 15 years that some of the algorithms I've put together are simply much better at the task I've set them than I ever would be. Not just faster, but more accurate. In fact, my current test for when I've perfected an algorithm is when I am repeatedly convinced that the system has gotten it wrong, but on investigation I'm wrong.
Durand D'souza from watching another talk on capsule networks, it seems that "state of the art performance on MNIST" in this case doesn't mean "higher accuracy", but rather "the same accuracy with less supervision". It's not that it's trying to get 100% accuracy, but instead it's getting similar accuracy to previous models, but only requires a fraction of labelled data compared to them. This is really helpful because for a more complicated problem, getting a large amount of high quality labelled data can be a real issue, so if we can get similar accuracy with lots of unlabelled data and a small amount of labelled data, that seems like a serious win.
Thanks for your videos. what you are doing is amazing. a small request can you make a live video on recommendations system or market basket analysis like apriori. thanks a lot in advance..
Hey Siraj am a huge fan of your vids. You are doing an awesome job with your lucid explanations. I am quite new to machine learning (deep learning in particular). Is there any particular order you would recommend to go through your videos in so as to get a comprehensive outlook of the content? Also I heard one of your videos wherein you were talking about the intersection of AI and blockchain in the creation of DAOs. I am working on that right now. It's truly inspiring to see your enthusiasm.. Hoping to see more videos on blockchains and DApps from your channel :) Once again thanks for all effort!!
I had a question if CNN does not provide spatial correlation, that would be because of using only one type of convolutional matrix(3x3 or 4x4), but Inception v3 uses all 2x2, 3x3 and 4x4, that can capture that eyes are above the nose. Does inception model also fail to capture spatial correlation?
So do the capsules store orientations of objects? I reckon the way humans recognise is like this: First we might see some features then we guess the object (also using context). Then we see what other features the object should have and where. Then we look to see if those features exist where we expect them to be. And if we don't recognise a feature we might look at sub-features and so on. Going up and down the hierarchy until we can say "ah, that's a five legged dog with carrots for eyes."
Siraj what do you think about using GO language in ML (and AI in general), do you think it can take over Python in this field when there are more libraries available?
I would argue that this is more or less what Numenta are working on for a while now (old stuff). Maybe you can point out some diferences I didn't notice?
Hi guys, the owner of these codes is still updating his work, so if you are interested in this work, please go to this repo: github.com/naturomics/CapsNet-Tensorflow to get the latest update!
Hey Siraj great videos like always. But there is one thing i do not really like about your content is that every time you start explaining the basics again. Now i know i can skip it but i just feel it is not required. For example anyone reading up about capsule network already knows what a convolution network is. So maybe skip the part of explaining that . :-) ... Anyway ciao ... keep making great videos :-D
What are the weaknesses of this model? I assume that because it maximizes a prediction and maps to a specific entity(capsule), it recognizes only one class for each image, right?
Great Video! You said that one big problem of some NN is when the image is shifted or displaced, rotated etc. Do you think this new technique can "interpret" CAPTCHAs?
Hey Siraj, could you please send a link to the webpage that you were using to demonstrate in this video? I couldn't find it in the description. Thanks!
One has to give credit to the original author of open-source implementation. Dig deeper and you will find that this is not a scalable architecture due to primitive and inefficiency in dynamic routing algorithm, however there is a new routing - em routing which might improve the routing technique!
Please take some extra time explaining the capsule networks more in-depth, you spent only about 5 minutes on them but about 15 on regular CNNs. Thanks for the video though!
Siraj is there any guarantee that CapsNet leads to better overall performance in Deep Q network? Can apply it in deep reinforcement learning? What is your intuition?
I didn't see the History of Object Recognition Infographic in the above description. In case anyone else is looking for it, try here: github.com/Nikasa1889/HistoryObjectRecognition