Hi, It's really a very clear and detailed lecture! Thank you sooo much! And could you please talk more about APAC learning and its sampling complexity? Thanks!
Thanks very much for the video. I still think the setting is quite problematic. By stating yi=c(xi), we are presuming that the true models are deterministic, rather than probabilistic. In other words, yi can be fully determined by xi. This is a huge unrealistic restriction in most applications.
11:41 Why are we considering all of the m points? He clearly said that this classifier correctly classifies the m points from the training data. Then, he looked at the probability that it will classify a random point (from test set) correctly. P(classifying random point correctly) = 1 - P(misclassifying random point) = 1-epsilon. Now, we want the probability it will classify all the random points correctly. And these random points should be from the test set. Why does he do (1-epsilon)^m? Where am I going wrong?
Yes. The classifier classifies all the m points in the training data correctly. But he wanted to show that given any classifier with error rate epsilon; what is the probability of correctly predicting m points?
@@mohammadsadilkhan1875 That means we can set m to n? Because we are talking about how good is our hypothesis when it comes to the real data set. If we have total n data points (out of which training set is made of just m points), we are talking about true error so we should take n, right?
Not sure I understand why the hypothesis class of planes in 2D has VC dimension 3. + - + can't be classified correctly by any plane. Am I misunderstanding the definition of shattering?
24:56 answers my question. The "classifier" gets to choose the set. Why does this make sense though..? Why isn't it the labeller that also chooses the set?
I have a question. How can shatter a positive and negative case that are located in a line. I mean consider a straight line where we have three points on it, + - +.
The points should not be shattered for ANY three points. There should exist three points such that however you label them, a classifier should be able to shatter it. Think of it this way. You and I are playing a game. I place the points(n). You label them anyway you like. I choose the classifier to shatter it. The maximum number of points (n) for which I can always win, is the VC dimension
Minor quibble: In the proof of error bounds, when you drew e^-epsilon and 1-epsilon, I think you've drawn the mirror image. They should slope up and to the left. I also think it seemed a bit arbitrary to point out that e^-x is greater than 1-epsilon. There are infinite functions that are greater than 1-epsilon. What made Leslie Valiant pick e^-x specifically, in the first place?
he answers that around 13:20, he is basically working on a finite space, think n-class classification, there is only n possibilities (classes) for the hypothesis.