Great question! That is something that I should have addressed in the video, but anyhow, before you do anything you must first know whether or not you are overfitting. You can do this by comparing the cost with the training data and the validation data. If they are very similar then just leave your neural network the way it is. If the cost with training data was significantly better than with validation data then you are overfitting, and you can do several things in order to change this. The easiest approach would be to have an extremely large training set. This would result in an accurate representation of the overall population (all the data), and therefore you can easily find a good relationship between inputs and outputs. You could also slightly augment the data in order to, essentially, get more of it. For example, if you were trying to identify which number a certain handwritten digit on a greyscale (black and white) represented, you could rotate the numbers, or scale them, a small random amount and add another data point. However, if you do this too much, the Neural Network would find that the primary relationship was rotation or scaling between different pixels which compose the digit. Still, we can do more than that. You can also do something called "dropout". This means you randomly don't use certain neurons and their synapses; they are "dropped out". Here is a good illustration of it on a Neural Network model: cdn-images-1.medium.com/max/1800/1*yIGb-kfxCAK0xiXipo6utA.png Dropout serves the purpose of making sure that very complicated co-adaptations don't develop. Co-adaptations occur when many neurons are used to find a single relationship. Oftentimes, this relationship is an overly-complex one that is only present in the training data. A simpler way of doing this would be to reduce the number of parameters (# of neurons, hidden layers, ect.), however, this doesn't allow for a large Neural Network which, despite its susceptibility to unneeded co-adaptations, offers practical benefits for certain datasets.
@@hwhd Great vedio thanks alot. Is there a way to use the early stop technique? I mean keeping a sub sample of the data as validation set and while training the rest of the data checking if the error over the validation set is increasing at one point and then stop the training at that point. Would the code be very complicated if we do this? I personally have no idea how to incorporate early stop technique in python. Thanks
Your videos are AWESOME, I have not been able to find anyone anywhere that both explain the mathematics behind NN's and shows a build up from the bottom how to make a real NN using python. Hats off man! I had a bit of a rough time seeing how you made the maths into that chunk of code you used. In my opinion that could have been explained a bit more, but that might just be because i don't know the numpy lib well enough to understand what functions you are using xD Thanks again for the great video!