Hi, thanks for this video. Now I know why my classifier always predicted with such high confidence, be it correct or incorrect. Could there be something else other than temperature to solve this? I would like to determine how confident the model is in its prediction. Is temperature the way to go?
Another technique is called label smoothing. It is related but applied to ground truth labels. See - proceedings.neurips.cc/paper/2019/file/f1748d6b0fd9d439f71450117eba2725-Paper.pdf Also there is something model calibration but I have not yet applied them to neural networks.
Instead of using using exp function in softmax to make logits positive what if we shift the logits by least logit value [1, -2, 0] => [3, 0, 2]. This also ensures relativity between logits.
@@krp2834 The min isn't differentiable, but it's still a differentiable function at other points. But if you do that, the minimum value will be guaranteed to always have a "probability" of zero. That may not be desirable... It also will prevent you from using loss functions like KL Divergence or Cross Entropy. Also, they will not be "logits". I suggest you review the definition of logit