The sum of a discrete probability distribution should always be 1, correct? For the distribution shown at 6:00, sum_x Q(x) = 1+epsilon which doesn't equal 1. What am I missing?
At 10:55, when replacing the P in the KL divergence formula, is not a "Q" missed in the second term ( Z part)?... but if there was a Q multiplied, can we still consider this term effectless in the maximization and ignore it?
Z is a constant and Q is a probability distribution. Therefore you can pull out Z and sum Q over x, which produces 1. That's the second term he writes down, the log(Z). Does that explain?