Machine Learning and Artificial Intelligence are talked about a lot in the press. Yet they are rarely taught as part of undergraduate courses or in high school. Luckily there are fantastic resources available on RU-vid for the curious mind. This channel is an attempt to curate some of those courses. The courses I am listing have served me great in my PhD in Deep Reinforcement Learning.
As a math student who is interested in information theory and neural networks, i discovered this gem of a lecture series when i was looking for videos to fall asleep to! In fact i've finished the lectures when i was sleeping :D Now I decided to start it properly and just finished watching this lecture and taking notes. I would love to send David a mail when i finish the course. Thanks for leaving this behind my man, rest in peace.
I am a bit troubled about the result of the tickets lottery. I would expect to buy from all tickets from the all-0s ticket up to the (1000 choose 123) tickets to have 99% of winning (so it would rather be => sum of (N choose n) with n goes from 0 to 123). But here, the teacher only considers the (1000 choose 123) tickets which does not represent 99% chance of winning the game. I can't see where I failed to understand this result ... @Jakob Foerster ?
clipping reduces as it goes on, information content regained, i think maybe you shouts when you are nervous. more interstitial melodies, perhaps. the beautiful entropy
It's because the fist question is basically asking what the MSB is. For example, if the answer is yes, we know the binary number is: "1xxxxx", where 'x' is still unknown. The next question determines the next digit: "10xxxxx". The last question determines the LSB: "101010". I think the LSB (even or odd) is easiest to think about intuitively. If you ask if the number is odd, and the answer is "Yes", you know that the number in binary has a '1' as the LSB.
Applying entropy for this weighing problem is peculiar, since the entropy depends on how we are defining the states of the system. For example, for the first weighing, it seems totally irrelevant whether the scale tips left or tips right. So in this view, it would be preferable to set up the first weighing so that it is equally probable for the scale to tip at all as opposed to being balanced. This would indicate that the most informative (greedy) first weighing is actually the case where we have a 50% chance of leaving the odd ball out, which is to weigh three against three and leave six aside. However, in that case I think there may be a conflation between the physical state of the scale and the epistemic state of the balls. The correct approach to a greedy solution is to maximize the epistemic entropy, which I believe is achieved by the 4 vs. 4 weighing.
*Higher rates with feedback?!* Seems possible to me! 11:38 With a feedback line, to correctly send N bits, we need to transmit N*(1+f) bits on average. In other words, the average needed transmission bits per source bit is 1+f. As a result, the capacity of the feedback channel would be 1/(1+f), which is slightly more than (1-f) !! Am I the only one who sees it this way? What am I getting wrong here?
No, you have to send N * 1 / (1 - f) bits on average. Because the probability of sending "?" is f (i.e. the probability of sending the CORRECT symbol is 1 - f), by binomial distribution, the average bits you have to send is 1 / (1 - f), NOT (1 + f).
41:10 Transition from random walk to Hamiltonian Monte Carlo 57:39 Overrelaxation (Adler's and Neal's methods) 1:07:50 Slice Sampling 1:22:10 Exact Sampling
Lets make an extreme example, you have two code words that have equal probability, one is a="0", the other is b="1111111". If your message is abbaba, the output is: "011111111111111011111110". If you chose a bit at random from the output, you can see that choosing a "1" is much more likely, not because "b" appeared more, but because the code word for b is much longer.
isn;t he wrong in claiming the code words in 41:10 , are uniquely decodable? , say refer 41:10, a be 1 , b be 10 , c be 100 and d be 000 , isn;t it obvious a is a prefix of b and c , b being a prefix of c?
I'm new to information theory. At 4:24 why is the received signal is "approximately" and not "equal" to the transmitted signal + noise? like what else is there other than the transmitted signal and the noise? Thanks for sharing such helpful lectures.
It gets extremely difficult trying to model everything that affects the signal, but it's relatively easy to add noise to the signal, but not just noise but an assumption that the noise is additive (hence the plus sign), white, and gaussian( normally distributed). As you would know from statistics, a normal distribution has nice properties for estimation. This stuff gets really complex but I guess you get the idea. Note, we can make other assumptions about the characteristics of noise
Received = Transmitted + Noise makes a *lot* of simplifying assumptions! In the real channel, the noise might be multiplied, not added. The transmitted signal might go through weird transformations, like you might receive log(Transmitted), or it might be modeled as a filter which changes different parts of your Transmitted signal in different ways!