Wow amazing video! How does the coding loop know when the encoding has been finished, if we are iteratively sending the same frame's prediction back to the intra-prediction module? Also, how are the intra and inter-prediction modules working together? Since, they both will try to predict pixel values for the same pixels (albeit using different approaches)
The loop is running on a block base. So each frame is split into blocks and these are processed sequentially. Once all blocks have been processed, the encoding of the frame is done. The intra and inter prediction modules do not interfere. The encoder can try out all different modes but ultimately it will have to decide on exactly one mode (intra or inter) that is then signaled in the bitstream.
@@AdmMusicc I am not sure if there is something that covers this with code. But this is always a good starting point: github.com/krzemienski/awesome-video?tab=readme-ov-file#books
I looked very hard to see a video of this quality, thank you. Isn't techniques like RLE used before the enthropy coding? Do you have books, articles or other videos if we want to dig further?
Thank you! You are right. So in order to use entropy coding efficiently, the data that you push into it is usually preprocessed and ordered in certain ways. This is in some way similar to run length encoding where certain bins that are put into the entropy coding engine can mean "all of the following coefficients in the block are 0" (or something similar. I really depends on the codec).
Fantastic explanation of a complex topic. This is the best teaching material out there. Thank you very much, also for the interesting tool! Would LCEVC be a good follow-up video? To the best of my understanding, it introduces some new concepts. Probably there is already much more to say anyway. All the best for your predictable and unpredictable future frames of life.
Great to hear that it was helpful. Yes the list for follow-up videos is long. I would first like to go a bit deeper into details of video coding though.
Hi! Is it true that B-frames are not well suited for high-motion content? For example, if dynamic B-frames/look-ahead is used, is the amount of motion the deciding factior regarding the number of consecutive B-frames the encoder choose to use?
Hi. So this question is hard to answer because this is probably dependent on the codec as well as the specific encoder implementation. But in general for a "normal user" of an encoder this should not matter because the encoder will choose the best coding structure depending on the content that comes in (if the encoder is allowed to choose the coding structure freely). So there is certainly situations where B-frames are less effective (e.g. if the frames are very dissimilar which may happen for very high motion). But in those situations motion compensation in general is not effective.
5:27 Not exactly a fair comparison, since near-transparent audio quality is compared with medium-appeal video here. For transparent video and audio compression, the difference in compression ratio isn't that huge anymore. A big difference seems to be that audio quality below transparency quickly becomes unappealing (maybe partly because it's more densely filled with information we deem important?), while the same is not true for images or video, where we often don't really mind significant perceptual degradation in quality.
Hi! I am sorry if I offended any audio compression engineers here. I did not mean to say that audio compression is easy. It is definitely not. We can probably discuss all day what would be good/bad quality in video compared to audio and what is worse or comparable. But I think my main point still holds. I was just using very typical values from practical applications. But why they are typical is also discussable. Mostly I think that audio bitrates are low compared to video so that they are typically chosen higher then actually necessary because the main focus is on saving bitrate on video. But of course I also get your point this greatly depends on the application and what you consider good/transparent quality for video and audio.