Love your content :) I have one small question.. At 4:10 Spill memory is of 137MB and Spill Disk is of 77.2MB. If 137MB is spilled from memory why only 77.2MB is written in disk? Shouldn't it be 137MB? Can you please clarify this?
Data written on disk are serialized and the data in memory is in deserialized format. Thus the amount will be less on disk. This is majir tradeoff when you are reading data from disks. Please make sure to share with your network if you love this content ❤️
Very well explained 👍 i have one doubt. Broadcast join also detects small df if broadcast join is enabled right? Do we need to specify which one is smaller df in broadcast join?