@@easewithdata Thanks & I'm following you for more than a month its been a great learning experience , we want you to make End to End Project in Pyspark
Hello, I understand the request, but it will not be possible to capture all issues/scenarios on RU-vid sessions. I will try to create a mini series later which will cover this topic. Easiest way to create an OOM exception and the most common one - is to create a driver with smaller memory size and then read dataset with bigger size and collect() it for display. Collect will try to fit all data in driver memory which will result in OOM. And to fix this OOM to use take() in place of collect. Hope this helps.
In case you are working with in memory catalog, the metadata will be lost once the compute or cluster is restarted. This is why it is recommended to have a permanent catalog.
Hi, where to get spark session master details in local spark. I am using local[8], I can see only driver using all the 8 cores but no executors after defining on session. I believe it could be cuz of master !
Hello, Local execution only supports with single node which is driver. It uses threads in your machine to execute tasks parallely. Now if you need more executors then you have to configure a cluster and use it in your master. Please checkout the beginning of the series to understand more.