Good video. Do you have any videos how Spark runs better or different compared to Hadoop and for which type of scenarios Spark is preferable than Hadoop.
In my spark version 2.4.3 job after all my transformations,computations and joins I am writing my final dataframe to s3 in parquet format But irrespective of my cores count my job is taking fixed amount for completing save action For distinct cores count-8,16,24 my write action timing is fixed to 8 minutes Due to this my solution is not becoming scalable How should I make my solution scalable so that my overall job execution time becomes proportional to cores used