Spark is a distributed computing system that is used within Foundry to run data transformations at scale. This series covers the core Spark concepts you need to know for working with data in Foundry.
This video builds on an understanding of data partitions (link below) to introduce shuffling, which is the process of rearranging data across partitions, and demonstrate how minimizing shuffling for a job can be used to reduce compute costs.
Spark Basics | Partitioning: • Spark Basics | Partitions
21 авг 2024