๐ ๐๐ญ๐๐ซ๐ญ ๐ ๐๐ฉ๐๐ซ๐ค ๐๐๐ฌ๐ฌ๐ข๐จ๐ง : Set up the PySpark environment.
๐งฃ ๐๐ซ๐๐๐ญ๐ ๐ ๐๐ข๐ฌ๐ญ : Define the list with three elements.
๐ข ๐๐๐ซ๐๐ฅ๐ฅ๐๐ฅ๐ข๐ณ๐ ๐ญ๐ก๐ ๐๐ข๐ฌ๐ญ : Distribute the list across the cluster nodes.
๐ ๐๐จ๐ง๐ฏ๐๐ซ๐ญ ๐ญ๐จ ๐๐๐ญ๐๐
๐ซ๐๐ฆ๐ : Convert the distributed RDD to a DataFrame.
๐ ๐๐๐ซ๐๐จ๐ซ๐ฆ ๐๐ซ๐จ๐๐๐ฌ๐ฌ๐ข๐ง๐ : Show the contents and perform any desired operations.
๐ This video will explain how to write first program in PySpark.
๐ข Video Link: lnkd.in/gmE_dAcG
LinkedIn Profile of author:
/ sachin-saxena-graphic-...
Code Source Link:
lnkd.in/g67a4kY3
๐๐ฑ๐ฉ๐ฅ๐๐ง๐๐ญ๐ข๐จ๐ง ๐จ๐ ๐ญ๐ก๐ ๐๐จ๐๐ :
๐. ๐๐ฉ๐๐ซ๐ค ๐๐๐ฌ๐ฌ๐ข๐จ๐ง : The SparkSession is created to provide an entry point for Spark functionality.
๐. ๐๐ข๐ฌ๐ญ ๐๐ซ๐๐๐ญ๐ข๐จ๐ง : A list of three elements is defined.
๐. ๐๐๐ซ๐๐ฅ๐ฅ๐๐ฅ๐ข๐ณ๐ : The list is parallelized with numSlices=3, which ensures that each element is assigned to a different partition in the RDD. This is how we can distribute it across the three nodes.
๐. ๐๐จ๐ง๐ฏ๐๐ซ๐ญ ๐ญ๐จ ๐๐๐ญ๐๐
๐ซ๐๐ฆ๐ : The RDD is mapped to a tuple format to convert it into a DataFrame. The column is named "element".
๐. ๐๐ข๐ฌ๐ฉ๐ฅ๐๐ฒ ๐๐๐ญ๐๐
๐ซ๐๐ฆ๐ : The contents of the DataFrame are printed using df.show(), which will display each element as a separate row.
๐. ๐๐จ๐ฎ๐ง๐ญ : The total number of elements is counted and printed.
๐. ๐
๐ฎ๐ซ๐ญ๐ก๐๐ซ ๐๐ซ๐จ๐๐๐ฌ๐ฌ๐ข๐ง๐ : An optional step is included to filter the DataFrame for elements containing "1" and display the result.
๐. ๐๐ญ๐จ๐ฉ ๐๐ฉ๐๐ซ๐ค ๐๐๐ฌ๐ฌ๐ข๐จ๐ง Finally, the Spark session is stopped to release resources.
3:54 Databricks source
6:00 Show the number of students in the file
16:00 Map and Flatmap in PySpark
29:00 GroupBy in PySpark
30:00 Show the total marks achieved by Female and Male students
32:00 Show the total number of students that have passed and failed.
33:10 filter data as 50+ marks are required to pass the course
40:00 Show the total number of students enrolled per course
51:00 Show the total marks that students have achieved per course
52:00 Show the average marks that students have achieved per course
55:00 Show the minimum and maximum marks achieved per course
57:00 Show the average age of male and female students
24 ะพะบั 2024