Тёмный

05 Understand Spark Session & Create your First DataFrame 

Ease With Data
Подписаться 3,9 тыс.
Просмотров 4,7 тыс.
50% 1

Video explains - How to create Spark Session? How to write DataFrame Queries ? What is Spark UI ? How to understand Actions in Spark ?
Chapters
00:00 - Introduction
00:33 - Local env setup
01:07 - Understand Use Case
01:20 - How to create SparkSession object ?
03:30 - What is Spark UI?
04:15 - Write our First DataFrame
06:10 - How Actions trigger job ?
09:30 - Spark Interactive Shell
10:21 - Bonus Tip
Local PySpark Jupyter Lab setup - • 03 Data Lakehouse | Da...
Python Basics - www.learnpython.org/
Code URL -github.com/subhamkharwal/pysp...
The series provides a step-by-step guide to learning PySpark, a popular open-source distributed computing framework that is used for big data processing.
New video in every 3 days ❤️
#spark #pyspark #python #dataengineering

Опубликовано:

 

15 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 29   
@rdj127
@rdj127 Год назад
Nice , please upload more
@sureshraina321
@sureshraina321 Год назад
Great content !!
@easewithdata
@easewithdata Год назад
Thanks ❤️
@easewithdata
@easewithdata Месяц назад
The JDK version is updated in ubuntu library which is leading to issues like JAVA Process exited. You can use the PySpark Jupyter notebook from Docker hub as workaround. Just run the following command in docker. It will also work as as expected. docker pull jupyter/pyspark-notebook:spark-3.3.0
@user-cp2je2nt6x
@user-cp2je2nt6x 8 месяцев назад
Subham, I installed Jupyter Lab setup using your repo on Github. I am getting the following error. RuntimeError: Java gateway process exited before sending its port number. I am trying to fix it. Meanwhile if you know the solution please help
@easewithdata
@easewithdata 8 месяцев назад
Sure I will check and let you know. Meanwhile you can also use this notebook to get started - hub.docker.com/r/jupyter/pyspark-notebook
@AbhishekMahajan
@AbhishekMahajan Год назад
I am having an issue RunTimeError: Java gateway process exited before sending its port number.. I have check stackoverflow but solution didn't work for me
@easewithdata
@easewithdata Год назад
Can you tell me the step where you are facing this issue.
@AbhishekMahajan
@AbhishekMahajan Год назад
@@easewithdata after installing and going through all process as you have mentioned when I make spark object that time I am getting this error
@easewithdata
@easewithdata Месяц назад
@@AbhishekMahajan Hello, Yes, I know the JDK version is updated in ubuntu library which is leading to this issue. You can use the PySpark Jupyter notebook from Docker hub as workaround. Just run the following command in docker. docker pull jupyter/pyspark-notebook:spark-3.3.0
@abhishekbm2568
@abhishekbm2568 8 месяцев назад
Subham, If I am not asking much, could you please provide the data dump you are using int his video. So it is handy to practice while watching the tutorial
@easewithdata
@easewithdata 8 месяцев назад
All notebooks are uploaded on Github - github.com/subhamkharwal/pyspark-zero-to-hero I will upload the data as well in the same repo later. Thanks for following. Make sure to share with your network and tag us ❤️
@bhavishyasharma998
@bhavishyasharma998 Месяц назад
Everyone is having the same problem 'RuntimeError: Java gateway process exited before sending its port number' please help.
@easewithdata
@easewithdata Месяц назад
Hello, Yes, I know the JDK version is updated in ubuntu library which is leading to this issue. You can use the PySpark Jupyter notebook from Docker hub as workaround. Just run the following command in docker. docker pull jupyter/pyspark-notebook:spark-3.3.0
@easewithdata
@easewithdata Месяц назад
Hello, Yes, I know the JDK version is updated in ubuntu library which is leading to this issue. You can use the PySpark Jupyter notebook from Docker hub as workaround. Just run the following command in docker. docker pull jupyter/pyspark-notebook:spark-3.3.0
@bhavishyasharma998
@bhavishyasharma998 Месяц назад
@@easewithdata but then I'm not able to see spark UI localhost:4040 not working
@hamzaiftikhar1220
@hamzaiftikhar1220 6 месяцев назад
Hi Subham... I am facing "Invalid credentials" error when i paste the token from docker container into jupyter lab.... kindly guide me.
@easewithdata
@easewithdata 6 месяцев назад
Hello Hamza, Just copy the token from the logs and use it to set the password. Please ensure not to copy any quotes or spaces. You can refer: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-WhxljT3IfdM.html
@rudramitra7609
@rudramitra7609 6 месяцев назад
All is good, but most trainers are missing to add the repo where we can use the data to practice, its a basic thing. Request please in all ur video if you can add a GIT link
@easewithdata
@easewithdata 6 месяцев назад
Hello Rudra, You can find all notebook link in the description of video, its already present for every coding video session. All datasets are uploaded as well in GIT. I will add it for any video I missed. You can refer: github.com/subhamkharwal/pyspark-zero-to-hero
@saketsourav_hjp
@saketsourav_hjp 2 месяца назад
Getting error while creating the spark session: java gateway process exited before sending its port number. I guess most of us are facing this issue can you help us
@easewithdata
@easewithdata 2 месяца назад
Hello Saket, Yes this is due to Java upgrade. You can set us pyspark jupyter notebook directly from docker using the below command docker pull jupyter/pyspark-notebook
@at-cv9ky
@at-cv9ky 5 месяцев назад
ModuleNotFoundError Traceback (most recent call last) Cell In[3], line 1 ----> 1 from pyspark.sql import SparkSession 3 spark = (SparkSession 4 .builder 5 .appName("Spark Intro") 6 .master("local[*]") 7 .getOrCreate()) ModuleNotFoundError: No module named 'pyspark' kindly help me to solve this error. Thanks
@easewithdata
@easewithdata 5 месяцев назад
Seems there is something wring with your setup. You can also follow this setup for your Jupyter lab installation: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-BeZRe0f23hI.htmlsi=DlPk-oUv_APrSadL
@SanjeevKumar-dr6qj
@SanjeevKumar-dr6qj Год назад
create all videos
@easewithdata
@easewithdata 8 месяцев назад
New videos are being published every 3 days. Thanks for Following ❤️
@muralibestha3227
@muralibestha3227 8 месяцев назад
hi subham where is the python videos
@easewithdata
@easewithdata 8 месяцев назад
Do you mean PySpark videos ?
Далее
06 Basic Structured Transformation - Part 1
13:06
Просмотров 3,5 тыс.
RATE THE TOUCH vs JUVENTUS ACADEMY 🙈
00:35
Просмотров 7 млн
Learn Apache Spark in 10 Minutes | Step by Step Guide
10:47
24 Fix Skewness and Spillage with Salting in Spark
21:17
19 Understand and Optimize Shuffle in Spark
15:14
Просмотров 1,7 тыс.
RATE THE TOUCH vs JUVENTUS ACADEMY 🙈
00:35
Просмотров 7 млн