Building data pipelines with #python is an important skill for data engineers and data scientists. But what's the best library to use? In this video we look at three options: pandas, polars, and spark (pyspark).
Timeline:
00:00 Data Pipelines
01:11 The Data
02:32 Pandas
04:34 Polars
06:15 PySpark
09:15 Spark SQL
Follow me on twitch for live coding streams: / medallionstallion_
My other videos:
Speed Up Your Pandas Code: • Make Your Pandas Code ...
Intro to Pandas video: • A Gentle Introduction ...
Exploratory Data Analysis Video: • Exploratory Data Analy...
Working with Audio data in Python: • Audio Data Processing ...
Efficient Pandas Dataframes: • Speed Up Your Pandas D...
* RU-vid: youtube.com/@robmulla?sub_con...
* Discord: / discord
* Twitch: / medallionstallion_
* Twitter: / rob_mulla
* Kaggle: www.kaggle.com/robikscube
#python #polars #spark #dataengineering
5 июн 2024