Тёмный

DuckDB vs Pandas vs Polars For Python devs 

MotherDuck
Подписаться 3,8 тыс.
Просмотров 15 тыс.
50% 1

In this video, ‪@mehdio‬ will do a walkthrough of DuckDB, Polars and Pandas. We will discuss the main features and dive into a pragmatic code example.
📓 Resources
Github Repo of the tutorial : github.com/mehd-io/duckdb-pan...
DuckDB getting started video: • DuckDB Tutorial For Be...
➡️ Follow Us
LinkedIn: / 8192. .
Twitter : / motherduck
Blog: motherduck.com/blog/
0:00 Intro
0:34 What is DuckDB
2:46 What is Pandas
3:45 What is Polars
5:12 Code project
6:14 Install & dependencies
7:18 Versatility
8:18 Syntax
9:26 Performance
10:43 Takeaways
#duckdbvspandas #duckdbvspolars #dataengineering #polarsvsduckdb #polarsvspandas #pandasvsduckdb #pandasvspolars

Опубликовано:

 

5 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 16   
@Shawn-cr8ep
@Shawn-cr8ep Год назад
DuckDB is the most underused and underrated Python library. I started using it a couple weeks ago and I'm blown away by the efficiency increase over Pandas. Plus SQL is easier and it forces you to think I'm vectorized operations rather than being tempted by Pandas built in loop methods that are super slow
@porlando12
@porlando12 9 месяцев назад
I appreciate the nods to the R community going on in here. Great video!
@matej6418
@matej6418 8 месяцев назад
all 5 of them.
@MrRubix94
@MrRubix94 Год назад
Well I had just started to learn Polars, but your video and another one comparing DuckDB and Polars are making me doubt my choice… DuckDB seems MUCH faster. Besides, SQL knowledge can be leveraged for everything. Why one would use pandas or polars over DuckDB? Am I missing something?
@mehdio
@mehdio Год назад
I understand the doubt :) Apart from features there is the debate about DataFrame vs SQL approach. While both Polars and DuckDB support DataFrame & SQL, DuckDB is primary designed to interface through SQL. So if your a SQL lover, DuckDB is a no brainer. Polars has also a SQL interface but it's a pretty recent.
@MrRubix94
@MrRubix94 Год назад
@@mehdio Hum, I’m not really a SQL lover, I just want to use what works best as a data scientist. Manipulating a DataFrame is really convenient when exploring data. Maybe DuckDB + Polars? But I like simplicity, I would rather use one tool only. Choices, choices…
@incremental_failure
@incremental_failure Год назад
Same here. Just finished a rewrite from Pandas to Polars and it's already out of date. Although I'll likely be using Polars for the in-memory stuff and DuckDB for out-of-memory persistent data. The differences in speed are not gigantic if you consider the bigger picture and Polars development is very active, they are getting faster with every minor version.
@armeyavaidya3464
@armeyavaidya3464 9 месяцев назад
Polars is best for continuous operation on columns, Also it doesn't support indices so can't do (I at some point and j at some point)
@incremental_failure
@incremental_failure 9 месяцев назад
@@armeyavaidya3464 Indexes can be simulated, using a column as an index.
@kpyoutuber4671
@kpyoutuber4671 5 месяцев назад
Thank you, for this valuable content!!. Can you also explain the parquet dataset? I used to create partitioned Parquet datasets by using Pandas and Polars. But I want to know how to read data from such partitioned parquet datasets directly to Polars lazy frame format (not to pandas as data size is larger than memory) to do some analytics. import polars as pl import pyarrow.parquet as pq # Read data written to parquet dataset pq_df = pq.read_table(r"C:\Users\test_pl", schema=pd_df_schema, ) pl_df = pl.from_pandas(pq_df.to_pandas()).lazy() Is there any better way to do this
@motherduckdb
@motherduckdb 3 месяца назад
As per polars documentation, docs.pola.rs/py-polars/html/reference/api/polars.scan_pyarrow_dataset.html#polars.scan_pyarrow_dataset You can use scan_pyarrow_dataset() to read from partitioned datasets.
@Emotekofficial
@Emotekofficial 9 месяцев назад
How about DUCKDB and SQLALCHEMY? Do they shake hands? Can I do ORM like this?
@motherduckdb
@motherduckdb 8 месяцев назад
yep, here’s MotherDuck instructions for it: motherduck.com/docs/integrations/sqlalchemy (though also works with vanilla OSS duckdb, with driver linked from there)
@JOHNSMITH-ve3rq
@JOHNSMITH-ve3rq 10 месяцев назад
SQLite is faster yo
@shogun8-9
@shogun8-9 9 месяцев назад
not for analysis. SQLite is OLTB, not OLAP.
@allthingsdata
@allthingsdata 2 месяца назад
I guess I'm stating the obvious but for anyone who doesn't use SQL for data operations DuckDB is second class. And I surely do not like to use SQL for transformations and such.
Далее
DuckDB Tutorial For Beginners
11:25
Просмотров 28 тыс.
Why I chose Python & Polars for Data Analysis
24:33
Просмотров 4,9 тыс.
15 Python Libraries You Should Know About
14:54
Просмотров 370 тыс.
Polars is the Pandas killer / Igor Mintz (Viz.ai)
21:46
What polars does for you - Ritchie Vink
27:45
Просмотров 3,4 тыс.
Big Data is Dead | MotherDuck
25:58
Просмотров 11 тыс.
Do these Pandas Alternatives actually work?
20:19
Просмотров 14 тыс.
Why should you care about DuckDB? ft. Mihai Bojin
14:35