Тёмный

DuckDB Tutorial - DuckDB course for beginners 

Data with Marc
Подписаться 27 тыс.
Просмотров 30 тыс.
50% 1

DuckDB Tutorial - DuckDB course for beginners
Discover the power of DuckDB and why it will be so important to you!
Materials: robust-dinosaur-2ef.notion.si...
Connect with me on LinkedIn 👇
/ marclamberti
TABLE OF CONTENT
0:00 Introduction
0:32 Who am I
0:53 Why DuckDB?
1:52 What is DuckDB?
4:34 What are the limits of DuckDB
5:35 Practice time
5:41 The setup
7:22 Install DuckDB and Imports
8:11 Create a DuckDB connection
9:10 Print the top 10 rows with Pandas
10:37 Print the top 10 rows with DuckDB
11:15 Retrieving data from a request
12:13 Use read_csv to load CSVs
13:03 Using DESCRIBE
13:15 Registering a Dataframe as a Table
14:15 Counting rows with DuckDB
14:18 Data cleansing
14:52 Querying Pandas Dataframes with DuckDB
15:55 Filtering on columns
16:46 What's a view?
17:06 Creating a table with types that skips invalid rows
18:40 The First-Form clause
19:07 Excluding columns with DuckDB
19:40 The Column Expression
21:52 Create a View
22:34 GROUP BY ALL
22:48 Querying the View
23:05 Revenue by City
23:24 Writing Parquet files with DuckDB
24:05 Querying Parquet files with DuckDB
24:35 DuckDB with S3
25:08 DuckDB Extensions
25:15 Conclusion

Опубликовано:

 

26 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 47   
@HitAndMissLab
@HitAndMissLab 7 дней назад
Brilliant tutorial, appreciate your work.
@MHelmers77
@MHelmers77 11 месяцев назад
Would love a full Udemy Course on DuckDB & dbt from you!
@shneor.e
@shneor.e 10 месяцев назад
I would buy it
@gatono25
@gatono25 Год назад
That's game changing, very nice video, Marc, Thank you.
@AliMasri91
@AliMasri91 Год назад
Thank you! Great content as always!
@adiliophi
@adiliophi Год назад
Awesome content! Thanks a lot!
@robertbutscher6824
@robertbutscher6824 6 месяцев назад
great tutorial, thanks for inspiration and your interstings lessons
@yangsong6111
@yangsong6111 Месяц назад
thx a lot, I learned a lot
@tech-n-data
@tech-n-data 10 месяцев назад
This is a great intro video, thank you.
@MarcLamberti
@MarcLamberti 10 месяцев назад
Thank you ❤️
@interestingamerican3100
@interestingamerican3100 Год назад
Dude.....I like your style! Subscribed and hit the bell....
@MarcLamberti
@MarcLamberti Год назад
Thank you 🙏
@rayusaki88
@rayusaki88 Месяц назад
Thanks Marc
@stnico
@stnico Год назад
Great, thanks!
@MarcLamberti
@MarcLamberti Год назад
You’re welcome ☺️
@mathieujasinski2383
@mathieujasinski2383 11 месяцев назад
I did smashed the like "booton", thanks brother, very helpful content!
@MarcLamberti
@MarcLamberti 11 месяцев назад
Thank you 🙏
@sigkalbar
@sigkalbar Год назад
Awesome ❤
@MarcLamberti
@MarcLamberti Год назад
Thank you 🙏
@davidjackson7675
@davidjackson7675 Год назад
Thanks.
@dataslinger6379
@dataslinger6379 Год назад
Thanks as always for the great content Marc! Just as a sanity check for myself, OLAP Databases don't necessarily need to be columnar based correct? Recent big data databases use columnar based storage, but any database that processes analytical workloads is considered an OLAP database even if it is row based correct?
@jerrynwaeze9269
@jerrynwaeze9269 Год назад
Hi This is the best video on duckdb atm. How do you get resources on it? I heard you can connect to a postgres db using a Postgres scanner. How does that work?
@olga6163
@olga6163 11 месяцев назад
Thanks, great lesson! Could you help to fix an error, it occurs when I try to run the sales table that we`ve created. The error is "Conversion Error: Could not convert DATE (0019-04-19) to nanoseconds" Don`t understand why it`s trying to get nanoseconds with the date format. Thanks!
@leonvictorrichard3959
@leonvictorrichard3959 9 месяцев назад
Would love an dbt+ duckdb course from u on Udemy. Big fan of yours 🎉
@ragibshahriar7959
@ragibshahriar7959 7 месяцев назад
Please make more videos
@user-fv1576
@user-fv1576 10 дней назад
do you have a video on your vscode config setup?
@LeSaboteur3981
@LeSaboteur3981 5 месяцев назад
like always... way to few likes for such a great video!🎉
@CaribouDataScience
@CaribouDataScience Год назад
Why not use the pandas SQL functions?
@LeSaboteur3981
@LeSaboteur3981 5 месяцев назад
for example i don't think it supports the nice extended Syntax like EXCLUTE. but the main reason is speed. duckDB will execute way faster, pandas SQL is still pandas in the end. (correct me if i'm wrong)
@aruniyer1833
@aruniyer1833 11 месяцев назад
I am getting this error Conversion Error: Could not convert DATE (0019-04-19) to nanoseconds and run conn.execute("from sales").df() when i create a sales table. Not sure if you have seen it?
@MarcLamberti
@MarcLamberti 11 месяцев назад
Let me double check that
@emilychase4610
@emilychase4610 11 месяцев назад
I'm getting this error too!
@aruniyer1833
@aruniyer1833 11 месяцев назад
I was able to solve it by dropping the non date values df = df.loc[df['Order Date'] != 'Order Date'] df['Order Date'] =pd.to_datetime(df['Order Date'],format='%m/%d/%y %H:%M') In the sql block i made this change to match df format strptime("Order Date", '%Y-%m-%d %H:%M:%S') Looks like the trycast is not working or some bug
@rashadnelson1873
@rashadnelson1873 Год назад
Oh man, oooofff! I tried following your video here and your blog with the stock analysis data. I've run into too many errors unfortunately. That's the thing about these dependencies - there's too many moving parts where errors are persistent. I can sort through them, but it would take up too much time. We all got work to do :-). For data analysis work, my go to still at this point is using an AWS S3 bucket to load the raw data, use an AWS Glue to create a database and an AWS Glue Crawler to upload the data while creating a table within that database. I can also use AWS Glue Studio to convert the data from .cvs format to .parquet format. From there, I can use AWS Athena to query the data.
@durgeshkshirsagar5160
@durgeshkshirsagar5160 7 месяцев назад
4:28 it is written redhshit :D Please don't mind. Thanks for the video.
@MarcLamberti
@MarcLamberti 7 месяцев назад
🫣🥹🥹🥹🥹
@durgeshkshirsagar5160
@durgeshkshirsagar5160 7 месяцев назад
Except that baziz, everything is perfectly fine.😀
@shogun8-9
@shogun8-9 10 месяцев назад
The files are not available anymore :(
@MarcLamberti
@MarcLamberti 10 месяцев назад
What files?
@shogun8-9
@shogun8-9 10 месяцев назад
@@MarcLamberti In order to work with this tutorial, you provided the sales dataset in the materials. There is a link in the description. On this page, there is a link to the kaggle dataset that you used in the video. However, this one got removed. So it is not possible anymore to follow this tutorial :(
@MarcLamberti
@MarcLamberti 10 месяцев назад
oh oh, let me check if I can fix that
@shogun8-9
@shogun8-9 10 месяцев назад
@@MarcLamberti Thank you! Let us know if it works again
@MarcLamberti
@MarcLamberti 10 месяцев назад
Here we go: www.kaggle.com/datasets/kushagra1211/usa-sales-product-datasetcleaned
@davidjackson7675
@davidjackson7675 Год назад
When I ran this Python/Duckdb it only shape returned (186862, 6): # with duckdb cur_time = time.time() df = conn.execute(""" SELECT * FROM '/kaggle/input/sales-product-data/*.csv' """).df() print(f"time: {(time.time() - cur_time)}") print(df.shape)
Далее
Why should you care about DuckDB? ft. Mihai Bojin
14:35
Наше обычное утро 💕
00:42
Просмотров 298 тыс.
Airflow на простом примере
9:53
Просмотров 2,3 тыс.
DuckDB Tutorial For Beginners
11:25
Просмотров 30 тыс.
What's new in Apache Airflow 2.6?
9:53
Просмотров 5 тыс.
Why use DuckDB in your data pipelines ft. Niels Claeys
22:26