Тёмный
No video :(

How to Clean Data Like a Pro: Pandas for Data Scientists and Analysts 

TrentDoesMath
Подписаться 257
Просмотров 2,5 тыс.
50% 1

Опубликовано:

 

28 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 24   
@newenglandnomad9405
@newenglandnomad9405 12 дней назад
Fantastic easy to follow data cleaning video. I also appreciate you blatantly saying yes it's 50 or so rows but it could be 10k, but the same techniques apply.
@kebincui
@kebincui 4 дня назад
Excellent video❤, thanks for sharing
@trentdoesmath
@trentdoesmath 3 дня назад
Thanks for watching!
@mapletech_22
@mapletech_22 13 дней назад
This is great. ❤❤🎉
@israsuazo3345
@israsuazo3345 Месяц назад
This is the 1st video I watched that actually seeing the python libraries in action. Thank you for this.
@trentdoesmath
@trentdoesmath Месяц назад
You're very welcome! I'm excited to hear about what you will build with them 🙂
@Carlos-wv4zk
@Carlos-wv4zk Месяц назад
Dude I cannot explain how helpful this was, man! Seriously, you literally allowed me to pickup any datasets I download and immediately gave me the practical guidelines to clean/analyze it. Thank you!!
@trentdoesmath
@trentdoesmath Месяц назад
You're very welcome!😎
@souravbarua3991
@souravbarua3991 18 дней назад
Very helpful and super simple explanation. Looking forward for your next advance pandas with larger dataset videos. Thank you for this video.
@ChukwuemekaAmblessedchinenye
@ChukwuemekaAmblessedchinenye Месяц назад
wow your are the real goat the best video so far please more video like this
@dogsapparatus7504
@dogsapparatus7504 25 дней назад
nice tutorial
@ImJordanHubbard-qg9qt
@ImJordanHubbard-qg9qt 20 дней назад
Actual actionable real life skills not fluffy fun python skills but actual valuable stuff we need to know!
@CaribouDataScience
@CaribouDataScience Месяц назад
You misspelled Tidyverse 😮
@trentdoesmath
@trentdoesmath Месяц назад
🤣
@LivingG6170
@LivingG6170 Месяц назад
Keep doing good work. Big help
@trentdoesmath
@trentdoesmath Месяц назад
I appreciate the kind words 🙏 thanks for the support!
@tmb8807
@tmb8807 Месяц назад
Cool, thanks. Is Polars making much of an impact in your world? I've used it a bit and I think I prefer the more explicit syntax - besides the potential for enormous performance gains it brings.
@trentdoesmath
@trentdoesmath Месяц назад
Hi tmb8807 :) I have followed a couple of tutorials on polars, but never used it on anything in a professional setting as of yet 🤔 I'll test it out more extensively. Any good tutorials you'd recommend? Typically, when I've worked on projects that needed high performance I've used Apache Spark - but Polars could be a nice in-between pandas and spark? Thanks for the support!
@tmb8807
@tmb8807 Месяц назад
@@trentdoesmath thanks for the reply. There are a few tutorials on RU-vid, the one from Rob Mulla is what got me onto it. Because Polars can work with larger-than-memory data via the streaming API I’ve seen it suggested it could replace Spark on a single node for some jobs, although I’ve not done that first hand! But it could potentially expand the 'in-between' area, as you say. Main reason I like it is that I just find the syntax much more consistent and readable (and easier to write as a result). Your mileage may vary on that, though, especially if you're extremely comfortable with Pandas (it's a bit less "Pythonic", with more explicit methods for everything). Lazy evaluation and the query optimisation engine are a big selling point of it as well - can greatly improve memory usage.
@trentdoesmath
@trentdoesmath Месяц назад
Awesome! I'll check out the Rob Mulla stuff, thanks for the recommendation👍 For sure! It actually reminds me a bit of Scala 🤔... Very 'to the point'. Not sure if you have tried out Dask before? but it's yet another performance option out there.
@totoarifiyanto8679
@totoarifiyanto8679 Месяц назад
Just like Thor said: "Another"
@kikiboy2545
@kikiboy2545 Месяц назад
Hi ! Thanks for this video. I wanted to know, as a data scientist/analyst, why did you choose to use Jupyter and a .ipynb cleaning file ? Why not using pycharm and a .py for example ? Is that just a matter of personal preference ? Sorry I am new to python, proficient on Stata but trying to make a shift
@trentdoesmath
@trentdoesmath Месяц назад
Hi @kikiboy2545 🙂 thank you for your question. TL; DR - I chose to use jupyter as it is easier for me to demo with and record the video with. To your point on creating a .py file - I would recommend this if you are creating cleaning logic that is going to be re-used and shipped to 'production' as it is easier to test and maintain a straight Python script IMO. That being said, there is increasing support for the use of notebooks as the preferred environment - as examples, Snowflake, Databricks, Azure Synapse and more all support the use of re-useable notebooks to contain all of your logic. I've worked in teams where notebooks are preferred for all data pipeline code due to how intuitive and approachable they are - but as I say my personal preference is: use notebooks for exploration, and .py scripts for your production code 🙂 No need to apologize! I am glad to be part of your learning journey - keep pushing man! 😎
@trentdoesmath
@trentdoesmath Месяц назад
What are some data cleaning techniques that you have used? 🤔
Далее
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Exploratory Data Analysis with Pandas Python
40:22
Просмотров 461 тыс.
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Why you should not be a data scientist
12:33
Просмотров 759 тыс.
How to turn data into stories
50:43
Просмотров 248 тыс.