Тёмный

Solving 100 Python Pandas Problems! (from easy to very difficult) 

Keith Galli
Подписаться 222 тыс.
Просмотров 83 тыс.
50% 1

In this tutorial, you'll gain hands-on experience with the python pandas library, building experience with data manipulation and analysis skills important for data science. You'll learn how to create, modify, and analyze DataFrames, handle missing data (NaNs), clean messy data, and generate some visualizations. By tackling a variety of problems, from basic data handling to advanced DataFrame techniques, you'll build a solid foundation in managing and interpreting real-world data sets using pandas.
Repo we're working off of (credit to Alex Riley who put repo together):
github.com/ajcr/100-pandas-pu...
My code solutions (use repo above for blank starting template):
github.com/KeithGalli/100-pan...
Hope that you enjoy this video. If you do, make sure to like it and subscribe to not miss future videos like this!
Video Timeline!
0:00 - Intro & Setup
2:14 - Problems (1-3) Initial pandas setup
4:42 - Problems (4-10) DataFrame operations
4:52 - 4) Create a dataframe from dictionary
5:24 - 5) Display dataframe summary
5:41 - 6) First 3 rows of the dataframe
6:02 - 7) Select ‘animal’ and ‘age’ columns
7:42 - 8) Data in specific rows and columns
9:06 - 9) Rows with visits greater than 3
9:57 - 10) Rows with NaN in age
10:56 - 11) Cats younger than 3 years
11:35 - 12) Age between 2 and 4
12:45 - 13) Change age in row ‘f’
15:56 - 14) Sum of all visits
16:41 - 15) Average age by animal
20:21 - 16) Modify and revert rows
24:06 - 17) Count by animal type
25:28 - Quick review
26:17 - 18) Sort by age and visits
28:07 - 19) Convert 'priority' to boolean
29:42 - 20) Replace 'snake' with 'python'
30:53 - 21) Mean age by animal and visits
33:49 - Advanced DataFrame techniques
33:57 - 22) Filter duplicate integers
43:18 - 23) Subtract row mean
45:42 - 24) Column with smallest sum
50:39 - 25) Count unique rows
53:17 - 26) Column with third NaN
1:10:27 - Solution review for 26
1:17:13 - 27) Sum of top three values
1:24:01 - 28) Sum by column condition
1:40:11 - Recent problem review
1:42:53 - 29) Count differences since last zero
1:56:19 - 30) Locate largest values
2:08:38 - 31) Replace negatives with mean
2:17:43 - 32) Rolling mean over groups
2:23:10 - Series and DatetimeIndex
2:23:12 - 33) DatetimeIndex for 2015
2:27:56 - 34) Sum values on Wednesdays
2:45:04 - 35) Monthly mean values
2:46:16 - 36) Best value in four-month groups
2:50:26 - 37) DatetimeIndex of third Thursdays
2:59:03 - Cleaning Data
2:59:40 - 38) Fill missing FlightNumber
3:02:45 - 39) Split column by delimiter
3:06:47 - 40) Fix city name capitalization
3:08:30 - 41) Reattach columns
3:13:11 - 42) Fix airline name punctuation
3:17:45 - 43) Expand RecentDelays into columns
3:27:31 - MultiIndexes in Pandas
3:27:34 - 44) Construct a MultiIndex
3:30:37 - Solution review
3:32:44 - 45) Lexicographically sorted check
3:32:58 - 46) Select specific MultiIndex labels
3:34:23 - 47) Slice Series with MultiIndex
3:35:24 - 48) Sum by first level
3:37:47 - 49) Alternative sum method
3:40:08 - Additional solution insights
3:41:22 - 50) Swap MultiIndex levels
3:45:27 - Minesweeper problems
3:45:44 - 51) Generate coordinate grid
4:00:28 - 52) Add 'safe' or 'mine' column
4:03:04 - 53) Count adjacent mines
4:27:33 - Review solution to 53
4:33:02 - Skipped problems 54 & 55
4:33:11 - Plotting
4:33:12 - 56) Scatter plot with black x markers
4:41:26 - 57) Plot four data types
4:52:50 - 58) Overlay multiple graphs
5:03:11 - 59) Hourly stock data summary
5:14:12 - 60) Candlestick plot
------------------
Practice your Python Pandas data science skills with problems on StrataScratch!
stratascratch.com/?via=keith

Опубликовано:

 

4 июл 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 90   
@JatinKumar-cn9wt
@JatinKumar-cn9wt 2 месяца назад
Are you crazy man , 5 hour + course only for pandas , man your dedication for teaching is amazing
@KeithGalli
@KeithGalli 2 месяца назад
I appreciate the support!
@raphaelmatthew5165
@raphaelmatthew5165 2 месяца назад
Please guys give this video a like if you haven't, it takes a lot of work to create such a masterpiece. Welcome back Keith🎉.
@KeithGalli
@KeithGalli 2 месяца назад
Woo!!! 5 hours of Pandas practice, what could be better. Hope you all enjoy!
@jonpounds1922
@jonpounds1922 2 месяца назад
Will watch this on repeat until I am an expert. Thank you.
@KeithGalli
@KeithGalli 2 месяца назад
@@jonpounds1922 haha my man 💪
@simonmasters3295
@simonmasters3295 2 месяца назад
It will not make you and expert Consider the examples trivial. Average age of each animal (a SQL "group by" for instance...) try doing it this way with a million animals and computing the variance at the same time
@PaYaMv2
@PaYaMv2 2 месяца назад
You are a sight for sore programming eyes Keith! We cannot thank you enough for this!!
@foland2619
@foland2619 2 месяца назад
Awesome work and skills Keith. Thank you, great effort
@d.g0101
@d.g0101 2 месяца назад
This is what I was looking for. Thanks Keith
@renatolippi
@renatolippi 2 месяца назад
Excellent! Thank you very much for this video!! Please more with this format 👏
@Kidpambi
@Kidpambi 2 месяца назад
It is great to have you back teaching 🎉
@vedantlssj2
@vedantlssj2 28 дней назад
This the kind of content that makes RU-vid the great source of learning it is!
@ryandavis280
@ryandavis280 Месяц назад
OMG keith you are a lifesaver! thank you!
@JW-pu1uk
@JW-pu1uk 2 месяца назад
Dude I freaking LOVE your content. I am so stoked to see this video and have it bookmarked for the rest of my data science career lol
@KeithGalli
@KeithGalli 2 месяца назад
haha love that! Glad you like the content :)
@abdulbasitnisar
@abdulbasitnisar Месяц назад
Thank you such much!! whatever you are doing actually is life changing for people like me who is self learning these! Thank you!!!
@Soso65929
@Soso65929 2 месяца назад
more of this buddy enjoyed each second
@vishalcrazy5121
@vishalcrazy5121 Месяц назад
Thank you for this Keith .
@chetan8577
@chetan8577 2 месяца назад
This video would be really helpful. Keep up the great work!😊
@loganmclaughlin1288
@loganmclaughlin1288 2 месяца назад
Love the long form !
@tdcode
@tdcode Месяц назад
Man, you're crazy 🤣🤣🤣🤣🤣🤣🤣🤣🤣. This is awesome! Thanks for a colossal and great video!!!🎉🎉
@ngoclinhvu5381
@ngoclinhvu5381 2 месяца назад
5 hours of pandas puzzles??? Just what I need!
@ngoclinhvu5381
@ngoclinhvu5381 2 месяца назад
never thought I'd ever say that in my life tbh
@KeithGalli
@KeithGalli 2 месяца назад
@@ngoclinhvu5381 Haha very fair. I found these exercises very educational for me personally, so hope that you do as well!
@conykuo4308
@conykuo4308 Месяц назад
5 hours pandas video is crazyyyy. Must give a thumb up!
@paraglide01
@paraglide01 24 дня назад
Thanks man, I was just looking for getting into Pandas.
@NuanceWebsites
@NuanceWebsites Месяц назад
Bro, you are a genius!!!
@yaj_at8787
@yaj_at8787 Месяц назад
thanks for this...needed
@jdmcivicrrr
@jdmcivicrrr 2 месяца назад
This is awesome. Thanks. ❤
@KeithGalli
@KeithGalli 2 месяца назад
You are very welcome!
@sofiarequena2616
@sofiarequena2616 2 месяца назад
Great video!!
@FIBONACCIVEGA
@FIBONACCIVEGA 2 месяца назад
Such a good videos!!!
@Aya_Chagra
@Aya_Chagra 2 месяца назад
Very Nice ⚡thanks a lot ⚘
@APP-ld6jf
@APP-ld6jf 2 месяца назад
Looking forward to numpy puzzles now!!
@chamaraweerasinghe5836
@chamaraweerasinghe5836 2 месяца назад
Thanks Lot Keith...😘
@second1799
@second1799 Месяц назад
This is awesome! can you do for other libraries too please!!!
@dianapestriaeva9853
@dianapestriaeva9853 2 месяца назад
pure gold 🤩
@chandrasekars8904
@chandrasekars8904 2 месяца назад
This is really an excellent channel on Python like "techie talkee"
@souravbarua3991
@souravbarua3991 Месяц назад
Thank you for making such wonder videos on python.🙏.please make some videos on pyspark also.
@klausditrich7323
@klausditrich7323 2 месяца назад
I'm too old for all the Minecraft or Fortnight streams, so here I'm and loving it :-)
@KeithGalli
@KeithGalli 2 месяца назад
Hahaha love this
@MrBeavis2014
@MrBeavis2014 2 месяца назад
thank you very much
@ikersanchez8222
@ikersanchez8222 2 месяца назад
I love your content
@AgustinGonzalez-tz3yr
@AgustinGonzalez-tz3yr 2 месяца назад
19:30 I do this a lot, by passing a dict to the agg function after grouping (it allows you to asign multiple operators to several cols at once). Eg df.groupby(“animal”).agg({“age”:”mean”})
@KeithGalli
@KeithGalli 2 месяца назад
This is super useful, thank you for the tip!
@thiagosiqueira4690
@thiagosiqueira4690 Месяц назад
this works too, df.groupby('animal').mean('age')
@AgustinGonzalez-tz3yr
@AgustinGonzalez-tz3yr Месяц назад
@@thiagosiqueira4690 I think that doesn't work for grouping by multiple columns and adding a specific function for every column
@soroushnazari5596
@soroushnazari5596 2 месяца назад
Great to have you back Keith! Going to watch it over the next couple of days and it’s gonna be my sort of bible I guess for future reference haha
@KeithGalli
@KeithGalli 2 месяца назад
Love that! I found the exercises very educational myself.
@user-ed3bs1vq8b
@user-ed3bs1vq8b 2 месяца назад
answer to question 23 is incorrect: it say the ROW mean, while df.mean() gives you the column mean
@sssimp4216
@sssimp4216 2 месяца назад
Thank you 😭🩵🩵
@AchuVlogs
@AchuVlogs 2 месяца назад
Awesome!! Can you please make one for Pyspark? :)
@aloSolo
@aloSolo 2 месяца назад
You look great 🎉 and thanks for posting this video.
@KeithGalli
@KeithGalli 2 месяца назад
Thank you! 😊
@rohitsharma-mg7hd
@rohitsharma-mg7hd 2 месяца назад
another solution to puzzle 26: for i in [0,1,2,3,4]: a=df.iloc[i].sort_values(ascending=True).index[7] print (a)
@sairajtrimbake2801
@sairajtrimbake2801 Месяц назад
Legend
@weiwei2587
@weiwei2587 2 месяца назад
Great tutorial!
@jehfodelrey
@jehfodelrey 2 месяца назад
Thanks for this. Do you know some app or website, to practice or improve my scripting skills?
@panth5501
@panth5501 Месяц назад
Great content, solution of 23 I believe is wrong.
@Cynosure11
@Cynosure11 2 месяца назад
Thank you for your video, Keith! Question for you, do you think its too late to get data science job in 2024?
@KeithGalli
@KeithGalli 2 месяца назад
The job market is challenging right now, but data science positions aren't going anywhere. You definitely can still get a data science job in 2024. That being said, I wouldn't only look for data science positions. There are a lot of software engineering & data engineering roles that use a similar skillset that can be less competitive to land. I'd recommend keeping track of the most popular skills on job openings for all these types of roles, and tailor what you learn moving forward based on that. I also recommend trying to network with people that are working at companies you find interesting. You'll give yourself a much better chance at landing a data science job if you are referred by someone already at a company you are applying to. Job postings on a site like LinkedIn can really difficult to progress in the process because so many people apply.
@StalkedHuman
@StalkedHuman 2 месяца назад
What do you think is a good method to concatanate a string value from datafram column to anothee dataframe column by index key. Example, df_1 rows 10, 20, 26, 30, 40 column 5 (string) concatonate to df_2 rows 9, 19, 25, 29, 39 column 1?..
@sarvanandgaikwad3048
@sarvanandgaikwad3048 2 месяца назад
Do the same for all the popular libraries.
@user-me4pb8qs2t
@user-me4pb8qs2t 2 месяца назад
🎉🎉Cool!!!!
@nacef7606
@nacef7606 2 месяца назад
for the 22th quiz it could be done as simple as for i,k in df.items(): df=set(k)
@KeithGalli
@KeithGalli 2 месяца назад
A couple of issues with the set solution. One is that a set is not guaranteed to preserve any order that numbers are inserted into it, so even though it worked in this example, if we changed the numbers to [3,3,2,2,1,1,8,8,9], the output show {1,2,3,8,9} instead of [3,2,1,8,9]. Another issue with the set solution is that if a number reappears later on in the list, it will disregard it from the solution. So if we had [1, 2, 2, 3, 4, 5, 5, 5, 6, 7, 7, 2, 2] as our input, your solution would output {1,2,3,4,5,6,7} while the correct solution would be [1,2,3,4,5,6,7,2]. Hope this is helpful!
@sauravgoyal19
@sauravgoyal19 Месяц назад
Could you share the VS Code extension you use that provides function descriptions as you type?
@chalijutt3478
@chalijutt3478 20 дней назад
could somebody explain q23 because the way he is doing, i think its wrong because "df.mean()" going to give us the mean values with respect to individual columns not rows and then also in subtraction each mean value going to subtracted from individual columns respectively. we have to use the "df.mean(axis=1)" and then in subtraction also we have to take care of it . I have done it like that "df.subtract(df.mean(axis=1),axis="index").multiply(-1)". Please correct me if i am wrong.
@derekborders9647
@derekborders9647 11 дней назад
Weird to see you open VSCode with the ui when the cli command is the easiest one to add to your other flow once you’re in that directory after cloning. code . Opens the current directory in VSCode.
@brigitayantie
@brigitayantie 2 месяца назад
You really so serious learn and post this class
@saikumar7247
@saikumar7247 Месяц назад
sir could u make same like numpy video
@DataScience-oj4hc
@DataScience-oj4hc 15 дней назад
1:10:03
@reach2puneeths
@reach2puneeths 2 месяца назад
Nice video and content. Can you also come up with similar video of pyspark.
@KeithGalli
@KeithGalli 2 месяца назад
Thank you! As of now I don't have immediate plans to Pyspark video, but I'll look more into it.
@starlordhero1607
@starlordhero1607 Месяц назад
Bro, can you do it for other libraries like numpy, seaborn, and matplotlib. Please !!!!!
@dsisimridijsbs1969
@dsisimridijsbs1969 17 дней назад
Hi guys, is there something similar or equivalent for SQL and scikit-learn? Thank you in advance!
@realzeejay
@realzeejay 2 месяца назад
great video! however, regarding the usage of the terminal to create directories etc at 0:59 , can anyone recommend some youtube videos or sources to get more familiar with it? thanks a bunch! good luck getting good at pandas everybody :)
@meeFaizul
@meeFaizul 2 месяца назад
Finally❤😂
@Intellectualmind4
@Intellectualmind4 2 месяца назад
Great job boss 🎉🎉🎉🎉🎉
@adamlongon53
@adamlongon53 2 месяца назад
Wow OMG ...
@edwinroman30
@edwinroman30 2 месяца назад
🎉🎉🎉🎉
@sebastianalvarez1537
@sebastianalvarez1537 2 месяца назад
pants
@KeithGalli
@KeithGalli 2 месяца назад
pants
@andreogimenes
@andreogimenes Месяц назад
49 seconds theres a disgusting sound!
@sarveshpadav2881
@sarveshpadav2881 2 месяца назад
24:27 We can use gropyby to count the animals in the following way... df.groupby('animal')['animal'].count()
Далее
I've Read Over 100 Books on Python. Here are the Top 3
9:26
10 Nooby Mistakes Devs Often Make In Python
24:31
Просмотров 48 тыс.
Why You’ll WASTE The Next 3 Years…
6:06
Просмотров 399 тыс.
Coding Interviews Be Like
5:31
Просмотров 6 млн
C++ Developer Learns Python
9:26
Просмотров 2,7 млн
Solving real world data science tasks with Python Pandas!
1:26:07
5 Good Python Habits
17:35
Просмотров 406 тыс.