Тёмный

Brandon Rhodes - Pandas From The Ground Up - PyCon 2015 

PyCon 2015
Подписаться 20 тыс.
Просмотров 214 тыс.
50% 1

"Speaker: Brandon Rhodes
The typical Pandas user learns one dataframe method at a time, slowly scraping features together through trial and error until they can solve the task in front of them. In this tutorial you will re-learn how to think about dataframes from the ground up, and discover how to select intelligently from their abilities to solve your data processing problems through direct and deliberately-chosen steps.
Slides can be found at: speakerdeck.co... and github.com/PyC..."

Опубликовано:

 

28 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 143   
@X3n0n36
@X3n0n36 8 месяцев назад
This has to be one of the best videos introductions to Pandas out there, the simple structure of a dataset of movies provided the well thought exercises to test knowledge and to easily see the power of the library. My applause to Brandon I unlocked a new awesome tool to use.
@crazystrum2005
@crazystrum2005 8 лет назад
Wow...this guy is a good teacher! Extremely valuable tutorial on pandas.
@jlpicard7
@jlpicard7 8 лет назад
No kidding! While some like to present to make themselves feel smart, this is one of those rare folks who make YOU feel smarter when their done. For example, I will always remember how // works because of the clever way he described it. UPDATE: All the positive reviews are well justified. Just a small issue with what he says between about 36:30 and 37:20. When I execute this command, I get the following: C:\apps\Anaconda3\lib\site-packages\ipykernel\__main__.py:2: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) from ipykernel import kernelapp as app Given this is over a year old at the time I'm watching this, just giving a heads up that we need to start using the more current way to sort.
@bendo425
@bendo425 7 лет назад
Excellent training and teaching ! I wish Brandon was featured on every PyCon conference.
@tossimmar
@tossimmar 7 лет назад
Absolutely excellent tutorial. If you're trying to become proficient in manipulating datasets in Python, this is your video. Really, really good.
@fernandesfd
@fernandesfd 9 лет назад
Excellent introduction to Pandas!!! It's a long one, but it's certainly worth it. I recommend watching it first, then going back to rework the examples later.
@240018
@240018 5 лет назад
37:10 sort is replaced with sort_values in newer versions
@artemfediai7206
@artemfediai7206 3 года назад
and order for series is replaced by sort_values as well
@senses0
@senses0 6 лет назад
16:00 -- Loading titles.csv and cast.csv 19:32 -- Basic dataframe operations [ len(df), df.head(), df.tail(), selecting columns, 28:08 - Compute on series data, 30:13 - Filter data, 36:53 - Sorting, 38:08 - Basic maneuvers cheat sheet, 40:06 - Filtering dataframe based on null values ] 45:15 -- String operations 48:30 -- [ series.value_counts(), series.sort_index() ] 53:57 -- Some plot basics 1:00:00 -- Indexes [ 1:05:07 - df.set_index(['column']), 1:06:30 - df.loc[], 1:10:42 - multiple indexes, 1:12:20 - Multiple lookup in a multi-indexed dataframe, 1:13:18 - df.reset_index(), 1:51:41 - Index append example ] 1:21:43 -- Grouping [ 1:28:42 - Groupby custom columns that are instantly created ] 1:33:47 -- Stacking and Unstacking operations 1:55:22 -- Cheat sheet 1:55:36 -- Exercises 2:06:28 -- Datetime operation 2:08:29 -- Merge operation [ 2:14:44 - Merging a table with itself ] 2:12:22 -- Pivot operation 2:19:20 -- Data cleanup operation
@the1gofer
@the1gofer 3 года назад
VIP
@Sayerpp
@Sayerpp 6 лет назад
For anyone having trouble getting the .csv files: Don't download the zip from the slide url, it's incomplete. Download it from github.com/brandon-rhodes/pycon-pandas-tutorial and follow the readme.
@gaulinmp
@gaulinmp 9 лет назад
Great tutorial! When you 'unstack' the last time (at 1:50:00) Pandas is doing what you want (many columns with 1 row of data). Note that's why when you did g['extra'] it added a 'row', which was actually a column (how the square bracket assign normally works).
@AsifMehedi
@AsifMehedi 9 лет назад
The Github link to the materials: github.com/brandon-rhodes/pycon-pandas-tutorial/
@tydal6516
@tydal6516 3 года назад
You da man
@grigorytrofimov6513
@grigorytrofimov6513 3 года назад
Effective tutor. Liked a lot the way of speaking about Pandas
@04nikunj
@04nikunj 5 лет назад
18 downvotes Please explain what did you miss on this absolutely *free* & *amazing* tutorial
@kamillearner8552
@kamillearner8552 3 года назад
I am enjoying it so far but after watching another course I understand that "div" operator (mentioned here around the 29th minute) is nothing else but a floor division rather than the truncation (although it provides the very same results for positive numbers). To mention more, python 3.6+ (at least what I know of) will have two operations that work well with each other: "//" and "%" and these two will complement each other in obtaining an original number using the following equation and as an example using a following fraction where we let n to be a nominator and d to be a denominator: (n/d) , n = d * (n // d) + n % d . I hope it helps!
@Simon-kc4ml
@Simon-kc4ml Год назад
1:37:46 my respect for the man elevated immensely... 1:49:14 even more respect....
@MrPortraitsofpast
@MrPortraitsofpast 8 лет назад
i usually don't trust anyone who pronounces it "dah-ta", but this guy is cool
@defoezhang590
@defoezhang590 5 лет назад
Best pandas indicator ever
@Thankful_n_Grateful
@Thankful_n_Grateful 8 лет назад
Aspire to be Web Developer ( Web Application Developer / Tool Builder ). While I'm just starting out, I'm wrestling with Python and Data Presentation in HTML. Hopefully this video will assist in that area.Thanks for sharing...
@gmacie
@gmacie 7 лет назад
Informative, Entertaining and Clever. What more can you ask for?
@Tmmmey
@Tmmmey 6 лет назад
For a collection of 18+ pandas video tutorials, and more, see this GitHub repo: github.com/tommyod/awesome-pandas
@KhalidElMouloudi
@KhalidElMouloudi 6 лет назад
Thanks!
@timjohnston7958
@timjohnston7958 7 лет назад
Outstanding Tutorial!!!
@MarkButterworth13
@MarkButterworth13 9 лет назад
Finally got there.... DON'T go to the releases directory as per the slides in the video! Go to github.com/brandon-rhodes/pycon-pandas-tutorial and download the zip with the button on the right. Then follow the "Detailed instructions" at the bottom which include the data file downloads. BTW I have no idea where the slides are they are not on speakerdeck (and it's search facility is crap, does an OR so you get long long listings).
@MrHomniel
@MrHomniel 8 лет назад
Great stuff! Thanks Brandon
@artemfediai7206
@artemfediai7206 3 года назад
I love this guy. He is like Sigmund Freud from python.
@ksauri658
@ksauri658 8 лет назад
Anyone else find that they have fewer titles than he does? Followed the instructions on his github..
@Rusianskatingfan
@Rusianskatingfan 9 лет назад
Tom Cruise comes out in the summer, this guy awesome!
@rangavembar
@rangavembar 5 лет назад
Brilliant!!
@robertramirez2167
@robertramirez2167 4 года назад
I am getting an error when running the data frames in the first example even when using the encoding='utf-8' anyone have this issues?
@gourishpisal2796
@gourishpisal2796 4 года назад
Where can I get the above data set can any one share the link?
@suileungmak9325
@suileungmak9325 7 лет назад
The Tutorial is great. However when I try to run the BUILD.py to convert the data file. I found the following error : No such file or directory:''genres.list.gz" Any idea how to solve ? Would any one want to send me the csv file ? Thanks !
@ajeetpisal3156
@ajeetpisal3156 4 года назад
where can i get the data set of the above video?
@sharadbags
@sharadbags 8 лет назад
I am not able to see the data/titles.csv file in the github link given by Brandon. Has it been moved somewhere else? Can someone upload those files and send a link to it?
@gimboland
@gimboland 8 лет назад
You have to follow the instructions in the README.md, and run build/BUILD.sh, which downloads the raw data from IMDB and creates titles.csv and cast.csv for you.
@X3n0n36
@X3n0n36 8 месяцев назад
I'm going to do a silly comment please don't take it seriously but I did a most casted actor as 'Himself' and the top result came to be 'Adolf Hitler', where is his IMDb page?
@7QHook
@7QHook 9 лет назад
The ftp berlin site with the 4 imdb files wants authentication - popup asking for username and password. Any Help?
@7QHook
@7QHook 9 лет назад
7QHook the files are also here: ftp://ftp.funet.fi/pub/mirrors/ftp.imdb.com/pub/ no authentication asked for.
@smikketabito2813
@smikketabito2813 9 лет назад
I can't even do the first thing in the ipython notebook. Hitting enter does absolutely nothing but jump to the next line and wait for me to type more. What is wrong here? Why can't I do a single thing?
@dheerajmummareddy2286
@dheerajmummareddy2286 9 лет назад
+Smikket Abito try pressing shit+enter
@zhiminghe7757
@zhiminghe7757 7 лет назад
awesome
@001zeal
@001zeal 8 лет назад
I have an error on the forst exercise : File data/cast.csv does not exist Where do I get that ?
@muchuang923
@muchuang923 8 лет назад
+Yasha Jain You can go to github.com/brandon-rhodes/pycon-pandas-tutorial . And follow the instruction.
@001zeal
@001zeal 8 лет назад
+Mu Chuang Hi , I did convert them into .csv but its still give me an error.
@famgiu
@famgiu 6 лет назад
Is this video stil up-to-date to understand Pandas ore someone can suggest something better? I'can't find the data/titles.csv (the zip file desappeared..)
@famgiu
@famgiu 6 лет назад
Found the solution: everything is explained in the readme file
@KhalidElMouloudi
@KhalidElMouloudi 6 лет назад
The FTP links in the readme file didn't work for me (error 550). Any idea how I can get them?
@KhalidElMouloudi
@KhalidElMouloudi 6 лет назад
For anyone having the same issue, just delete the file name from the URL in order to gain access to the FTP folder directly.
@bdurbin
@bdurbin 9 лет назад
how do we get data/titles.csv?
@chandrakant72
@chandrakant72 9 лет назад
Brent Durbin Look at BUILD.sh in Brandon's repo and you will know how to get it. You have to install imdbpy which is listed as one of the packages in the requirements file and then get the files via ftp, then convert to csv.
@bdurbin
@bdurbin 9 лет назад
Brent Durbin The problem was that I was downloading from the wrong place. The correct link should be: github.com/brandon-rhodes/pycon-pandas-tutorial/releases/download/v1.0/Pandas-Tutorial.zip
@jacksondice5435
@jacksondice5435 6 лет назад
Sorry im really confused by this it says "To convert these into the CSV files that the tutorial needs, run the BUILD.py script with either Python 2 or Python 3. It will create the three CSV files in the data directory that you need to run all of the tutorial examples. It should take about 5 minutes to run on a fast modern machine:" $ python build/BUILD.py I dont get How im supposed to do this? it isnt BUILD.py its BUILD.sh... How do i "run it" with python3?
@fidelfs
@fidelfs 5 лет назад
I cannot find "data" folder.
@hirenpatel5448
@hirenpatel5448 5 лет назад
Go to build folder and try to run BUILD shell script. It will take upto 10 mins.
@dermaddin85
@dermaddin85 8 лет назад
I watched nearly all available introductions to pandas here on youtube... If you found this video, you can stop searching. Brandon is an awesome teacher and the lecture itself is pure gold. The 2,5 hours are well spend.
@rockykamen-rubio1600
@rockykamen-rubio1600 3 года назад
The link he gives in the slides uses code that is depreciated. Here is the current link github.com/brandon-rhodes/pycon-pandas-tutorial
@husseinyoussef6998
@husseinyoussef6998 3 года назад
Thanks bruv
@SumitKumar-lz5yv
@SumitKumar-lz5yv 2 года назад
where can I find the datasets used in this tutorial?
@damionperphius7775
@damionperphius7775 3 года назад
Anyone watching this in modern times: I found I had to use pd.read_csv() rather than pd.DataFrame.from_csv() (I think read_csv is deprecated) sort() has been replaced with sort_values() to sort by columns, and sort_index() to sort by index.
@damionperphius7775
@damionperphius7775 3 года назад
you may need to use encoding = 'latin-1' also
@yashguleria926
@yashguleria926 3 года назад
how do I get the data?. I cannot access the links shared on github ! :(
@supreethmeka4179
@supreethmeka4179 9 лет назад
Can someone please upload the datasets (in the form of .csv into zip file) and share the link? This will be really helpful .
@JeffWeakley
@JeffWeakley 9 лет назад
+David Backus thanks David. This looks like a great tutorial but needs this data.
@LYCANCURSE
@LYCANCURSE 7 лет назад
thanks alot
@sundeepb
@sundeepb 4 года назад
@@davidbackus9253 Thank you for linking this and for a great session that is still relevant 5 years later. Could you also link the exercise notebooks, please?
@hankhausrecordings3688
@hankhausrecordings3688 4 года назад
@@sundeepb I typed in the URL from the beginning of the session and found it is still live here. The datasets are not in the zip file but are still present at the link provided in an earlier comment by someone else. github.com/brandon-rhodes/pycon-pandas-tutorial/releases
@sundeepb
@sundeepb 4 года назад
@@hankhausrecordings3688 Thanks for the link. I missed that it was in the talk.
@piyush6631
@piyush6631 5 лет назад
This is *the best* tutorial on pandas that I've taken. My coursera course didn't covered pandas properly so I was actually learning it in bits and pieces. This tutorial is to the rescue. The most in-depth tutorial on pandas. This is premium education. Okay, I'm gonna stop now lol
@apoliahx
@apoliahx 2 года назад
This is the best comment on youtube that I've seen. . This comment is to the rescue. The most in-depth comment on RU-vid . This is premium entertainment . Okay, I'm gonna stop now lol
@OneEyedMonkey9000
@OneEyedMonkey9000 4 года назад
The Github repository contains all of the Notebooks, but not the data, presumably because it is massive. IIt does include a script to download it for you, (see the README). It takes ages to download, so run it in the background while you are watching this.
@snk2288
@snk2288 7 лет назад
By far the best tutorial on Pandas. Brandon's energy is remarkable. Not once I got bored or lost during the session.
@sabertoothwallaby2937
@sabertoothwallaby2937 Год назад
Sarcasm?
@nettlesome7125
@nettlesome7125 8 лет назад
Incredible lecture, with accompanying exercises thoughtfully conceived and well-integrated. Bravo!
@neerajkrishna1983
@neerajkrishna1983 5 лет назад
I've seen Many tutorials on Pandas, but this one stands out. Brandon's a phenomenal teacher. A truly gifted one. I wish many more tutorials from him.
@scfu
@scfu 6 лет назад
I don't think the BUILD.py handles utf-8 correctly in recent versions of python (the resulting csv files didn't contain utf-8 text)
@Nestorghh
@Nestorghh 9 лет назад
Thank you for posting this tutorial. I've been struggling with pandas but hopefully now I get to know a bunch of things from this talk.
@revolution77N
@revolution77N 9 лет назад
Brandon is a legend! This is the best teacher EVER!!! Thank you so much!!!
@quentinsf
@quentinsf 9 лет назад
A truly excellent tutorial - many thanks, Brandon!
@TheJamAttack
@TheJamAttack 3 года назад
If you get an encoding error even whilst using encoding='utf-8' , try using encoding='ISO-8859-1', fixed the problem for me
@tomparatube6506
@tomparatube6506 3 года назад
hi Jamie, hope you can help me with the 3rd csv output. After running BUILD.py, it outputted OK the 1st 2 csv ("titles.csv" and "release_dates.csv"), but hang on the 3rd file. It gave the below error lines. I could open the BUILD.py in Notepad++, but only understood some of the 230 lines Python code. Any idea what I should do? Thanks for any tip! "Finished writing "release_dates.csv" Reading 'actors.list.gz UnicodeEncodeError Traceback (most recent call last) C:\Ana3\lib\encodings\cp1252.py in encode(self, input, final) 17 class IncrementalEncoder(codecs.IncrementalEncoder): 18 def encode(self, input, final=False): ---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0] 20 21 class IncrementalDecoder(codecs.IncrementalDecoder): UnicodeEncodeError: 'charmap' codec can't encode character '\x99' in position 38: character maps to
@tomparatube6506
@tomparatube6506 3 года назад
hi Jamie: never mind. Somehow BUILD.py still outputted a 3rd file named "cast.csv". I then added encoding='ISO-8859-1' to the end as advised. I then could read the file. But while "actors.list.gz" and "actresses.list.gz" run into 339 and 196 MB, the size of "cast.csv" is only 88 KB. Thanks anyway! df = pd.read_csv('data/cast.csv', encoding='ISO-8859-1')
@TheKukuga
@TheKukuga 3 года назад
got this errror when i tried to build.py is this normal? UnicodeEncodeError: 'charmap' codec can't encode character '\x99' in position 38: character maps to the 3 csv is there but i'm not sure if its going to be the same
@vector4067
@vector4067 8 лет назад
Brandon and Jake made the PyCon 2015 a perfect one.
@RAL2010
@RAL2010 3 года назад
@23:20 The B-key works if Settings/Tex Editor Key Map is set to 'emacs' in JupyterLab. It does not work in mode 'default'.
@don0071
@don0071 6 лет назад
If you are working with the latest anaconda,Sort method has been deprecated ,please use sort_value as described @ pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.sort_values.html
@juliannavas9561
@juliannavas9561 4 года назад
@2:05:03 He says we need to unstack first for the type, do the fraction, and then unstack for n in order to get the correct fractions, otherwise we have to do the steps 3 times, however, I checked and unstacking everything at once gives the same results. Is it an upgrade for pandas that allows us to do it now (2019) and it coulnd't be done before (2015)?
@TheDeclancox
@TheDeclancox 8 лет назад
Kudos Brandon, a really excellent tutorial. A great intro to pandas and great inside info. Thank you!
@rubenknudsen3939
@rubenknudsen3939 9 лет назад
Great tutorial, finally a Pandas tutorial in a reasonable pace :)
@TheBirdBrothers
@TheBirdBrothers 8 лет назад
great tutorial, thank you, lively and full of choice insights :)
@josefernandosabando283
@josefernandosabando283 7 лет назад
Really good tutorial! I didn't know anything about pandas and could follow and complete all the exercises. Nice work Brandon :)
@YathinBM
@YathinBM 6 лет назад
Can the Jupyter Notebook of this course be shared here. I would like to practice the exercises in this session.
@lapatria100
@lapatria100 2 года назад
Was looking forward to this but can't get it up and running. :(
@shivshankar53
@shivshankar53 6 лет назад
Excellent video...I wish I find some more training materials of such high quality
@debigam
@debigam 9 лет назад
Best pandas tutorial and awesome notebooks for practice i've seen . Thank you very much Brandon Rhodes!
@lsagar
@lsagar 5 лет назад
I cannot find the data folder in the github ripo
@netzhauptschalter
@netzhauptschalter 8 лет назад
Start this tutorial right from Brandon's github - This way you will get the video, all notebooks, and the right input data for your DataFrames. github.com/brandon-rhodes/pycon-pandas-tutorial
@turhangokhan
@turhangokhan 8 лет назад
Wow... what a great crash course to Pandas
@mihaiphelps5035
@mihaiphelps5035 6 лет назад
There is no 'data/titles.csv' and 'data/cast.csv' file in the tutorials folder. Where do I get them from?
@KhalidElMouloudi
@KhalidElMouloudi 6 лет назад
Read the README file. You'll see a couple FTP links. Past them in your browser but before hitting enter, delete the file name from the URL (otherwise you'll get an error). This way you'll get access to the folder containing the data in LIST form. Follow README instructions to convert them to CSV.
@TimXSummers
@TimXSummers 8 лет назад
Great tutorial on Pandas, takes it at a good pace with all the exercises and solutions available. Great teaching style, thanks a load
@ElderLT
@ElderLT 9 лет назад
Great stuff! The amount of work put in is incredible! I am only into my first hour, but i really love that we have exercises (a bunch of them!!) to work with and seal pandas into one's head. Thank you so much!
@jdmccarty
@jdmccarty 9 лет назад
Anyone else get this error? $ python BUILD.py File "BUILD.py", line 4 ^ SyntaxError: invalid syntax
@originalusername143
@originalusername143 8 лет назад
+jdmccarty If you are running it in gitbash you should be running BUILD.sh not BUILD.py I had the exact same problem Navigate to the folder build $ BUILD.sh
@smitshah1737
@smitshah1737 7 лет назад
36:46 people have names but movies have a title......hahaha....awesome tutorial..!!
@rajrajvir378
@rajrajvir378 5 лет назад
Where do i get this file from ? This one where he is typing all this.
@waynewang8951
@waynewang8951 9 лет назад
This is the best Pandas tutorial I found on RU-vid! Well worth the time to watch!
@alexmtl70
@alexmtl70 8 лет назад
Wonderful tutorial.
@LookNumber9
@LookNumber9 9 лет назад
Very helpful, Brandon. Thank you!
@nguyentony4761
@nguyentony4761 8 лет назад
Excellent job.
@thomasnn
@thomasnn 6 лет назад
I thought I knew Pandas well, but still learned so much!
@enes-the-cat-father
@enes-the-cat-father 5 лет назад
This is a great exercise. On the other hand, the guide on Github is not for Windows users. As I beginner, I spent 3-4 hours to import the dataset and failed. The guide on Github assumes everyone is using Linux or Mac OS. With the help of an expert, I could import and used this dataset. It's so great!
@olawaleojodu6704
@olawaleojodu6704 3 года назад
How were you able to import the dataset?
@enes-the-cat-father
@enes-the-cat-father 3 года назад
@@olawaleojodu6704 yes, but I don't even remember anything about what was the problem.
@archana6951
@archana6951 5 лет назад
This video was super helpful, thank you!
@iloveno3
@iloveno3 5 лет назад
Thanks a lot, useful and entertaining...
@mickeykedia
@mickeykedia 9 лет назад
Outstanding video. One of the best 'Introductions' to Pandas !
@NajeemMuhammed
@NajeemMuhammed 7 лет назад
He's a great teacher! It was a well spent 2.5 hrs.
@sudiptochatterjee
@sudiptochatterjee 7 лет назад
Brandon Rhodes, thank you for the amazing session.
@me4901
@me4901 5 лет назад
Still one of the best tutorials for Pandas.
@musclesmalone
@musclesmalone 3 года назад
trying to do the accompanying exercises but any question relating to the "cast.csv" dataset takes an extremely long time to execute as this file seems to be massive. Anyone else have the same problem and is there a way around it to make doing the exercises quicker? awesome tutorial by the way!
@tomparatube6506
@tomparatube6506 3 года назад
Sample "cast.csv" 'size down to 1/2 or 1/3 of original size, using pd.DataFrame.sample() method. Row index numbers will be different from his example though.
@musclesmalone
@musclesmalone 3 года назад
@@tomparatube6506 That's very helpful. thank you!
@2441139ify
@2441139ify 8 лет назад
%%time is not giving the time in jupyter
@赵潜-x2l
@赵潜-x2l 8 лет назад
it is amazing, i can use the skill in my work,i love it.
@markhou
@markhou 2 года назад
That guy is awesome
@videosbuff
@videosbuff 2 года назад
Awesome lecture!!
@geoffcounihan7093
@geoffcounihan7093 7 лет назад
great stuff, thanks for the lesson!
@ginebro1930
@ginebro1930 7 лет назад
Amazing presentation!!
@TomHines
@TomHines 7 лет назад
Brandon... excellent talk.
@magicbreifcase
@magicbreifcase 5 лет назад
Brandon marry me
@frankservant5754
@frankservant5754 4 года назад
THANK YOU
@kylemizui7536
@kylemizui7536 3 года назад
Gold
@AsifMehedi
@AsifMehedi 9 лет назад
Excellent tutorial. Particularly helpful for me was the explanation of unstack and stack.
Далее
Is Computer Science still worth it?
20:08
Просмотров 336 тыс.
Watermelon magic box! #shorts by Leisi Crazy
00:20
Просмотров 14 млн
Самая сложная маска…
00:32
Просмотров 1,3 млн
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Why Don Knuth Doesn't Use Email - Computerphile
2:33
Просмотров 211 тыс.
4 Pandas Functions That I Wish I Knew Earlier
4:33
Просмотров 25 тыс.
Solving real world data science tasks with Python Pandas!
1:26:07