Тёмный
No video :(

How slow is iterating over a pandas DataFrame? 

Visual Studio Code
Подписаться 512 тыс.
Просмотров 7 тыс.
50% 1

You can do it, but it's a lot slower than you think. Especially when you see how much we can speed it up.
00:00 - Intro
00:55 - Iterating with iterrows
02:59 - Vectorization explained
03:28 - Iterating with Series Apply (vectorized)
05:17 - PURE SPEED
06:02 - How could we make it even FASTER?
👩‍💻 Example code from video: github.com/burkeholland/itera...
🐼 More pandas: pandas.pydata.org/
Theme: GitHub Dark Dimmed
#jupyterinvscode, #jupyter, #python

Опубликовано:

 

13 авг 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 18   
@unknown3158
@unknown3158 2 года назад
While vectorization does help, a lot of the extra time is coming as a result of the 'attempts' being added individually. If you start with an empty list and then append to that list and after the loop is done assign the list to the 'attempts' column, you get a significant decrease in the total time, and there are probably better ways to optimize it. Bottom line, you are not really comparing the best versions of each of the implementations. Either way, it was a good video that sheds some light on vectorization, which helps a lot, especially if you run 'built-in' pandas functions.
@Thomas-gi9vy
@Thomas-gi9vy 2 года назад
Where was this 3 weeks ago when I was doing a data visualization project and needed this
@datanasov
@datanasov 2 года назад
This is actually not a vectorized function! You're implying that with the ": pd.Series" but if you print(type(tweet_text_series)) inside of the function you can see that it's actually just a string and apply runs it for every row of the series. You can also see that there is no vectorization speed improvement as you can do it in a loop with the same performance in the following way: cnt = 0 smm = 0 for index, row in df_tweets['tweet_text'].iteritems(): slashIndex = row.index('/') attempts = float(row[slashIndex-1]) smm += attempts cnt += 1 It just leaves out the slowest operations but the loop itself is not one of them (thus the improvement is not from vectorizing it). Sometimes vectorization does lead to drastic speed improvements but this is not an example of that. The pandas vectorized functions for strings are under df_tweets['tweet_text'].str. (e.g. .str.extract(...), .str.slice(...), .str.index(...)) Apply is usually no faster than a well written for loop / list comprehension and is NOT vectorized.
@nathanielvolk515
@nathanielvolk515 Год назад
This has saved me so much time!!!
@troysincomb
@troysincomb 2 года назад
Ive found that .itertuples has similar times as an apply. So if anyone has a use case where they need to iterate, .itertuples is the way to go.
@AnthonyShaw
@AnthonyShaw 2 года назад
Superb!
@adheesh2secondsago630
@adheesh2secondsago630 2 года назад
Hey bro, a vim tutorial would be appreciated too (;
@BurkeHolland
@BurkeHolland 2 года назад
Noted
@jackflitcroft881
@jackflitcroft881 2 года назад
One thing I've run into is when you have conditionals in the function you're applying over. For example, if I want to apply some string functions but only to rows where the string is found based on a regex. How might you do that?
@Adam.Netuddmeg
@Adam.Netuddmeg 2 года назад
When you assign values to the column, you make a subset of the column with your conditions, so only for those values will there be a calculation. After it you can fill the nan values with whatever you want.
@StefanoVerugi
@StefanoVerugi Год назад
List comprehension allows if-else and looks better
@ebukaezike9308
@ebukaezike9308 2 года назад
What font are you using ?
@BurkeHolland
@BurkeHolland 2 года назад
Fira Code
@asimrahal
@asimrahal Год назад
What is the interviewer's name?
@chinmayk8004
@chinmayk8004 2 года назад
you should quit Microsoft and launch a programming school... I'll totally join. even thought I know all this... it's weirdly ASMR-ish and therapeutic to see elegant code.
@BurkeHolland
@BurkeHolland 2 года назад
This might be the nicest thing anyone has ever said to me
@mustafahany8693
@mustafahany8693 2 года назад
terminal still not working at windows 10
@jobinbaby.k.b8373
@jobinbaby.k.b8373 2 года назад
Visual Studio Code Israel 🇮🇱🤝🤝🤝👩‍💻👩‍💻👩‍💻👩‍💻👩‍💻👩‍💻👩‍💻👩‍💻🖥️💻
Далее
Stop wasting memory in your Pandas DataFrame!
5:00
Просмотров 12 тыс.
Хитрость старого мастера #diy
00:54
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Make Your Pandas Code Lightning Fast
10:38
Просмотров 181 тыс.
How principled coders outperform the competition
11:11
I've been using Redis wrong this whole time...
20:53
Просмотров 351 тыс.
Loop / Iterate over pandas DataFrame (2020)
11:05
Просмотров 81 тыс.
You might never need Pandas again...
5:38
Просмотров 6 тыс.