Profiling and optimizing your Python code | Python tricks

Подписаться 21 тыс.

Просмотров 71 тыс.

50% 1

(For more, visit pythontutorials.eu !) In this video, I show how you can profile Python code using the cProfile module, and how you can use this information to optimize your code, resulting (sometimes) in massive performance improvements.
The Jupyter notebook is available from osf.io/upav8/

Опубликовано:

5 янв 2018

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 139

@senajoaop 5 лет назад

Had to do a fast and superficial analysis in a very long code. This video made it possible, thanks a lot pal.

@simonbrecher878 5 лет назад

Good video, but it is actually quadratic (polynomial), not exponential. Quadratic is n^2, polynomial n^k, where k is a constant and exponential is k^n. n is length of input. You would not be able to do even near 5000 in exponential problem.

@daviddvorak3278 2 года назад

The final solution is also not linear, but n*log(n), since python sort is not linear.

@eilonavizemer7755 3 года назад

Great video! here is another, perhaps easier, solution to make this code's complexity linear: 1. lowercase all the movies 2. convert your movie list into a set (sets in python avoid duplicates) 3. convert the set back into a list and return it.

@liorbm1 2 года назад

It will be nice to see perf difference between his final code to your idea..

@vermarajat2596 2 года назад

i think conversion from set to list will take more time.

@cosminturtureanu692 Год назад

The goal is to find the duplicates, not to remove them

@mstr_rprochowicz 4 года назад

It helped me a lot in tracking expensive functions that were unnecessary used 2 million times in a loop. Thanks for this useful tutorial!

@Drahagoon 4 года назад

Awesome video! Well explained, with a simple, clear and typical hands-on example illustration. Great work.

@legau2k 6 лет назад

Awesome video. It was great watching you go step by step through the ode optimization. Your solution for finding duplicates also was very clever and elegant. Worth a subscribe ^.^

@sheikhakbar2067 3 года назад

This channel needs a couple of millions subscribers... I always come back to it to learn those marvellous tips and tricks!

@balmittal1770 Год назад

Nice code optimization. specially the last one.

@MsSuyash1995 5 лет назад

I came here to get a glimpse of how cProfile module but leaving here impressed with your final solution... I loved how you combined the zip() function after sorting the list... And, a great job in illustrating the importance of why profilers are an important tool in a programmer's armamentarium...

@MrAmbarish710 3 года назад

Man your videos are really really helpful! Best explanation of cProfile, profiling and optimization in python. Please keep posting videos...

@sailalmishra4860 5 лет назад

Buddy this is Amazing, You should not be on such low subscriber count.. God bless

@mervynwinn1852 4 года назад

i love the way you say "popping"

@AyushMandowara_xx7 4 года назад

This helped me optimize code by about 50-75% depending on the file contents being scanned. Earlier it was consuming about a minute on a large scan, while now it takes about 20 at max. The average speed is reduced to 10secs from 25secs. All I did after analyzing was change my Pandas Series objects (generated from Google Spreadsheets) to tuples (lists would also have the same effect but my data never changes in a single run). Using cprofiler I could see that Pandas library was consuming loads of resources just to fetch values based on an index number. Thanks a ton!!

@shivan2418 4 года назад

In case anyone in the future reads this I found that this method executes even faster than the method he ended up with. from collections import Counter def find_duplicate_words_counter(src='movies.txt'): return [movie for movie, count in Counter([movie.lower() for movie in read_movies(src)]).items() if count>1]

@Alister222222 3 года назад

Was going to post this as well - converting the list into a Counter (e.g. a special dict type from the collections module) and running a comprehension to get back everything that had a count above 1 does seem to be the cleanest way to get to the solution, and I am pleased it is also the fastest!

@droit19 2 года назад

@@Alister222222 - I tried this and was 0.13 seconds faster or 44% faster than the Zip method

@ranelpadon8834 2 года назад

Good analysis and build up of improvements. Thanks!

@DaanWaardenburg 4 года назад

Keep coming back here when my code starts running slow :P

@sschmachtel8963 4 года назад

Yeah I can imagine :-) First time here. Coming back to remember yourself that errr... yeah ... why on earth does it take so long and frustrates me ... how am I ever going to find out wth this code is slow. Is it me or is it some crazy circumstance that goes on in my libraries that I use. While waiting for your code to finnish you can actually study the problems and get them fixed. But I do think not by hand just by profilers

@arjunkirpal9776 6 лет назад

Thank you Sebastiaan! Would love more Asyncio videos!

@prakharchaurasiya8107 4 года назад

Finally some optimization that is not too complex. Thanks.

@migovas1483 4 года назад

This was great and clear, right to the point!!

@0versun0 6 лет назад

More more more videos. Yours video is very helpful! Keep going

@abhishekpandey7096 5 лет назад

Hey🌏🌏🏕️

@tonyradice4166 Год назад

Outstanding presentation!!!

@razintailor 3 года назад

Great explanation. Lucid and fundamental. It is indeed helpful.

@marazDNG 2 года назад

Great video man!

@hamol3d 3 года назад

Great Video! Thank you.

@ganeshchaudhari8087 Год назад

Well explained! Thanks for this video....

@rgrapey 6 лет назад

Clear and informative!

@vinitkumar2923 Год назад

Great video and explanation. Thanks for sharing this.

@MultiRick15 2 года назад

Wow! Great explanation.

@farooqseeru948 5 лет назад

Brillant. Clear explanation.

@nutcrackeroverdrive 6 лет назад

Thanx, Sebastiaan, very useful and helpful video.

@maedehshahabi4744 Год назад

Thank you sir for your clear explanation.

@thecaveofthedead 5 лет назад

Excellent tutorial. Thanks.

@babuasian 6 лет назад

Appreciate it. Really useful for most of the programs..

@botenbireu7875 6 лет назад

Thank you a lot! very clear explanation!!!

@sm3801_smo 6 лет назад

I initially subbed because of your Biological Psychology videos, but I didn't know you're into programming, very useful video!

6 лет назад

+Samuel Muñoz thanks ! Yes, the Bio Psy lectures are something new for me. Most of the videos are about Python and/or OpenSesame

@15kasturi Год назад

I just subscribed you by watching this video, very informative and nice goggles!

@yildirimicen766 2 года назад

Hi Mr. Mathot, you are great, I love your Python sessions... :)

@danyalt8221 2 года назад

It Was Great! Thank You.

@marveltv5341 3 года назад

Careful... he is a hero 🙌

@hrithiksharma2047 3 года назад

Great tut bro! Thank you

@Grimlor 6 лет назад

I've found this so useful! Thank you for this video. By analyzing my code and applying a little tweak, I've already managed to save 0.8 seconds of runtime. And I've only just started! :D

6 лет назад

+Grimlor glad to hear it!

@sashkazayebashka 5 лет назад

Great video/ Thank you man!

@onlymusic2005 4 года назад

Real treasure... bunch of thanx

@Julien-hg8jh 3 года назад

15:30 auto corection ! Nice video BTW :D

@jeremyalvaprathama4069 4 года назад

Awesome work! I just subscribed

@kpespinosa 5 лет назад

great explanation! cheers

@blanky_nap 6 лет назад

Great video!

@JakobRobert00 3 года назад

Thank you, this helped me so much :)

@user-fi8ii5fx3b 3 года назад

This is super nice video, thank you sir

@benedictcoltman1983 3 года назад

Superb! Thanks

@TheFilipo2 6 лет назад

Thank you, this was super helpful!

6 лет назад

Good to hear!

@acho8387 4 года назад

very good video! thanks!

@mohammedgt8102 Год назад

Awesome video.

@deividaspelakauskas9394 3 года назад

Underrated.

@ke30_ 3 года назад

I love this so much

@Nobrezando Год назад

You earned my follow at 15:33

@pygemssoftware4254 2 года назад

Great work and explanation. I would like to email my eyes to you as token of my appreciation😃

@siddharthindora7182 4 года назад

Great Video...Thanks for explanation :)

@dhananjaykansal8097 5 лет назад

YOU ARE JUST AWESOMEEEEEE

@alvaromartin6301 4 года назад

Excelente Content! New sub.

@haonanqiu4251 3 года назад

thanks a lot!

@xanterx 4 года назад

Love your shades 🤘

@steelcock 2 года назад

Mindblowing

@parietal100 6 лет назад

Thank you Sebastian

@mahesh9762132636 4 года назад

This guy is crazy in coding , concepts and thinking...he brought down the execution time from 6 sec to .002 sec......this is insane ... tremendous work done bro...

@fuanka1724 6 лет назад

Loved this. Optimization is really important to me. Thanks.

@emasmach 5 лет назад

Nice. Excelent.

@svalaboj 3 года назад

your video is very useful, thanks for the same.

@graycybermonk3068 5 лет назад

You will kill me. Really Awesome.

@user-ot1uk8iy1t 2 года назад

thanks!

@fantasdeck 2 года назад

I like how you edited your video to hide the little typo you made. But, cool tool. Will be using...

@nikithar3628 6 лет назад

Awesome

@chunceywei8284 6 лет назад

Thank you

@norwegiandud 3 года назад

Helpful video, thanks! Just one 🐛 with the 007-method (or weird feature). If there are movies that are represented more than two times they appear as duplicates in the duplicates list. E.G. 'the phantom of the opera' appears five times in the TXT file, and four times in the list of duplicates. Now if this is a 🐛 or feature ... depends on who you ask.

@vyl6781 5 лет назад

Saved my sanity.

@deepak1725 5 лет назад

Very Very nice

@luciano_remes 2 года назад

Your last solution runs in NlogN time complexity, but you could actually make it faster by just using a set of found movies. It would run in Linear time and be way simpler: found = set() duplicates = [] for movie in movies: if movie not in found: found.add(movie) else: duplicates.append(movie)

@ikramu5719 4 года назад

Thank you for that explanation. Neat solution with the zip and slices too! ps The link for the movies file is now out of date though.

@7aygames35 3 года назад

The 22 people who disliked are those who were writing bad code and when it was pointed out to them, they just got angry

@AmrXcellent 3 года назад

Good video but If I understand correctly the final change in code change does not account for a movie title that is duplicated more than once. So the first two iterations of the code are doing more functionality. All in all nice video, I learned something new watching it. so thank you for that.

3 года назад

That's correct: triple duplicates are not caught with this method. And thank you!

@deadman1999 Год назад

yes, I was thinking the same thing, the final code was so fast because it only checked its 1st neighbor, taking into account that there were only 1 duplicate.

@vaibhavjain1914 3 года назад

Bruh in this video you are teaching code optimization but looking at your choice of wearable I feel I am learning how to assassinate enemy but amazing video 😀

@BullishBuddy 2 года назад

👍👍

@neelojp8460 6 лет назад

thank you so much for your videos they are really very helpful! Do you have any own books about python ?

6 лет назад

+post fix Thank you! No, I'm afraid that I do not have any Python books myself. But there are plenty of good free Python books out there, such as Byte of Python.

@neelojp8460 6 лет назад

thank you for your answer, dank je wel :-).... you should wirte one about the tricks which you show us here... and here is the link for the Byte of Python for all others: www.gitbook.com/book/swaroopch/byte-of-python/details

@DragonRazor9283 3 года назад

from 6 seconds to 0.007 seconds wow!

@Nobrezando Год назад

"Well, you can see that our code it's taking about 0.00023 to execute. But if you're not satisfied with that..." lmao

@anumsheraz4625 4 года назад

is there any tool to identify how much memory is consumed by the code ?

@bunlonglay463 3 года назад

Hey, shouldn't be your last solution, where you sort the movies list, O(n log(n)) and not as you said O(n)?. Sorting the movies list takes O(nlog(n)) time. Also when you use zip with slices of the movies arrays, copies of movies are created. This is also inefficient. Could someone maybe confirm what I said? Anyway, great video explaining the profiler

@fcoignmo 6 лет назад

Where did you get the "movies.txt" file (link)? Thank you for the vide, great work.

6 лет назад

My reply is a bit late, but I got this data from here: osf.io/r73y9/

@Jure1234567 3 года назад

Can I do it with wxwidgets classes and multithreading?

@mariusnorheim 6 лет назад

Hi Sebastian, I tried running the profile decorator, but 1) I'm using python 2.7 and 2) I'm running it in atom, not jupyter, so I get an error message. Would be awesome if you could post the code for python 2.7 as well in the file

@mariusnorheim 6 лет назад

Actually got it to work. It seems that you'll have to encode the Unicode strings to byte strings, and use io.BytesIO, instead of io.StringIO.

@Excess-qn7qh 3 года назад

does the @profile annotaion only work with jUpiter?

@drewduncan5774 6 лет назад

12:54 Quadratic, not exponential.

4 года назад

NO it's in O(n*ln(n)) because of the Sort()

@stephenaiesi6073 4 года назад

With Big O notation we are really ony concerend with the term with the highest power. An algorithm on the order of O(3x² + 2x + 11) is usually reduced down to to O(3x²). I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion. So in terms of Big-O, an algorithm on the order of a quadratic equation is usually considered to be on the order of its highest term. If you think about comparing two algorithms, one operating at O(3x² + 2x + 11) to one that runs at O(3x²), let's see how different they really are: So given the following equations: f(x) = 3x² + 2x + 11 g(x) = 3x² Let's see how they correspond given a single input (n=1) f(1) = 16 g(1) = 3 The ratio between these two results is 5.33 and would go to show that quadratic and exponential are not swappable in this context Now lets scale it to 100 inputs, n=100 f(100) = 30211 g(100) = 30000 Now they are operating at a ratio of 1.007. Not identical, but damn near close dependng on the precision needed. In terms of making algorithms efficient with computers 100 inputs is not considered much anyways. Now let's scale it to 1,000,00 inputs f(1,000,000) = 3000002000011 g(1,000,000) = 3000000000000 Ratio of 1.00000066 The difference in comparing these two with without the extra terms is often negligible when comparing them to algorithms on the order of a different exponential power. Run the same exeriment with comparing f(x²) and g(x³), with and without extra quadatic terms and you can see that dropping the lower terms, though not exact, is definitely enough to compare the efficiency of algorithms. So as the size of the inputs grows, paritculary towards quantities where optimization is necessary, we are usually dealing with such vast amounts of data that including the lower terms of the quadratic formula in our assessment of an algorithms efficiency does not necessarily provide extra insight. ps: i'm fully aware this isn't the case in every domain, but it is for the most part how it is done and definitely applies to the kinds of problems in this video.

@_treed1 4 года назад

Lol these comments. It's a loop in a loop which is n * n so n^2 tadah

@__gavin__ 4 года назад

@@stephenaiesi6073 > I've seen books drop the coefficient as well but that has a fairly large impact on the accuracy of the expression in my opinion. Big O notation has a formal mathematical definition. A function f(x) is said to be O(g(x)) if |f(x)| =x_0, where A and x_0 are some constant values. Hence, when considering big O notation, it really doesn't matter if you drop the 3 or not. If f(x) = O(3x^2) then all we are saying is that there exists some x_0 such that for all x>=x_0, |f(x)|

@calebmunuru3598 4 года назад

Stephen Aiesi Thanks mate. This is a really good explanation

@adityakushwaha3654 3 года назад

But how do you know which code will be more efficient wrt present code ?

@yildirimicen766 2 года назад

Hi Mr. Mathot, how about the following with "combinations" (you can even omit "movies.sort()"): from itertools import combinations # find duplicates in list of movies movies = ['abc','abc','xyz','ddddd','ddddd','star wars'] print([m for m,n in combinations(movies, 2) if m==n])

@vanglequy7844 3 года назад

13:30 Who else pause the video and challenge yourselves? But beware of the tendency to jump into redesign the solution before profiling.

@xspager 5 лет назад

Awesome explanation but when you removed the function you also changed the way you do the searching, you stopped looping over all the movies and used the "in" operator

@vadimturov7808 5 лет назад

It's like forensics

@alishermatkurbanov9205 5 лет назад

What if the list has more than 1 duplicate, e.g. [1, 3, 1, 4, 1, 4, 4, 5] -> sorted [1, 1, 1, 3, 4, 4, 4, 5] -> zipped smth like this [(1, 1), (1, 1), (1, 3), (4, 4), (4, 4), (4, 5)], so 1 and 4 will be added to duplicates twice. Doesnt duplicates should be list with unique items?

5 лет назад

That's correct. Triplicates will end up twice in the list of duplicates, which may not be what you want. An easy trick to get around that would be to use a set comprehension (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-uTUV2eONqSQ.html), rather than a list comprehension. Because sets by definition consist of unique items.