The Fastest Way to Loop in Python - An Unfortunate Truth

Подписаться 228 тыс.

Просмотров 1,4 млн

50% 1

What's faster, a for loop, a while loop, or something else?
We try several different ways to accomplish a looping task and discover which is fastest.
― mCoding with James Murphy (mcoding.io)
Source code: github.com/mCodingLLC/VideosS...
SUPPORT ME ⭐
---------------------------------------------------
Patreon: / mcoding
Paypal: www.paypal.com/donate/?hosted...
Other donations: mcoding.io/donate
BE ACTIVE IN MY COMMUNITY 😄
---------------------------------------------------
Discord: / discord
Github: github.com/mCodingLLC/
Reddit: / mcoding
Facebook: / james.mcoding

Наука

Опубликовано:

18 дек 2020

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 2,2 тыс.

@AlexPadula 3 года назад

You know, if the time difference between a while loop and a for loop matters to your application, Python might not be the most appropriate language.

@abebuckingham8198 2 года назад

Comparisons are expensive and iterating is cheap. That's true in assembly, C and Python.

@firstname4337 2 года назад

WRONG -- he just showed that numpy works fine

@HiltownJoe 2 года назад

@@firstname4337 Yes, because if you use numpy you use C. That is the whole point of numpy, to not use python for operations that are repeated millions of times. Alex' point stands, because in the numpy case we use Python for everything that is not time sensitive, but for the thing that is time sensitive we use a small C monster wrapped in a thin fluffy Python blanket.

@boredhero 2 года назад

@@HiltownJoe Have you heard of PyPy. I ran his source file in pypy3 and got less than a quarter of a second for all methods except sum generator and sum list comp (0.72 seconds for those)

@coreylazy 2 года назад

Clearly not a protip. A professional understands the constraints of their environment and the project they're working on. The choice of programming language is not (always) up to the individual writing the application.

@nathandecena7694 3 года назад

"The Fastest way to Loop in Python is not to Loop in Python" Great sentence.

@karx11erx 3 года назад

Talk about understanding the letters, but not the meaning.

@samuelhulme8347 3 года назад

To loop,or no to loop that is the question

@nathandecena7694 3 года назад

@@samuelhulme8347 Hahaha DEEP

@mohamedfilali977 2 года назад

@@samuelhulme8347 ok, nice job, now im lost ..lool

@Nerketur 2 года назад

"a strange language. the only winning way is not to use it. how about a nice game of chess?"

@peterwilson8039 Год назад

I'm a physicist and my experience is that when you spend the majority of your time writing the code, and running it takes only a few seconds, reducing the development time is much more important than reducing the run-time, and that's why I like Python.

@mCoding Год назад

So true, and something many eager-optimizers need to hear! They will certainly learn from experience as they discover where they spend their time.

@yutubl Год назад

Hi Peter, well I've been a software developer delievering a (Basic) scriptable technical measurement data PC-application NI DIAdem, and we put most of our development effort into giving our users like you (engineers & physicists) optimized script commands, which behave similar to that of Python. So using already implemented loop command may give you better performance than own loops as described here.

@FlanPoirot Год назад

that's the thing tho, they're not mutually exclusive, you can have a very easy to write and a decently performant language (look at julia) why python is slow is not bc it can't go faster or bc it's a sacrifice for expressiveness (as there are python implementations that are way way faster but that lack total compatibility with libs) The reason why python is slow is because it's poorly designed. It uses linked lists for lists instead of vectors (log of n vs log of 1), it compiles to an intermediate language but it leaves it at that, there's no JIT in Cpython, the global interpreter lock impedes people from writing super performant code, etc, etc Python is slow bc of the choices they've made over the years, they're consistently made choices that affect performance, they never took the time to think about it and see how they could implement abstractions that would not take a big toll on performance, they just added things on top as they saw fit. But that's not a big surprise once you see that python was a pet project of Guido and he just made it bc he could and people happened to pick it up and spread it (even tho it's performance and features were worse back then)

@CottidaeSEA Год назад

You're using Python for simpler scripts, like what it was made for. That's a valid use case. Making large systems in Python is just stupid for the simple reason that it will be expensive to run.

@spl420 Год назад

That's depends on task, but when we talking about science - it's 100% true.

@tomaskot9278 3 года назад

Fun fact: A good C compiler would detect this common pattern (summing sequential numbers) and replace the for loop with the direct calculation of the result. And if even the number of elements was hard-coded, the compiler would even simply replace the whole code with the hard-coded numeric answer.

@Czeckie 3 года назад

I always wondered what's the point of this? Are series we have summation formulas for really coming up in a real software? As far as I can tell the only application is to show it in a presentation about optimizing compilers. And frankly, this is even a wrong showcase of optimization, cause the useful techniques are general and pliable enough for compiler to reason about. This is just a hardcoded nonsense.

@vinno97 3 года назад

@@Czeckie this exact problem isn't common. But good compilers can recognize loads of (small) patterns in a codebase that, when combined, can amount to a large speedup. Sometimes some optimized chunks can again be combined into a new, smaller and/or faster, chunk.

@joestevenson5568 3 года назад

@@oj0024 Yes, if you write shitty enough code the compiler cannot fix it for you because it can’t make assumptions about what you’re trying to accomplish. This is not a revelation or even a shortcoming, its a feature.

@oj0024 3 года назад

@Matt Murphy He is right that compilers can do it, but if we look at the two most modern compilers right now only one of them can detect it and only if you write the code in a non-obvious way. The second part showed that compilers don't do it with constants either, they just execute the code internally. I'm not saying that compilers aren't doing great optimizations, just that the claim that "A good C compiler would detect this common pattern" is wrong. Especially since such optimizations don't gain anything (I'm talking about the pattern transformation, not the constant evaluation). They can only be done in very few cases, and you usually assume that the programmers know what they are doing. (e.g. the only reason to implement the summation code would probably be for benchmarking purposes)

@oj0024 3 года назад

@@joestevenson5568 clang didn't detect the pattern when using a normal for loop, it only detected it if you write the non obvious and usually not used while(n--) loop.

@daifee9174 3 года назад

pro tip how to boost python: use C

@gutoguto0873 3 года назад

or C++, either choice is good.

@embeddor2230 3 года назад

C++ should be the better option here, cause of the OOP support.

@mCoding 3 года назад

Or any compiled language, really.

@Jonas_Meyer 3 года назад

Assembler the only real choice.

@leopoldbroom7679 3 года назад

@@Jonas_Meyer You're not a real programmer until you can program in assembly

@FaranAiki 3 года назад

What I did not know about this video, is, 100_000_000. I did not know if python has a feature to add underscore between number. Thanks, anyway :)

@mCoding 3 года назад

I add secret lessons to all my videos!

@lawrencedoliveiro9104 3 года назад

Added in Python 3.6.

@monojitchatterjee3185 3 года назад

Same! I didn't know this too!

@loutragetadk453 3 года назад

Same

@rogervanbommel1086 3 года назад

Same

@kitrodriguez992 3 года назад

I'm in my 3rd year in Computer Science and trust me, I've watched SO many youtubers that do this and you're by far the smallest for an unknown reason. You explain things really well, you don't beat around the bush and overall just make quality content. :D I'm now subbed and I'll be coming back for more :D

@mCoding 3 года назад

Thanks so much for the kind words!

@deedewald1707 3 года назад

Excellent content and testing summaries !

@krele14 2 года назад

Then again, he's the only one that would place "just know your result ahead of time lul" as a viable suggestion.

@julianmahler2388 2 года назад

5:08 It's not about NumPy being primarily written in C (so is CPython), but about Python' dynamic vs. NumPy's static typing. In a dynamically typed language like Python, every time the interpreter encounters an expression like c = a + b, it will first have to check a's type (and find out it's an integer). Then it checks b's type (and finds out it's an integer too). Then it checks if there's any known operation to add (+) two integer types (of course there is). If you do this one million times in a row (e.g. in a loop), the same checks will be performed over and over again. It's like asking "Is a still an integer? ... What about b? ... Do I know how to add two integers?" one million times in a row. And of course all of that happens at runtime. That's why Python is so damn slow in that regard.

@dariuszspiewak5624 Год назад

Yeah... that's the price you pay for flexibility. But there are ways to overcome this. They all require some kind of hinting that the structures you use contain only elements of the same type.

@sumnerhayes3411 Год назад

> In a dynamically typed language like Python, every time the interpreter encounters an expression like c = a + b, it will first have to check a's type (and find out it's an integer). Then it checks b's type (and finds out it's an integer too). Then it checks if there's any known operation to add (+) two integer types (of course there is). A dynamically typed language implementation doesn't *have* to do things this way, that's just the naïve approach (albeit the easiest approach out of the box for new implementations). A smarter specializing JIT can create precompiled versions of functions and expressions for commonly used types (e.g. integers), and can determine that they are invariant and avoid having to re-check the types each time you hit the expression. The default CPython implementation does things the way you describe, but alternate implementations like psyco and PyPy take the specializing approach in their JITs. Not surprisingly, PyPy executes these for and while loops on the order of 50-100 times faster than CPython does on my machine (and ~10 times faster than numpy). $ python3 benchmark.py while: 5.254076757992152 for: 2.9640753349813167 sum_range: 1.1016407490242273 sum_numpy: 0.6697274879843462 $ pypy3 benchmark.py while: 0.05818561598425731 for: 0.07444041498820297 sum_range: 0.07667819698690437

@abhishankpaul Год назад

I hope this problem can be partly solved if people declare data types of nearly all the variables as they need to do in C before hand like I do most of the time

@kylehart8829 Год назад

@@sumnerhayes3411 There are tons of other reasons why dynamic typing is pointless and bad. It saves you having to *gasp* tell the program the type of a variable one single time; and it costs you dearly. All python projects should enforce static typing, even though that doesn't gain performance. All languages should be statically typed, period, because it is infinitely more readable and provides no development benefits at all.

@grawss Год назад

@@kylehart8829 I would have disagreed prior to learning Python, back when it was just dogma in my head. Interesting how that works. However, the learning process was made easier by removing a step. I think a better solution would be to make it extremely clear how to toggle Python into enforcing static typing, and why that's a huge benefit.

@SteinGauslaaStrindhaug 3 года назад

Half through the video I thought: actually for this simple sum you don't even need to loop just the formula. And you didn't disappoint. While this might seem like a very synthetic example which won't often be possible in real life, you'd be surprised how many times I (a programmer who is not even very good at maths) have been able to replace large loops and delete large chunks of code and replace it with a simple formula. Not usually this clean and simple, but quite often you find that the code is looping through a lot of data calculating stuff and then filtering out stuff, and if you simply rearrange things you might be able to do the filtering before the loop which might even reduce the loop to a few statements without a loop. Very often this is not obvious in the code but very obvious when sketching out the desired results on paper. Pencil and paper is definitely not obsolete.

@mCoding 3 года назад

Indeed! Never underestimate math! The feeling of deleting 30 lines of scratch and replacing it with 3 lines of math is indescribable.

@OatmealTheCrazy 3 года назад

@@mCoding it's the moments when you're lying in bed and it kinda hits you out of nowhere that are the best Of course, it is kinda a let down when the divine revelations are actually wrong though

@ttt69420 2 года назад

Doing it on paper with a pencil probably triggers a different part of your thinking process from all the years of doing math that way.

@delqyrus2619 2 года назад

You can actually be even faster than with math, if you know how CPUs work. If you instead of calculating (n * (n -1)) // 2 just calculate (n * (n -1)) >> 1 it should (in theory) be a few clockcycles faster. While most math operations are pretty trivial for the CPU(+, -, *) division is a pretty annoying task, so you might spare some time with just bitshifting.

@timewave02012 2 года назад

@@delqyrus2619 Bit shifting is math too. No different than multiplying and dividing by powers of ten being easy in decimal. A good compiler will optimize it though, so it's probably best not to bother.

@NickByers-og9cx 3 года назад

You want to be careful when timing functions, your processor likely speeds up or down based on system load and ambient temperature changes. Although you can still capture order of magnitude changes very easily like you did. If you're on a laptop though where cooling and power are limited, take extra care, and run multiple simulations in different orders at random times!

@mCoding 3 года назад

I cannot stress this enough! Benchmarking is such a difficult topic even though it seems so simple.

@luziferius3687 3 года назад

One thing you can do is to disable CPU boost clocks during benchbarks. This way, you get stable clocks regardless of load created by the tested function. For example on Linux, you can just write 0 to /sys/devices/system/cpu/cpufreq/boost to disable the boost or write 1 to enable it again. This’ll get you much more stable results that are comparable even if ambient temperature changes.

@NickByers-og9cx 3 года назад

@@luziferius3687 that's a good approach! Myself I usually try to do perf stat whenever possible, and count the actual CPU cycles it takes. Run each command say 3 or 5 times and take the median!

@Teriton 3 года назад

@@mCoding Obviously the best way is to time it yourself with a stopwatch

@MisterAssasine 3 года назад

imo it would have been better to use a smaller n in the function and a higher number of runs

@opticalmoose8091 2 года назад

Sometimes it’s just impossible to express something in numpy - iterative algorithms, for instance. I had a very hot piece of code, which implemented an A* pathfinding algorithm, which was slowing my entire program down. I looked for solutions and found a package called Numba - it allows you to decorate a single function and it would try to compile it the best it can. There are various optimization settings to consider, and in order to get *significant* improvements in performance (like, order of magnitude and more) you basically had to rewrite your program as if you were writing in C, but using Python’s syntax - so, no dynamic typing, using very limited subset of built-in types and refrain from using custom classes as much and possible. But it didn’t require too much work, and it was a great benefit in the end - I ended up with a blazing fast function without needing to write it in actual C (not very hard) and then figure out how to couple this with Python (quite finicky if you’ve never done it). So yeah, I’d recommend looking at Numba if you need a more general solution.

@NJ-wb1cz Год назад

Or just ditch python and go for java or other jvm languages or go or rust etc

@pavelperina7629 Год назад

@@NJ-wb1cz Eh, rust is likely one of the hardest languages to learn (after "modern" C++ with template metaprogramming). Julia seems quite easy to learn and pretty fast (if strongly typed it's close to C, maybe twice or three times slower, but not two orders of magnitude like Python). On the other hand I'm not sure how to structure code which is too complex for a single file and too simple to put it into package. It was basically developed for data crunching. Also year or two ago Rust was lacking some libraries, especially for plotting graphs (C++ lacks them too), painting (Cairo) and even exporting data as images (no support for writting of 16bit PNG files)

@julians.2597 4 месяца назад

@@NJ-wb1cz right, who would do something as stupid as just adding another library as dependency when they could add an entirely different development stack just to optimise one function.

@NJ-wb1cz 4 месяца назад

@@julians.2597 why is adding libraries stupid? That's a strange stance. Sure, moving to another language is an option, but you'll likely use libraries there as well.

@deemon710 2 года назад

Oh snap! I thought I was saving on resources by not using additional modules like numpy but now I see it's ridiculously faster. I wasn't aware pre-built functions were that much better either. Thanks for the reveal!

@eric_d Год назад

He actually did say in the video that Numpy uses a lot more resources, and with a bigger number, you may not have enough memory. Your original thought was correct, that you do save resources by not using additional modules. You may save time, but not resources. Also, if you compile your code, those additional modules will need to be compiled in, which will make the executable larger, and require more resources as well. As we are quickly approaching the limits of how far we can go with hardware (CPU technology is pretty much capped out, unless someone figures out how to make atoms smaller), we need to be more mindful of the amount of resources we use. Programmers need to stop being lazy. They have to stop thinking that they can bloat their code with hundreds of modules because CPU's will be faster by the time their code is in use. Just because speed and bandwidth are available, doesn't mean they should be used. For example, I know of some websites that include 50-100MB of Javascript code for each page view. Sure, dial-up is a thing of the past in most parts of the world, and people have faster internet connections than ever before. The page loads pretty fast on a gigabit connection. What about the people who have much slower connections? Is it really necessary to design your website where every single page view uses an entire day's worth of bandwidth on a dial-up connection? If you have several pages of the same website open in different browser tabs, your memory utilization goes through the roof! I've seen a single browser tab on one of those websites using over 2.5GB of RAM! I haven't even checked how much unnecessary Javascript is downloaded when using something like Facebook, but I've seen a single tab with Facebook open using over _5GB RAM_!!! This kind of waste needs to stop. Even if it does save a second or two here and there, it could cost several more seconds (or longer) in other areas. I've always hated importing modules in any language I've ever programmed in. I would much rather write my own code to do what I need done, whenever possible. I realize that's not always possible, but it's always better than adding bloat to a program that's not needed. Look at what's known as the "demo scene". It's something that started LONG ago, where people try to make the most exotic program possible in assembly language that fits within a 64KB executable. People have fit 10 minute long full screen videos with music into 64KB! They have created complete first-person-shooter games in 64KB! These programs run plenty fast on hardware from over 30 years ago! If more of todays programmers took the time to write good code, we wouldn't even need computers as fast as we have. Sorry, that started off as a 1 line reply, and ended up going 100 directions. There's still soo much more I want to say about the topic, but who is going to read it anyway?

@moustafael-shaboury2659 Год назад

@@eric_d I will! I'm very new to programming and I really appreciated reading that. Thank you for the reminder :)

@saivardhanchowdary7918 3 года назад

The RU-vid algorithm got you mate. All the best!

@mCoding 3 года назад

Thanks for the support!

@hemanthkotagiri8865 3 года назад

To the point, quality content. Subscribed!

@mCoding 3 года назад

Thanks for the support!

@Connor-of6mu 3 года назад

I really like your channel! These are very interesting insights into the Python language in small videos which makes them easy to digest. Keep up the good work!

@mCoding 3 года назад

Awesome, thank you!

@CodeDisease 3 года назад

“Python is an incredibly slow language” You just insulted my entire race of people... but yes.

@CodeDisease 3 года назад

I guess I’ll be first then...

@suryanshthakur5820 3 года назад

Your guess right

@soupnoodles 2 года назад

Its not incredebily slow, compared to others its slower yes but it can acomplish the tasks its meant for with no issues Machine learning, Automation, Data Science and Cyber Security

@CodeDisease 2 года назад

@@soupnoodles Yes but you can do all of that much faster with C, you're better off using C.

@CodeDisease 2 года назад

@@soupnoodles Also why did you like your own comment?

@GuilhermeFArantesSouza 3 года назад

"The best way to loop in python is to not loop", hahaha, loved it. Also got curious about doing the numpy sum but with the built-in range (because of the memory problem), how much would that impact the performance?

@mCoding 3 года назад

In my experience, I've never hit a RAM limit because of this, but in memory constrained systems this is definitely something to watch out for. I wish numpy has something like std::ranges::iota to use as an array-like.

@aquilesviza5550 3 года назад

@@mCoding There is a library called dask, is used in distributed computing and has support for numpy arrays. working with blocks consume less memory but might be a bit slower.

@tonyflury2291 Год назад

i think numpy.arange(...) creates a potential memory issue (it allocates the a large array in this case), but the Python3 builtin range(...) doesn't allocate a large array - it is a lazy iterator that simply tracks the last value generated so it knows what to generate next.

3 года назад

This is the reason why python is such a good programing language. You can write your program with easy and pretty syntax and use packages wrapping C code to do the complex and time consuming tasks. This just shows that you can use python and have fast running code. I would also add that if your program's only focus is speed and performance then don't use python, simple as that.

@mCoding 3 года назад

People too often leave out the speed of development in their calculation of speed!

@waldolemmer 2 года назад

Python isn't special in this regard

@imrobbinganyonewhotalkstom4881 Год назад

People oftentimes forget lua.

@HydratedBeans Год назад

A faster processor is cheaper than another developer to speed up development. That’s why I use python.

@NJ-wb1cz Год назад

@@HydratedBeans unless you depend on oython-specific libraries I don't think the development speed is noticably greater compared to, say, kotlin. And a loosely typed language will always have more mistakes and harder to statically check

@sploofmcsterra4786 Год назад

This was a valuable lesson to learn when I was coding a neural network. I wasn't aware of this fact and was trying to parallelise, before realising numpy automatically did that. Turning everything into vector operations led to a speed increase that was crazy.

@andreabradpitto 3 года назад

This is very informative and well presented. Also, it seems like you are very talented in explaining stuff, but this is only the first video of yours I have watched, so I'll have to do some further benchmarking ;) Thank you!

@DeuxisWasTaken 7 месяцев назад

Very good video. Note: NumPy isn't just C, it's highly optimised C with SIMD capabilities. The only way you can go faster than that is with GPGPU like OpenCL, CUDA, Vulkan or Metal. (And even that isn't assured since GPGPU has initial overhead.)

@brianchandler3346 3 года назад

Expertly presented. I never would have thought to look at it where C was used behind the scenes. I will definitely try and make the better to worst progression habit when writing Python.

@chrisray1567 3 года назад

Great video. I’ve always found list comprehensions to be faster than the equivalent for loop or while loop.

@simonmultiverse6349 3 года назад

There is a language called Lython, so you don't try to loop in Python; instead you poop in Lython.

@supergamerfr 3 года назад

This channel is gonna blow up great work

@jetison333 3 года назад

Very informative video, keep at it and im sure you'll grow a ton.

@mCoding 3 года назад

I appreciate that!

@shingo371 3 года назад

Awesome content, got very surprised when I saw your low sub count

@mCoding 3 года назад

I appreciate that!

@anb1142 3 года назад

me too

@christianhill8651 3 года назад

You just convinced me to learn algorithms and functions. This was an amazing illustration.

@TARS.. 3 года назад

This channel is a gold mine, thank you so much for these. Keep them up!

@mCoding 3 года назад

Thanks, will do!

@GuRuGeorge03 3 года назад

Oh this is an amazing recommendation by youtube. I came to the same conclusion as you while having massive performance issues in my machine learning models in non obvious parts of the code. Once I noticed that it is the loops that are awfully slow I tried refactoring them as much as possible and ended up using a mixture of things u've shown here. I didn't even know or speculate that it's because of C but now everything is much clearer in my head. Thanks!

@mCoding 3 года назад

You should also check out Jax, Numba, and Cython for potentially enormous speedup for little effort if you are doing ML.

@robertmielewczyk4219 3 года назад

you're not supposed to do ML with for loops in theory you can and it's very undestandable if you do it that way but any solution using matrices/tensorflow will be faster. Like he says in the video numpy is fast

@dddbra1748 3 года назад

I think your channel deserves more subscribers.

@mCoding 3 года назад

I appreciate that!

@TheSandkastenverbot 2 года назад

Thanks for the information! It's really helpfull to get a glance into the Python interpreter's inner workings

@Jcewazhere 3 года назад

In my one of my programming classes we learned about loop unrolling. This is much more handy to know about. :)

@laughtale1607 3 года назад

That's why Gauss exists : simple use of (N-1)N/2 in this case

@mCoding 3 года назад

And he came up with this at such a young age too!

@bFix 3 года назад

wait isn't it: (N+1)*N/2?

@laughtale1607 3 года назад

@@bFix He takes sum to N-1 in the example

@Samsam-kl2lk 3 года назад

@@laughtale1607 Yes, so it is n*(n-1)/2

@laughtale1607 3 года назад

@@Samsam-kl2lk yup

@npip99 3 года назад

Fun fact: If you write this in C, gcc/clang will automatically optimize the for-loop sum into n(n-1)/2 for you.

@underfilho 2 года назад

really?

@googleuser4203 3 года назад

A lot of important tips! Thank you. Subscribed!

@aidanwelch4763 5 месяцев назад

Thanks for this, I was trying to explain to people in JS that essentially always calling a native function is faster than writing a more optimized(in terms of Big-O) way of doing something. They were asking Big-O proof and didn't believe me that it's just not relevant. Looping through a string 10 times with a native regex check is faster than iterating through it once in pure JS.

@sergeyshchelkunov5762 3 года назад

Very informative. Excellent insight into Py inner workings. Great incentive to employ Numpy (and Multiprocessing) whener possible. Give a simple yet splendid example of why knowing math is important for any programmer who tries to use Py for data science type applications.

@ahmadhafian3785 2 года назад

It would be interesting to see how recursive function would perform

@danejohnson8657 2 года назад

Python doesn't have tail-call optimization, so you probably just overflow the stack.

@AeryelleCat 2 года назад

That is ridiculously clear, it blew me away. Straight to the point, extremely clear explanation, very methodical, and everything is shown perfectly. Just purely amazing tbh.

@Parthsean 3 года назад

This is probably the first coding channel that I looked up just for fun. Great work. Can you cover advance pythonic syntaxes and topics?

@mCoding 3 года назад

I'm always open for suggestion! What do you want to see?

@Parthsean 3 года назад

@@mCoding Nothing in particular just any fancy pythonic things which help improve performance and code readability. Also, one thing you can do is make a mess of a code and start refactoring it. That would be amazing to see.

@tharfagreinir Год назад

There's a fun story about that summation formula. Apparently mathematician Carl Friedrich Gauss came up with it at a young age when his teacher set his class the task of adding up all the numbers from 1 to 100 and was very surprised by Gauss' solution. Regardless of the veracity of the story, the formula is indeed often referred to as the Gauss summation formula. Mind you, a lot of things in math are named after Gauss as he was extremely prolific.

@magicmulder Год назад

You left out one important aspect; the teacher wanted to keep the kids busy for a while and Gauss came up with the solution within minutes.

@thomas-sk3cr 7 месяцев назад

Another important aspect, the formula is actually (n+1)*n/2.

@MattDunlapCO 7 месяцев назад

@@thomas-sk3cr Exactly. I was looking for this correction.

@adrianmizen5070 6 месяцев назад

@@thomas-sk3cr which actually illustrates a pitfall of using external math formulas instead of letting the computer do the work, that they are going to be less clear about what the code is supposed to do. getting the wrong answer really fast is worse than getting the right answer slowly

@thomas-sk3cr 6 месяцев назад

@@adrianmizen5070 not sure if I understand you correctly. I do think that it's definitely better to use a formula if there is one. But it should be properly documented of course and you have to make sure that it's the right one. Maybe we can agree on that? I don't think you should reimplement well-known formulas with loops just because it's more clear to the reader at first glance :)

@supermariobrosz8 3 года назад

Great video! Another idea is to time yield in functions and generator expressions in comparison to the traditional loops

@mCoding 3 года назад

Great suggestion!

@ferociousfeind8538 2 года назад

yes, graph the size of n against the time it takes different strategies to compute the sum >:3 make the computer dance for my entertainment!

@SanixDarker 2 года назад

Good content ! Liked and subscribed !

@thelazyrabbit4220 2 года назад

Dayumn... TIL something new as well as some new trick. You definitely earned my sub!

@edwinjonah 3 года назад

Awesome video! A few days ago I was comparing the execution time on using a loop vs. the Gaussian formula, it's nice to see some other options in between.

@mCoding 3 года назад

Thanks for the support!

@lb2040 2 года назад

@@mCoding Very interesting video indeed, something I never really thought about … but the math-guy in me needs to say it: You got the formula wrong… 😬

@fanoflinoa6109 2 года назад

Great comparative video The math solution if you include n in the addition is ((n+1)*n)/2 Because you used range() and the while loop was set to < and not

@aioia3885 2 года назад

Flooring the division is very much recommended, because otherwise python will convert the result to a floating point number which may cause you to lose precision

@fanoflinoa6109 9 месяцев назад

@kurogami2771 it does in this case as we are adding up a series of numbers from 0 to n, which by definition excludes negative values. As for flooring, it is true that we should avoid using floats when not needed even for just one operation so we can either floor or int() the results.

@abebuckingham8198 2 месяца назад

@@aioia3885 Either n or n-1 is even so it's never a float. Using floor isn't required.

@aioia3885 2 месяца назад

@@abebuckingham8198 edit: "either n or n-1 is even so it's never a float" is not correct because in python division with / will always result in float conversion (see type(1/1)). the rest of this reply is just trying to clarify what I now realize was not necessary to clarify in my comment I said "flooring the division is very much recommended" but what I really meant was not using the floor operator on the result but instead using the integer division operator. in python if you do a/b the result will always be a float bu if you do a//b then the result will be an integer if both a and b are as well. doing a//b is basically like doing int(a/b) except that there will be no conversion to a float and so no precision will be lost. for example, int((10**18+2)/2) evaluates to 5e17, because the division was a floating point one, but (10**18+2)//2 does result in the expected value of 5*10**17+1

@MrMonishSoni 3 года назад

This Video is so awesome, never came across such a beautiful way to understand loop in such depth. Thank you so much for making this video. ❤❤❤💕💕💕

@mCoding 3 года назад

Glad you enjoyed it!

@maxteer2800 3 года назад

Great stuff! I never considered that while loops would be slower than for loops, but even as you started I was guessing it would!

@mCoding 3 года назад

Glad you liked it!

@cottawalla 2 года назад

It was the one addition statement executed many times that slowed the python "for" loop. The conclusion is that looping can be very fast but executing python statements anywhere in the code is very slow.

@kenhaley4 3 года назад

Another reason the numpy approach is faster: The elements of a numpy array must all be of the same type, and there's no overhead to check (and convert to numeric, if necessary) the type of each array element.This probably contributes much more to the increase in speed than the language difference (C vs. Python.)

@vishnukumar4531 2 года назад

Other than that, the sum itself can be computed parallelly and fused together later, thus we are not forced to iterate serially element by element!

@dmitripogosian5084 2 года назад

In compiled languages loops are often optimized on compilation stage, such as being unrolled where necessary, with particular emphasis on efficient cache/prefetch from memory utilization. One should look on original Fortran code of LAPACK routines where some loop optimizations were done 'by hand'

@shinobuoshino5066 6 месяцев назад

@@dmitripogosian5084 loop unrolling is never efficient when cache and prefetch is the topic unless you explicitly wrote "from 0 to 4" just to avoid copy pasting code 4 times by yourself, and even that may not be worth unrolling depending on body of the for loop which most compilers can decide by themselves if you do profile guided optimization pass. There's a reason why -O2 more performant for most software than -O3 and unrolling is one culprit of that.

@fillfreakin2245 Год назад

If you're using a counter within the while loop, then yes that counter is slower than the for loop. If you don't, and you use the while loop as intended and doing the same stuff inside the loop, then it is essentially the same speed. def while_loop(n=100_000_000): s = 0 while s < n: s += 1 return s takes about the same time as: def for_loop(n=100_000_000): s = 0 for i in range(n): s += 1 return s Use the right loop for the right task. If you have to put an incremental counter in your while loop then you're using the wrong loop.

@ssfjor 2 месяца назад

Comparing the sum with the counter limit certainly won't give the correct result in s: We are calculating the sum of all integers and analyzing various ways to do that. I. e., if the limit where 10, with 1,2,3,4 we reach it, way before reaching the correct sum 55.

@dragonsage6909 2 года назад

Wow .. I learned about 5 new things from this.. using timeit is a great idea.. thank you!

@dwarftoad Год назад

I think numpy also will generally, if it can, use parallel CPU operations (SIMD) to process the array, which is probably why it's faster than standard sum() (and also why it makes sense to create the array of data with arange() even if each value could be calculated in the loop. But definitely good to be aware of this memory requirement.)

@spacelem 2 года назад

My own results were 6.06 seconds for the while loop, 4.25 seconds for the for loop, and 569 ms for the numpy loop. I tried it in Julia (almost identical code except for the end tags), and the results were 1.3 ns for both the for and the while loop. Checking the lowered code it turned out it had optimised the loop away completely, so we got sum_math for free! So instead to force it to do the actual sum, I modified it to s += i * (rand() < 0.5), and it came in at 576 ms. So even with a call to rand() and a branch, Julia was still nearly 10x faster than raw Python, and was about on par with NumPy (which didn't have a call to rand() ). If I force Julia to construct the range object with collect(1:n), then I don't need so that it matches NumPy, then it gets even faster at 290 ms, but now it allocates 763 MiB of memory, so clearly there are pros and cons of using range objects, although I'd stick with the range object for most usages. So, if you like Python but you want C performance (and you like to use arrays a lot and don't like to add all the extra numpy syntax just to do that), maybe check out Julia.

@incremental_failure Год назад

Most like myself use Python for the libraries, nothing else comes close except for maybe C++. It would take a lifetime to recreate all the libraries in Julia.

@spacelem Год назад

@@incremental_failure there has a lot of ongoing work to make native Julia libraries (mostly in the scientific and statistical communities), but Julia can also call C, C++, and Fortran code, and the PyCall package already allows you to embed Python in Julia, so you're never left in the lurch if you choose to use Julia. I've already seen Julia be hugely faster and more flexible when doing a lot of ODE modelling than when using R and calling deSolve which is written in Fortran.

@FallinPython 3 года назад

Thanks for the tips! The comparison with NumPy is not fair because it is made for array computing and hold an unique data type. By the way, you deserve far more subscribers.

@mCoding 3 года назад

Thanks!

@TranquilSeaOfMath 2 года назад

Nicely demonstrated and explained.

@micirei 3 года назад

Your channel is great! I suggest you use a high pass filter for your voice to cut off the low frequencies that make your voice sound muddy, great job so far.

@mCoding 3 года назад

Thanks for the tip! I have no idea what I'm doing video editing so tips like this can really help improve my video quality!

@minghaoliang4311 3 года назад

Can you do a comparison of recursion over loop?

@mCoding 3 года назад

I have many video ideas in the works, this is definitely a contender.

@ikickss 3 года назад

Usually recursion is slower and takes up more memory. And many "big" problems cannot be coded recursively due to stack size limitations. But if compiler/interpreter can do tail recusion optimization, it's a very nice option. Unfortunately, not all problems are applicable to it.

@shukterhousejive 3 года назад

The CPython interpreter doesn't do tail recursion so even if you write it out correctly it's still faster to loop, save the recursion for when you forget how to convert your code into a loop correctly

@sbwlearning1372 3 года назад

@Sec Coder Check out Lisp or Prolog. The recursion is conceptual in terms of the problem or model. Imagine a family tree. The definitions and relationships are recursive. Mom and Dad produce offspring each of which in turn can become Mom's or Dads who produce offspring. Comes into its own when these Relationships are highly complex ( Imagine every leaf on a real tree was in fact another tree and so on and so forth. The Mantlebrot set is recursive ) Essentially they become huge lists or recursive data structures. Those languages became prioritized and optimised around lists and recursive data structures.( With a few constructs and language primitives not found in the likes of C or Pascal such as "is a type of " "is connected to" "is related to", "is derived from" etc etc ) Pythons modelling capabilities Tuples Lists etc etc Means most of it can now at least be modelled easily before being translated to something else for speed. Ps for fun check out Occam . It's editor was indent sensitive just like Python It also let you mix sequential and parallel programming .!!!! Oh lordy 😁

@fat_pigeon 3 года назад

@Sec Coder It lets you express common algorithms in a purely functional way, i.e. without needing to mutate variables. Doing so makes your code easier to analyze mathematically and makes it more composable.

@lakshyachopra_ 2 года назад

This video was very informative. Thanks a lot James!

@mCoding 2 года назад

Glad it was helpful! You are very welcome!

@leobozkir5425 3 года назад

WHOA 45K?! The first time I saw your channel you had 12K subs. C0ngrats and keep it up!

@KonradKeck 2 года назад

Thanks for that! I had no idea about the ratios. Two ideas though: I'm doing statistics and ETL jobs with pandas. I would've been interested in other methods as well like recursion or doing the same with pandas instead of numpy (whether it's any slower than pure numpy), pandas + apply + lambda, etc. Second, going one step further and optimising performance for if-elif-else / mapping function would be great for another video.

@BartMassey-PO8 3 года назад

Rust with 64-bit numbers looks to be about 50× faster than numpy on my box (once I convinced the compiler to stop solving the problem at compile-time without losing optimization). Rust with BigUint bignums looks to be about 5x faster than numpy. The obvious for loop with pypy3 is about 12x faster than numpy. So there's still plenty of room for speedup.

@bva0 3 года назад

Cool video. I would add that numba and cython are also super fast alternatives.

@ButtKraken01 2 года назад

This was a great explanation and earned you my subscription!

@ColinBroderickMaths 2 года назад

Just wanted to note that using an iteration index in a while loop isn't really good practice anyway, since that alone would almost always mean a for loop is the correct pattern. So it's not really surprising that the while loop was slower in this experiment. I'd also like to warn people that numpy is only faster if you are operating on large sets or large arrays of numbers. If you apply a numpy function to a single number, it's almost always much slower than the equivalent function in Python, presumably due to some overhead.

@rursus8354 Год назад

That isn't the point. The point is the efficiency of raw Python.

@OMGclueless 3 года назад

It's also worth taking a second to remember that even in its slowest form, the python code is adding up 100 million numbers in 20 seconds. It's slow, but "slow" is a relative term.

@mCoding 3 года назад

This is a good point that everyone should keep in mind!

@carmelo5991 8 месяцев назад

Is slower than other languages but still enough because computers are very fast

@edwinontiveros8701 7 месяцев назад

On any CPU capable of many teraflops, 20 seconds for 100 million numbers means eons. It's like walking and taking a step only every 5 seconds, short distances It's not that bad, but will badly scale once you need to travel farthest.

@maxch.9135 2 года назад

you are the only programmer that i pressed liked, subscribed and the bell icon :)

@thebiomicroguy 3 года назад

Neat, beautiful, concise and to the point. You nailed it. Thanks.

@mCoding 3 года назад

You're very welcome!

@zf0666 3 года назад

Good example of how being good at math and algorithms beats all micro optimizations of the code itself :)

@mCoding 3 года назад

Exactly!

@JensRoland 3 года назад

No loop unrolling variant? Nice roundup of methods, and nice video in general!

@mCoding 3 года назад

You're right, I totally forgot about some "optimizations" one could try. Maybe a later video. Thanks for the support!

@superscatboy 3 года назад

Unroll to 100,000,000 lines of additions, then reduce them to a single value and have the function just return that value. Boom, orders of magnitude faster!

@n00blamer 3 года назад

@@mattmurphy7030 Unrolling is more about reducing dependencies than iteration overhead, the multiple accumulators are summed in the end. Branch predictor doesn't play much role here as this loop is fully predicable, only the last iteration flushes the pipeline after the loop gets up to speed. Data dependencies can potentially stall the ALU, but if the CPU has a really short pipeline it won't have any effect.. this is highly dependent on the microarchitecture so arguing one way or another is a bit pointless unless we specify the hardware..

@smjure 3 года назад

Awesomely clear presentation, thanks a lot 👍

@mCoding 3 года назад

You are welcome!

@MeHdi.fz28 10 месяцев назад

This video was very helpful, nicee👌🏻👌🏻🔥

@SillyLittleMe Год назад

If you are writing pure and iterative heavy Python code, then PyPy is your best bet for performance. I ran the same test as mentioned in the video on Python 3.10 and PyPy 3.9 and my god the difference was staggering! Here are my results : Python 3.10: for loop : 6.6 secs while loop : 11 secs PyPy3.9: for loop : 0.22 secs while loop : 0.36 secs PyPy is really meant for long running code and real world results show that.

@davidbellamy1388 3 года назад

Your lessons are awesome. They have a Corey Schafer feel to the presentation in their brevity and clarity.

@mCoding 3 года назад

Thank you! I've seen many of Corey's videos, so I'm glad to hear that!

@daveys 5 месяцев назад

I’m a Python novice and I found this to be really interesting and useful. Many thanks!

@itsamelara 3 года назад

great video, great channel, I'm subscribing!

@mCoding 3 года назад

Awesome, thank you!

@TheParkitny 3 года назад

There is another option. Often you can use the numba library to get a speed boost over numpy.

@mrtnsnp 3 года назад

The numba package may be useful in some cases.

@mCoding 3 года назад

Absolutely, if you can use Numba to @jit your problem, them it will very likely speed up the solution! I'll probably cover this in a future video.

@mrtnsnp 3 года назад

@@mCoding I did a small test. from numba import njit @njit def int_sum(n=1_000_000_000): s: int = 0 for i in range(n): s += i return s timeit.timeit(int_sum, number=1) 5.373000021791086e-06 timeit.timeit(int_sum_p, number=1) 44.33714248199999 int_sum_p is the same function, without decorator. The result is very close to C speed. But obviously, the best improvement comes from doing the maths properly.

@dmccallie 3 года назад

Numba is amazing, if you can limit the code in the @jit’ed routine to simple loops and math and/or numpy operations. You can’t use things like lists or dicts

@gregorymorse8423 3 года назад

Should do all the examples with numba this is also interesting. Cython also. Masterclass in squeezing loop performance, could also show sum(x for x in range(n)). Ideally use matplotlib to show the data for different sum sizes and prove the obvious that it is linear

@mrtnsnp 3 года назад

@@gregorymorse8423 When trying to make a comparison with compiled C++ code, I discovered that clang removes the loop altogether, applying the maths that the programmer should have done in the first place. And I suspect that numba applies the same optimisation, otherwise the total time does not make sense with the clockspeed I have.

@sophiaelementaris4203 3 года назад

You video was amazing man, more one sub.

@acemanftw 3 года назад

Great video! Subbed

@mCoding 3 года назад

Awesome, thank you!

@logicprojectsspeed2023 3 года назад

Great video. Commenting mostly to help the algorithm push this. Would have loved to see a comparison with straight C thrown in there given that C code was the source of the speedup. What's the cost of the python to C transitions?

@mCoding 3 года назад

A C compiler would probably compute the answer at compile time so there is nothing to compare! A Python to C transition would be a huge undertaking even for a small project.

@oj0024 3 года назад

Here are the results on my pc: python with for loop: 8.74s c with gcc and no compiler optimizations: 0.235734s c with gcc and full compiler optimizations: 0.048770s (gcc doesn't detect the pattern and actually executes the entire code) c with clang and full compiler optimizations: 0.000001s (clang detects the pattern and optimizes it using the Gaussian formula)

@alejandromorales8784 2 года назад

@@oj0024 wow, there is absolutely nothing to compare. I knew python was slow and C was fast, but I didn't know that the difference is to this extent

@tonyflury2291 Год назад

@@mCoding Nuitka will take Python code and transpile to C (which uses the Python standard Library to do the object iterations) - on some applications it is 2x faster than normal Python (and it is a work in progress).

@andreiiaz2097 3 года назад

Python is kinda like the doctor: It's the last place you want to be, but you go when you need it, because otherwise things might get more complicated

@akshaybodla163 3 года назад

I personally like to start with python. Its vast libraries and simple structure is easy for me to test programs. Later when I decide on a specific implementation, i'll code that in C++.

@andreiiaz2097 3 года назад

@@akshaybodla163 Yeah, I can see that, it's nice, but how do you transfer a python library to C++?

@antonliakhovitch8306 3 года назад

@@andreiiaz2097 A lot of C/C++ libraries have Python bindings so you can just use the same ones a lot of the time.

@maythesciencebewithyou 3 года назад

In reality, python is the first thing most people will use because it is easy to use. Python is only usefull because if works as a glue for other languages. Easy access to libraries written in more difficult languages.

@jaredgreathouse3672 3 года назад

Such a fact. I do statistics research using Stata. My rule is Stata where I can, Python where I must.

@sebastienollquist1318 2 года назад

I was so much expecting this formula to come up at the end :p

@Shaddymaze Год назад

Big thanks for explaining this clearly!

@reprC 3 года назад

Cool video. I’d be curious how `numba` stacks up in this, both the JIT and AOT variants

@youuuuuuuuuuutube 2 года назад

Numba would make it closer to the unoptimized C/C++ code. I used it to render fractals and I got it to run 50x faster than unoptimized Python, and almost the same speed as the C code. The thing is ... that you can make the loop a lot faster in C too because you can use SIMD intrinsics, so you could do 4 or 8 or even more operations in a single instruction.

@tobiasgrun5116 2 года назад

Just tested.. The JIT execution time is similar to the math version. To all Numba / LLVM developers: Great job!

@yolamontalvan9502 3 года назад

That’s why I would use C to program a missile.

@mCoding 3 года назад

Watch out for those null pointers though, wouldn't want your missile program to CRASH!

@bamberghh1691 3 года назад

Usually Ada or SPARK are used in high reliability systems like aircraft instead of c since they are much safer

@mCoding 3 года назад

I had heard about this before, thanks for sharing, it makes a lot of sense in safety critical applications. I know NASA still uses C for (as least some?) space shuttles!

@mwanikimwaniki6801 3 года назад

@@bamberghh1691 I knew ADA is used for missiles... But is there a specific reason why.

@arpandhatt6011 3 года назад

@@mwanikimwaniki6801 Ada has been specifically designed to catch as many mistakes at compile time. It’s static analysis tools are extremely thorough, and in general, it’s designed to prevent undefined behavior, from its type system, to its syntax. It’s an interesting language.

@artar4363 3 года назад

Didn't knew that Python is SO slow with those loops! Good video!

@chiragsingla. 2 года назад

btw it's doing 100M iterations so it won't affect too much in small iterations

@mahdiashrafmahir1534 Год назад

Thanks for the video, it was insightful

@nvmcomrade 3 года назад

Half way through the video I thought to my self why even loop to find an answer to a simple arithmetic progression and then I was surprised that this was addressed as well. Wow, impressive.

@mCoding 3 года назад

Thanks!

@lepepito 3 года назад

It serves the purpose to explain the problem of loops and also the process of finding different solutions to a problem It’s a nice exemple

@emilemil1 2 года назад

It would be a good idea to show examples that does a bit more work within the loop, to show that if your loop body is costly then how you loop is largely insignificant.

@K9Megahertz 2 года назад

Yeap, I was thinking the same thing. These tests really don't tell you anything meaningful.

@a_maxed_out_handle_of_30_chars 2 года назад

learned something new, thanks :)

@qorbanimaq 3 года назад

This is the first time I subscribe to a channel based on seeing the first video I watch from that channel. Great job!

@mCoding 3 года назад

Such an honor! Thank you!

@qorbanimaq 3 года назад

@@mCoding THANK YOU for creating such quality content.

@Generlc_Human 3 года назад

when you dont know how to code but you know english: "i know some of these words"

@andrewkim5556 3 года назад

Wow excellent!

@mCoding 3 года назад

Glad you like it!

@battlemode 2 года назад

Very helpful, thank you

@edwardwong654 Год назад

Quite interesting. I will probably never need to know this but if I get a chance to show off, I will find a way, haha. Thanks and very interesting.

@WhiteThunder121 3 года назад

"Is there anything o beat numpy?" Try compiling numpy top native machine code using numba: @nb.njit(cache=True, parallel=True) def numba_sum(): return np.sum(np.arange(100_000_000)) > 0.00064 seconds Or suing your GPU (Cupy): def cupy_sum(): return cp.sum(cp.arange(100_000_000)) > 0.001747 seconds Cupy should be an order of magnitude faster than you CPU, but only if you do some more complex computation than calculating the sum of the array.

@mCoding 3 года назад

Good point! Maybe some day I'll do a comparison of which C library for Python is the fastest... so many choices though.

@WhiteThunder121 3 года назад

@@mCoding Would be interesting to watch. Good video btw. I did not expect the "i = i+1" to have such a huge impact when iterating.