Тёмный

Compiled Python is FAST 

Doug Mercer
Подписаться 11 тыс.
Просмотров 88 тыс.
50% 1

Sign up for 1-on-1 coaching at dougmercer.dev
-----------------------------------------
Python has a bit of a reputation -- fast to write, but slow to run.
In this video, we focus on a simple to understand dynamic programming problem that would be terribly slow in native Python or numpy. We show that Python can achieve (and actually exceed) C++ level performance with the help of just-in-time and ahead-of-time compilers such as mypyc, Cython, numba, and taichi.
Also, I finally got a camera, so, uh... face reveal, I guess.
#python
Chapters
---------------
00:00 Intro
01:07 The Problem
02:38 numpy
03:08 mypyc
04:08 cython
06:46 numba
07:58 taichi
09:47 Results
11:48 Final Thoughts

Опубликовано:

 

31 май 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 543   
@dougmercer
@dougmercer 2 месяца назад
If you're new here, be sure to subscribe! More Python videos coming soon =]
@thesnedit5406
@thesnedit5406 2 месяца назад
You're very underrated
@FabianOtavo
@FabianOtavo Месяц назад
Mojo and Codon(Exaloop)?
@flutterwind7686
@flutterwind7686 2 месяца назад
Numba and cython are an easy way to improve performance beyond what most people require for python, and they don't require much boilerplate either.
@dougmercer
@dougmercer 2 месяца назад
Absolutely!
@emilfilipov169
@emilfilipov169 21 день назад
@@dougmercer taichi doesn't look very boiler-platy either with just the use of a decorator.
@megaspazos1496
@megaspazos1496 2 месяца назад
Great video, I enjoyed it! In my eyes the video actually shows how fast C++ is. Unoptimized line by line translation from Python to C++ can be as fast as compiled Python optimized with HPC library.
@dougmercer
@dougmercer 2 месяца назад
Absolutely. C/C++ and gcc -O3 is basically magic.
@BartekLeon-jx5jv
@BartekLeon-jx5jv 2 месяца назад
​ @dougmercer I am pretty convinced that taichi under the hood creates 1D array and not 2D. Doing vector hits the performance quite a bit (while not the most reliable test, changing vector to normal vector gave ~10% boost. Although both C++ versions where faster than taichi for me. (compiled with MSVC release). There are still some minor things, but they shouldn't influence anything since in my case it was ~40-50% in std::max and 20-30% in creating the vector. All in all, nice video showcasing the tools.
@BartekLeon-jx5jv
@BartekLeon-jx5jv 2 месяца назад
Ah, also... just out of curiosity: @numba.njit def lcs2(a, b): m, n = len(a), len(b) dp = [0] * (n + 1) prev_row = [0] * (n + 1) # Temporary storage for the previous row for i in range(1, m + 1): for j in range(1, n + 1): if a[i - 1] == b[j - 1]: dp[j] = prev_row[j - 1] + 1 else: dp[j] = max(prev_row[j], dp[j - 1]) for j in range(1, n + 1): prev_row[j] = dp[j] return dp[n] Less memory allocation / 2D array. Testing this against C++ / taichi would be a nice one :) [and you have some vectorisation you can throw there]
@ruroruro
@ruroruro 2 месяца назад
​@@BartekLeon-jx5jv it's not a 1D array, but a homogeneous ND array. It's somewhere between vector and int[A][B]. It is represented as a flat array in memory, but unlike int[A][B], the data type, number of dimensions, sizes of these dimensions and the iteration strides are dynamic. Also, it's not just taichi that's using ndarrays, numpy and numba are also using ndarrays here.
@BartekLeon-jx5jv
@BartekLeon-jx5jv 2 месяца назад
​@@ruroruro That's what I meant in a sense. Although all is still boiling down to: are you allocating once or are you allocating N times (in case of vector).
@mr_voron
@mr_voron 11 месяцев назад
This channel is highly underrated. Excellent analysis.
@dougmercer
@dougmercer 11 месяцев назад
Thanks for the support Maks! =]
@s8r4
@s8r4 7 месяцев назад
I've also had some fun using various methods to speed python up, and this video is a great overview of the major ways of going about it, but while it's a big departure, I've found nim to have the most python-like syntax while being as fast as things get (compiles to c, among many other languages). I've seen that you know about the true power of python already, but James Powell did a great talk about this exact topic titled "Objectionable Content", big recommend. Thanks for the video!
@dougmercer
@dougmercer 7 месяцев назад
I'll check it out! Also, I have looked at Nim in the past. It seems nice. Eventually I may do another video on this topic, and branch out to other languages (Nim, Julia, and now Mojo). Thanks for the idea, the video rec, and thoughtful comment =]
@dhrubajyotipaul8204
@dhrubajyotipaul8204 2 месяца назад
Thank you for making this. Trying out mypyc, cython, and numba right now! :D
@dougmercer
@dougmercer 2 месяца назад
Enjoy! And good luck =]
@Masterrex
@Masterrex 6 месяцев назад
Subbed, nicely done. I can tell you were having fun, IMO don’t worry so much about the glitzy graphics - your story telling is great!
@dougmercer
@dougmercer 6 месяцев назад
Thanks so much =]
@onogrirwin
@onogrirwin 2 месяца назад
damn, this is a high effort channel. your stock footage game is especially on point. hope you pop off big time :)
@dougmercer
@dougmercer 2 месяца назад
That's so nice! thanks =] 🤞
@ethanymh
@ethanymh 11 месяцев назад
Love this video so much! The quality of content, animation, and visualization is unmatched...
@dougmercer
@dougmercer 11 месяцев назад
Thank you so much!
@stereoplegic
@stereoplegic 2 месяца назад
After reading the other comments while thinking up my own, I feel compelled to echo this sentiment first. Fantastic job, @dougmercer - both technically and visually - I loved it all.
@dougmercer
@dougmercer 2 месяца назад
Thanks @stereoplegic! That means a lot =]
@jcldc
@jcldc 6 месяцев назад
Nice video. I have just learned cython and achieved a speed up of 500x vs pure python(+numpy) in one of my code. It worth to mention that using cython, you can automatically parallyze your loop with prange statement instead of range.
@dougmercer
@dougmercer 6 месяцев назад
500x is great! And good point on prange-- I should have covered the parallel aspect more of all the solutions (numba, Taichi, and cython) but I glossed over it due to the serial nature of the example problem. Thanks for the comment =]
@MrXav360
@MrXav360 10 месяцев назад
I learned C++ in the last month (came from a Python background!) and tried my luck at coding real-time animations of fractals. I wanted to compare with Python's performance, but now I am scared I learned C++ for nothing... Thanks! (Just kidding I loved learning C++ and I am glad I did. It's super impressive however to see that we can achieve similar performances with these packages in Python! Thanks for the video).
@dougmercer
@dougmercer 10 месяцев назад
Taichi is great for fractals! I like that it has good built in infrastructure for plotting to a canvas. That said, I'm sure you'll find a use for your new-found C++ knowledge =]
@user-yk8yb5xy8r
@user-yk8yb5xy8r 2 месяца назад
My favourite was numba as we were able to achieve our goal with very little code, there are certain shortcut algorithms that can be applied to makeup for its non applicable functions
@YuumiGamer1243
@YuumiGamer1243 2 месяца назад
I was already aware of numba, but it's good to see all the others like this. Enjoyable video, and I was happy you showed most of the code, while somehow making it feel like a documentary
@dougmercer
@dougmercer 2 месяца назад
That's an awesome compliment-- I'm gonna put "Code Documentarian" on my resume. Thanks for watching and commenting =]
@alexsere3061
@alexsere3061 Месяц назад
Dude, the quality and depth of this video is insane. I feel like I have a deeper understanding of the strengths and limitations of python, and I have been using it for about 7 years. Thank you
@dougmercer
@dougmercer Месяц назад
Glad it was helpful =]
@matswikstrom7453
@matswikstrom7453 6 месяцев назад
Wow! Really informative and interesting - Thank You! I am now a subscriber 😊👍
@dougmercer
@dougmercer 6 месяцев назад
Thanks so much =]
@pietraderdetective8953
@pietraderdetective8953 9 месяцев назад
This is a very high quality content, mate! Well done! A question, for gamedev use case, can we just use the tools mentioned to speedup things? I've seen horrible performance when someone is using Python-based game engine (like pygame etc).
@dougmercer
@dougmercer 9 месяцев назад
Thanks! =] Yes, you should be accelerate a pygame-based game with these tools. You can't speed up pygame functions and methods, but you can speed up your code between those calls. It'll be most well suited for larger, number crunchy parts between methods rather than quick little one-off operations. Let me know if you end up tweaking something and seeing a boost in performance!
@giannisic1544
@giannisic1544 7 месяцев назад
Brilliant video and useful content. It's a pity there's so few of us... Glad the algorithm suggested this video
@dougmercer
@dougmercer 7 месяцев назад
Thanks! Glad you found it helpful =]
@dar1e08
@dar1e08 2 месяца назад
Easily the best video I have seen on performance Python, subbed.
@dougmercer
@dougmercer 2 месяца назад
Thanks so much! I should have another performance related video out in mid April so see ya then =]
@EdeYOlorDSZs
@EdeYOlorDSZs 2 месяца назад
crazy good video! I'm gonna check out Taichi for sure
@dougmercer
@dougmercer 2 месяца назад
Thanks =]
@Finnnicus
@Finnnicus 11 месяцев назад
good content, great presentation. love the style!
@dougmercer
@dougmercer 11 месяцев назад
Thanks Finnnicus! Much appreciated =]
@josebarria3233
@josebarria3233 5 месяцев назад
Gotta love mypyc, I've been using it in my project and never felt disappointed
@beaverbuoy3011
@beaverbuoy3011 Месяц назад
Super enjoyable video, thank you this was very helpful!
@dougmercer
@dougmercer Месяц назад
Thanks! Glad it was helpful!
@miriamramstudio3982
@miriamramstudio3982 2 месяца назад
Text on the screen was definitely engaging ;) Thanks
@dougmercer
@dougmercer 2 месяца назад
Yay! Success =]
@billyhart3299
@billyhart3299 2 месяца назад
Great video man. I'm going to try this on my web server project that uses numpy quite a lot.
@dougmercer
@dougmercer 2 месяца назад
Numba should work great! You may just need to tweak your implementation slightly to use the subset of numpy features supported by Numba.
@billyhart3299
@billyhart3299 2 месяца назад
@@dougmercer have you tried anything that helps with matplotlib?
@dougmercer
@dougmercer 2 месяца назад
Hmm. Hard to say. Could try mypyc-- maybe it'll just magically work. Alternatively, though this might be a bit disruptive, you could swap out CPython with PyPy (a JIT compiled replacement for the CPython interpreter). In the video I'm working on now, PyPy was shockingly convenient and fast.
@dougmercer
@dougmercer 2 месяца назад
What are you plotting, out of curiosity? Maybe do a quick sanity check to make sure the amount of data your plotting has exceeded the usefulness of matplotlib. If it's a scatter plot with millions of points, maybe you should use something like datashader or similar
@billyhart3299
@billyhart3299 2 месяца назад
@@dougmercer I'm using it to do histograms for images that have been turned black and white and then converted to 8 bit png files to convert them to stippling.
@cmilkau
@cmilkau 2 месяца назад
pypy is a jit for full python with special bindings for numpy and scipy. you can use it for any python code, but for max performance might need to write critical parts of your code in rpython, a subset of python that can be statically compiled to native binary. The example subsequence code is valid rpython btw.
@dougmercer
@dougmercer 2 месяца назад
PyPy is fantastic -- I'm actually going to cover it in my next video!
@guowanglin4537
@guowanglin4537 4 месяца назад
Well, I use numba in my research, concerning the human genome, it was really fast!
@dougmercer
@dougmercer 4 месяца назад
That's awesome! I love numba-- super convenient and fast
@sdmagic
@sdmagic 2 месяца назад
That was exceptional. Thank you very much.
@dougmercer
@dougmercer 2 месяца назад
Thanks for watching and commenting!
@ManuelBorges1979
@ManuelBorges1979 2 месяца назад
Excellent video. 👏🏼 Subscribed.
@dougmercer
@dougmercer 2 месяца назад
Thanks Manuel! Glad to have you =]
@pranavswaroop4291
@pranavswaroop4291 2 месяца назад
Just excellent in every way. Subbed.
@dougmercer
@dougmercer 2 месяца назад
=]
@enosunim
@enosunim 2 месяца назад
Thanks! This is a really great info!
@dougmercer
@dougmercer 2 месяца назад
Glad it was helpful!
@ThisRussellBrand
@ThisRussellBrand 25 дней назад
Beautifully done!
@dougmercer
@dougmercer 25 дней назад
Thanks Russell =]
@user-np9il4is1t
@user-np9il4is1t 9 месяцев назад
Love this video ! it was amzing and usefull !
@dougmercer
@dougmercer 9 месяцев назад
Thanks so much!
@NicolauFernandoFerreiraSobrosa
@NicolauFernandoFerreiraSobrosa 2 месяца назад
Very cool video! Did you consider compilation time in C++ tests? I used Numba daily, and the first run is always slow due to the JIT feature.
@dougmercer
@dougmercer 2 месяца назад
I did not count compilation time for the c++ times, but did include JIT time for the first run of Numba. However, it doesn't play a big impact, because we are typically doing 100s or thousands of runs and adding up their times (so the first run being slow only accounts for a small part of the overall time)
@sageunix3381
@sageunix3381 Месяц назад
limited branch c code will usually be faster in most applications , but if you want code to be ridiculously fast use assembly. inline assembly is cool too works directly with c. however speed comes at the cost of convenience often
@UndyingEDM
@UndyingEDM 6 дней назад
The video editing is top notch too!
@dougmercer
@dougmercer 6 дней назад
Thanks =]
@jamesarthurkimbell
@jamesarthurkimbell 2 месяца назад
Nice video! Well done
@dougmercer
@dougmercer 2 месяца назад
Thanks for watching!
@luaguedesc
@luaguedesc 2 месяца назад
Great video! Did you compile the C++ code with optimization flags?
@dougmercer
@dougmercer 2 месяца назад
Yup! You can check out the C++ code/compile command here, gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@rverm1000
@rverm1000 13 дней назад
That's nice of you to point these libraries out.
@dougmercer
@dougmercer 13 дней назад
Thanks!
@chkone007
@chkone007 11 месяцев назад
That was funny, I did both C++ and Python but now I'm more on C++ side. I had in mind the meme "look what they need to mimic a fraction of our power", I didn't tested it, but I bet If you change the proper compilation options that will be faster again in C++. To my understanding this is what taichi do, it's general SIMD based on your current hardware, under the hood via LLVM optimizer based on the data structure (taichi is tailored for sparse data structure). As you work with dense data Halide would give you [maybe] better results. For all cases the code generated by python front end can be generated by C++, the python will always have an overhead. This is what Machine Learning people do, they don't care about python performances, because all the computation which too 90% of their frame is implemented on CUDA and C++, the python is here only to provide data to lower level system.
@dougmercer
@dougmercer 11 месяцев назад
> "look what they need to mimic a fraction of our power" Haha, true! In another comment, I said I loved that even if I write terrible C++ it still turns out pretty fast. That said, the same argument could be reversed, if we consider productivity and third party library access. If an application is 95% high level glue and one hot spot, I'd rather write the majority in Python and the hot spot in an AOT or JIT compiled variant of Python than write my entire app in a low level language. The overhead would be worthwhile from a productivity perspective. > Proper compilation flags Do you have flags you want me to try in particular? I did -std=c++11 -O3, but maybe I'm missing something. > SIMD Since this is all sequential, can SIMD help? I thought SIMD was for packing multiple of the same operations in a single instruction (but again, I'm not a C++ dev) > the Python just provides an interface to a lower level language. True! And I'm OK with that! I def agree that well written, native code in a lower level will out-perform generated code from Python. That said, for all but the most trivial algorithms, I can't write well-written C++. So, if I can get even a 95% solution for free from these high level LLVM interfaces, then I'm stoked!
@chkone007
@chkone007 11 месяцев назад
@@dougmercer ( : That remind me a benchmark done by Microsoft, Debug C++ /NoSIMD vs Release C# SIMD, and they notice faster C# :D Yeah sure... The point of Python is not to be faster, it's mostly to be gentle with non-engineer-long-beard programmer, the user are mostly scientist and data-analysts. > Productivity For this example I see no productivity differences between C++ and Python. But personally I'm more productive in C++ with Eigen and few other lib Like an experimented Python will be faster with numpy and his other favorite libs. > Proper compilation flags I don't know what is your compiler, but for Visual Studio: /Ot {favorize speed} /Oi {Inable Intrinsic} To increase the STL speed, Disable C++ expcetion, "Basic Runtime Checks", /GS-, /GR- ... To help intrinsic generation /Zp8 or /Zp16 (here you're processing int), but we can process And based on your hardware /arch:AVX, ... > SIMD You have gather and scatter instruction that could help, need to profile ( : > Improve On both side I'll bet we can performance by using only type you need. If your number cannot go higher than 100 just use a byte/uint8_t, etc. As I said the video was funny, the point is not to say Python is faster than C++, but more "if you're careful you can have performance higher or close to baseline C++"
@dougmercer
@dougmercer 11 месяцев назад
I'm using g++, I'll try to find the analogs for the compiler flags you recommended. And true, a uint8 is enough. I'll mess around with that too. In any case, thanks for the comments! I'd def like to learn more about C++ but I don't get the opportunity very often
@user-zi2zv1jo7g
@user-zi2zv1jo7g Месяц назад
@@chkone007 Ok, I get the point but theres a lot of production code written in python, most code writing does not require performance and the few bits that do you can write a C extension or simply use C++ and python together
@chkone007
@chkone007 Месяц назад
​@@user-zi2zv1jo7g I kind strongly disagree. Did you ever experienced slow UI, stuttering App, lagging game, ... If yes, you already met a programmer who said "most code writing does not require performance". If you said a code does not require performance that just mean you consider your time more valuable than the user time. As a developper we don't own time, the time is not ours, it's the user time. That's what make the difference between a smooth app, slow and memory heavy software, like everything web based, slack, etc. And all chromium stuff. Most of the devs said It's just a chat app, I don't need C++, just a chromium based. Consequences... My Mac/PC uses 8 GiB for doing nothing, just running a VM. And in a industrial point of view, you can release your startup with python code and saying "how I don't care it's CUDA underthehood". You just expose yourself to have a competitor who implement his stuff on C++/CUDA directly and this competitor will explode his profitability because his AWS bill will be much cheaper. We always require memory efficient and fast code. If none of those argument convience you, consider the CO2 argument, it's more eco-friendly for you PC or your server or your N-instances of your programmer running on AWS. I love python to prototype idea, and accelerate my exploration of ideas, but I cannot be serious with that to my clients. I know lot of "AI startup" are like that, download the model from the researcher, create a docker, build a website => step 2 => profit. Most of them rely on Python, but any competitor with cheaper infrastructure can scale more and be more efficient. I had in mind Facebook developed on PHP fine, cool, but at the beginning each new user cost more than the previous one, ... FB wasn't able to scale. They create "HipHop" compiler from PHP to C++, and now the company became profitable each new user became cheaper than the previous one. Conclusion => Performance always mater. Don't read me wrong, that doesn't mean I over-engineer everything to save 1 byte or 1 pico second in median. But keep in mind the quote "early optimization is the root of evil" was written from a time when everybody was written C and assembly code... The code is different, today with python, javascript, ... "early non-optimization is the root of evil".
@famaral42
@famaral42 8 месяцев назад
Thanks for the analysis, I got motivated to look at numba and cython more carefully. Taichi looked cool, but not having it in the anaconda repo is a negative point for me. Have you tried running this code with TORCH?
@dougmercer
@dougmercer 8 месяцев назад
Oh interesting, I didn't realize taichi wasn't on conda-forge. I wonder if they'd accept a PR 🤔. For what it's worth, you can pip install it (and that's possible even if you're using an environment.yml). I did not try torch, but I suspect it would very slow. Reason being-- the main use case for torch is parallel computing via tensors. Since this problem is inherently not parallelizable, my guess is it'd be super slow in torch.
@famaral42
@famaral42 8 месяцев назад
@@dougmercer Thx for insinghts
@Iejdnx
@Iejdnx 2 месяца назад
5k subs? I swear I thought you had like 1 million because of how good this video was I'm subscribing
@dougmercer
@dougmercer 2 месяца назад
Thanks =] I appreciate it. It's been a slow grind, but the past few days the algorithm has blessed me with some impressions, so I hope it keeps going 🤞
@cmleibenguth
@cmleibenguth 7 месяцев назад
Interesting results!
@dougmercer
@dougmercer 7 месяцев назад
Thanks! I was surprised too
@mariuspopescu1854
@mariuspopescu1854 2 месяца назад
So, I'm not a big python guy so I was curious. I repeated your experiment for C++ vs numba. Only real difference: for the C++, I rewrote it just a bit (used auto and changed the indexing a bit to be more c-like) and I wrote the function as a template in which the size m and n were the template variables. This allowed me to change from a vector to a stack allocated array, the main benefit I believe being that the whole memory is contiguous and allowed for better caching. The C++ version was about 1.5x faster than numba on my machine. I really enjoyed this video though! Made my question my biases, and I think there's alot to be said by letting compilers/optimizers do the thinking for you. I think this was really insightful and I think I'm gonna give the numba one a go for many of my future quick projects.
@dougmercer
@dougmercer 2 месяца назад
Oh, that's awesome! I think that's the fastest anyone has gotten it so far! Someone else in the comments encouraged me to try a 1D vector of size (m+1)(n+1) and index into it with arithmetic -- that gave me a roughly 1.1-1.2ish x speedup over the original C++ . So, I guess much of the remaining speedup came from data locality-- very cool that it was another 0.3x-ish boost. I'm glad you found the video interesting =]
@roshan7988
@roshan7988 11 месяцев назад
Great video! Super underrated channel. Love the graphics
@dougmercer
@dougmercer 11 месяцев назад
Thanks Roshan! Means a ton to hear that =]
@atharv9924
@atharv9924 7 месяцев назад
@Dough: Your channel's popularity should be atleast 100x more!!!
@dougmercer
@dougmercer 7 месяцев назад
Thanks so much! Fingers crossed the channel does grow 100x 🤞. At that point I prob could make videos full time 🤯
@JohnMitchellCalif
@JohnMitchellCalif 2 месяца назад
interesting and useful! Subscribed.
@dougmercer
@dougmercer 2 месяца назад
Thanks! And welcome =]
2 месяца назад
Very usefull. A quick question, what eas the optimization level for compiling the c++ code. It can really make a diferrence.
@dougmercer
@dougmercer 2 месяца назад
I used -O3. Another commenter recommended using a 1D array and handling indexing through arithmetic, and that does speed up the C++ by about 1.1-1.2x. (still pretty similar to the ndarray approach from Taichi) Here's the c++ code and build script if you want to play around with it yourself =] gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@user-by8fp5uw2o
@user-by8fp5uw2o 2 месяца назад
Consider using Golang if you want speed + simple to learn (mostly, ofc). Python is fantastic at some tasks, but if you’re really trying to get the best of both worlds (fast to write and fast to run), then Golang could be a great fit
@dougmercer
@dougmercer 2 месяца назад
I do plan to do a project in Go sometime soon
@khawarshehzad487
@khawarshehzad487 11 месяцев назад
Amazing content, engaging presentation and sadly, underrated channel. Subbed!
@dougmercer
@dougmercer 11 месяцев назад
Thanks so much! Be sure to share with friends/coworkers you think might enjoy this, and hopefully the channel will grow over time 🤞
@khawarshehzad487
@khawarshehzad487 11 месяцев назад
@@dougmercer keep up the good work, it sure will 🙌
@ianposter2161
@ianposter2161 5 месяцев назад
Hey, thanks for an amazing video! Which one would you suggest so that I can just grab my regular python code with dataclasses and get a performance boost with no tweaks whatsoever?
@dougmercer
@dougmercer 5 месяцев назад
Thanks for watching! =] I'd try mypyc first. The others are way more disruptive and would probably require changes to your code
@ianposter2161
@ianposter2161 5 месяцев назад
​@@dougmercer Thanks for your answer! I was thinking of something. Nowadays we almost always use type hints because they are great. But only for clarity/type-checkers like mypy. So we are not getting any performance benefit out of it, although I think we could have! Cython translates python to C and forces us to write statically-typed python for that. Which type hints could also be used for... Turns out that Cython supports type hints as well! Then we have stuff like MonkeyType that allows us to automatically type-hint code based on runtime behavior. Nice for annotating legacy code. 1) we write python code with type hints 2) if needed apply MonkeyType to apply them everywhere 3) compile with Cython 4) get a C-like performance I wonder why it's not actually practiced. Do you have any idea?
@dougmercer
@dougmercer 5 месяцев назад
Mmm, for using type hints to achieve better performance through compilation, I think there's a high level design question: "should your code (1) look/feel like vanilla Python, or (2) are you OK with using non-standard Python features, or (3) are you willing to use syntax that only works in your special language, as long as it still vaguely resembles Python and interoperates with it"? I think mypyc is the closest to achieving the goal of speeding up vanilla Python. cython's python mode is pretty OK, but you need to add extra metadata to make it be performant (e.g., the locals decorator). Cython also has its own type system rather than using Pythons built-in types (e.g., cython.int vs int). Cython as a language (in non-python mode) isn't really Python any more, but interpolates with it well. Some other languages (e.g., Mojo) claim to have a "python-like" syntax and support interacting with Python, but the code isn't really Python.
@ianposter2161
@ianposter2161 5 месяцев назад
​@@dougmercer Yeah it would be amazing if we could just write vanilla python with standard type hints and compile it with Cython. Apparenly Cython somewhat supports it. RU-vid blocks my commend if I paste a link but you can search this on google: Can Cython use Python type hints? Because todays type hints are everywhere and we don't get any performance benefit out of it at all, which feels weird.
@dougmercer
@dougmercer 5 месяцев назад
It's hard to say-- when I was experimenting with this problem I remember not observing any speed up when adding vanilla Python typehints, and it wasn't until I started adding things like the @locals decorator that I really noticed any improvement. Let me know if you do any testing that shows a meaningful speed up!
@lapppse2764
@lapppse2764 2 месяца назад
10:48 I think it would be nice to define on the left that lower is better (I've usually seen it done in benchmarks). Thank you for the video! About CPP, I think you might've used SIMD instructions.
@dougmercer
@dougmercer 2 месяца назад
Good point, I def could have made the metrics interpretation clearer. As for SIMD, it's hard to parallelize this because it's an inherently serial problem (everything requires previous solutions)
@abhisheks5882
@abhisheks5882 9 месяцев назад
This channel is a hidden gem
@dougmercer
@dougmercer 9 месяцев назад
Thanks 💎 =]
@BaselSamy
@BaselSamy 3 месяца назад
Wonderful video, even for a beginner like myself! I wonder if you could share the animation tool you used? I feel it would be awesome for my presentations :))
@dougmercer
@dougmercer 3 месяца назад
Thanks! I primarily used Davinci Resolve, but used the Python library `manim` (community edition) for the code animations.
@BaselSamy
@BaselSamy 3 месяца назад
Thanks! @@dougmercer
@abc_cba
@abc_cba Месяц назад
If you don't keep your content consistently uploaded, you'd be committing a felony. Subbed!!
@dougmercer
@dougmercer Месяц назад
I'm gonna try! Hahaha Thanks for subbing =]
@ButchCassidyAndSundanceKid
@ButchCassidyAndSundanceKid 5 месяцев назад
Was your taichi (arch) based on cpu or gpu when you carried out the benchmark testing ?
@dougmercer
@dougmercer 5 месяцев назад
The LCS dynamic program was on CPU. The visualization I showed at the beginning of the section of a kind of warping fractal was on GPU.
@ButchCassidyAndSundanceKid
@ButchCassidyAndSundanceKid 5 месяцев назад
@@dougmercer Thanks. Taichi certainly looks promising, but I still prefer Numba for its simplicity, i.e. adding a couple of decorators, without altering the code too much. Have you tried Spark and Dask ? They're both parallel programming libraries.
@dougmercer
@dougmercer 5 месяцев назад
Yup, both are great! Since this problem couldn't be easily parallelized, I didn't mention them. And I agree, in general Numba will be easier than Taichi by a long shot. I just thought Taichi was kind of neat so I included it in the video ¯\_(ツ)_/¯
@ivolol
@ivolol 11 месяцев назад
Would be interested to see what Pypy and nuitka do for it as well.
@dougmercer
@dougmercer 11 месяцев назад
If this video ends up getting some more views, maybe I'll do another pass at adding other options. I have a *guess* though... PyPy would speed this up significantly, probably on par with numba. I've heard good things about it *but* it didn't install first try when using conda on my M1 Mac, so I skipped it ¯\_(ツ)_/¯ Nuitka would only speed things up a little bit. From what I've read, nuitka is more so about compatibility (supports *all* python language constructs) and for making standalone, portable builds. For nuitka, speed is secondary to those concerns
@Daekar3
@Daekar3 2 месяца назад
I feel like this is one reason why my PC is literally god-tier compared to what I went to college with, but the day to day experience really isn't ant different. My games are prettier and my SSD is bigger, but the mechanics is using the OS is NOT orders of magnitude better.
@etiennetiennetienne
@etiennetiennetienne 2 месяца назад
There are also ways to write c++ directly in python i think, for instance cppyy or with torch extension
@dougmercer
@dougmercer 2 месяца назад
True! Through C/C++ extension libraries, you can directly write/link C/C++ libraries and write your own Python interface to it. Cppyy, ctypes, cffi, pybind11, and Cython are all fair game for this.
@RobertLugg
@RobertLugg 2 месяца назад
How did you make those amazing looking bar charts?
@dougmercer
@dougmercer 2 месяца назад
Hah, *very carefully* in Davinci Resolve (Fusion Page) =P I manually drew the graph using rectangles, then applied (noise + displace) to make it more irregular + (fade it out with noise + the "painterly" effect from Krokodove) to give it the water color appearance + paper texture + adding lens blur One of my favorite animations I've made =]. Thanks for commenting on it
@thesnedit5406
@thesnedit5406 2 месяца назад
The theme, info, ambience and the whole vibe of the video is so good. Subscribed !
@dougmercer
@dougmercer 2 месяца назад
That's like the best compliment =] thanks!
@valdarbien3252
@valdarbien3252 9 месяцев назад
Nice work Doug, keep it going. In addition to all the Python accelerators you described, there are now much better options like Julia. The following two lines of high level code is 650X faster than Python and 5.8X faster than C++. The base version without Tullio.jl package is also 1.43X faster than C++: function lcs(a, b) dp = Matrix{Int32}(undef, length(a)+1, length(b)+1) @tullio dp[i+1, j+1] = a[i] == b[j] ? dp[i, j] + 1 : max(dp[i, j+1], dp[i+1, j]) return last(dp) end Look how nice that line is! No loops, no if's, nothing; the mathematical formula as is in code.
@dougmercer
@dougmercer 9 месяцев назад
That is nice! I haven't played with Julia since undergrad, but it's definitely a cool language! I'll have to find an excuse to dive back into it!
@MrNolimitech
@MrNolimitech 2 месяца назад
When you reach the 100x speed performance, I don't think it really matter that you can do better (Maybe with some cases). Most of the time, it's only because the code is wrong. People that is new (or even pro) to python, think it's slow, because they heard it somewhere. But in fact, it's only because they can't write better codes. They duplicates everything. They initializes the same thing at multiples times. They repeat themself. Using multiprocessors or threads with a huge function (method) that do everything inside, instead of separate things and use the cpu/gpu for specifics calculations. These are good libraries, but I hope people will try to optimize their codes with betters lines before using those libraries.
@dougmercer
@dougmercer 2 месяца назад
I agree! there is usually a lot of room to make your algorithm/implementation better
@overbored1337
@overbored1337 Месяц назад
Python is super slow by default. The only skill issue is actually the choice of Python when performance matters, because it was never designed for speed, or power draw, and optimizing it goes against its fundamentals. If it does not fit, as is, then use another language instead of a shoehorn.
3 месяца назад
Nice. Thanks!
@dougmercer
@dougmercer 3 месяца назад
No prob! Glad it was helpful
@user-up8fm3vb1r
@user-up8fm3vb1r Месяц назад
Amazing work, as someone who has to use python against my will, I enjoy your videos
@dougmercer
@dougmercer Месяц назад
Thanks =]. What's your preferred language if Python is against your will?
@user-up8fm3vb1r
@user-up8fm3vb1r Месяц назад
@@dougmercer Haskell is my love and I like lambda calculus so I am writing a interpreter and compiler for my own lc implementation for fun. (in haskell)
@dougmercer
@dougmercer Месяц назад
@@user-up8fm3vb1r very cool. I haven't touched Haskell much, but I'm learning ocaml for fun recently and enjoying it
@user-up8fm3vb1r
@user-up8fm3vb1r Месяц назад
@@dougmercer glad to see you join the functional land.. enjoy!!
@IamusTheFox
@IamusTheFox 2 месяца назад
Im enjoying the video, serious question though. How can jit be faster than c++? Did you have the c++ optimizer on? Nevermind, found a comment where you said that you used -O3. Great work. I feel like anyone who complains about your c++ isn't being fair. While i may have done it another way, its valid
@dougmercer
@dougmercer 2 месяца назад
Probably means that I left some performance on the table in the C++, or the JIT pulled some tricks that most people wouldn't pull when writing it natively. Someone else in the comments found that using a flat 1D array gave the C++ a 1.1-1.2x speedup. That probably puts it on par with the Numba/Taichi ndarray approaches That said, the point of the video still stands-- for at least this particular problem, there are several approaches for getting performance on par with native C++
@IamusTheFox
@IamusTheFox 2 месяца назад
Absolutely! Fantastically well done. I'm really quiet impressed by what you did.
@dougmercer
@dougmercer 2 месяца назад
Thanks =]
@MaxShapira2real
@MaxShapira2real 11 месяцев назад
You should put out an advanced Python course. Great job buddy!
@dougmercer
@dougmercer 11 месяцев назад
Maybe one day! Thanks Max!
@AndersonPEM
@AndersonPEM Месяц назад
[Tries with Rust] the result shows up before you even start the program 😂
@aangtonio5570
@aangtonio5570 Месяц назад
Thank you Doug for this awesome video! Btw, just curious: has anyone tried some of this on Pygame? I know Python it's not a common language in the videogame industry, but maybe some of this could bring it some justice (and good surprises).
@dougmercer
@dougmercer Месяц назад
You can definitely use Cython or Numba to help speed some things up with pygame. I found a few old reddit threads that included demos and discussions by searching "Numba pygame reddit".
@BrunoGallant
@BrunoGallant 2 месяца назад
Great production value. Thanks for the tips. Grumpy linux sysadmin here, definitively does not want to learn C++. With good speed, python is perfect.
@dougmercer
@dougmercer 2 месяца назад
Glad it was helpful! And definitely, I'm a big fan of "good enough" speed, and generally I can get that with Python
@PySnek
@PySnek Месяц назад
What about Nim?
@ThatJay283
@ThatJay283 2 месяца назад
with the c++ version, did you compile it with -O3 optimisations enabled?
@dougmercer
@dougmercer 2 месяца назад
Yup! gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@ThatJay283
@ThatJay283 2 месяца назад
@@dougmercer thanks! i just managed to get it 169% faster (see fork). still, the speed improvements offered by numba, pyx, and taichi are really impressive :)
@dougmercer
@dougmercer 2 месяца назад
Very cool! Yesterday I implemented the 1D index approach (not nearly as cleverly-- just hand jammed the indexing arithmetic in line) and I got about 1.1-1.2x speed up. Does the noexcept make a difference in performance? Or is there something else causing the extra 0.4ish speed up 🤔
@lchunleo
@lchunleo 9 месяцев назад
Good work
@dougmercer
@dougmercer 9 месяцев назад
Thanks =]
@legion_prex3650
@legion_prex3650 2 месяца назад
Love you channel! Nice 80ies sound!
@dougmercer
@dougmercer 2 месяца назад
Thanks! I had fun choosing music for this one =]
@system64_MC
@system64_MC 3 месяца назад
What happens if you use the -O2 or -O3 optimisation flag for the C++ implementation?
@dougmercer
@dougmercer 3 месяца назад
I did compile with -O3 for my C++ test
@dougmercer
@dougmercer 3 месяца назад
gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3
@system64_MC
@system64_MC 3 месяца назад
@@dougmercer Oh, you did. This is surprising how Python can be faster than C++!
@dougmercer
@dougmercer 3 месяца назад
Definitely surprising! That said, I'm sure someone could write faster C++! But, it did beat my first attempt at translating the code into C++ ¯\_(ツ)_/¯
@timlambe8837
@timlambe8837 7 месяцев назад
Really interresting Video. I‘d love to learn more about it. Maybe I will be laughed at for this statement, but even with this video i feel like bringing python to C-Level performance seems to be quite a bit of an effort. Isnt it worth it to learn C/C++ for special tasks? How would you evaluate the developer‘s expirience comparing „Make everything possible with Python“ with „Learning C/C++ or Rust“? Thanks a Lot!
@dougmercer
@dougmercer 7 месяцев назад
You're right! It's not easy to get C++ performance in Python. I think these tools are appropriate when there are a few "hot spots" in your code, but the majority of your application benefits from Python's ecosystem. It's possible to directly build C extensions and call them from python, but I think these tools are way easier. For some (new) projects, it might make sense to write the whole thing in Rust from the start. In practice, most of my projects use a lot of Python libraries, and my team is not very flexible (they mostly only know Python), so it'd be pretty disruptive if I wrote a critical component in a different language and with different tooling. Good question! (Sorry I don't have a good answer =P)
@timlambe8837
@timlambe8837 7 месяцев назад
@@dougmercer that is indeed a good answer, thanks. Since I am working in the Data analysis field (geospatial) I love Python for its possibilities. I was wondering if it makes sense to learn another language for intensive calculations like C++. But think I will try your tools 😊 Many thanks!
@helkindown
@helkindown 2 месяца назад
Great video! From what I've tested, your C++ code is good enough. The main bottleneck of your code seems to be the dp result variable. I was able to double the speed (from 3.78832 to 1.77546 seconds) by replacing dp 2D array by two 1D arrays: one "current row" array and "previous row" array, and swapping references around at each iteration. This probably because the code don't have as many cache misses by not fetching new rows of the "dp" array, which are filled by zeros anyway. I did not test this with the Python code, but the same speedup should be obtainable by using two variable (or an tuple of 2 arrays) to keep up with C++.
@dougmercer
@dougmercer 2 месяца назад
Good point! I may have to re-run this experiment at some point-- I wonder how Numba/cython would perform with that more memory efficient approach 🤔
@OliverBatchelor
@OliverBatchelor 3 месяца назад
Taichi for the win. You didn't even use GPU programming with it, which is all I do - the inter-op with torch is excellent and works the same way as the ndarray.
@dougmercer
@dougmercer 3 месяца назад
Taichi was super fun. I did use GPU (well, metal) for rendering the fractal animation. Was pleasantly surprised at how easy it was.
@OliverBatchelor
@OliverBatchelor 3 месяца назад
@@dougmercerSorry that possibly came out the wrong way - I meant that you did a great job demonstrating it *even without* using the GPU!
@dougmercer
@dougmercer 3 месяца назад
Oh, I see now-- hah! Thanks =] I definitely would like to try using Taichi for an ML project. Taichi + Torch seems like a great fit. Do you have any open source projects you've done with it? (I have skimmed through the docs section involving torch, but haven't looked at real projects). I also thought it might be fun to make a "shader" to process video (but I can't for the life of me figure out how to extend Davinci Resolve with Python code, so that's kind of an unrelated blocker).
@OliverBatchelor
@OliverBatchelor 3 месяца назад
@rcer Yep! A few now - most of them are for bits and pieces I do at work, and largely undocumented e.g. for an HDR Camera ISP pipeline or a spatial subdivision grid for distance queries. By far the biggest one so far is a Taichi library for Gaussian Splatting rasterization, I called it taichi-splatting (distinct from original taichi_3d_gaussian_splatting, which it originally derived from but is very different now!). It has a few rough edges but I think it has enabled quite a clean yet performant implementation. I replied a yesterday but I see my comment is nowhere to be seen I think because I put a link in here, so I haven't this time! I must admit that before watching this video I did not realise that the CPU implementation in taichi performed so well, especially with the outer loop serialised!
@dougmercer
@dougmercer 3 месяца назад
Oh right! I saw your posts in the discord-- I read through the readme a bit. It looks very interesting-- I'll take another look at the source code sometime tomorrow. And sorry about the link issue! For some reason it's not showing up in comments held for review on the mobile app. I'll check in a browser tomorrow and hopefully approve it (if not, RU-vid totally ate it-- sorry)
@varunbhaaskar3338
@varunbhaaskar3338 Месяц назад
how many of them are production ready? is there anything like this that is production ready?
@dougmercer
@dougmercer Месяц назад
I would say Cython and Numba are definitely "production ready"
@imadlatch7206
@imadlatch7206 3 месяца назад
we just use pypy as interpreter, no need anything else
@dougmercer
@dougmercer 3 месяца назад
Yeah, pypy is a great option
@Caspar__
@Caspar__ 2 месяца назад
But most of the time I use pyhton libraries. Can I just in time copile those as well?
@dougmercer
@dougmercer 2 месяца назад
I do not believe any of these options will compile or JIT third party libraries. If I'm wrong, hopefully someone will correct me. That said, you can try using a different a Python interpreter altogether. PyPy would JIT whatever code it runs (but you need to use the PyPy interpreter instead of CPython)
@Caspar__
@Caspar__ 2 месяца назад
@@dougmercer Thanks a lot : )
@janAkaliKilo
@janAkaliKilo 2 месяца назад
Another option - learn Nim. It is an easy to learn language with a pythonic syntax. Because Nim is a compiled language, it's speed is on par with C, C++ and Rust.
@dougmercer
@dougmercer 2 месяца назад
I've been meaning to give it a shot... It definitely seems very approachable
@Petch85
@Petch85 7 месяцев назад
Grate video. I will give numba a try... I use numpy all the time, and that is super fast for my work. But I always end up needing to plot some numbers, and save it as a png file or something. I use matplotlib, and most of the time i can read and manipulate my data i lest than 0.1 sec. But then making the plot takes maybe 1 sec, and saving the png file also take 1 sec. Is there anything I could do. (I have more than one file of data, and need more than one plot saved... I know 3 sec do not seem like a long time, but it adds up)
@dougmercer
@dougmercer 7 месяцев назад
Hmm, I don't have any sure-fire recommendations. Could potentially try using multiprocessing if your plotting function is easy to map over an iterable of inputs? That way you can maybe speed up by the number of cores your CPU has.
@lbgstzockt8493
@lbgstzockt8493 2 месяца назад
Are you showing the plot? There is a way to not show the plot windows but still save to a file, it is still slow but much less than two seconds.
@Zeioth
@Zeioth 2 месяца назад
I'm missing nuitka on that comparison, but very cool.
@dougmercer
@dougmercer 2 месяца назад
I've never tried it! Does it work well? I'll have to mess with it sometime 🤔 That said, I am working on a video where I cover one library that I wanted to include in this video (PyPy).
@cleteblackwell1706
@cleteblackwell1706 2 месяца назад
Can you do these kinds of comparisons for building flask apps?
@dougmercer
@dougmercer 2 месяца назад
Hmm, what specifically did you have in mind? As an aside, I typically use FastAPI for Python web projects, but have used Flask in the past
@cleteblackwell1706
@cleteblackwell1706 2 месяца назад
Either is fine. Maybe an api that calls a couple other APIs and reads from a database. That would be your typical business api.
@budidarmawan6959
@budidarmawan6959 2 месяца назад
this is a very nice video.
@dougmercer
@dougmercer 2 месяца назад
Thanks =]
@ethan91372
@ethan91372 2 месяца назад
4:00 where do you get this footage?
@dougmercer
@dougmercer 2 месяца назад
Storyblocks
@nathan22211
@nathan22211 2 дня назад
I feel like you could get similar performance using lupa + lua_importer or nimporter/nython. Both lua and nim are similar in difficulty to python, though I think nim is somewhat like rust when it comes to how to code it.
@dougmercer
@dougmercer 2 дня назад
This is my first time hearing about either of those. Very interesting 🤔
@jimmysaxblack
@jimmysaxblack 3 месяца назад
fantastic thanks a lot
@dougmercer
@dougmercer 3 месяца назад
Glad it was helpful =]
@BobbyMully
@BobbyMully 2 месяца назад
75% of use cases you run into, it'll be fine to just use Python.
@dougmercer
@dougmercer 2 месяца назад
Definitely agree
@nevokrien95
@nevokrien95 Месяц назад
More like 90%... I am trying to find an excuse to use c and its actually very hard finding something thst dosent have optimized code for you
@dougmercer
@dougmercer Месяц назад
@@nevokrien95 same-- I need to carve out time to learn Go this year but have literally no reason to do so ¯\_(ツ)_/¯
@nevokrien95
@nevokrien95 Месяц назад
@dougmercer i am using it to write a proxy server that I switches between vpn connections. Go let's u do networking stuff python. Just can't.
@rm9050
@rm9050 6 месяцев назад
Is useful use Taichi for load csv like pandas? I discover dask and is fantastic
@dougmercer
@dougmercer 6 месяцев назад
Hmm, I might be wrong, but I don't believe Taichi has any filesystem support. I believe the simple thing to do would be to read data in Python and pass it to Taichi for processing. That said, I love Dask and Pandas! They rock!
@stereoplegic
@stereoplegic 2 месяца назад
Polars is faster than Pandas with almost identical API, right?
@dougmercer
@dougmercer 2 месяца назад
Yes, it is. I'm actually working on a video that talks about trying to read a very large CSV file and do some basic number crunching with it. (The one billion rows challenge, 1brc, but in Python) Spoiler alert, Polars and Duckdb are great choices.
@incremental_failure
@incremental_failure 2 месяца назад
Polars is by far the fastest to load CSV. It might even be faster when you load in polars and convert to pandas.
@arta6183
@arta6183 5 месяцев назад
Can you also share the C++ code? It's very easy to write slow C++ code. If the code involves vectors, then AVX optimizations can drastically improve performance on x86 CPUs.
@dougmercer
@dougmercer 5 месяцев назад
Hey @arta6183 - Sure! Here's a link to the code and compile command in a gist - gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3 Note-- this algorithm is inherently *not* parallelizable unless you do some really wonky stuff (wave front optimization). So, I'm not sure if AVX will help. That said, I would love to see you squeeze 10x more performance out of it and share a gist back to me. Like I said in the video -- I only know the absolute basics of C++, so my C++ code is *bad*.
@Big_bangx
@Big_bangx 3 дня назад
What about optimizing your C++ implementation instead to go faster ?
@marcelobravo3074
@marcelobravo3074 5 месяцев назад
this is gold
@dougmercer
@dougmercer 5 месяцев назад
Thanks! Glad you liked it =]
@sootguy
@sootguy 2 месяца назад
what about pypy?
@dougmercer
@dougmercer 2 месяца назад
I'm working on a video that uses it right now =]
@markkim5117
@markkim5117 4 месяца назад
WOW I'm impressed!
@dougmercer
@dougmercer 4 месяца назад
Thanks! =]
@Uveryahi
@Uveryahi 2 месяца назад
Came for the video, stayed for the stock footage inserts x)
@dougmercer
@dougmercer 2 месяца назад
=] I also used Nosferatu in my other video called "Your code is almost entirely untested"... I wonder what it means that I keep putting horror movie clips into my Python explainers 🤔
@dudaseifert
@dudaseifert Месяц назад
If it ran faster than your c++ code, there is a problem with your c++ code. It's basically impossible to run faster
@Angel33Demon666
@Angel33Demon666 Месяц назад
How does this compare with Julia? I found that its fast just out of the box
@dougmercer
@dougmercer Месяц назад
I didn't try Julia, but I've used it a bit in the past and it is quite fast. In a future video, I'd like to throw Julia and Nim into the mix
@mayankmaurya8631
@mayankmaurya8631 2 месяца назад
Ideally, C++ can't be slower than any language's any kind of implementation because in C++ you can literally write hardware-level controls. What I'm saying is your C++ code was not very well written. The machine code that taichi or numba produced can be also produced by C++, so it was not a good comparison.
@dougmercer
@dougmercer 2 месяца назад
@mayankmaurya8631, I think you missed the point of the video. For a Python developer trying to accelerate a hot spot in their code numba, cython, and taichi were found to be just about as fast as C++. So, rather than hand writing C++ and complicating their build system, they can pip install numba and get just as good performance for very little work.
@vonnikon
@vonnikon Месяц назад
"Hand writing C++" Python is not "hand written"? Most of the fasy Python solutions presented in the video resulted in messy code and/or dependency/compatibility. C++ has none of those problems. I suppose that's the real point of the video. Can you make Python run fast? Yes, but it is easier and more maintainable to get the same result using C++.
@dearheart2
@dearheart2 Месяц назад
I wish all videos (no just youtube) has voice and music as separate channels. I hate music in educational videos.
@francescotomba1350
@francescotomba1350 2 месяца назад
Did you compile with -O3 in c++?
@dougmercer
@dougmercer 2 месяца назад
Yup! gist.github.com/dougmercer/1a0fab15abf45d836c2290b98e6c1cd3 Some people in comments have gotten between 1.1-1.7x speed up through other improvements, but it doesn't really change the narrative much: these compiled Python tools frequently give good enough performance
@francescotomba1350
@francescotomba1350 2 месяца назад
@@dougmercer thank you! I think is really problem dependent. In some codebases I worked I had for example a 40x speed up over cython or numba by embedding very very small pure C functions using ctypes.
@dougmercer
@dougmercer 2 месяца назад
Oh definitely agree. Squeezing out performance is always "it depends" and "did you profile it?"
@francescotomba1350
@francescotomba1350 2 месяца назад
Yes, in my case there were two issues, the first was that cython for some things relies on the python interpreter if data and objects are not managed in the most cythonic way, the second was cache misses. I was working on a kd-tree implementation and a tiny detail on how nodes are managed let me cut out on cache misses during tree traversal. For that purpose I used perf to sample from the process but I know for sure that there are many other options for doing that.
@francescotomba1350
@francescotomba1350 2 месяца назад
​@@dougmercer Moreover, numba is a life saver if you need performance on the fly without many refactors.
@tzimisce1753
@tzimisce1753 2 месяца назад
One language to rule them all.
@dougmercer
@dougmercer 2 месяца назад
=]
Далее
How Fast can Python Parse 1 Billion Rows of Data?
16:31
Python's 5 Worst Features
19:44
Просмотров 80 тыс.
HELLUVA BOSS - THE FULL MOON  // S2: Episode 8
23:10
Просмотров 4,4 млн
Make Python code 1000x Faster with Numba
20:33
Просмотров 438 тыс.
25 nooby Python habits you need to ditch
9:12
Просмотров 1,7 млн
PLEASE Use These 5 Python Decorators
20:12
Просмотров 89 тыс.
How to convert Python to Cython (and Speed Up 100X)
14:46
5 Good Python Habits
17:35
Просмотров 338 тыс.
Unlocking your CPU cores in Python (multiprocessing)
12:16
I Rewrote This Entire Main File // Code Review
16:08
Просмотров 104 тыс.
Modern Python logging
21:32
Просмотров 147 тыс.
HELLUVA BOSS - THE FULL MOON  // S2: Episode 8
23:10
Просмотров 4,4 млн