Thank you for the very helpful talk. I had already spent a bit of time trying to speed up my code which makes heavy use of Pandas. I had made some good progress by using cProfile, avoiding iterrows and apply, and using the Pandas series columns as much as possible. Based on this talk, I tried the line_profiler for the first time, and it is really informative! Also using the vectorization with numpy arrays was so easy and made a huge difference. Thanks again.
Very informative and helpful talk for Data Scientists. I have been moving more towards vectorization after organically noticing the steep performance hits by using iterrows and apply compared to vectorized inputs.
Wow, I'm really going to take the numpy array vs Series performance hit to heart. All that index alignment operations must make a significant overhead rather than strictly elementwise operations. I have a nasty Python Enum.Flag columnwise aggregation operation in my code that's taking a substantial amount of time. Thanks for the solid tips.
Hi, this is a great talk. I was puzzled by the first part related to vectorization. How on earth can the function accept a scalar or a vector without crashing...Until I realized that the haversine function contains only operations that are, in fact, all already defined in Numpy. Any functions which is not native for arrays would not work this way. It may be worth pointed it out. :-)
Well, I would strongly suggest to data scientists to extend their teams to professional programmers if code performance becomes an issue. All python libraries base at some point on C code. Shortening the calls to the native plattform libraries will only improve performance up to a certain point. The reason for this are the C compilers and the coding. C compilers accept dozens of arguments which improve the performance and solutions can be written in dozen of ways. Both factor combined offer huge potential for optimization. Lastly I tuned a C code processing 20 GB in average (unfortunately) sequentially on one processor core in 40 minutes down to 10 minutes without major source code changes. I even could do better but I reached the boundaries of the IO throughput of the server...