No video :(

Evolution of a Median Algorithm in C++ - Pete Isensee - CppCon 2023

Подписаться 152 тыс.

Просмотров 7 тыс.

50% 1

cppcon.org/
---
Evolution of a Median Algorithm in C++ - Pete Isensee - CppCon 2023
github.com/Cpp...
You won’t find std::median in ＜algorithm＞. Should it be? Come on a journey to explore various implementations of median(), starting from specific use cases to progressively more generic versions. We’ll tune for performance and integrate more advanced features like execution policies and concepts. We’ll discuss the pros and cons of various approaches, and you can make your own decision about whether this algorithm and related statistical functions belong in the Standard Library.
---
Pete Isensee
Pete Isensee has been in the digital entertainment business since 1994. He's programmed video games, shipped three generations of Xbox consoles at Microsoft, created VR experiences at HBO, published dozens of technical articles, and taught C++ tips & tricks to game developers worldwide. Pete is currently a software engineering director at Reality Labs Research, helping unlock the potential of virtual and augmented reality. He enjoys jazz, wines and backpacking.
---
Videos Filmed & Edited by Bash Films: www.BashFilms.com
RU-vid Channel Managed by Digital Medium Ltd: events.digital...
---
Registration for CppCon: cppcon.org/reg...
#cppcon #cppprogramming #cpp

Опубликовано:

26 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 33

@StefaNoneD 5 месяцев назад

This talk was surprisingly good!

@user-ce9jx1bq5x 6 месяцев назад

I really love this presentation’s format! It feels very accessible.

@guiorgy 6 месяцев назад

Great talk, accessible to all levels of programmers

@ankos314 6 месяцев назад

Such a clear and helpful talk!

@CharlesHogg 5 месяцев назад

What a beautiful talk, and so well delivered too! One change I'd suggest is to keep the return type the same. When we exit the domain of the input types, the code gets harder to reason about. There will always be integral values that can't be represented in a floating point of the same size. If we started with integers, then let's stick with them! And if that means the median of 4 and 5 is 4, then so be it.

@rickjames1220 5 месяцев назад

Great talk! 🙂

@debajyotimajumder472 5 месяцев назад

Great talk. Loved the talk

@hymen0callis 6 месяцев назад

I really liked this speaker. Relatively simple topic, but presented in an entertaining and informative fashion. I took inspiration from this.

@kodirovsshik 5 месяцев назад

Awesome talk!

@johnnycashcow1130 6 месяцев назад

Awesome talk! Wish there were more talks like tgis

@CharlesHogg 5 месяцев назад

Good point about the permutation being both surprising, and necessary for performance. But, good news: copying is also O(N)! So you can take the container by value, and still have an O(N) algorithm without mutating the user's data.

@AlfredoCorrea 5 месяцев назад

I think that if the element type is arithmetic (midpoint is well defined) one can devise an algorithm that is N log N in the average case, by doing bisection on guesses for the median, without modifying the container or using extra space. The problem then would be how to call these algorithms when you have a non-modifying version and a modifying versions, or if just using overloads with the same name. Maybe the presented algorithm should be called median_range and the non modifying version should be called arithmetic_median_by_bisection, since they return different things.

@isodoubIet 3 месяца назад

Taking the range by value might still mutate the underlying container if it's a borrowed range.

@Antonio-yy2ec 6 месяцев назад

Pure gold

@adityatrivediii 6 месяцев назад

nice talk!

@onebronx 5 месяцев назад

Very nice and concise talk, with a smooth progression to the point. Love it. Wouldn't be better to pass a "midpoint policy" as a template parameter of the function, with the std::midpoint as a default policy?

@AlfredoCorrea 5 месяцев назад

I sounds to me that, for consistency, the max_element (even case) should be brought next to the mid element by an extra swap or by reverse first element. This in turn can simplify the result type since both iterators will be next to each other. The cost of the extra swap could be amortized by subsequent median calls on half ranges (quartiles) or by a total sort if it is necessary in an eventual next step.

@bruderdasisteinschwerermangel 6 месяцев назад

very interesting talk, and I yeah I think a percentile/median function in the standard library would be very nice to have. But to close off the story he started... if a beginner straight out of university wanted to write a template like this, I'd probably recommend them to stick to the 5 lines of code from the beginning. Generic algorithms are great but you shouldn't write a generic algorithm if you don't actually have a use for it.

@aaardvaaark 5 месяцев назад

Loved that. Was disappointed there was no time for questions at the end as they would've been interesting to hear. Near the start of the video, I paused it and googled algorithm for median, and there was mention of a "median of medians" algo being the optimum one. I'm left wondering if it would be faster than (and just as flexible as) the one presented.

@sinom 5 месяцев назад

nth element is a generalization of the median of medians! So if you give it a mid point it will be exactly the same as median of medians, while also supporting other split points with other inputs.

@axelriet 6 месяцев назад

You just “stole” one of my favorite interview questions 😊 Besides that the one answer that everyone is looking for is that of an indexed skiplist. Bonus point if it’s a *circular* indexed skiplist so you can compute the *moving* median, trimeans etc in O(logN) for any window size. I have a working implementation in C# that is fast enough to handle 15,000 book updates per second over 1,500 currency pairs from Binance (did this for an elusive trading bot that I never productized)

@fareloz 6 месяцев назад

14:45 But what about std::next, std::prev for iterators? In the code at this point we use operator+() instead which is not generic

@Roibarkan 6 месяцев назад

The presented algorithm calls nth_element, and thus requires that the range is a std::random_range. The std::random_access_iterator concept (which lies at the heart of std::ranges::random_access_range) is fine with using operator+() as means of increment. So, I believe there’s no value in using std::next() (nor std::ranges::next()) - the more generic and works for ‘lesser’ iterator categories (e.g. input, output, etc.) in this scenario.

@fareloz 6 месяцев назад

@@Roibarkan Agreed!

@borone8573 6 месяцев назад

I’m concerned about passing range as universal reference. It means that range must be copied or moved, doesn’t it? And it seems like we don’t copy anything, so const& won’t compile

@Roibarkan 6 месяцев назад

I believe the universal reference is always a reference, and won’t cost an extra copy. It’s not ideal because it will bind to a temporary container, and cause the algorithm to return dangling iterators into the container having been destructed. I think this issue is also relevant to most std::ranges algorithms (such as nth_element) so we’re in good company. I believe Nico Josuttis discusses this lifetime issue in his great back-to-basics talk: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-26aW6aBVpk0.html

@AlfredoCorrea 5 месяцев назад

@Roibarkan It is not a matter of “ideal” or not, Universal references are necessary because they can take subranges that are temporaries but do not own the data or even have well defined iterators even after their destruction. @borone8573 And no, they can bind to an existing object without copying or moving

@fareloz 6 месяцев назад

13:45 So the basic naïve algorithm works 1s on 10M data and is run once a day. Do we really need to squish out every millisecond out of it? This is an example of when generic algorithm is worse solution, because it requires much more time to develop (hitting a lot of snags and investigation time), much more expertise and the outcome is much more cumbersome to read and maintain. Sometimes small unoptimized ad-hoc function is much better if it does the job for the task required.

@Roibarkan 6 месяцев назад

I think Pete’s goal wasn’t to show how a new-hire should develop on their first task at a company, but rather how a library such as the STL could solve an entire category of tasks (the median algorithm) for the entire c++ community

@doroke 5 месяцев назад

I have profiled library code from signal analysis that spent 99% of its time calculating medians. It is absolutely important to give developers of all skill levels access to the lowest complexity implementations that exist!

@BilingualHobo 5 месяцев назад

I recently had to rewrite a median filter algorithm that was taking 30ms to process ~10M pixels. The slowdown was the time it took to sort a 9 element kernel. And next I need to update it to get

@sinom 5 месяцев назад

As he also said in this talk, in some cases the naive algorithm was also just incorrect or had UB. If you're never gonna call it with an empty vector, only want to pass in vectors and not any other containers, and don't care about it being correct in the even case then sure, the original is fine. But if you care about any of those then you should at least use part of the generalization. Also if you can use a better standard algorithm (like in this case nth element with median of medians instead of sort) I don't see too much of a reason not to.

@TheOnlyAndreySotnikov 5 месяцев назад

The generic version should be something that flows under the fingertips of an experienced developer without thinking twice. It requires much more time to develop, only for a newbie. This is why newbies have to learn. The same is true for readability. If you think it's challenging to read, you should probably learn. Your complaint is akin to: "Let's not use integrals in calculus; they are too difficult to read for me."