Back to Basics: Concurrency - Arthur O'Dwyer - CppCon 2020

Подписаться 149 тыс.

Просмотров 99 тыс.

50% 1

cppcon.org/
github.com/CppCon/CppCon2020/...
---
One of C++11's flagship features was the introduction of std::thread, along with a complete suite of synchronization primitives and useful patterns such as thread-safe static initialization. In this session, we'll motivate C++11's threading model and explain how to use std::thread effectively. We'll compare and contrast the C++11 synchronization primitives (mutex, condition variable, reader-writer lock, and once-flag) as well as the primitives that are new in C++20 (semaphore, latch, and barrier). In particular, we'll show how to make a mutex and a condition variable work together.
When using threads, it's important to avoid shared mutable state. We'll show how to tame that state via the "blue/green deployment" pattern, and briefly discuss how to use std::future and std::async to safely handle threads that produce answers.
Attendees will leave this session with a strong grasp on "multithreading tactics" in C++11 and beyond.
---
Arthur O'Dwyer is the author of "Mastering the C++17 STL" (Packt 2017) and of professional training courses such as "Intro to C++," "Classic STL: Algorithms, Containers, Iterators," and "The STL From Scratch." (Ask me about training your new hires!) Arthur is occasionally active on the C++ Standards Committee and has a blog mostly about C++. He is also the author of "Colossal Cave: The Board Game," an avid club juggler, and the recipient of four
---
Streamed & Edited by Digital Medium Ltd - events.digital-medium.co.uk
events@digital-medium.co.uk
*-----*
Register Now For CppCon 2022: cppcon.org/registration/
*-----*

Наука

Опубликовано:

4 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 72

@kamilziemian995 Год назад

I love "Back to Basics" talks. You can learn so much from them.

@stemei86 Год назад

I love the bathroom analogy. Two persons trying to use the toilet without synchronization can lead to very undefined behavior.

@aprasath1 22 дня назад

wonderful talk and gave a very nice overview of all constructs in just 1 hr. Amazing.

@kajonkeatirattanasinchai1368 Год назад

Not just informative, educative, but also entertaining! Thanks for the talk. :-)

@CppCon Год назад

Glad you enjoyed it!

@strakhov 2 года назад

Thank you very much Arthur, a very well prepared talk and slides!

@mauricio-poppe 3 года назад

Just what I needed, thank you!

@NKernytskyy 3 года назад

Excellent lecture for noob like myself. Concurrency 101 Primer done right!

@guanwang 3 года назад

Awesome talk!! Learnt a lot. Thank you! 👍

@CppCon 3 года назад

Glad to hear it!

@on2k23nm 6 месяцев назад

Back to Basics is excellent !

@chaicblack7415 3 месяца назад

Very grear talk! Thank you very much! Arthur

@jjp8710 3 года назад

This is an amazing talk, well explained and very useful, Thanks

@CppCon 3 года назад

Glad it was helpful!

@Gloryisfood 2 года назад

Good material, learnt a lot. Thanks!

@CppCon 2 года назад

Glad to hear it!

@intvnut Год назад

The blue/green pattern at the end of the talk sounds a lot like the Read-Copy-Update pattern used in the Linux kernel. RCU does a bit more, by tracking readers and serializing writers with a mutex. The blue/green pattern is more like a CAS-based optimistic copy/update pattern which won't tell you how many readers are outstanding unless you hold onto the old `blue`. (That could matter if you want to determine when "everybody" sees the new setting.) As used here, the `shared_ptr` avoids the ABA problem we normally have to worry about with CAS based optimistic updates, so that much is nice.

@azoller 10 месяцев назад

Thanks, Arthur!

@think2086 3 года назад

As for why we call it "blocking," I finally understood the other day: think of it as traffic. There are only so many pipelines in the CPU. If a software thread occupies one of them, like a car occupies a lane of a highway, instead of going away while waiting, so other code on another software thread can use those hardware threads in the mean time, it rather just sits there and blocks all the work piling up behind it. It'd be like you stopping your car in the middle of the freeway to text back your Tinder crush right away, instead of pulling over to the side of the road first. In either case, you're not making forward progress as you wait to finish the texting session, but in the former case you also block everyone else that could have used that lane.

@WyMustIGo 3 года назад

I agree that busy waiting is foolish, but... this would prevent the compiler optimization: volatile std::atomic ready;

@retropaganda8442 3 года назад

I think one important point could have made it in this introduction: describe what happens when an exception is thrown from within a thread. Probably most of the time, you don't want the whole process to be shutdown, and be able to handle the exception in the original/main thread, but, that's not what happens by default. The way we can pass an exception from one thread to the other is nonobvious and so it's interesting to teach. That's why I think it's important enough to be talked about in the first introduction. Random minor observations: 24:56 You skipped over the opportunity to talk about the `mutable` keyword. On the screen, the num_tokens_available function is const, but you're modifying the mtx_ member variable, so I think it's not going to compile unless it's declared mutable. 34:36 You mention that spurious wakeups are not a concern for now, but you actually have the perfect example case on the screen: people might be tempted to replace the `while` with an `if`. I think people should be taught to always use the other overload of the `wait` function that takes the condition predicate as the second argument, in the form of a lambda, because it does the while-loop by itself, so there's no risk someone who's not aware of spurious wakeups might misuse the condition variable.

@DamianReloaded 3 года назад

Good talk. Thanks!

@HasanTezcan1905 2 года назад

I am not sure if someone else did comment on this, but I think the 'optimazation' mentioned at 5:30 into talk might be wrong....the initialization should come after sleep(200).

@MarcoBergamin 3 года назад

Nice talk!

@mckdoeful 3 года назад

very good talk

@thorsten9211 3 года назад

Thank You!

@MaceUA 3 года назад

18:00 does the compiler really have a right to optimize away the checks on the atomic variable, even when it is shared with another thread?

@Quuxplusone 3 года назад

I had thought so, but it appears I was wrong; the compiler cannot hoist the atomic read out of the loop. The advice at @17:54 is still correct - you should still be very careful and beware of situations where the compiler might reorder or coalesce or eliminate atomic accesses in general - but this particular while-loop is NOT actually such a situation. For some (mostly theoretical) examples of how a compiler might optimize atomic access patterns, see JF Bastien's N4455 "No Sane Compiler Would Optimize Atomics" www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html

@nickknight738 3 года назад

@@Quuxplusone Herb Sutter in some of his presentations promotes spinlocks - which I was surprised about but after watching him I bought into it. He places requirements on the lock - i.e. limit what code the lock is covering. Using a printf that might block would be bad for a spinlock though.

@creativeprocessingunitmk1587 Год назад

The read is prevented from moving using default memory order, “sequentially consistent”. The presenter was incorrect

@Peter_Cordes 4 месяца назад

No, Arthur got this wrong. Compilers have to assume that other threads *are* changing the values of std::atomic objects between reads, because concurrent read + write is not data race UB. (For non-atomic variables, data race UB is what allows this transformation which compilers do in practice.) (6.9.2.3) Forward progress / 18 : An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads *in a finite period of time.* (33.5.4) Order and consistency [atomics.order] / 11: Implementations should make atomic stores visible to atomic loads *within a reasonable amount of time.* These are only "should" requirements in the ISO C++ standard, but this transformation would make an implementation useless in practice. (The reason those requirements aren't stronger than "should" is to allow running multi-threaded programs on a heavily-loaded OS with a scheduler that doesn't guarantee fairness, e.g. that can starve a thread indefinitely if there are other high-priority (or real-time priority) threads taking all the CPU time. Not because the authors of the standard think it's reasonable for a compiler to statically decide that a loop should never see a store if it wasn't visible before the first iteration.) All of this applies even with memory_order_relaxed, @creativeprocessingunitmk1587 . This hypothetical transformation wouldn't reorder the load wrt. any other accesses so seq_cst isn't involved.

@antonfernando8409 2 года назад

Pretty cool talk.

@MaceUA 3 года назад

58:00 is there a typo in the `while` loop condition, or have I misunderstood something? If `compare_exchange_strong` returns `true`, the loop will start all over. But `true` result would mean that we have successfully inserted `green` into `g_config`, so we don't need to iterate anymore. Thus, shouldn't it be `... while (!g_config.compare_exchange_strong(...))` (with a negation), so that we would finish the loop as soon as it returns `true` (and continue iterating while it returns `false`)?

@Quuxplusone 3 года назад

You're correct, there's a typo on slide 49. The while-loop condition should have a "!" at the front. Thanks!

@mworld 7 месяцев назад

It's very handy when the data you want to work on is already broken up, e.g multiple files. Just process each file into it's own location in memory then aggregate the multiple data structures back together. Sometimes it makes sense to not write everything to the same data structure (array, vector, etc) for all your threads. Mutex/locks are slow, my suggestion is try to avoid them if possible.

@SamWhitlock 3 года назад

Does std::latch imply a memory fence for the threads arriving at it? (not that that's necessarily a great pattern)

@avedissimracing9628 2 года назад

Thank you for the great talk, there's a whole bunch of useful things here. Is there a typo(s) on slide at 41:25? Capturing conn_ is missing in lambda and there's an extra asterisk after return, it's a reference, not a pointer, you don't need to dereference it.

@Quuxplusone 2 года назад

You're right that the lambda is missing a capture: it should be [&], not []. But the asterisk in `return *conn_;` is correct: conn_ is an optional, and we want to return a Connection&, which means we need to crack open the optional to get at its actual Connection value. We can do that either as `return conn_.value()` (which will check conn_.has_value() and throw if it doesn't) or `return *conn_` (which doesn't bother with the check).

@PeteBrubaker Год назад

Also why do you say "puttr" when "pointer" is the same number of syllables?

@greatbullet7372 3 года назад

Time-Sharing is basicly another terminology for Scheduling

@jackw7714 3 года назад

At 16:51, can the compiler really bring the atomic read out the loop? There's a memory barrier, so surely not? Obviously, if it was just a non-atomic bool it could though.

@BlairdBlaird 2 года назад

There is no memory barrier (there's an atomic load which is a very different thing), but even if there was it would not make a difference, because "ready is never set" is a completely valid execution: atomics work in terms of happens-before and happens-after, and only ensure consistency within those bounds (as well as the access itself being atomic obviously). Here the code will *probably* happen to work because compilers don't bother too much around atomics currently (as it's fiddly to optimise and pretty unlikely to yield much benefit), but per-standard it's absolutely not *guaranteed* to work.

@creativeprocessingunitmk1587 Год назад

The read is prevented from moving using default memory order, “sequentially consistent”. The presenter was incorrect

@SheelByTorn Год назад

Why are we calling mtx.lock(); mtx.unlock(); side-by-side inside the threadB at 19:49?

@amimf 3 года назад

Great talk, but is it really true that a spinning lock on an atomic is undefined behavior as mentioned at 17:25 ?

@creativeprocessingunitmk1587 Год назад

The read is prevented from moving using default memory order, “sequentially consistent”. The presenter was incorrect

@konstantinburlachenko2843 2 года назад

Does thread::join guarantee that memory fence will be inserted into a thread for which join is called?

@tourdesource Год назад

Can I quote you on the volatile thing? The way you said that cracked me up.

@MagnificentImbecil 2 года назад

This is a great presentation. Thank you. On Slide 49 (The "blue/green" pattern (write-side), I wonder whether an ABBA bug exists, with the possible result that: - thread B loads the `g_config` global shared_ptr to the `blue` local `shared_ptr`, then makes a local copy of `*blue`, then modifies the local copy, then compares `g_config` against `blue` and finds them equal and modifies `g_config`; - but, in the meanwhile, after thread B has loaded `g_config` into its `blue`, other threads have made several modifications, with the caveat that the latest `ConfigMap` happens to reside in memory at the same address used for an older `ConfigMap` -- thus thread B "finds [them] equal". Perhaps a solution to this is introducing an immovable version number. Comparing ever-incrented version numbers is safer than comparing addresses of objects: the former cannot be re-used, the latter can be re-used.

@Quuxplusone 2 года назад

There *is* a (typo-) bug in the blue/green example (see MaceUA's comment "58:00 is there a typo..."), but it's not an ABA problem. Your scenario can't happen because it depends on the idea that thread A can ever allocate the latest `ConfigMap` at the same address as the `*blue` originally observed by thread B. But that would imply that the object at `*blue` had been destroyed and deallocated... and that can't have happened, because thread B has been holding a shared_ptr on `blue` the entire time. If variable `blue` were a raw pointer, then we'd have that ABA problem, and worse, because `*blue` might get deallocated by some other thread right in the middle of our copy-constructing it into `*green`! By making `g_config` and `blue` into shared_ptrs, we eliminate that particular possibility.

@akyeren 2 года назад

That is solved by the compare_exchange_strong call in the while condition. The function checks if the current representation of the config matches the expected (blue), if so, modify the value and return true, but if there were other threads already changed the config, the function will return false. Actually there's a typo in Authur's code, it should rather be `while(!compare_exchange_string(...));`, when the modification fails, then it retries.

@konstantinburlachenko2843 2 года назад

Does std::atomic writes guarantee memory fence?

@videofountain 2 года назад

On Slide 6 ... there is a cacheLine (without a number), cacheLine1, and cacheLine2. Is cacheLine(without a number) intentional?

@steveneumeyer681 2 года назад

don't use volatile ever - can you explain why not?

@shargor 4 месяца назад

Any references to the "bigger boat"?

@Zettymaster Год назад

Is it really UB if i spin on an atomic? since the comparison operators have an implicit load with a memory order of std::memory_order_seq_cst, so the compiler is not allowed to optimize the load away. if it was a non atomic bool i would 100% agree, but not with atomics.

@Peter_Cordes 4 месяца назад

No, Arthur got this wrong. Compilers have to assume that other threads *are* changing the values of std::atomic objects between reads, because concurrent read + write is not data race UB. (For non-atomic variables, data race UB is what allows this transformation which compilers do in practice.) (6.9.2.3) Forward progress / 18 : An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads *in a finite period of time.* (33.5.4) Order and consistency [atomics.order] / 11: Implementations should make atomic stores visible to atomic loads *within a reasonable amount of time.* These are only "should" requirements in the ISO C++ standard, but this transformation would make an implementation useless in practice. (The reason those requirements aren't stronger than "should" is to allow running multi-threaded programs on a heavily-loaded OS with a scheduler that doesn't guarantee fairness, e.g. that can starve a thread indefinitely if there are other high-priority (or real-time priority) threads taking all the CPU time. Not because the authors of the standard think it's reasonable for a compiler to statically decide that a loop should never see a store if it wasn't visible before the first iteration.) All of this applies even with memory_order_relaxed. This hypothetical transformation wouldn't reorder the load wrt. any other accesses so seq_cst isn't involved. RU-vid deletes comments with links, but search for Why set the stop flag using `memory_order_seq_cst`, if you check it with `memory_order_relaxed` on Stack Overflow

@konstantinburlachenko2843 2 года назад

It’s most likely that code in slide 10 is incorrect - there is no volatile type specifier for result variable. Please correct me if I am wrong.

@Quuxplusone 2 года назад

You are, in fact, wrong about that. :) C++11 introduced a formal memory model that specifies exactly when one event may be guaranteed to happen-before another. On slide 10, we are guaranteed that the write happens-before the thread exits (because they happen in the same thread); and the thread exit happens-before `join` returns (because thread::join() is a synchronization point); and the return from `join` happens-before the read of `result` (again because they happen in the same thread). So the read and the write definitely don't race with each other; the write definitely happens first. Adding `volatile` to this code wouldn't make it any more or less correct; and vice versa, adding `volatile` to wrong code generally doesn't make it correct. As to what the compiler/hardware have to do, involving memory fences and cache coherency and so on, in order to make that read load the correct value from memory, "that's not my department"; all I'm telling you is what the correct value _must be_ in this case, according to the C++ Standard. On the topic of `volatile` specifically, see quuxplusone.github.io/blog/2022/01/28/volatile-means-it-really-happens/

@PeteBrubaker Год назад

"If you have to ask the question you probably shouldn't be doing it yourself." How else are you supposed to learn. Not a great response there...

@myown236 3 года назад

How to notify worker threads to stop working and terminate?

@retropaganda8442 3 года назад

Using a condition variable

@Peter_Cordes 4 месяца назад

With an atomic that you check every so often. (Or with C++20 std::jthread which has cancel functionality.) Arthur is wrong about compilers being allowed to assume it doesn't change. Compilers have to assume that other threads *are* changing the values of std::atomic objects between reads, because concurrent read + write is not data race UB. (For non-atomic variables, data race UB is what allows this transformation which compilers do in practice.) (6.9.2.3) Forward progress / 18 : An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads *in a finite period of time.* (33.5.4) Order and consistency [atomics.order] / 11: Implementations should make atomic stores visible to atomic loads *within a reasonable amount of time.* These are only "should" requirements in the ISO C++ standard, but this transformation would make an implementation useless in practice. (The reason those requirements aren't stronger than "should" is to allow running multi-threaded programs on a heavily-loaded OS with a scheduler that doesn't guarantee fairness, e.g. that can starve a thread indefinitely if there are other high-priority (or real-time priority) threads taking all the CPU time. Not because the authors of the standard think it's reasonable for a compiler to statically decide that a loop should never see a store if it wasn't visible before the first iteration.) Search for Why set the stop flag using `memory_order_seq_cst`, if you check it with `memory_order_relaxed` on Stack Overflow; you only need "relaxed" memory ordering for both setting and checking such a flag.

@konstantinburlachenko2843 2 года назад

I do not agree with slide 14. The busy wait is a solution if you do busy wait with thread yielding. If std;;mutex acquire requires kernel space object then typically system call is 1000 clocks. The lock is not free. So question how std::mutex is implemented.

@creativeprocessingunitmk1587 Год назад

The read is prevented from moving using default memory order, “sequentially consistent”. The presenter was incorrect

@Peter_Cordes 4 месяца назад

Arthur got this wrong. Compilers have to assume that other threads *are* changing the values of std::atomic objects between reads, because concurrent read + write is not data race UB. (For non-atomic variables, data race UB is what allows this transformation which compilers do in practice.) (6.9.2.3) Forward progress / 18 : An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads *in a finite period of time.* (33.5.4) Order and consistency [atomics.order] / 11: Implementations should make atomic stores visible to atomic loads *within a reasonable amount of time.* These are only "should" requirements in the ISO C++ standard, but this transformation would make an implementation useless in practice. (The reason those requirements aren't stronger than "should" is to allow running multi-threaded programs on a heavily-loaded OS with a scheduler that doesn't guarantee fairness, e.g. that can starve a thread indefinitely if there are other high-priority (or real-time priority) threads taking all the CPU time. Not because the authors of the standard think it's reasonable for a compiler to statically decide that a loop should never see a store if it wasn't visible before the first iteration.) All of this applies even with memory_order_relaxed, @creativeprocessingunitmk1587 . This hypothetical transformation wouldn't reorder the load wrt. any other accesses so seq_cst isn't involved. re: mutex: on Linux / glibc, std::mutex is "light weight", making no system calls in the uncontended case. It spin-waits for a short time, then makes a futex system call to sleep until notified. (Same API that C++ .wait() and .notify_one() use.) So if there's contention you get futex system calls, if not you just get atomic RMWs in user-space on the mutex object. This is usually better than a spin-wait loop, if you don't manually fall back to C++20 .wait(). IIRC, MSVC on Windows has a "heavy" std::mutex that always calls into the kernel for unique_lock, but maybe not for some other ways of using it. And yes, that's slow, with Spectre / Meltdown mitigations that's probaly even worse than 1000 cycles.

@FrankMadero 2 года назад

@45:57 is says never to use volatile but it is common to make variable volatile when it is assigned to say chip select pin for some type of communication on a bare metal. I dont understand his response.