Killing C++ Serialization Overhead & Complexity - Eyal Zedaka - CppCon 2022

Подписаться 149 тыс.

Просмотров 19 тыс.

50% 1

cppcon.org/
---
Killing C++ Serialization Overhead & Complexity - Eyal Zedaka - CppCon 2022
github.com/CppCon/CppCon2022
One of the hottest topics in C++ being data serialization, which is a major enabler for inter process communication, persisting the state of the program, or storing information in convenient way. That’s why there are countless data serialization frameworks out there, even ones made for cross programming-language communication.
When considering a serialization framework we face many considerations - convenience of use, performance, size of the library, size of the data, is the library applicable for embedded, support for cross language communication, build system dependencies (for those requiring to generate code in build time), and many more.
If I got you wondering about the above, and you are looking for some modern C++20 zero overhead macro-less serialization that requires close to nothing from you as a developer, together with almost unmatched performance, freestanding/embedded support out of the box, and a fair chance to blow your mind, this session is for you, hope to see you there!
---
Eyal Zedaka
Eyal Zedaka is a technical leader and C++ instructor, with more than 10 years of experience in C++, operating systems and device security. He is currently with Microsoft as principal manager, responsible for OS application sandboxing through virtualization. In the past, Eyal was manager of device security engineering of the Magic Leap augmented reality device, where he lead security features development and security research of the device, as well as the security architecture of the device and SoC from the requirement stage. Eyal designed C++ freestanding libraries for trusted execution environments, worked closely with embedded teams regarding C++ use in lower level areas. He has spoken at CppCon 2021 about using coroutines to implement C++ exceptions in freestanding environment and embedded systems.
---
Videos Filmed & Edited by Bash Films: www.BashFilms.com
RU-vid Channel Managed by Digital Medium Ltd events.digital-medium.co.uk
#cppcon #programming #serialization

Наука

Опубликовано:

10 июл 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 43

@piotrarturklos Год назад

TLDR: There are tricks that we can use in C++20 to approximate the reflection that's needed to do serialization without adding the serialize functions for each struct. It blows my mind that we can do this already. Also, older serialization libraries are inefficient for various reasons.

@mapron1 Год назад

Yeah, but problem it basically unsupports MSVC (compiler crashes on it, issue preset from September). Need to wait till library become less 'bleeding edge' :D

@kodirovsshik Год назад

What tricks are you guys talking about?

@tilkku Год назад

Good TLDR. The Reflection and Structures chapters were really interesting. I had already read P1061 but didn't realize it would be useful in cases like this too.

@mike200017 Год назад

I've known about this trick for a few years now (works in C++17, just uglier) and it's super useful to automatically generate things like string conversion functions, serialization, hash functions, lexicographic ordering operators, equality operators, just lots of boiler-plate that is trivial (and annoying) to write for any "aggregate type".

@scotthinton4610 6 месяцев назад

Very cool! I'm learning a lot from this.

@ninepoints5932 Год назад

With libraries that lean heavily on the compiler, it would be nice (critical even) to see compile time benchmarks, not just runtime.

@dandymcgee 6 месяцев назад

i had the same thought!

@stavb9400 Год назад

Nice presentation, funny enough I used a similar approach for a parser 2 years ago and seems to be the fastest way to go. For the time being I stay with slow protobuff as json formatting and Java and python interfacing are a must. Nice presentation of the benchmarks that put everything into perspective

@keris3920 3 месяца назад

I just started parsing the header files for the structure definitions and generating code for other languages from it. I learned that I can generate handlers and transformation routines this way as well, whereas those are limited in a json-like format.

@JohnDlugosz Год назад

how is the template zpp::bits::members implemented? The usage (at 42:37) is to look at a member of that type named members, but that would be the constructor so is it legal (and rational) to define a static const integer member with that name? Later, you used 'use serialize=' for a different purpose. Are these two uses of 'serialize' part of a general purpose customization template that supports different facets?

@eyalz7766 Год назад

The members is defined like this: template struct members { constexpr static std::size_t value = Count; }; So you are right that the slide has a mistake and the check should be "std::same_as" and then simply return "Type::serialize::value" For the second part the "using serialize = " is where I intend the customization will come from most of the times so yes.

@eLBehmo Год назад

The compact binary representation is a nice trick. Should be a good fit for temporary storing some objects in a stupid key-value-store system. All the tricks should be possible in c++17, maybe even c++14.

@mike200017 Год назад

Definitely possible in C++17, just uglier. The most interesting applications, IMHO, are large systems of concurrent tasks with pretty strict control over software versioning (maybe all running off the same build) and platforms.... cough... mapreduce.... cough... Basically volatile data pipelines. Issues of stability over time, platforms and languages makes this format not really appropriate for things like persistent storage and interoperability, in which case it's generally worth the serialization overhead (everything has a price after all).

@peregrin71 Год назад

If I understand correctly this serialization would only work between systems where types have the exact same size and endianness on serialization and deserialization end. And thus for example would not work between 32 and 64 bit systems (where ints have different sizes). Or have I missed something?

@eyalz7766 Год назад

It’s unlikely for ints to have different sizes but the intention is to use the “sized” integers between systems such as std::uint32_t and friends. Endianness is also configurable

@peregrin71 Год назад

@@eyalz7766 No ints will have different sizes on x86 and x64 architectures for example. The c++ standard does not require them to be 4 bytes.

@eyalz7766 Год назад

I said unlikely, not never - I do not myself however know a platform where plain int is 8 bytes on 64 bit system. In any case the way to go is the sized types as I mentioned in the comment above if you’re worried about this

@franziskusloibl2032 Год назад

Have I missed the point ? What about associations ?

@Voy2378 Год назад

Zero overhead manual code seems wrong, I would presume checking for size once is much faster than doing check for every member.

@eyalz7766 Год назад

You would have to iterate twice to check size, I doubt that presumption but the approach presented can support this strategy if this turns out to be more efficient, although it’s definitely more complicated in terms of the metaprogramming required. Also as an edge case accumulating the sizes in advance could overflow and will potentially require additional check to protect against that.

@PaulMetalhero Год назад

Very interesting project. Repo link?

@mapron1 Год назад

eyalz800/zpp_bits , it's on the one of first slides. btw it does not work with MSVC, I tried to do some dirty hacks to fix it, without success.

@PaulMetalhero Год назад

@@mapron1 thanks

@billp37abq Год назад

National Security Agency software mandates in the ~180s required that no software other than their apps be present in their apps . How is c++ software not reference by an app be removed?

@HamzaHajeir 10 месяцев назад

Perhaps you mean 851ns (nanoseconds)? Because 851ms is very long time IMO, for embedded systems it's a killer :)

@theoneandonlyyoko 9 месяцев назад

the benchmark was done 10 million times

@marcbotnope1728 Год назад

Sorry but a nanopb is an excellent solution for an embedded system

@eyalz7766 Год назад

However nanopb is C and cannot take C++ classes and serialize them in protobuf. I haven't tested nanopb to find out if it can generate better/faster code than zpp::bits, also zpp::bits protobuf implementation is very simple and does not try to be the fastest protobuf implementation out there, I found less need to optimize it when it was already faster than original protobuf for what I've tested, maybe one day I will try to find how I can optimize it further.

@marcbotnope1728 Год назад

@@eyalz7766 you will not be sending classes to a small embedded system (MCU) with 32k Flash and 16k RAM. You will be sending PODs over "serial" that you have packetized. Nano PB is like a few K of flash and a few K of ram, can fit almost anywhere you are likely to need this kind of serialization. zpp_bits fist example is std::string.. there is not std:: anything on a MCU.

@eyalz7766 Год назад

@@marcbotnope1728 I guess I don't see the issue, using classes such as std::string has nothing to do with how much flash or ram you have, if you go for freestanding mode and not linking with the standard library. Also you don't "send the class", you send the data referred to by the class, for std::string it's size and sequence of bytes. In runtime, the std::string is not any more than a pointer, size, capacity structure (ignoring SSO optimization), with fancy functions that manipulate it and mostly inline nicely into compact assembly. As long as you can resolve the situation like I did in the last demo - to work in a freestanding mode, then a lot of the STL library is going to be free in any embedded environment.

@HamzaHajeir 10 месяцев назад

@@marcbotnope1728 To me I've found recently that nanopb is complex and limited due to its C nature, perhaps all modern MCUs I know do support C++. With my research I've went with protozero, and I've not seen a significant increase in memory (perhaps in 20's kB), lots of modern MCUs come with +256 KB flash, reaching to several MBs. (the famous old STM32F103 have 128KB flash). nanopb Does serialize twice (first one is to measure the size), they stated that they made it less runtime efficient in order to optimize for the size. protozero does support parsing with getting a view of input data, which avoids copy, so I was hoping protozero joins the benchmark (so Eyalz will optimize the protobuf further xD). My only take on the zpp might be the lack of multiple language support.

@lexer_ Год назад

Very interesting topic, very interesting approach, but sadly the presentation is pretty bad. I bet this could have been under 20 minutes long without loosing a single bit of information and still be easy to follow. For every minute of content there is at least one minute of unnecessary teasing of what will come next for example which was probably meant to improve engagement but it actually does the exact opposite.

@2718281828459045236 Год назад

Well, he uses spaces instead of tabs - clearly he's into redundancy 👎

@dexterman6361 Год назад

I would like to respectfully disagree. As a noob, the pacing was fun and perfect

@piotrarturklos Год назад

I don't think the testing is bad because serialization is a topic where performance is important, and this is the whole point of the talk, so constant measurements are good to see. Otherwise half the time people would be wondering if the next thing he talks about is hurting performance or not.

@perfectionbox Год назад

For sure. Too many folks don't understand that anything longer than five minutes causes a "life's too short to watch this" reaction

@mapron1 Год назад

I disagree, it's not 10/10 great - but pretty ok presentation. I've seen much worse on CppCon :) And I am not noob, average developer (and wrote serialization library too).

@yoniyash2839 Год назад

Great talk!! Well done! Protobuf is probably one of the most popular serialization libraries although it's very large and inefficient, Google should fix it.

@mike200017 Год назад

Protobuf is designed for a different design space. It's objectives are mostly to be low bandwidth, inter-operable between languages and platforms, and to be stable over time. So, you should never expect performance comparable to an inlined series of memcpy's. But it's definitely true that the public google protobuf library is really bloated and much slower than it could be. In terms of serialization speed, as the presentation kinda showed, most of the issue is just that protobuf wire-format is slow to generate (bit-fiddling with everything) compared to just a raw memcpy. Deserialization is much worse, and it's mostly dominated by dynamic memory allocations since basically every field or repeated field in a protobuf class (the generated C++ code) is stored by pointer.