I like and use several PIC uCUs, but I will NEVER FORGIVE and UTTERLY DESPISE that uCU like the PIC12F675 that does NOT use the MAX clock internally - NOT FORGIVENESS (especially for 8 pin devices like 675/683 etc.- EVIL EVIUL EVIL!!!!!!!!!!!!!!!!!!!!!
15:48 - Well, if the real answer is 157 and the computed answer is 160, I'd say the error is +3, not -3. So I'd call r the "correction" or something like that rather than the error. Or perhaps the "residual" - maybe that's why R and r are used for it in the first place.
I understand these are for educational purposes only. I can't imagine there would be any need to use these techniques for real-world use cases because they primarily shuffle around pointers and functions that are basically accessible without needing to shuffle them. *shuffle = "punning"
In my world, branchless means ... no decisions at all just memory assignations and math operations and bitwise trickery that yield 1 or 0 by just running the operation itself if a dcision is to be made. For complex scenarios you may have multiple decisions taken at once by extracting the bits from the result, say you need 4 boolean solutions, your math will yield a 4 bit integer, each bit being one of the needed decisions.... this is how we used to do "simd" in the age of stupid cpus ;). Did it takes a lot of time to think up the mathematical equivalent of decisions like this? Yes! A heckton! But! If the function was run millions of times a second for no apparent reason it was well worth it if it can be done at all. Using simd instruction have nothing to do with branchless programming but with parallel execution, instruction level parallelism. Yes it makes the code orders of magnitude faster but they are a different animal entirelly, with its own bags of tricks, and back when branchless programming was actually needed these facy instructions were nothing but wet dreams! Nowadays compilers will do a good enough job and cpus are so much faster it is not worth the time to do it unless you are writing said compiler...
For those unable to see their own named general purpose register names (labels) in watches and variables, do this: Click the toolbox (Project Properties) icon from the Dashboard (extreme upper left icon), then select 'mpasm (Global Options)' and check the box for 'build in absolute mode', apply and exit. Then debug. It's taken me a couple of years to getting around to finding this fix, I have searched countless Microchip forums and watched as many YT videos to no avail. Hope this helps you.
Hmmm. Well, I guessed correctly about Dekker, and would not have had to asked my earlier question if I had bothered to watch the whole thing before asking my earlier question. He was indeed Dijkstra’s colleague, another one of those early titans of computing science…
Sorry I’m so late to the party! Just discovered your marvelous AVX512 introduction. Is TJ Dekker the same Dekker who derived the correct mutual exclusion algorithm at the Dutch Mathematical Center in the 1950’s, when EW Dijkstra was building the THE operating system?
I tried this example of SOA when loading 50 large textures, on my game written in C.The rest of the variables were primitives, and were initialized first. It was so fast! But I found that the loading speed was increased dramatically if I loaded the textures first. I’m talking about 5 seconds to load, to under 1 sec. I learned that positioning of variable initialization is important too 😂 It also makes the code so much easier to read.❤
I’m an undergraduate in CpE and I was working on an autonomous robot, and I had this really over engineered way to do course correction. I watched this video, and just today I was looking over my system and noticed that the techniques in your video were perfect for this application. I reduced it from a 50 line solution of ugly nested if else’s about 4 deep at max, to 2 lines. Thank you!
The best way I know to do the branchless translation from lower case to uppercase is to use a 256 bytes translation table where every byte contains its offset from begining except those from offset 'a' to 'z' containing 'A' to 'Z', so the loop's body reduces to : d[i] = tbl[d[i]; Another thing when you try to minimise branches is to remember the "hack interval" : as long as a and b are known to be non negative integers, the condition a <= x && x <= b is equivalent to (unsigned)(x-a) <= (unsigned)(b-a). In the general case this will replace 2 branches by one, when used in a loop, in most of the cases a and b don't change in the loop, so the compiler moves the calculation of (b-a) outside of the loop, here we have constants, so (b-a) is computed at compile time. Here, if we want to avoid the translation table, the body of the loop can reduce to : d[i] -= ((unsigned)(d[i] - 'a') <= (unsigned)('z' - 'a')) <<5; It should be easily vectorisable.
I've actually never thought about branchless programming in the context of making a program faster. I only thought of it in the context of cryptographic systems where you want to ensure every execution has the same speed and heat profile or cryptographic systems which just don't allow for branching
Your accent can be mistaken (by some American like myself) for Australian, and with those trees in the background, I just imagine what you're saying to be like "And we can see these Koala bears teaching their young some integer division using coconuts. Yep, and whenever a coconut rolls off of the tree branch, it sets the carry flag to one Or perhaps it IS Australian. I was like, he sounds British, in his earlier videos, but now I definitely hear something my brain identifies as Australian or similar.