Тёмный

The Truth about the Fast Inverse Square Root on the N64 

Kaze Emanuar
Подписаться 269 тыс.
Просмотров 248 тыс.
50% 1

Patreon: / kazestuff
Streams: / @kazeclips
🐦 / kazeemanuar
MERCH: kazemerch.mysp...
Discord: / discord

Опубликовано:

 

28 сен 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 619   
@TypingHazard
@TypingHazard 10 месяцев назад
Is "now try it with the fast inverse sqrt" the programmer version of how every musician content creator is asked/forced to attempt Rush E and other meme songs
@Jay-kc2pm
@Jay-kc2pm 10 дней назад
Free bird!
@ENCHANTMEN_
@ENCHANTMEN_ 10 месяцев назад
It's crazy to think just what a modern computer might be capable of if we had the time and expertise to optimize code to this degree...
@mariotheundying
@mariotheundying 10 месяцев назад
Where is that lost time going to? A big company can do it
@adicsbtw
@adicsbtw 10 месяцев назад
modern CPUs actually have built in optimized instructions for exactly these types of things. For example, intel CPUs have an operation that does 8 inverse square root operations with perfect accuracy in just one clock cycle. These types of operations are called SIMD, and are majorly underutilized Edit: not quite perfect 2^-14 is the maximum accuracy they guarantee For a game though, that difference will be practically indistinguishable
@clouds-rb9xt
@clouds-rb9xt 10 месяцев назад
Right? Game developers need to pay attention. We wouldn't have so many issues with modern games being unoptimized if they used more advanced optimization. Doom Eternal is a perfect example of that
@vesuvianprime
@vesuvianprime 10 месяцев назад
@@mariotheundying Layers and layers and layers of virtualization, security, memory safety, compatibility, APIs, threading, and more before you even get close to running instructions on the bare metal
@mariotheundying
@mariotheundying 10 месяцев назад
@@vesuvianprime a big company has all the time, all the people and all the money in the world for that, I'm talking about stuff like Microsoft
@possible-realities
@possible-realities 10 месяцев назад
Nice video! Just one small point: If you want to use invsqrt(x) to calculate sqrt(x), you can use x*invsqrt(x) instead of 1/invsqrt(x). That might save a few cycles? But I still agree that the quake3 fast inverse square root algorithm is probably not that useful on N64.
@KazeN64
@KazeN64 10 месяцев назад
oh true, somehow i just forgot
@nicholaswallingford3613
@nicholaswallingford3613 10 месяцев назад
to do regular sqrt you would just use a different magic constant. float fast_sqrt(float x) { int i = *(int *)&x; i = 0x1fbd1df5 + (i >> 1); return *(float *)&i; }
@andersama2215
@andersama2215 10 месяцев назад
If I understood that table correctly from the start of the video, does that imply that the 0 newton iteration would take 6 cycles to complete just the fast inverse portion? How long does * take, may deserve another video...
@andersama2215
@andersama2215 10 месяцев назад
Did a little reading, it sounds as though hardware implementations of sqrt make have taken a few different routes, common ones apparently were a lookup table for rough approximation followed by a number of newton iterations, or alternatively some process similar to long division. Without digging too deeply, my guess is that might be why the cycle counts for division and sqrt are the same. Since long division's one of the slower approaches to dealing with floats, it could be that the fast inverse sqrt is the way to go, since the n64's hardware was developed before the algorithm was discovered it could be that its implementation could be beaten via software. Newton iterations roughly double the precision of the result so a better initial guess can rapidly decrease the time cost.
@Gideon_Judges6
@Gideon_Judges6 10 месяцев назад
I'm kind of surprised it was made popular by Quake III. I could've sworn Mike Abrash put it in the original Quake, but it's been years since I saw the source.
@KazeN64
@KazeN64 10 месяцев назад
i thought it was from doom before i made this video lmao
@arciks11
@arciks11 10 месяцев назад
​@@KazeN64 You accidentally still said Doom in one part of the video.
@KazeN64
@KazeN64 10 месяцев назад
ahhh shit, i thought i edited it out haha
@TheGershon
@TheGershon 10 месяцев назад
@@KazeN64 Twice actually, lol. 5:00 and 8:30
@proxy1035
@proxy1035 10 месяцев назад
@@KazeN64 ah yes, DOOM with all of it's real time reflections and shading and floating point numbers that it definitely used. /s seriously though, i thought it was funny. because DOOM is pretty much always the first game people think of when IdSoftware is mentioned. so sometimes people think the FastInverseSqrt is also from DOOM, even though the game doesn't even use floats at all because they were way too slow at the time.
@ethanpayne4116
@ethanpayne4116 10 месяцев назад
Silas' idea with the error cancelling is very cool, there are probably many other examples where we can reduce the error of one problem by dividing it into two sub-problems with opposite error
@Sh1penfire
@Sh1penfire 10 месяцев назад
This feels like reading a data book wrong every time with the wrong indicator Perfectly balanced as all this should be
@M0liusX
@M0liusX 10 месяцев назад
Error is a fickle issue. While I don't know for sure, usually these types of strategies have smaller error, in return they have larger error when values get extremely big or small.
@ethanpayne4116
@ethanpayne4116 10 месяцев назад
@@M0liusX I definitely believe that, there are usually very specific cases where one algorithm is better than another, and in general there is no "optimal" algorithm which always works best for all situations, it always depends on the specific example.
@Creabsley
@Creabsley 10 месяцев назад
We were doing this analogue style in the 1930s with balanced cables. Letting you run small voltages hundreds of feet with no interference.
@sjurursteinholm5368
@sjurursteinholm5368 9 месяцев назад
We use similar algoritms in land surveying to estimate coordinates with high accuracy. By using a GPS reading and comparing it to a GPS reading, at a control station, we can see which sattelites are visible in both readings. By subtracting the differences between the readings, the accuracy of the initial GPS reading goes from 5-10m(15-30feet) of accuracy, to 2-3cm(1-1.5inches) of accuracy. This only works when the same satelites are visible in both readings, if the control reading is to far away, then the algorithm wouldn't work. But this cancels out errors like the density of the atmosphere, refraction errors, and random errors, since you work with more data.
@Flamefreeze1
@Flamefreeze1 10 месяцев назад
The truth about why my dad never came back from the grocery store 😢😢😢 Edit: Yooo my mind was blown w/ Silas’s idea. Mathematically it seems obvious but getting that much accuracy improvement with the 2 fourth-root calcs multiplied together is insane! Thanks for the good content as always!
@howisthis8849
@howisthis8849 10 месяцев назад
don't worry, he's just building up speed for 12 years
@thehedgehoggamer8471
@thehedgehoggamer8471 10 месяцев назад
Wha
@wfzyx
@wfzyx 10 месяцев назад
I know the project is aimed at maintaining real hardware compatibility, but maybe consider to patch a n64-emu and give one extra rambus to the console to see how far your game can go?
@m4r_art
@m4r_art 10 месяцев назад
I don't know where your journey takes you, but the amount of data you took upon you, is painfully large. You are like a character from literature, the giver. In the story a certain character carries the memory of the world as it was. At this point it's safe to bet you are in the top 5 most knowledgeable n64 programming/development person in the world.
@forasago
@forasago 10 месяцев назад
He has the relative luxury of only concerning himself with one set of hardware (the N64) and one set of software (the relevant programming language(s)) for decades. Almost no (game) programmer out there enjoys this kind of laser focus. Instead it's 2-3 things at a time and every other year one thing is discarded and another thing added.
@chungus1149
@chungus1149 10 месяцев назад
I used to think there was no reason for a Mario 64 sequel to exist, but seeing how much there was to improve.on the base engine, and how many level possibilities it opens up, it's obvious they should have made one
@leroymilo
@leroymilo 10 месяцев назад
Damn, Kaze's take on fast inverse square root is really interesting.
@johanngambolputty5351
@johanngambolputty5351 10 месяцев назад
8:57 ooo, reminds me of Romberg integration (more generally Richardson extrapolation), add together approximations at different step sizes to cancel out one error term in the taylor expansion. Takes me back to intro to scientific computing :)
@notarandom7
@notarandom7 10 месяцев назад
babe wake up, Kaze just uploaded a new video
@Longboost
@Longboost 7 дней назад
8:46 calling an inverse fourth root a "fourth inverse square root" really made me lose track of what was happening for a moment, lol. Great video though
@burkeychathouse5537
@burkeychathouse5537 6 месяцев назад
At 7:23: I’m not an expert on this, but wouldn’t any number 1.17549435082e-38 or smaller be rounded up due to floating point accuracy, eliminating any potential issue?
@KazeN64
@KazeN64 6 месяцев назад
that number is the lowest possible floating point number before the exponent hit -127 - if the number was smaller than this, it'd loop around and you'd get a number closer to the max float representable
@quelfth4413
@quelfth4413 10 месяцев назад
At 1:30 you seem to be suggesting that the best way to get sqrt(x) from 1/sqrt(x) is to take the reciprocal via a division, but you could also just multiply by x since x/sqrt(x) = sqrt(x). I don't know if this changes anything you're saying.
@cubedude8690
@cubedude8690 10 месяцев назад
it does not
@Gunbudder
@Gunbudder 10 месяцев назад
when i was tutoring, i told students that there is a time and place to use the fast inverse square root. the place is on 32 bit windows PCs and the time was 1999. The trick only works for single precision IEEE 754 floats, and essentially any modern PC made after probably 2007 or so is so fast that the gains of the float hack aren't worth it lol. its important to teach though because i think it demystifies the IEEE 754 standard and helps students understand that its essentially just scientific notation but with 2 instead of 10 as your base. and if you go into embedded systems, you do need to understand how that stuff works because you will eventually need to convert a 754 float to a DEC float (or vice versa) and that has its own little hacks.
@kitlith
@kitlith 10 месяцев назад
The same trick works for doubles too, but it needs a different magic constant (which has been listed on wikipedia) Otherwise, yeah, it's role is obsoleted by the dedicated reciprocal square root approximation instruction, where available.
@tompov227
@tompov227 5 месяцев назад
this is a great example of how optimization is a very specific problem that always requiring profiling before you can say something is gonna be a "good optimization"
@ZintomV1
@ZintomV1 10 месяцев назад
Another fun and educational video Kaze, thank you!
@mickpatel5126
@mickpatel5126 10 месяцев назад
Am I the only one who doesn’t understand most of this stuff? Yet I’m always excited to see them. And when I see “vroom” I get happy too
@Alexaction223
@Alexaction223 10 месяцев назад
Is there anyway to upgrade the ram's speed on a N64? I'm a bit curious just how much horsepower is locked away behind having 4 miners share one pickaxe.
@jimmyhirr5773
@jimmyhirr5773 9 месяцев назад
The N64 uses a type of RAM called RDRAM. That's where you would need to start your search. There are faster versions of it that were made for PCs, but they were uncommon and the last ones were made around 2003. The PS2 also uses RDRAM, so that might be an easier source to find. After you find some RDRAM, you would need to find out how to make it work. The RDRAM is connected to the RCP, so you would need to find some way to speed up the RDRAM without affecting the RCP's timings.
@BottomOfTheDumpsterFire
@BottomOfTheDumpsterFire 10 месяцев назад
It just hit me that 4rt(x)^2 is sqrt(x) Silas was onto something
@rosly_yt
@rosly_yt 10 месяцев назад
Fast Inverse Square Root my beloved
@smokeydops
@smokeydops 10 месяцев назад
Im using fixed-point so i kinda need the algorithmic versions, not the mantissa hack. Interested in researching these... In a few years, when i have time to spare this issue
@thephoenixsystem6765
@thephoenixsystem6765 10 месяцев назад
The maths sounded nice, but I can appreciate the music at the end from a technical standpoint. And it also sounds great lol - did you make it?
@higherquality
@higherquality 10 месяцев назад
that's jaw on the floor type stuff
@igorgiuseppe1862
@igorgiuseppe1862 10 месяцев назад
7:32 nice detail on the map
@gunnmetal115
@gunnmetal115 10 месяцев назад
This is beautiful. I've no idea what you're saying but it sounds great.
@Magikarp-4ever
@Magikarp-4ever 10 месяцев назад
Thank you for reminding everyone it also screws up the rendering temporarily! Most forget that
@GamerOverThere
@GamerOverThere 10 месяцев назад
Couldn’t you multiply x * invsqrt(x) instead of doing 1 / invsqrt(x) ? I think that would be a little faster but I’ve never programmed for the N64 specifically.
@christophclear1438
@christophclear1438 10 месяцев назад
This appears to make no sense since x * invsqrt(x) = sqrt(x)....
@zhewaxen3047
@zhewaxen3047 10 месяцев назад
Yes it is faster. Kaze said in an another comment that he simply forgot that.
@GamerOverThere
@GamerOverThere 10 месяцев назад
@@christophclear1438 in the context of vid, it was trying to use the invsqrt(x) to find the sqrt(x).
@christophclear1438
@christophclear1438 10 месяцев назад
@@GamerOverThereI see, sorry. I thought the sqrt instruction only took few cycles.
@VideoGameBoxReviews
@VideoGameBoxReviews 10 месяцев назад
Every time you you upload I get amazed. Just hoping to play the mod at some point too.
@Netsuko
@Netsuko 10 месяцев назад
I didn't understand a single thing in this video, but it was fun to watch!
@DessertArbiter
@DessertArbiter 10 месяцев назад
I can't wait for the optimized robot uprising
@tiaraguy7705
@tiaraguy7705 10 месяцев назад
I came here to hear about the ram bus again
@RedBerylFTW
@RedBerylFTW 10 месяцев назад
Not only is your explanation easy to understand, you also are a chad innovator. 🍻
@Hugo-xr1mg
@Hugo-xr1mg 10 месяцев назад
If it's a inverse square root, should we say cylindrical top for it?
@adamih96
@adamih96 10 месяцев назад
You can use the inverse square root algorithm in more cases. The other best example is probably physical simulations, more exactly, simulations of gravity, electromagnetism, etc. since those follow the inverse square law. It's probably obsolete by now though, and i have no idea why you would ever want to build physical simulations on an N64 lmao.
@macksnotcool
@macksnotcool 9 месяцев назад
Unrelated to inverse square root but you could probably use 2D SDFs for a higher texture quality on certain types of clip-art-like textures.
@KazeN64
@KazeN64 9 месяцев назад
the n64 is hardware limited to a small set of texture formats so there isnt much playroom there
@macksnotcool
@macksnotcool 9 месяцев назад
@@KazeN64 Hmm... I believe it has less to do with the texture format and more to do with how the texture is rendered. I still think the odds of this working (and working well) aren't completely likely but it could be worth a try.
@DragAmiot
@DragAmiot 10 месяцев назад
Bro the inside of the factory looks soooooooo goood
@Gameboygenius
@Gameboygenius 10 месяцев назад
Fast inverse square root _with 0 Newton iterations?_ cue the meme... Watch out, we've got a badass over here!
@GameGearZero
@GameGearZero 10 месяцев назад
Hi Kaze, very cool video. i have a question about a mario 64 port for the playstation classic, it runs very well (60fps) but in under water scenes the framerate drops below 60fps. i watched you videos and question myself: is it possible to build a port for the playstation classsic with your enhanced version of the source code and will it reduce or completely remove the stutters ( framerate drops)?
@Mireneye
@Mireneye 10 месяцев назад
My best guess.. Some optimizations might be helpful but Kaze has made a lot of very N64 hardware specific improvements, and since the hardware is different, It's not at all certain they would work as well.
@Clancydaenlightened
@Clancydaenlightened 10 месяцев назад
1:11 well use builtin sqrt, then run a parser in cpu cache that grabs this result, and performs inversion within cpu cache, so you only would calculate the square root and write to cache buffer periodically when needed
@Clancydaenlightened
@Clancydaenlightened 10 месяцев назад
So u get best accuracy with more vroom vroom Less ram latency
@l30n.marin3r0
@l30n.marin3r0 9 месяцев назад
Make the bus go vroom vroom...OH SHIT, Oh god...that was fantastic xD
@_ipsissimus_
@_ipsissimus_ 10 месяцев назад
this is relevant to my interests.
@shadou1234567
@shadou1234567 10 месяцев назад
aas a starting coder, i really do find this all really interesting, but boy its hard to understand. I love how dedicated you are, and i believe people could write colege thesis about your optimizations alone, but being truthfull, i don´t even get it why the square root is needed here. Oh well, there is allways more to learn, and i hope in the future i can write code that is optimal enought so it doesnt clog weaker machines
@bm1259
@bm1259 10 месяцев назад
The square root isn't a coding thing really its a math thing. vectors are things that have magnitudes and directions. Lets say we have a vector (3,4) if we wanted to find the size of it we would basically use Pythagoras to find it out thats why the square root is needed here. As for the reason he needs to find the square root he does mention in the video that its for vector normalization.
@blar2112
@blar2112 10 месяцев назад
All my homies hate inverse square root, all my homies just use a constant that its randomly generated at the beggining.
@MonochromeWench
@MonochromeWench 10 месяцев назад
It is not surprising the unique challenges of PC games that rely heavily on an awful x87 floating point unit do not necessarily translate to a completely different architecture
@franwex
@franwex 10 месяцев назад
I didn’t understand anything. But I loved the video.
@KoltPenny
@KoltPenny 7 месяцев назад
What will happen once all the Nintendo 64's in the world decay naturally?
@Meleeman011
@Meleeman011 10 месяцев назад
you're amazing thank you for making this
@GriffinForte
@GriffinForte 6 дней назад
Briluh i thought the title was the t truth about the square root of the Nintendo 64
@ruslankudriachenko5673
@ruslankudriachenko5673 10 месяцев назад
You probably meant 33 milliseconds :)
@NeverSnows
@NeverSnows 10 месяцев назад
YOOOOOOOO WE COOKING WITH GAS!
@bioman1hazard607
@bioman1hazard607 10 месяцев назад
Show this to John Carmack, I'd bet he's get a kick out of it
@robertmcknightmusic
@robertmcknightmusic 10 месяцев назад
lol I watched that stand up maths video too
@mahatmagandhiful
@mahatmagandhiful 10 месяцев назад
Quick question from a rando: Are you an emulator developer or something? This video showed up seemingly randomly in my recc's, and while I find it interesting enough on its own, I get the feeling I'm not your usual target audience...
@KazeN64
@KazeN64 10 месяцев назад
i'm a modder actually! i am doing all this for my big mod so that it can run at the best framerate.
@reeyees50
@reeyees50 10 месяцев назад
If only they knew that shit in 1998
@DarkFusion28
@DarkFusion28 8 месяцев назад
Anyone know the name of that song that starts at 8:52? It sounds so familiar
@Ghennesph
@Ghennesph 8 месяцев назад
rambus go vroomvroom
@springstudios9590
@springstudios9590 10 месяцев назад
Any release window timing for return to yoshis island 64
@orestes1984
@orestes1984 10 месяцев назад
I dunno how you do it Kaze but this looks like a PS2 game by now
@snork_games
@snork_games 10 месяцев назад
love watching these videos
@sahilhossain8204
@sahilhossain8204 10 месяцев назад
Lore of The Truth about the Fast Inverse Square Root on the N64 momentum 100
@handsoffmymacaroni102
@handsoffmymacaroni102 10 месяцев назад
The N64 really is the most interesting console.
@backfromcuba
@backfromcuba 3 месяца назад
i know it's been said before but your levels are gorgeous. shame the n64 wasn't used better in its time :/
@catdisc5304
@catdisc5304 10 месяцев назад
Leaving this comment so i can come back in 5 years or so and be like "OMG I FINALLY UNDERSTOOD!" just like with the OG fast inverse sqrt
@frognik79
@frognik79 10 месяцев назад
Are there any system components that Mario64 doesn't use that could be hijacked into doing things they're not supposed to do?
@ChrisEbz
@ChrisEbz 10 месяцев назад
Question Kaze or anyone else. Why is making an N64 hack Console compatible difficult? Figure I would ask here, so excuse being somewhat off topic. I understand emulators aren't completely accurate. What makes these instructions compatible and using an emulator that doesn't emulator perfect different? I searched this allot and community forums are so terribly toxic and get offended by everything! It's a real turnoff but I dont see the answer besides for "wahh stop offending the creators" or "duh the emulators are different" 😮‍💨🥴. OF course I mean no disrespect for the hard work put into this! :) Legit prefer using a console but never get a proper answer. Kaze is amazing, much respect so figured ask here.
@KazeN64
@KazeN64 10 месяцев назад
I think there's multiple things making it a lot harder to develope for the n64 directly: 1. Emulators are easier to test on. That means that if you do make a change or you develope a tool, you will test it on emulator first. By default, emulators will work already and then you have to put in additional work to make it go well on console 2. A real console can't be ran alongside a debugger (so any incompatibilies are very hard to find) 3. Emulators are a lot less picky than real consoles. Many exceptions might not be emulated correctly, so when console would just freeze, emulators might just produce a slightly wrong result or even just work. (e.g. you can see this with ROM reads - they need to be 8 bytes aligned on N64, but can have any alignment on emulator) 4. Not every creator has access to an n64 to begin with to even test that their mod works on it. 5. the n64 has a pretty tight performance budget and emulator just doesnt
@ChrisEbz
@ChrisEbz 10 месяцев назад
@@KazeN64 Thanks for the wonderful answer, much appreciated. This really sums everything up nice and neatly. Hopefully this helps others as well who have the same question. I really respect he hard work put into hacking/production so naturally I'm not trying to stir the pot :) haha. This puts things in perspective very well. Your second point seems very important concerning debugging, this makes allot of sense. The trial and error of using a real console definitely suggest much more intense work. Not having access to the hardware seems to be a bigger issue than I thought, understandable. Thanks Kaze much appreciated.
@Nickps
@Nickps 10 месяцев назад
So, I noticed at 8:54 that you used pointers to alias the floats like the original algorithm, which is UB. So, the language lawyer in me was wondering why you didn't use a union instead. Does the compiler generate worse code in that case?
@KazeN64
@KazeN64 10 месяцев назад
i dont think it'd make a difference, even a memcpy works the same. i didn't even realize this was UB at the time (and i do plenty of floating point bithacks in this codebase so i'm not sure this type of UB can be avoided without compromising performance)
@Nickps
@Nickps 10 месяцев назад
​@@KazeN64Yeah, I guess that makes sense. After all, C++'s bit_cast is just a constexpr memcpy so compilers should know how to optimize it. The only other problem I can think of with pointer casting is that if floats and u32s have different alignemts on N64, one of the pointers might end up being unaligned, but since everything ended up working I guess that's not the case. Or maybe it is but MIPS doesn't care?
@KazeN64
@KazeN64 10 месяцев назад
they are both forced to be 4 byte aligned and have 4 byte size so the casting would never cause that on the n64
@kurtanaika
@kurtanaika 9 месяцев назад
LVIV POLYTECHNIC MENTIONED RAHHHHHHHHHH🔥🔥🔥🔥🔥🔥🔥🗣️🗣️🗣️🗣️🗣️🗣️
@Gamers_of_Oz
@Gamers_of_Oz 10 месяцев назад
Man I wish I was smart to understand this love the work though seeing N64 games optimised is amazing.
@UltravioletNomad
@UltravioletNomad 10 месяцев назад
I love how talking about CPU cycles is starting to sound more like skill point drain. Its like coding the game is a game itself.
@TjMastery
@TjMastery 10 месяцев назад
for him? yes it is my friend
@Pedritox0953
@Pedritox0953 10 месяцев назад
Great video!
@Clodd1
@Clodd1 10 месяцев назад
4:25: Me!
@duxnihilo
@duxnihilo 10 месяцев назад
2:04 Is this why the n64 barely has any 60 FPS games?
@KazeN64
@KazeN64 10 месяцев назад
yeah, most games that run at 60fps are static camera games that can afford to run with no zbuffer. that lets you read half the memory reads/writes when rendering a pixel.
@razorblade413
@razorblade413 10 месяцев назад
so at the end, is kaze sticking to the silas approach or not?
@trashtrash2169
@trashtrash2169 10 месяцев назад
I don't think so, it was just a really cool idea that popped up. Could be wrong, though.
@razorblade413
@razorblade413 10 месяцев назад
@@trashtrash2169 ok thanks man. I had to watch the video like 3 times to understand something of it. Really dense but cool stuff.
@gsestream
@gsestream 10 месяцев назад
if you take taylor expansion of the sqrt and invert it.. its something like 1/sqrt(x)=inv(1 + x/2 + ...), shiftrgt-1 is the div-by-2, doom? it was quake. want to get the best guess, compute the derivative f'(x) = 0, to get the best guess limits
@gsestream
@gsestream 10 месяцев назад
sqrt{1+x} = 1 + x>>1 - x^2>>3 + x^3>>4 - {5*x^4}>>7 + {7*x^5}>>8, abs{x}>1 + {3x^2}>>3 - {5*x^3}>>4 + {35*x^4}>>7 - {63*x^5}>>8, abs{x}
@gsestream
@gsestream 10 месяцев назад
best sqrt is no sqrt required at all, like distance comparisons/sorting, for sorting object/vertex/triangle distances in power of 2 order
@gsestream
@gsestream 10 месяцев назад
you could always also do the sine trick, and store some function/derivative values in the range [0,2] of sqrt() and 1/sqrt(), then quadratic interpolate between the range, very fast combined with bit shifts, div by 4 (or shiftrgt-2, also in the FPU)
@gsestream
@gsestream 10 месяцев назад
yep the bit shift + quadratic derivative point sample interpolation approximation, gets you to below 1e-5 accuracy, for both normal sqrt and inverse sqrt, and is super fast, only bit shift, plus and multiply operations. if you divide by 2 (or 4) to get the argument value to between 1 and 2, you get the best sqrt approximation. below 1 values the quadratic approximation gets worse.
@gsestream
@gsestream 10 месяцев назад
also you can replace sqrt in distance by for example ray-to-plane distance, which is only dot div dot operation, no sqrt required, at all.
@shadowmen11
@shadowmen11 10 месяцев назад
6.54: Polska górą! 💪 Sorry, i had to
@boomloom2943
@boomloom2943 10 месяцев назад
What sm64 romhack you playing
@angeldude101
@angeldude101 10 месяцев назад
His own.
@Enlightenment0172
@Enlightenment0172 10 месяцев назад
Dude, Nintendo's gotta hire you. Secondly, if Nintendo reached out, would you accept?
@KazeN64
@KazeN64 10 месяцев назад
how much are they paying
@stevenharbinger2427
@stevenharbinger2427 6 дней назад
you video make me not sad and i need that because it's on biggly
@hornylink
@hornylink 9 месяцев назад
>why aren't you using the famous algorithm that optimized quake for x86 instruction sets? >10 minutes explaining that mario 64 isn't quake and an N64 isn't x86 in a way normal people would understand this channel always fun, people always stupid.
@603840Jrg
@603840Jrg 10 месяцев назад
I like to think that whatever piece of his soul John Carmack lost when Oculus got acquired by FB/Meta ended up possessing Kaze
@mrmimeisfunny
@mrmimeisfunny 10 месяцев назад
I didn't actually expect it to work at all. Because I remember you said that square roots on the N64 are relatively fast. Of course Kaze will find a way to eek out that tiny extra bit of performance and then some.
@Brad_Script
@Brad_Script 8 месяцев назад
you're forgetting this is the reverse sqrt not just sqrt
@mrmimeisfunny
@mrmimeisfunny 8 месяцев назад
@@Brad_Script I didn't forget. I just thought that it's not worth it compared to the performance gains on old Pentiums.
@arciks11
@arciks11 10 месяцев назад
2:00 So they gave it a race car engine and a can of beer for a gas tank?
@Nerdule
@Nerdule 10 месяцев назад
The gas tank's just fine, actually, especially with the Expansion Pak ... The problem is that the hose from the gas tank to the engine is the size of a curly straw.
@benjaminoechsli1941
@benjaminoechsli1941 10 месяцев назад
Cutting corners in the strangest of ways.
@RedstoNeman0
@RedstoNeman0 10 месяцев назад
yeah that's pretty much a good metric for why the n64 is slow lol
@someoneelse4811
@someoneelse4811 10 месяцев назад
Yeah, I remember some N64 dev from the time made a similar anology.
@ssl3546
@ssl3546 10 месяцев назад
They started with designs used for SGI workstations and cut stuff waaay down. They were not going to make a new CPU design for the N64 and clocking it slower wouldn't have saved any money. It's not like today where we have ARM cores at every performance level you might want.
@hans_maier_w
@hans_maier_w 10 месяцев назад
everyones gangsta till kaze finds an one cycle improvement in the most cracked function ever existing
@Stratelier
@Stratelier 10 дней назад
Never underestimate the value of a single cycle to a bedrock function that gets called endlessly. Like back when one of my teenage hobby projects was a primitive 3D renderer (for modding a game with), written in BASIC (specifically VB4), with basically zero access to any external, actual 3D APIs. Displaying textures in real-time was far outside my abilities but I had the coordinate transformations for wireframe rendering optimized as much as I can think of, including a few scenarios where I resorted to the oft-maligned GOSUB/RETURN type calls instead of making a function call to handle it, simply because it was the faster mechanism.
@simjans7633
@simjans7633 10 месяцев назад
Both editions of the book Hacker's Delight mention the fast inverse square root (or as they call it, an Approximate Reciprocal Square Root Routine) and give various improvements of the algorithm. In the books they already mentioned FISR without Newton iterations: > deleting the Newton step results in a substantially faster function with a relative error within ±0.035, using a constant of 0x5F37642F.
@HappyLittleBoozer
@HappyLittleBoozer 10 месяцев назад
It's amazing to me how you can make deeply complex topics so easy to understand by explaining them based on a use case. Programming is like black magic to me, yet I can follow your videos along without any issue. God bless.
@torvusbolt201
@torvusbolt201 10 месяцев назад
All of your videos are so incredible. I love how you mix maths and humour in the way you do. Even if I can't comprehend everything, I love each and every second
@RatcheT2497
@RatcheT2497 10 месяцев назад
have you thought about writing some small research papers for these findings and experiments? like, even if they're extremely specific for your use case, they're still cool as hell and might even help someone some day
@caliburnleaf9323
@caliburnleaf9323 10 месяцев назад
For the graph at 9:19, it probably would have been more clear if you'd labeled it as "Error (%) vs Cycles," since that's what the numbers actually represent. In both cases, a lower number is better, which is the inverse of what is implied by "Accuracy vs Performance" (which suggests that a higher value is more accurate or has higher performance).
@prgnify
@prgnify 10 месяцев назад
I was certain it was absolutely "useless" in the Nintendo 64 hardware, I'm amazed you actually found a place to use it! Also, you know your audience very well, @06:58 I chuckled and @07:23 I almost laughed. Great content!
@Gameboygenius
@Gameboygenius 10 месяцев назад
Really. N64 is not my platform, but if it was Gameboy code I might legit have paused the video to count the cycles.
@noobtracker
@noobtracker 9 месяцев назад
5:02 Ah yes, Quake's Fast Inverse Square Root algorithm, famously used in DOOM /j
@pockpock6382
@pockpock6382 10 месяцев назад
Kaze's coding gives me the feeling of when the Vtech kicks in
@drewynucci9037
@drewynucci9037 10 месяцев назад
I wonder if you’ll ever do optimization for the n64 bios… I know Nintendo didn’t give very many developers access to the bios but there are a few games which load a different bios into the n64 and allow even more optimized code to be run for whatever game was developed…
@watchm4ker
@watchm4ker 5 месяцев назад
You mean the RSP Microcode, and that's a whole different problem. I don't know how doucumented the microcode is. Nintendo certainly didn't want developers messing with it, and just use the ones they provided.
@drewynucci9037
@drewynucci9037 4 месяца назад
@@watchm4ker yeah, I meant the microcode… I’d be so interested to see what kinds of things could be done with that explored
@watchm4ker
@watchm4ker 4 месяца назад
@@drewynucci9037 having looked at a few more videos... Yeah, it's pretty wild. Look up F3DEX3.
@cozmictwinkie9260
@cozmictwinkie9260 10 месяцев назад
RAM Bus is my favorite recurring character on this show, glad to see Them back!
@DelayRGC
@DelayRGC 10 месяцев назад
The inverse square root sure went through a journey, didn't it? From being cumbersome to calculate, to an ingenious bit hack, to becoming its own CPU instruction.
@SilicatYT
@SilicatYT 10 месяцев назад
Kaze Emanuar: The only person that can explain to me how to optimise a 26 year old game in high technical detail I can't even begin to understand, while keeping me invested until the end.
@cubedude8690
@cubedude8690 10 месяцев назад
i used to watch your videos
@colonthree
@colonthree 10 месяцев назад
I made my own using weighted quadratic beziers. It's only 4% less accurate than the sine and cosine operations using the squirt, at a fraction of the performans cost. I know it can be improved, but so far so good. :3c
@cubedude8690
@cubedude8690 10 месяцев назад
I didn't even bat an eye at "squirt" the first time I read this comment because that's exactly how I say it in my head
@Armameteus
@Armameteus 10 месяцев назад
To summarize: _Script kiddie:_ "HAY, you should use this really famous algorithm because it's a more efficient way of performing floating point calculations!" _Nintendo:_ "Yeah, we thought of that, dude. We stuck a chip in the system that does that _specific_ calculation all on its own because doing it any other way was hella inefficient." _Kaze:_ "Amateurs..." _Nintendo/kid:_ "What was that?" _Kaze, LVL. 99 Script Wizard:_ "AMATEURS!"
@TjMastery
@TjMastery 10 месяцев назад
u mean level 199
@Mr_Yeah
@Mr_Yeah 10 месяцев назад
7:48 There is research for that by Chris Lomont in 2003. However, the number he found was 0x5f37642f.
@KazeN64
@KazeN64 10 месяцев назад
he might have used a different error measuring technique. I used "Maximum relative error" as a measure of error.
Далее
Finding the BEST sine function for Nintendo 64
26:41
Просмотров 316 тыс.
Optimizing with "Bad Code"
17:11
Просмотров 212 тыс.
Inside Out 2: BABY JOY VS SHIN SONIC 4
00:16
Просмотров 3,7 млн
How Pilotwings 64 on the N64 Revolutionized 3D Gaming
14:07
Fast Inverse Square Root - A Quake III Algorithm
20:08
Someone improved my code by 40,832,277,770%
28:47
Просмотров 2,5 млн
64 Bits: Nintendo's BIGGEST Mistake
15:33
Просмотров 274 тыс.
When Optimisations Work, But for the Wrong Reasons
22:19
How Modders fight Mario 64's Biggest Problem
9:23
Просмотров 133 тыс.
The Biggest Myth in Speedrunning History
19:37
Просмотров 1,3 млн
The Folded Polynomial - N64 Optimization
14:26
Просмотров 237 тыс.
Mario 64's Physics are not perfect
22:00
Просмотров 274 тыс.