Optimizing Loops In Go | Prime Reacts

ThePrimeTime

Подписаться 589 тыс.

Просмотров 84 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

26 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 185

@abdia-lb2yx 7 месяцев назад

I ain't gonna lie having an influencer style videos in the coding world just entertainment non educational helps me stay on track when i feel not motivated to study keep it up bro.

@Lewini 7 месяцев назад

Hard agree. Keeps my algorithms in check

@cyrilemeka6987 7 месяцев назад

Agreed.

@isthatjake Месяц назад

It'll help with work, too

@DjoumyDjoums 7 месяцев назад

My guess on why 3 is slower than 2 : index bound checks. Compilers can very often eliminate them when you loop on the actual length of the array, not a pre-cached numerical value.

@grifon97 7 месяцев назад

sounds like a very good guess to me!:d

@Nenad_bZmaj 7 месяцев назад

@@chigozie123 Not true.

@Pipe0481 7 месяцев назад

@@chigozie123That's just completely false lol

@chigozie123 7 месяцев назад

@@Pipe0481 I should clarify that I'm comparing it to what some C/C++ compilers do. If you look up optimizations done by the Go compiler, you will find that the only optimization even worth mentioning is the conversion of "zeroing out a slice", to a simple memclr instruction. In C, we get that for free using `memset`. Maybe also function inlining, but that's nothing given that in C, the `inline` keyword is free to use. That's just the tip of the iceberg. Compare that list to what GCC does, and you will see why it's difficult to claim that the go compiler is an optimizing compiler. The main reason why I said that go doesn't do optimizations is that in C, there is no reason why the second and third code snippets should not run at the same speed. The compiler should have optimized the len call out of the second, to make it pretty much the same as the third.

@Nenad_bZmaj 7 месяцев назад

@@chigozie123 The for loop with the range clause is clearly more optimized by the compiler compared to standard for loop with the condition clause, as he has shown. I agree with the len call - it should have optimized it, but he measured with the time package, not benchmarked it, and did not show us standard deviations. In Go you can opt for no inline. I simply disagree with the statement that Go compiler doesn't do exhaustive optimizations, as long as they are worth doing and as long as they would not bloat the compile duration.

@ColinFox 7 месяцев назад

Go uses symbol lifetimes to determine whether to allocate on the stack or on the heap. If the object survives after the function returns, it's on the heap. Pretty straightforward.

@susiebaka3388 7 месяцев назад

"The more you write standard code, the more likely you're generally doing well." - ThePrimeagen

@samsqu4red 7 месяцев назад

Found the channel just few weeks ago. Coming from a background in Webflow, with only a grasp of HTML, CSS, and a bit of JavaScript, I see myself as a puppy. Recently, I've decided to take coding more seriously because it's interesting, and I also have a web app idea that I'm genuinely determined to bring to life. Though 90% of what the man is saying still flies right over my head, I feel like my understanding is expanding. I've never attended school, worked in-house with a team, or studied with guidance. It seems like I found a semblance of a mentor who's bombarding me with jargon I can't possibly comprehend, but ultimately is helping me see throw the fog in a way. Feels like little by little I'm starting to grasp how everything is connected.. weird.

@andygarfield6529 7 месяцев назад

There’s all sorts of things he could have done here. Changing the order of the loops can give you wild improvements because it can change the proportion of cache hits. Also loop unrolling. Also using arrays in Go (which are a real thing) instead of slices.

@CjqNslXUcM 7 месяцев назад

the loop order is correct, it goes through the innermost slice first.

@SaHaRaSquad 7 месяцев назад

If you do something like manual loop unrolling always benchmark it, because most likely it will actually make your code slower by blocking much better automatic optimizations.

@Nenad_bZmaj 7 месяцев назад

In my experience, array gives me worse performance than slice. After I checked that, instead of declaring a := [5]something, I always do: a := make([]something, 5, 5)

@Zullfix 7 месяцев назад

Agreed. His single-array test could have been much faster if the locality of coordinates were stored contiguously.

@Yawhatnever 7 месяцев назад

Loop unrolling might be one of those things where you end up fighting the compiler (would need to measure), but loop tiling might have a real chance of increasing cache hits

@HairyPixels 7 месяцев назад

The compiler writes better code than you but what if you're writing a compiler? BRAIN MELTING

@ninocraft1 7 месяцев назад

if you write your own compiler, its probably gonna be even slower xD

@Patterner 7 месяцев назад

just compile the compiler with the compiler. repeat. ... profit

@OkAtheistExplainVim 7 месяцев назад

if the compiler compiles your compiler is it really your compiler?

@HairyPixels 7 месяцев назад

@@Patterner In Soviet Russia compiler compiles you.

@joranmulderij 7 месяцев назад

@@OkAtheistExplainVim If you wrote the compiler that wrote the compiler then yes

@jagc2206 7 месяцев назад

There is a big distinctions between array an slices in go. This video uses slices, which are pointers to arrays. A slatic array will have a compile time length and allows the compiler to do a lot more optimization

@Nenad_bZmaj 7 месяцев назад

My experience with performance is the opposite, so I stick with slices.

@eni4ever 7 месяцев назад

blue hair no?

@dejangegic 7 месяцев назад

Old video

@sparrow243 7 месяцев назад

He's getting bald

@greendsnow 7 месяцев назад

@@sparrow243 hahahahahah

@yobson 7 месяцев назад

@XDarkGreyX 7 месяцев назад

Brain no?

@philipp04 7 месяцев назад

A year ago my prof had us write gaussian elimination, but the algorithm we had to implemeny was slightly different for each of us. I had to find the inverse of a matrix using Cholesky decomposition, which is designed to be fast for large symmetric matrices. I'll cut to the chase, the code I wrote was slow as hell and I couldn't figure out what's going on, though I could tell that I had tons and tons of cache misses. The professor though could tell what I did wrong immediately: the matrix was stored as a 1d array row by row, but I was traversing it columnwise. I transposed the matrix and swapped all indices in my code, and the total runtime decreased by a factor of 20 (on 8000x8000 matrices). Moral of the story: go through your data sequentially, it makes a huge difference.

@samhughes1747 7 месяцев назад

EDIT: this is wrong. Only the comprehension-inlining bit made the boat in 3.12. Hey. Python doesn’t have a loop problem anymore. There’s still the “it’s python” problem, but Python 3.12 introduced loop unrolling,

@ruroruro 7 месяцев назад

source?

@samhughes1747 7 месяцев назад

@@ruroruro, oof. Yeah, that didn't get approved for 3.12, and I got that mixed up with the comprehension-function inlining that did get released in 3.12.

@rt1517 7 месяцев назад

The video author should have disassembled the resulting binary to see if he would be able to figure out what the hell is going on.

@g.c955 7 месяцев назад

he could to a follow up by profiling the code to see where CPU and memory are spent, then analyze the assembly output of the code to understand why.

@swedishpsychopath8795 7 месяцев назад

I'm planning to throw a SIMULA-67 party in 3 years to celebrate its 60 years as the first OOP language. Hate it or love it the norwegian SIMULA-67 invented Object Oriented Programming (OOP) with Classes, Inheritance, Encapsulation, overloading, overriding and all the other good stuff. It is crazy to think that this paradighm shift in programming is 60 (SIXTY) years already!! Press LIKE if you would like to get an early invitation!

@0dsteel 7 месяцев назад

Slices are just references to an array with set length and capacity. They are reallocated if the values wouldn't fit said capacity. On the other hand, arrays *may* be stack allocated, if the compiler is certain they don't escape the given scope and, well, fit in the stack frame. (this is not legal advice, please ask your friendly neighbourhood debugger)

@shapelessed 7 месяцев назад

Omg GO... I only had a week with that language when I tried it but I love the errors as values concept. To the point that I started using them all over my JS.

@0dsteel 7 месяцев назад

haters gonna hate, but you mention their uncaught runtime exceptions and they'll be in shambles :)

@max3446 7 месяцев назад

I like errors as values but Go would be so much nicer if they had monadic operations on errors (or at the very least, some sort of shorthand for `if err != nil { return err }`).

@miroslavhoudek7085 6 месяцев назад

A guy watches a guy who doesn't know what's going on, while also not knowing what's going on, while people in chat suggest what might be going on, but they can't be all right. Good fun overall, would recommend, but don't know why.

@defeqel6537 7 месяцев назад

Optimization videos like this, I'd always want to know the size of the data, and the theoretical memory bandwidth, etc. to have some actual context. No point trying to optimize things if you are already limited by the hardware (or the optimization would need to be radically different at least).

@debemdeboas 7 месяцев назад

reading contiguous memory is generally faster because the compiler may not have to deal with paging and segmentation for every index

@Nenad_bZmaj 7 месяцев назад

The difference of cca. 250 ms for about one billion elements is to be expected if the additional GC sweeps are triggered. He should have insulated the three scenarios and benchmarked them separately, with the SD, not consecutively in one test. It seems that an additional GC sweep was turned on sometime after the second test. Also, package "time", in my experience, may interfere somehow with either GC or the execution. There are better tools for benchmarking, although I do use "time" to give me a sense.

@Satook 7 месяцев назад

The 2nd form in part 1 is pretty easy for a compiler to optimise with constant propagation. The x and y don’t change so the sub-expressions can be moved up out of the innermost loop.

@AlexnderMisuno 7 месяцев назад

Once I implemented an Mineswepper game in go using "normal" approach to store data about field, cell state etc. And afterwards I did the same but using bit-wise operations (just for fun). To my surprise, not only the performance was the same, but also memory consumption

@yugioh8810 7 месяцев назад

using only index with range is usually faster especially if the array is long or has heavy structs since it eliminates the copy operation to the variable in the for loop. for i := range x is faster than for _, val := range x

@Gusto20000 7 месяцев назад

I’ve done the same thing a week ago after 1.22 released, benchmarking with testing.B and measured exactly the same result for each case… So I decided simple range is good enough. Bill Kennedy talks about “mechanical sympathy” in “Mastering Go” about this exact problem - it’s all about low level caching and prefetching by cpu and compiler

@miracleinnocent2649 7 месяцев назад

Go's allocations all depend on its escape analysis sometimes even interfaces can be stack allocated and I think the highest bytes for stack allocation is 32kb

@MattBolt35 7 месяцев назад

There are ways to pass link options to go’s compiler to show which values escape. Slices are typically heap allocated. You can stack allocate an array if you use a finite length (as he mentions).

@Nenad_bZmaj 7 месяцев назад

But only small sized arrays.

@freedomextremist7921 7 месяцев назад

It's a slice not a list. And Go does also have arrays which are contiguous regions in memory.

@Leeway4434 7 месяцев назад

yes in go arrays are stack allocated and are created with a fixed length. [5]int is an array of length 5 on the stack and []int{1,2,3,4,5} is a slice of length 5 on the heap, pointer to it on the stack. I believe there is a size limit for stack allocated arrays at some point the compiler puts them on the heap* to avoid stack overflows

@Nenad_bZmaj 7 месяцев назад

@@Leeway4434 All true.

@Leeway4434 7 месяцев назад

@@Nenad_bZmaj I actually looked into it some more and it is not guaranteed for slices to be put on the heap. The compiler has heuristics for whether or not something should end up on the heap or the stack. Large values usually end up on the heap. Small values (including slices of sufficiently small constant length) can often be allocated on the stack.

@Nenad_bZmaj 7 месяцев назад

@@Leeway4434 Thanks. I didn't know that.

@rumplstiltztinkerstein 7 месяцев назад

The notion of "code optimization" is mostly popular because of Javascript. We invented compilers so we don't have to worry about looking at everything the cpu is doing in every clock cycle.

@michaelrobb9542 7 месяцев назад

You can't put Sys calls in for loops. Calls like time.Now are calls to the OS, this will produce inconsistent results on each run.

@Nenad_bZmaj 7 месяцев назад

True. There is a bigger for loop in this particular test, encompassing three tests.

@Nenad_bZmaj 7 месяцев назад

Small arrays can be on stack, but slices are always allocated on the heap (in Go).

@haraldfielker4635 7 месяцев назад

I think the range works better with CPU caches (in that szenario).

@sotiriosmantziaris6097 7 месяцев назад

The compiler decides based on escape analysis where to put things.

@georgehelyar 7 месяцев назад

I'm not a go expert but it sounds like the reason 3 is slower than 2 and reverse is faster than forwards is because of bounds checking.

@defeqel6537 7 месяцев назад

which I think is something go should probably be able to optimize away in cases like this

@georgehelyar 7 месяцев назад

@@defeqel6537 is it a jagged array though? If len(a[0]) can be different to len(a[1]) then it could potentially go out of bounds, but again I'm not a go expert. C#, for example, has multi-dimensional arrays of contiguous memory, indexed with a[x,y,z] where the lengths of each dimension are guaranteed, or you could make a jagged array by making an array of arrays where they aren't, and then you'd have to use a[x][y][z]. Reverse being faster in several languages always seemed like it could be optimised though. Reverse is faster because it always knows that the lower bound is 0 regardless of the array so if it starts inside the bounds and then never goes below 0 then it doesn't have to check bounds for each iteration. And of course dynamic arrays could change size at runtime Rust probably does bounds checking at compile time

@Nenad_bZmaj 7 месяцев назад

@@defeqel6537 I believe that, too. The very typical difference of cca 200 ms for arrays of this size tells me that probably an additional GC sweep was triggered.

@Jarikraider 7 месяцев назад

Personally, I thought a 100ms improvement on 1 billion elements was pretty good.

@adambright5416 6 месяцев назад

The fact that he used time in one function instead of builtin benchmarks is just... Wow

@rosehogenson1398 7 месяцев назад

go has escape analysis which can decide to place slices on the stack if it can guarantee they don't escape the current scope

@Nenad_bZmaj 7 месяцев назад

True for some types of variables, not sure if this is true for slices. I think they are always on the heap.

@Nenad_bZmaj 7 месяцев назад

Something I noticed: you'll get slightly worse performance if you write involved expressions instead of simple indexes. Hint: assign such expression to a variable first and use that as an index. and let compiler optimize it. Simpler expressions are better put as an index directly, e.g. a[i+3], but write: ind := b[i]; do something with a[ind], rather than: do something with a[b[i]]. Not a big deal, though, since difference is marginal.

@OnFireByte 7 месяцев назад

really interesting that flatten slice is a bit slower than actual 3d slice

@CjqNslXUcM 7 месяцев назад

the slice is a pointer with a type and a len, so every time you move from the innermost slice it jumps to a cache miss. the flat slice is one fat blob of memory. no cache misses.

@OnFireByte 7 месяцев назад

@@CjqNslXUcM that’s why im surprised that flatten one is slower

@metropolis10 7 месяцев назад

I liked that he said the 100ms improvement wasn't worth it. How often do people admit the technically faster thing isn't worth the headaches.

@MethodOverRide 7 месяцев назад

Write smooth easy code? Sounds like something that genius Tom would do with JDSL.

@blenderpanzi 7 месяцев назад

I'm at 1:53 - I think the 2nd one could be the fastest, because an intelligent compiler could inline len() and eliminate bound checks in that case. It can't eliminate bound checks in the last version. Let's see if my intuition is right.

@Bluesourboy 7 месяцев назад

These complications from loops is why I believe C/C++ will be around FOR a WHILE longer.

@RenThraysk 7 месяцев назад

His problem is that Go compiler doesn't perform loop unrolling, this is a deliberate decision by the go devs. If you manually loop unroll he'd get an improvement.

@Nenad_bZmaj 7 месяцев назад

Regarding some comments: I did try arrays instead of slices of length known at the compile time, but performance was always slightly worse. In my experience, Go programmers can forget the arrays, since slices seem to be very well optimized. And, I like using slices when I have to do something differently in the last iteration or in distinct subranges. So, instead of for i := 0; i < k; i++ {...}; for i := k; i < len(a); i++{...}, I can write: b := a[: k]; c := a [k:]; for _, el := range b {...}; for _, el := range c {...}.

@flga_ 7 месяцев назад

I'm not sure what you did, but when switching to arrays you need to be careful to avoid copies. There is no inherent reason for arrays to perform slower, other than that.

@Nenad_bZmaj 7 месяцев назад

@@flga_ I passed them as reference. Very small arrays, for instance, small hash tables with short slices of bytes as elements, or arrays of ints. But passing pointers to them apparently always triggers GC sweep. When I turn them to slices, the execution is faster overall for the duration of one GC sweep, which on my comp., for several programs I wrote, apparently (inferred) takes about 200 ms on average.

@flga_ 7 месяцев назад

@@Nenad_bZmaj I don't think you were measuring things correctly then, I highly doubt gc is a factor. Passing a pointer to an array will require a nil check, that's the only difference and can be done by you, upfront, instead of by the compiler per iteration. Put in other terms, using arrays is asymptotically equivalent to using a 1d slice. Tho for small sizes the distinction is unlikely to be meaningful as the chance of it all being cached in either scenario is high.

@awesomedavid2012 7 месяцев назад

My assumption is that if you create a slice with a fixed compile time size and it doesn't outlive the function, Go will simply implement it as an array on the stack. The compiler knows its size and it knows how long it lives. Notice that even if you call append, it constructs a new slice. So the original could still be an array on the stack in implementation as long as you don't pass it outside of the function.

@Nenad_bZmaj 7 месяцев назад

Most arrays and ALL slices are allocated to the heap in Go.

@Nenad_bZmaj 7 месяцев назад

Also, when you call append, Go runtime doesn't necessarily create a copied array of twice the capacity. If there is enough capacity in the original array, appending a slice that points to it doesn't create new array. (On the side, a new slice is not declared with reasignment: a = append(a, el) .)

@what1heh3ck 7 месяцев назад

@@Nenad_bZmajI thought append always create a new slice (still point to the original array if it has enough capacity), am I missing something here?

@awesomedavid2012 7 месяцев назад

@@Nenad_bZmaj I agree about the append thing. Though I still think the compiler could theoretically optimize a slice to be on the stack if you never call append, give it a constant time size, and don't return it. When you say it's always on the heap in go, do you mean that is how the main Go compiler implements it or is that part of the Go standard?

@Nenad_bZmaj 7 месяцев назад

@@awesomedavid2012 I remember that this was the standard, although not in the language specification, since this doesn' t belong there, but this behaviour may change in future without changing the language specs. Somebody just told me that he looked at this more closely and that the current compiler does keep smaller slices that have properties you describe on the stack.

@justbrad_v3906 7 месяцев назад

yeah, that sounds like buffer overflow hell lol

@RandomGeometryDashStuff 7 месяцев назад

08:39 "a.length = 0" doesn't create new object so less stuff to gc

@k98killer 7 месяцев назад

Go will store slices in stack memory if they are small enough (less than 2^16 bytes). Allegedly.

@masterchief1520 2 месяца назад

Thought it all depended on escape analysis?

@k98killer 2 месяца назад

@@masterchief1520 I think it is a mix, but it has been a while since I last looked into this, so idk for sure.

@RandomGeometryDashStuff 7 месяцев назад

08:55 "a[a.length]=1" is slower than "a.push(1)" in v8

@FunkyELF 7 месяцев назад

The python range() is not like ranges in other languages. Python range just gives back numbers, that's it. I think go's range is more equivalent to general iteration in Python and the "in" keyword... `for thing in things: ...`

@davidzwitser 7 месяцев назад

Meanwhile in APL languages you just write ‘matrix + number’ and it’s fully optimised

@bbrodriges 7 месяцев назад

Slices can be stack allocated. If fact they are by default

@metaltyphoon 7 месяцев назад

Not they r not. The stack type, which is a pointer + length, will be stack allocated but the data won’t be in the stack.

@antoniusupov 7 месяцев назад

All pointer values in go will go to heap. Go check out the runtime package in go and see what is inside. Only objects of fixed length live on stack, in go even strings contain pointers in them and go to heap.

@Nenad_bZmaj 7 месяцев назад

The so called slice-header will be on stack (basically 3 numbers). It always points to the memory allocation on the heap.

@dejangegic 7 месяцев назад

I guess Go really is K.I.S.S. from the bottom-up

@ninocraft1 7 месяцев назад

S.S.I.K. ?

@Arwahanoth 7 месяцев назад

I guess range statement avoid bound checking.

@AlexanderNecheff 7 месяцев назад

I used to believe the compiler knew best and especially that any bug in the code was my fault, never the compiler's fault: then I started having to use the Intel compilers... And oh boy howdy - some of the worst compilers ever. Poor standards compliance, lots of weird bugs in both the front and back end, horrible diagnostics to boot. Still generally think the compiler usually knows best, but now I have an disclaimer associated with that rule of thumb.

@reactjs1900 7 месяцев назад

I think slices in go are like vectors and stored in heaps

@lukekurlandski7653 7 месяцев назад

These are dynamic arrays, right? I feel like you should be able to call a .contiguous() function or something that moves all of the elements of the tensor into a contiguous arrangement. Then again, I’m guessing there is some other data structure in Go that is closer to a traditional array or something…?

@Nenad_bZmaj 7 месяцев назад

Yes, array : a := [k]something, as opposed to slice- a:= make ([]something, k). But in my experience, slices perform better.

@someman7 7 месяцев назад

Whoa, his parentheses are different colors. It only took us what, 30 years to see "such" progress in syntax highlighting?

@arkie87 7 месяцев назад

numba allows you to write fast python loops by compiling using llvm also, am I the only one who pronounced numpy in their head as num-pee for the longest time

@huge_letters 7 месяцев назад

wouldn't it be faster to access an underlying list at each loop layer? instead of loop x loop y loop z t += arr[x][y][z] you do loop x xa = arr[x] loop y ya = xa[y] loop z t += ya[z] that way you do access on x and y only when needed and not every iteration? If you're saying it's likely memory access that's the issue

@Nenad_bZmaj 7 месяцев назад

No difference. You still have to access the element of the array via xa. xa and arr[x] are both just pointers. And the array is one same. The compiler will transform your code into the one in the video, anyway, because xa and ya are just transient variables of slice (pointer) type.

@huge_letters 7 месяцев назад

@@Nenad_bZmaj I mean that in option 1 you do access by x x*y*z times while in option 2 you do it only x times. Same for y - y*z times vs y times

@Nenad_bZmaj 7 месяцев назад

@@huge_letters Oh, I wish. But simply not possible. No matter how you write it, the array contains x*y*z elements, and you need to access them all.

@GaborGyebnar 7 месяцев назад

Case 5 is fastest because the code is buggy: y is never used. It's just reading the same row multiple times, cache hits make it fast but the result is incorrect.

@Zullfix 7 месяцев назад

Rather than trying to optimize iterating, it sounds like his array needs to be split into partitions (or chunks if you play minecraft) to allow for multithreading his calculations.

@ErazerPT 7 месяцев назад

Not everything can can be parallelized. It all comes down to whether or not your next step depends on the results of the previous step. If no, you can, if yes, you can't. Simple example, a loop that steps through a vector adding the previous element to the current one in place. You can't parallelize it because you need the previous step's result before you execute the next.

@Zullfix 7 месяцев назад

@@ErazerPT You don't parallelize by computing steps out of order, you parallelize by partitioning your space into chunks and computing each chunk on a separate thread, then combine the results. Obviously nothing is that simple, but at a surface level this is *the* technique used by professionals to create multi-threaded physics simulations and it very likely can be applied to whatever the guy in the video is trying to compute.

@ErazerPT 7 месяцев назад

@@Zullfix Read the example i gave you. It CAN'T be parallelized. You CAN'T partition things if ANY of the subsequent operations depend on the results of its predecessors. As for physics simulations, be it rigid body dynamics or fluid, search around and you'll find a simple "some parts can, some parts can't" answer. Think like this, in a perfectly inelastic collision of billiard balls in a perfect line, only the cue ball and the last ball will move. But the last ball can't know it will move before the "one before last" knows it should try to move. And the one before last won't know... etc. Things like for example "grass simulation" can be done IF you take some "artistic license", ie, for example assuming wind interacts with grass but grass blades don't interact with each other. THAT is when you can partition things, because wind is a common entity and each blade of grass does it's own thing.

@atijohn8135 7 месяцев назад

@@ErazerPT for your example, you could still optimize it by adding up parts of the array in parallel, then for each part add the final element of the previous one. this would be faster for large enough arrays, since you can further parallelize the subsequent block adds no problem. many smaller dependencies like that can be eliminated by experienced parallel programmers, it's really more common than you seem to think, you can't just say that you can't parallelize an algorithm just because "not everything can't be parallelized"

@ErazerPT 7 месяцев назад

@@atijohn8135 Jesus Mary, you people really don't give up... YOU CAN'T parallelize that example (and be more efficient). THINK. If one thread is adding a[0] to a[1], another CAN'T add a[1] to a[2] because it DOESN'T know what a[1]'s final value will be before the first thread finishes. But let's say we ignore that, and have a cpu with unlimited (real) threads. So, one does a[0]+a[1],while another does a[1]+a[2], etc until a[n-1]+a[n]. When they all finished, you now have to run another batch, look at differentials from the previous ops, and do it all over again for a[1] to a[n]. But... it has a "carry effect", so, you have to do it again for a[2] to a[n], then for each element upto a[n-1]+a[n]. Congratulations, you just parallelized something and ended up being less efficient than a single thread, because you still had to run n-1 steps, do more work per step and waste more memory because you had to keep track of both old values and new to calculate differentials. Great job.

@martinpata2899 7 месяцев назад

I like the idea but he should be using go benchmark for this tests

@GermanClaus 7 месяцев назад

Try out some Kotlin ;)

@yoloswaggins2161 7 месяцев назад

Can't you just disassemble the binary and look at what they're actually doing?

@mohamedaityoussef9965 7 месяцев назад

Blue hair gone nooooooo

@joyboricua3721 7 месяцев назад

@6:22 somewhere a hacker says "Oh yes we could"

@lancemarchetti8673 7 месяцев назад

Nice

@gavipk 6 месяцев назад

solution: c++

@alexpyattaev 7 месяцев назад

1 billion elements... The problem is likely memory bound. Make use of smaller data type, it will run faster.

@Nenad_bZmaj 7 месяцев назад

Yeah. Also the ratio of used to total memory influences how frequent GC will run. (kinda; it is more complicated). GC sweep for this size typically takes 200 ms (on my old comp with old single core Athlon CPU).

@alexpyattaev 7 месяцев назад

@@Nenad_bZmajGC does not traverse an array of integers. As far as GC is concerned, it is one "leaf" object in the object graph, so it does not get traversed, no matter how big it is. Now if it contained other objects like strings the GC would have to walk it.

@Nenad_bZmaj 7 месяцев назад

@@alexpyattaev I am not talking about GC traversing an array, but about the GC mark-and-sweep cycle occurring during the execution at some time points, at which times "the world is frozen". The way he structured this test - you don't know when the GC was active, so benchmarking three test-cases consecutively in one outer loop is not going to be informative for comparisons... I mentioned size dependency, because the program runs longer on larger arrays and GC runs more times. I see that in my programs. Although these programs involve quite complex data processing, the mere size of input array dictates how many times GC will run or for how long in total it freezes the world. When my input slices point to array of 50 Mbytes, one GC sweep takes about 50 ms on average, and when the array is 250 Mb - 250 ms. (on my very old comp.) I assume that when each innermost slice is done its array can be marked for sweeping since it is not going to be used anymore, and so on upwards.

@alexpyattaev 7 месяцев назад

@@Nenad_bZmajOh I see, with more memory allocated in general it just runs GC more often. That is crazy, so the more RAM you use the slower your code runs (as it does more pauses)? Brrr....

@Nenad_bZmaj 7 месяцев назад

@@alexpyattaev It is highly optimized, though, so it uses the least number of cycles possible for the amount of garbage it needs to collect. Plus, Go gives you options for GC so you can optimize it for your specific program during the build. If your program approaches the memory limit, Go GC will take that into account when calculating how often must sweep, and adjust its cycles so you don''t run out of memory. You can opt out of this behavior. When you have enough memory, GC doesn't take a large fraction of total time even if you have a lot of garbage. If you have a miltithreaded CPU, then you almost won't notice the GC.

@dixztube 7 месяцев назад

Man the new react next is sucks! I just popped out an htmx and go project. Silky smooth, fast and sexy. Been bumming around this react nonsense for two months now after getting decent at the last version of react

@jony1710 7 месяцев назад

ITT nobody knows what the stack is

@bloomingbridges 7 месяцев назад

🐼

@ofekshochat9920 7 месяцев назад

arr[a][b][c] = arr[a * (second_dim + third_dim) + b * third_dim + c]. this shouldn't have hops...

@leeroyjenkins0 3 месяца назад

They're slices so you can append to them and thus they can move. They're not contiguous memory. If you want contiguous memory you use fixed-size arrays.

@BlitzDjingga 7 месяцев назад

This is really shocking result, but I do believe the second one is better approach most of the time, while the third one is super optimized that it is so tiring to code and maintain. To bad the video does not show memory performance for this, I believe what is happening is that the second and third one operate on HEAP does make the performance slower. While the first one is lucky enough not to encounter memory overflow. thing is when you do `range` it make copy of the object, albeit 3D array maybe the host machine having ram big enough to account for all of that iterating. Will be more educational if can show the garbage collection and memory / space performance for each of the method.