CORRECTIONS: none so far, but please tell me if there are any mistakes. I apologize for the long wait on this one. For most of October, I was working on a different video (hint: it's chemistry related), but I eventually realized it would take way too long to finish that, so I switched to block sort. And then I've been sick for most of November so far. My views are a bit down, so anything you can do to help the channel is very much appreciated. Anyway, thank you for your patience and continued support!
Try making more videos on simple things expanded upon I’d say for more views. Your colors, first sorting video, and Platonic solids video were hits, and they all follow the “here’s a basic topic you already know about, I’m about to explain something complicated you didn’t know about it”
I have a question, can we create say, a joke-sorting algorithm that works in prime number time. This means that the number of operations is O(p_n). I mean it is proven that p_n ~ n*(log(n)+log(log(n))-1) for n>=6. Note that here log represents log base e not 2.
@@pokemonjourneysfan5925 that's really interesting! I've never seen a sorting algorithm that is O(log(logn)), but I've heard it's theoretically possible if the input is only integers.
I think the perfect sort would exploit CPU architecture, and therefore have optimal temporal and spatial locality prioritized over time and space complexity. I also think it would benefit from considering registers and cache as space that doesn't technically count against the "in-place" constraint. An nlogn algorithm that makes better use of locality will easily outperform one with more unpredictable memory access
I don't think local variables count against space complexity, as long as they're not arrays of a size dependent on n. You have to be able to swap stuff, after all. All the pointers and partitions, etc are part of that.
I mean, they technically do. But if it's just "And a temporary variable for swapping things", that's O(1) extra space. Contrast with a normal merge sort that needs a buffer the size of one of the blocks to be merged, which is half the size of the array or O(n)
What you propose would obviously lead to enormous savings when sorting, for example, a huge number of integers on a specific machine. But I think what you usually have is at most a few thousand elements sorted on a more or less random CPU (little more than the architecture being fixed) with somewhat random cache sizes and the actual comparisons delegated to a function or macro of variable complexity. In that case you don't know what to optimize for, and whatever happens during the comparison calculations may well completely destroy your locality. The usual metric of number of comparisons is exactly right in this case.
No doubt. Knowing more about the data and the machine you're using will result in improvements. Example: If I'm actively receiving data, I can start sorting before I even have all the data. If the transmission is slow, and the sort is relatively fast....insertion (or variation) sort may very well be the best: when playing cards, I don't sort my entire hand when I draw a new card.
This is very hard for me to understand. However, I think this video does an amazing job teaching it. I am just having a hard time because this is my first time ever encountering most of these ideas. I will definitely want to watch this multiple times.
Very, very good explanation of Grailsort. I still plan on posting my own explanations in the future, but it's great that yours is available on the net for now! I'll be sure to give you credit. We also will continue developing Holy Grailsort sometime in the near future!
Oh cool. I think this explanation lacked a pseudo code since in the other 2 parts of the video it had and this one didn't. It's also hard to find any online so a pseudo code would help me understand better.
I found that selection sort can be surprisingly useful as a lazy sort. I don't know how many values I'll need, but I usually don't need to sort the entire list. I usually just need the two smallest values.
That task is usually solved by quicksort, you just don't sort both halves recursively, only the half containing the k-th element to find k lowest elements.
@@Milan_Openfeint I haven't benchmarked it, but the worst case time complexity for this algorithm is O(n²). My worst case time complexity is O(n). The only benefit I can think of is that it would make the second selection O(1). But that seems overkill for just grabbing two elements. In all honesty, I did think about trying quicksort, but the thing that stopped me was that the one library I found for it used a lot of memory allocations, which was already taking most of my time.
If you just need to find the k-th largest/smallest value (in your case k=2), consider using quick select. Quick select is O(n), in-place, though not stable. If you write C++, it is called "std::nth_element" The resulting list's first k elements will be the k smallest elements, and the k-th element in the list will be exactly the k-th smallest element. If you then need the first k elements sorted, you only need to sort the first k-1 elements however you want, or if k
I would say that quick sort is an "in place" sorting algorithm. I've never heard of in place as being defined as specifically O(1) and log n space complexity is... well, if you want your sort to ever actually terminate, your logarithm better be less than, say, 64 lol.
Does binary search insertion sort count as O(nlogn)? It only has O(nlogn) comparisons and O(n) writes, but each write is of size O(n) which I've been told may mean that each write takes O(n) time
so the thing about insertion sort is that even though you can find a piece's destination within the sorted section in O(logn) time, to actually get it there, you have to shift everything over. On average, you have to shift over half the sorted section, which is on average half the list, so inserting 1 piece, even with a binary search is O(n) per piece, therefore in total it's O(n^2), the same time complexity as regular insertion sort. (It does have fewer operations though, just not a time complexity difference)
According to Andrei Alexandrescu, binary search insertion sort is also worse experimentally, since the branches are less predictable. In the linear insertion sort, the CPU can learn that most of the time, the inner loop is continued, but in binary search insertion sort, you can take the upper or lower branch seemingly at random. Branch prediction matters!
@@iwersonsch5131 if the branch is mispredicted, the effect is a pipeline flush, which can be as bad as doing nothing for hundreds of cycles. Plus Kuvina already explained that you still need the linear move operation.
@@nullplan01 So linear insertion sort has 1 pipeline flush per entry, while binary insertion sort has about lb(n)/2 pipeline flushes per entry. A middle ground could be to break up the sorted list into linearly searched blocks of, say, sqrt(n) (or for very large n, n^(1/3) and n^(2/3)), resulting in 2 pipeline flushes per entry, or 3 for very large n, while keeping the comparisons for each entry below 1000*log_1000(n)
Note that there is a simple variant of mergesort that is in-place, it's just not stable. Still it has advantages over quicksort as it requires less memory and it guarantees O(nlogn), which quicksort does not; quicksort is only typically O(nlogn) but unless you use a special variant of it, there are cases where it clearly is not, whereas meregsort is always O(nlogn), no matter the input. Of course, you could as well use heapsort instead of that mergesort variant if all you wanted was in-place and O(nlogn). But at some point, it also boils down to the fact that two algorithms aren't equally fast just because they are both O(nlogn) and speed is often the most important factor of all and it's not covered at all by the introduction table (complexity is not speed). Last but not least: In-place is a very stretched term. If an algorithm requires arrays of auxiliary structure, e.g. to track O(log n) bits, that is not truly in-place, as it just swaps stack memory for a bit array, which is a reduction of extra space but it is extra space.
Thanks that's a good point! I should have mentioned just because it checks the boxes doesn't mean it's necessarily the most practical algorithm for every purpose!
@@KuvinaWell, the big-O notation is always very deceiving. I usually tell people the story, that I required a lookup table for network protocol implementation in a system kernel. What did I choose? Well, a hasthable, of course, as it is O(1). Years later and just out of curiosity, I wanted to see what happens if I replace the hashtable by a sorted array instead, where lookups are O(logn), of course (binary search). And the result was: The sorted array resulted in a significant better performance. Of course, there is a turning point. The array will get slower the more elements it has, whereas the hashtable will roughly stay the same. I was able to calculate the turning point and later benchmarks confirmed that calculation to be correct. But that turning point was irrelevant, as it was way beyond 256 entries and the lookup table was bound to having at most 256 entries (the table entry count was stored in a byte) and would typically not have more than anything between 4 and 64 entries in real world scenarios. So just because two algorithms are both O(nlogn) and both have O(1) space complexity, doesn't mean it's totally irrelevant which one you pick. If there were no advantages and disadvantages, then there would be no reason why different variants even exist, right?
@@xcoder1122 When it will have at most a small bounded number of elements, like 256, it's also worth considering just using a bitmap + the data (if the data has some kind of null state, you can also avoid the bitmap) and get lookup in a single pointer offset. It uses a bit more memory, but if this is a persistent structure that multiple copies won't exist of that's not much of an issue in most cases. Unless you're using a 16-bit microcontroller or something, anyway.
In wikisort, how does it find the A block with the smallest tag value? Does it kind of selection sort it (check the second value of every A block, then block swap the first block with the one with the smallest tag value) or does it use a better method?
It can be with hyper large lists, like more than would fit into RAM (external sorting). External sorting is another can of worms anyway. But it would likely be something like doing a sort on blocks of a certain size where a standard algorithm would be able to do it in RAM, then use extra space and in a streaming fashion perform a (potentially n-way) merge sort.
O(n log n) is always higher than O(n), but it doesn't matter because it's optimal for comparison sorting. O(n) comparison sorting doesn't exist. Also, log n grows so slowly that the difference barely matters anyway. By the time it log n is appreciably large, you're hopefully going to be using a parallelized sort instead.
Because you might be sorting complex objects which have multiple keys. If you sort a list of people by age, you don't want the list to change relative order. If you did, it wouldnt be possible to guarantee a sort of two different keys since sorting by the second key could break the ordering of the first key.
I came up with a proof of concept electronic sorting technique. But I cant find anything remotely similar to it in the existing sorting algorithm ecosystem. I dont know how to appraise it to see if its worth investing more time into developing it. Its Stable. Its not Technically in-place, but its impact on hardware is very limited. Its time complexity is very hard to define.... it is a multi-step process, but each step is measured in clock cycles,
@@noelletheradcat Huh, never heard of it. The acronym for "Piece of Shit" (or "Point of Sale", but those are basically the same thing lmao) is too ingrained in my head
I kinda want to see how this performs in terms of cache efficiency. It seems like this uses more operations than Quicksort, but if it is way better with cache efficiency then it might be useful outside of embedded cases.
From benchmarks I recall seeing (awhile ago, no links, sorry), they're surprisingly competitive given the complexity involved. I suspect that a good quicksort will be faster on random data, but given the drawbacks of quicksort (unstable, hard to pick a good pivot for non-random data), they are probably worth considering. I believe wikisort is now used in some standard libraries.
Yes, a good set of benchmarks would help. Performance will depend on the size of the input and blocks, and which variant of block sort of course. There are some benchmarks for block sorts like quadsort, octosort and wikisort vs glibc qsort and std::stable_sort. Quadsort wasn't quite a fair comparison since it seems to use extra memory -- it is very fast but the author acknowledges "Quicksort has a slight advantage on random data as the array size increases beyond the L1 cache". Octosort seemed comparable to qsort for fully random order but much faster for a few scenarios (esp. already sorted ascending/descending). Wikisort is similar, but may not be as useful in embedded / low-memory environments because it slightly bends the "rules" by using a fixed-size 512-item cache -- this is still technically O(1) space because it's a constant size cache, but may not be what you want.
I'm halfway through this video, and I love the idea that, technically, I were a customer in a database, and my entry were part of the A block, you might be using my data entry to encode the sorting status of other customer's information in that database.
I think it's basically like rotation based merging, but in reverse. So in the algorithms where the left sublist is already sorted, you just start at the beginning and repeatedly find the first piece that is greater than the last one you used, and then rotate it into the next spot in the buffer. As for grail sort, you can use basically binary insertion sort, but if you find that the piece is exactly equal to one of the ones already in the buffer, you simply skip it and don't insert that one.
There are n! possibilites that the array can be in. Any comparison only gives 1 bit of information, but one needs log(n!) in total. n! ≈ n^n, so log(n^n) = nlog(n) is a lower bound on the number of comparisons needed to figure out what the permutation is.
Now get ready for... *EVOLUTIONARY SORT!!!* > This sort trains an artificial intelligence into learning how to sort stuff like humans do. > The AI will learn its own way to sort. > After the sort is completed the AI will be destroyed. Yes, I made this thing completely up and I have no intention to try and replicate it into a real sorting algorithm.
In the future, we will probably have 1 quadrillion parameter AIs that are actually useful for this. Maybe we will have them sorting something that can not be sorted easily, like quatum entangled particles.
That's basically the "Universal Search" algorithm. Ironically, it's the "absolute fastest" of all algorithms from the perspective of complexity theory, even though it's horribly slow. That's because the time to build "the AI", or whatever you call it, is CONSTANT, and thus negligible compared to O(n).
At 3:03 I don't quite understand why the time to rotate length with sqrt(n) has a time complexity of O(sqrt(n)). Like if you try to rotate the left sublist to the place wouldn't it take time to shift element to the left?
Each time you rotate, you need to move O(sqrt n) elements right and possibly O(n) elements left. This is done O(sqrt n) times in total, so it seems like it will take O(n sqrt n) time in total. But notice that each element is shifted left only once. This means that the time complexity of shifting elements left is O(n) in total. We shift O(sqrt n) elements right a total of O(sqrt n) times, so the time complexity of that is O(n) in total. Thus, the full merge process is linear.
@@vgtcross But how should this algorithm be implemented, like keep shifting the next element after the sublist until reaching the destination? But if so then wouldn't it be shuffled and will need to be sorted every time it shift?
So a rotation of 2 sublists can be done in linear time (compared to the length of the 2 sublists) if you simply flip each one then flip them together. This moves each piece twice, and there are faster ways that move each piece once, but the same time complexity.
I guess it has the same problems as a c++ hashmap, it's O(1) but it uses a linked list in it's implementation. How to work in place, how to work concise in memory (no pointers). I watched the video only halfways to now, sorry: Go also has hashmaps O(1). Most of the time I prevent to use them in a public API. Just lend over arrays/slices and search them with a binary search. It is a much simplier API only using arrays, and it is still faster as a map lookup, because of the cpu cache locality. I already know, that Bubble Sort can be faster than, lets say Quiksort, for small numbers. It's evident how the mathematical function curves behave, the extremes are in the big numbers thou.
I didn't follow every detail of the video but, don't all these algorithms still require O(log(n)) space on a call stack, due to their recursive nature? EDIT: Never mind. I think these are not recursive in nature. I'm going to have to implement these ideas to fully understand them. Awesome video!
i thought quicksort(choose a pivot(p), smaller than p to the left bigger to the right, split array at p and do the same on the two halves recursively) was in place? am i confusing with another algorithm?
The blocking assumes fixed record size. For uneven record lengths (CSV files ...) the rotate/swap just doesn't work. [ or you would need a separate layer of "pointers", wich would result in terrible locality]
alternatively you can treat the header ties as their own sub-sorting problem - rows with the same header get sorted together, then you can do a second pass to 'tie-break' grabbing a second header chunk out of each group of ties.
@@Milan_Openfeint Well, today sorting is result of years of optimize. Can't beat that easily. My idea can sort everything in two part. Read from main array once and get multiple sort lists. Then combine that lists to main array. Part 2 Multi thread able. And result is Stable. ... With big memory usage problem
Personally I wouldn't say that. Radix sort doesn't really have any type of block rearrangement or local merging. It does categorize pieces into groups, but I would say radix sort is a type of bucket sort or vice versa. I would instead classify block sort as a type of merge sort.
// Data Set size: 32768 x 32768 - RTfloat ///////////////////////////////// Application - MgwTestApp - NOS64_MGW ( 64-bit ) - Release Tests: Start Command Line arguments - User Count : 6 Values: 32768 3 1 4 7 * Test0001 Start * Microsoft Visual Studio version Not Defined ********************************************** Configuration - NOS64_MGW ( 64-bit ) - Release CTestSet::InitTestEnv - Passed * CSortSet Start * * TSortSet Methods * * CSortSet Methods * GetSortOrder - Passed Data Set of RANDOM Values Data Set Size : 1073741824 elements Number of Iterations: 3 Number of Tests : 1 Number of Threads : 4 * CSortSet Algorithms * Data Set: RTfloat - in RANDOM Order GetSortOrder - Passed HeapSort - Sorting... HeapSort - RTfloat - 425415.00000 ms HeapSort - Passed QuickSort - Sorting... QuickSort - RTfloat - 10495076.00000 ms QuickSort - Passed MergeSort - Sorting... MergeSort - RTfloat - 146345.00000 ms MergeSort - Passed Data Set: RTfloat - in ASCENDING Order GetSortOrder - Passed GetSortOrder - RTfloat - 0.00000 ms GetSortOrder - Passed * CSortSet Algorithms - VALUESET * Converted to RTdouble type Converted to RTubyte type VALUESET:RTubyte - 0.00000 ms VALUESET:RTubyte - Passed * CSortSet Generic * * CSortSet End * * Test0001 End * Tests: Completed
You can only sort an arbitrarily-ordered array in linear time if you know the *distribution* of the data beforehand, otherwise the minimum complexity will be O(n log n).
@@Musicombo In almost all practical cases, values are stored in 32, maybe 64 bit integers, which can always be sorted in linear time. If you don't make that assumption, then yes, you are right.