Thank you, for the video, it is highly informative. I wonder that, what would happen if the initial number of pages were 9 or 10 which is not a power of 2. Because in this case, when merge pass happens, there is a buffer left with less number of pages.
Hello Dr. Jens Dittrich, Thank you for such a detailed explanation of external sorting. The only doubt I have is on Replacement Selection section. The slide which heading says "two main memory pages for run generation with quicksort" so here this is different technique of sorting using quick sort (I mean is it distribution sort?)? and then are doing replacement selection? I couldn't connected quicksort part with replacement selection. Would you please explain me?
No, it is still merge sort, the only change is how we generate the initial runs in Phase 0: Quicksort or Replacement Selection. For all passes >0 we then merge the streams iteratively
how can you put all the data into main memory, if the main memory is insufficient? and also clarify that on what basis the data is divided into x number of pages?
I don't. In pass 0 you only load two pages at a time into main memory, (marked with the green rectangle), sort them and then write them back to disk. In phases >=1 we need at least three pages: one for each of the input runs and one for the output run. I could have used three pages for phase 0, but it would not have created this nicely balanced merge tree.