Тёмный

KD-Tree Nearest Neighbor Data Structure 

Stable Sort
Подписаться 12 тыс.
Просмотров 109 тыс.
50% 1

KD-Tree is a data structure useful when organizing data by several criteria all at once. Consider an example where you have a set of points on a 2 dimensional plane. Now suppose you are asked to find the nearest neighbor of some target point. What data structure would you use to store these points to be able to solve that problem?
en.wikipedia.org/wiki/K-d_tree
A naive approach would be to perhaps store the points in an array, sorted by one of the properties, such as the X coordinate. Then you could easily do a binary search to find the point with the closest X coordinate. But unfortunately you’d still not necessarily be any closer to finding a point where the Y coordinate is also nearby. To convince yourself of it, just think of the case where all of the points share the same X coordinate. Clearly, you’d have to examine every single point, one by one, to find the nearest neighbor of the target point.
KD-Tree data structure is much like the regular binary search tree that you are familiar with. Except each node would typically hold not just 1 value, but a list of values. When traversing a KD-Tree, we cycle through the index to the values list, based on the depth of a particular node.
The “K” in the name “KD-Tree” refers to the number of properties that each data point has. In our example, since it’s a 2 dimensional plane, we have exactly 2 properties - the X coordinate and the Y coordinate. So, the root node starts off arbitrarily picking one of those. Let’s suppose it’s the x coordinate. Then its children, the second level, alternate to using the Y coordinate. The grandchildren nodes cycle back to using the X coordinate, and so on.
Java source code:
bitbucket.org/StableSort/play...
When inserting a new node, just like in the typical binary search tree, we compare the node’s value and go left if it’s smaller, right if it’s bigger. It’s just that, as we go deeper and deeper down the tree, we keep alternating between using the X and the Y values in the comparison.
Note that in our example, since we started with comparing the X values, the Y values were ignored and as a consequence the 2nd layer nodes break the binary search tree rule where all nodes less than the root are on the left side and vice versa. But amazingly, it turns out that this is OK! The magic here is that we can check for this inconsistency on the way back, when unwinding the recursion.
Visually you can think of the root node dividing the XY plane into left and right sides, since we decided to start off using the X coordinate. Then the next point will fall into either the left or the right side and in turn divide that side into top and bottom sections. So as more and more nodes are inserted into the tree, each point splits the parent’s section into 2 sub-sections, alternating between splitting the section into left and right, vs top and bottom.
Once we have gone as far down as we can, we then recurse back, as you would in a regular binary tree search. We do have a candidate closest neighbor, but there is a possibility of a closer point still.
Think of the shortest distance as a line going from the target point to the candidate point. Let’s call this line “r”. This line is probably going to be at some random diagonal angle. But if the line directly perpendicular to the section that we chose not to visit when recursing down, is shorter, then there is a chance that that section may have a closer point.
So this is where the magic happens - as we recurse back, we also calculate the distance to the section that we did not visit. If that distance, let’s call it r-prime, is smaller than the best distance found so far, then we traverse into that section as well.
But note that this does not mean that we visit every node in the tree. If the height of the tree is H, then at most we’ll visit 2H nodes. In other words, this is still a logarithmic search in respect to the number of nodes in the tree.
So far we have been using an example of a 2 dimensional plane. But the methodology works equally well in higher dimensions. Each node would still just have a left and a right branch, but the number of values stored in a node could be arbitrarily large.
Suppose you are designing a database for a hospital that keeps track of each patient’s blood type, white blood cell count and blood pressure. Then a new patient comes in with an unknown illness. Using a KD-Tree you would be able to quickly select one or even a set of patients that share similar properties. These would be the “nearest neighbors”, with a higher likelihood of having a similar illness.

Опубликовано:

 

8 окт 2020

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 127   
@benwu4583
@benwu4583 4 месяца назад
50 min lecture material wrapped in a 7 min video. Honestly amazing!
@yashagarwal8249
@yashagarwal8249 4 месяца назад
In 6 and a half minutes, you managed to teach me the fundamentals of something I thought would take a lot of time. Amazing, thank you!
@wonton1019
@wonton1019 3 года назад
This is the greatest visualization of nearest neighbors I've ever seen. You sir, are a blessing beyond words, I'm not religious but I might as well be after watching your video. Thank you.
@stablesort
@stablesort 3 года назад
Thanks for such a great compliment! I am happy to hear that the video resonated with you so much!
@SauravSahu01
@SauravSahu01 Год назад
Same here. I have never seen such teacher. He is GOD level.
@maridavies3425
@maridavies3425 3 года назад
Awesome tutorial on KDtree data structure! The nearest neighbor search algorithm is easy to follow here. Thanks, keep em coming!
@stablesort
@stablesort 3 года назад
Thanks for the compliment! More tutorials are on the way =)
@1230986666
@1230986666 11 месяцев назад
This video hits all the right notes: short, useful and clear. Keep up the excellent work!
@Aca99100
@Aca99100 2 года назад
This tutorial was very precise, direct and comprehensive and it really helped me understand how to use kd trees. Thank you so much sir!
@stablesort
@stablesort 2 года назад
You are very welcome!
@omarshawky5859
@omarshawky5859 Год назад
I came to you from 2 independent great courses not understanding a specific scenario where they ignore an entire section without clarifying the reason but you explained it extremely better than both; Well done! Thank you :)
@stablesort
@stablesort Год назад
thanks for leaving a few good words!
@familytamelo8140
@familytamelo8140 3 года назад
Concise and yet very informative. Enjoyed. Thanks!
@stablesort
@stablesort 3 года назад
Thanks for the compliment! Glad to know that it was helpful.
@papayasprout
@papayasprout 2 года назад
This cleared up k-d Trees for me. Thanks for putting up the video.
@mukulbhave
@mukulbhave Год назад
simple ,short and precise. Awesome
@user-uu1bx4xv1s
@user-uu1bx4xv1s 3 года назад
Wow !new episode updated , i already cannot wait to learn it!thank you so much .
@stablesort
@stablesort 3 года назад
=) I hope you liked it!
@arnavanuj
@arnavanuj 3 года назад
One of the most useful searching algo in data science
@stablesort
@stablesort 3 года назад
Definitely a good one to know. I have seen it pop up in interviews as well.
@maxinteltech3321
@maxinteltech3321 2 года назад
Great visualization of a complex concept
@ChrisOffner
@ChrisOffner 2 года назад
Fantastic explanation and visualisation, thank you so much! :)
@stablesort
@stablesort 2 года назад
Thanks for leaving a compliment!
@vctorroferz
@vctorroferz Год назад
just an amazing explanation! thanks so much!
@NoName-ip4tt
@NoName-ip4tt 2 года назад
This is an interview question. How to design an algorithm for a flight computer to find closest airport during the flight in an emergency? Well explained along with lucid diagrams... Thanks for sharing your knowledge...
@heitorgomes9496
@heitorgomes9496 3 года назад
Well done, great explanation!
@stablesort
@stablesort 3 года назад
Thanks for the compliment!
@pietertjeboersma6586
@pietertjeboersma6586 3 года назад
Thanks for your clear explanation!
@stablesort
@stablesort 3 года назад
Glad you liked it! Cheers!
@wouttengrootenhuysen5137
@wouttengrootenhuysen5137 2 года назад
Thank you very much for making this video !
@KingGJT
@KingGJT Год назад
Great video! Thank you for making math visible :)
@balu.92
@balu.92 3 года назад
Excellent lesson! Thank you!
@stablesort
@stablesort 3 года назад
Thanks!
@volodymyrhavrylov7993
@volodymyrhavrylov7993 3 года назад
Nice and clear explanation, thanks!
@stablesort
@stablesort 3 года назад
Thanks for the compliment! Cheers!
@FlorinGN
@FlorinGN 12 дней назад
Really nice explanation, thank you!
@user-oh4vt4pt9v
@user-oh4vt4pt9v 3 года назад
Thank you for great explanation !
@stablesort
@stablesort 3 года назад
You are very welcome!
@AbJadzera
@AbJadzera 3 года назад
Great video, thank you for doing it!!
@stablesort
@stablesort 3 года назад
Thanks! I am glad to hear it!
@vijaynegi310
@vijaynegi310 Год назад
Great explanation !! thanks
@maximstepanenko
@maximstepanenko 2 года назад
Beautiful explanation. Thank you. )))
@yeetgod5063
@yeetgod5063 3 года назад
I love this channel please upload more
@stablesort
@stablesort 3 года назад
Thank you
@sensiblerazor9666
@sensiblerazor9666 3 года назад
Nice explanation. Thanks!
@stablesort
@stablesort 3 года назад
Cheers!
@brunasouzadarochasantos9929
@brunasouzadarochasantos9929 2 года назад
Excellent! Thank you so much!!
@dainel1490
@dainel1490 2 года назад
Thank you, I think it solved my problems with making kd-tree and searching it.
@anshumansharma6758
@anshumansharma6758 3 года назад
Amazing explanation!!!
@stablesort
@stablesort 3 года назад
Thanks for the compliment! Glad to hear that you enjoyed it!
@klausdupont6335
@klausdupont6335 3 года назад
Great video! Thanks!
@stablesort
@stablesort 3 года назад
Cheers!
@hanesburger8470
@hanesburger8470 2 года назад
Thank you from UC Berkeley!
@franciscoornelas4488
@franciscoornelas4488 2 года назад
Great content!
@pawankumarpandit1822
@pawankumarpandit1822 10 месяцев назад
thaks dear i cant believe that what a amazing way to explanation it is a kind of unblievable again thank you sir and a lot of love
@anshugangwar7344
@anshugangwar7344 2 года назад
First of all Very nice explanation , Do you have any video or code reference on Designing and implementing a binary space partitioning (BSP) tree with functions for adding nodes, printing the nodes in descending order and searching for the spatially closest node (along the one dimension)?
@br3nto
@br3nto 3 месяца назад
Nice vid and nice animation! Short and digestible😃
@zafirrasyiditaufik3774
@zafirrasyiditaufik3774 3 года назад
Nice explanation, I have a question though. Since this is not a balanced binary search tree, the worse case is still O(n) right? Or is there a way to use balancing techniques (like in a treap) with kd trees?
@stablesort
@stablesort 3 года назад
Excellent question and you are right on the money - KD-Tree has the potential to be heavily unbalanced, which results in O(n) search. The simplest approach to ensure a balanced (statistically speaking) tree that I could think of, in the case where all of the points are specified at the start, is to just shuffle the points before constructing the tree. But there are more sophisticated techniques that pick a good median value on each insertion. Also check out KDB-Trees (en.wikipedia.org/wiki/K-D-B-tree) they are balanced by design. Alternatively, if you have lots of inserts and deletes to a KD Tree, you could keep track of the number of such operations and then just rebuild the whole tree after some threshold. I am curious if there are better approaches - let me know what you find!
@karenaguilar2228
@karenaguilar2228 Год назад
que facilito de explicar. so good
@johnvif
@johnvif 3 года назад
awesome video!
@stablesort
@stablesort 3 года назад
Thanks for the compliment!
@smitdumore1064
@smitdumore1064 11 месяцев назад
Beautiful
@gimmemoreborisbrejcha9794
@gimmemoreborisbrejcha9794 3 года назад
Well, great work.
@stablesort
@stablesort 3 года назад
Cheers!
@omnamahshivaya2054
@omnamahshivaya2054 2 года назад
Your code saved me.
@stablesort
@stablesort 2 года назад
Good to hear it!
@HH-mf8qz
@HH-mf8qz Год назад
Nice And good visualization with the radius r
@avink6074
@avink6074 2 года назад
excellent !
@mohammadyahya78
@mohammadyahya78 Год назад
Amazing tutorial. Why the point (2, 6) splits the space into up and bottom and not left and right as the root node (5, 4) did please?
@joshyatharva
@joshyatharva 3 года назад
Awesome!❤️ Can you please make videos on B and B+ trees?
@stablesort
@stablesort 3 года назад
Thanks for the compliment and a great suggestion. I'll add B and B+ trees to the "to do" list =)
@bshsmdshj1938
@bshsmdshj1938 2 месяца назад
thank you!🤞🤞
@StopHackingMe-mm3bk
@StopHackingMe-mm3bk Год назад
Thanks for this video. Could we adapt the algorithm to not get only the closest point but rather all points under a certain distance ?
@Fanaindel3
@Fanaindel3 Год назад
+++
@jcasteld
@jcasteld Год назад
Hi. Your videos are great. Do you have any other new channel or web page to follow you? Thanks.
@simonsheldon1710
@simonsheldon1710 2 года назад
ty for video
@sensiblerazor9666
@sensiblerazor9666 3 года назад
Can you make a video on R-Trees and Multi-Dimensional Indexes?
@stablesort
@stablesort 3 года назад
That's a great topic and it'd definitely fit with the current discussion of KD-Trees. Thanks for suggesting it - adding it to the list!
@maksymchernetskyy6404
@maksymchernetskyy6404 11 месяцев назад
nextbranch or otherbranch can be null, then temp will be null, then we put null in the function that gives us the closest point, so we put null instead of the point. seems like a problem of this pseudocode to me. What do we do in this situation? Just output as best the non null one or we go to the other branch?
@user-yz2ik3cc9q
@user-yz2ik3cc9q 3 года назад
amazy,thinks~ expect the next
@stablesort
@stablesort 3 года назад
You are very welcome!
@panoramibyneel
@panoramibyneel 2 года назад
Did you notice the corn cob in a plastic pot? Some parsley would have worked better. Great video btw
@stablesort
@stablesort 2 года назад
I'll try parsley in my next video =)
@SingularityLabsAI
@SingularityLabsAI Год назад
0:44 if all coordinate shares same x-coordinate then it becomes easier as we only need to check at most 2 points.
@alexanderhess7742
@alexanderhess7742 3 года назад
Great video. Especially due to explaining r'.
@stablesort
@stablesort 3 года назад
Thanks for the compliment! Yeah, the radius concept is the tricky part.
@oren2234
@oren2234 2 года назад
so if this was a 3D tree we would have to check potentially all 4 quadrants the neighbour might be in ? and if i had a 10D tree I would have to check 2^9 quadrants ?
@stablesort
@stablesort 2 года назад
Good question. The tree nodes still have only two child nodes, even if you have 10 properties per node. Take a look at the source code - the distSquared(point0, point1) uses all properties (3, 10, whatever) but since there are only two child nodes, the plane is always split into just two. I hope this answers your question but let me know if I can clarify further.
@ramdavid845
@ramdavid845 2 года назад
Great video 👍🏻 could you please upload the code for deleting a node from the tree?
@stablesort
@stablesort 2 года назад
Thanks for the compliment. Deleting a node is not trivial, since you would have to either re-index all of the children nodes recursively or come up with some other scheme. Here are a couple from the top of my head: -just mark the node as deleted but not actually remove it. Then rebuild the tree once there are too many of these fake-deleted nodes. -move one of the leaf nodes in places of the deleted node
@br3nto
@br3nto 3 месяца назад
How does this compare to quad-trees? I’ve always wondered why they chose 4 partitions instead of 2. KD-trees seem the closest to this idea, but still a bit different.
@benripka6977
@benripka6977 2 года назад
Awesome
@Nick-qd5my
@Nick-qd5my 3 года назад
Great explaination. Took my prof 1h and not even half as clear
@stablesort
@stablesort 3 года назад
8-) Thanks!
@tomgt428
@tomgt428 3 года назад
So cool
@stablesort
@stablesort 3 года назад
Cheers!
@TranCao-w8r
@TranCao-w8r 10 дней назад
how come the point (13,3) is closer to (9,4) than (8,7)? and because of that we never traverse to the point (10,2), why does this happen?
@madhuvarun2790
@madhuvarun2790 3 года назад
what if the data has more than 2 dimensions, let's say 3? in the first layer if the chosen dimension is x what will be the dimension chosen in the second layer and third?
@stablesort
@stablesort 3 года назад
Thanks for posting a question. KD-Tree sequentially goes through each of the dimensions and then loops back. For example, if there are 3 dimensions, then it'd go in this sequence: 1, 2, 3, 1, 2, 3, 1, 2, 3, ... Typically this is easily implemented using MOD operator and the depth of the tree. So as you traverse deeper and deeper into the tree, you pass the 'depth' counter along. Then for a K dimensional tree, you'd pick dimension depth % K. Take a look at the source code linked in description for a sample implementation. Hope this helps!
@rafaelromeromunhoz9826
@rafaelromeromunhoz9826 3 года назад
thankss
@stablesort
@stablesort 3 года назад
cheers!
@brucewernick6542
@brucewernick6542 Год назад
What about N_Nearest_Neighbors? The logic for Nearest_Neighbor only works for the first.
@shivamverma-mt6kp
@shivamverma-mt6kp 3 года назад
Great video. Please post the code in c++ or complete pseudo code.
@stablesort
@stablesort 3 года назад
I do have the full source code in Java: bitbucket.org/StableSort/play/src/master/src/com/stablesort/kdtree/KDTree.java
@shivamverma-mt6kp
@shivamverma-mt6kp 3 года назад
@@stablesort Data structures in java is another trouble for me. I do data structures in c++ Please give the pseudo code atleast.
@stablesort
@stablesort 3 года назад
​@@shivamverma-mt6kp OK, here is a quick rundown: so you have a tree node, with left and right pointers (I called it KDNode). The "value" in that node could be a vector/array of integers, or encapsulated in some other object that just has that array/vector. I went with the second option, calling it KDPoint. class KDNode { KDNode left KDNode right KDPoint point ... } class KDPoint { List props // this could just be an array of int ... } // this is your main class with the algorithm to find the nearest neighbor: class KDTree { // reference to the root is all we need KDNode root // main method, described in the video: private KDNode nearestNeighbor(KDNode root, KDPoint target, int depth) { if (root == null) return null; KDNode nextBranch = null; KDNode otherBranch = null; // compare the property appropriate for the current depth if (target.get(depth) < root.point.get(depth)) { nextBranch = root.left; otherBranch = root.right; } else { nextBranch = root.right; otherBranch = root.left; } // recurse down the branch that's best according to the current depth KDNode temp = nearestNeighbor(nextBranch, target, depth + 1); KDNode best = closest(temp, root, target); long radiusSquared = distSquared(target, best.point); /* * We may need to check the other side of the tree. If the other side is closer than the radius, * then we must recurse to the other side as well. 'dist' is either a horizontal or a vertical line * that goes to an imaginary line that is splitting the plane by the root point. */ long dist = target.get(depth) - root.point.get(depth); if (radiusSquared >= dist * dist) { temp = nearestNeighbor(otherBranch, target, depth + 1); best = closest(temp, best, target); } return best; } // just picks the better option between n0 and n1, but does not traverse anywhere else KDNode closest(KDNode n0, KDNode n1, KDPoint target) { if (n0 == null) return n1; if (n1 == null) return n0; long d1 = distSquared(n0.point, target); long d2 = distSquared(n1.point, target); if (d1 < d2) return n0; else return n1; } ... } I believe this should be very close to C++ code. I hope this helps!
@nchm2625
@nchm2625 3 года назад
hi, thank you! I just ask myself what i need to do if i want to find not only my nearest neighbour but for example 5 or 6 nearest neighbours. Thanks :)
@stablesort
@stablesort 3 года назад
Sure, that would be a natural extension to this problem. I think the most intuitive solution would be to use an additional max-heap into which you keep adding points. Restrict the size of the heap to K (5 or 6 in your example). Whenever you add a point to the heap and the size becomes greater than K, remove the top element. The top element will be the one with the greatest distance. So ultimately you'll end up amassing K points with the shortest distance.
@nchm2625
@nchm2625 3 года назад
@@stablesort That would be a great addition to this video though :D thank you for that one, now i need to learn about heap maps
@stablesort
@stablesort 3 года назад
@@nchm2625 If you are coding in Java, take a look at PriorityQueue class - it's Java's heap implementation. It can do exactly what you are looking for.
@nchm2625
@nchm2625 3 года назад
@@stablesort Do you know if there is any literature about this specific problem, because i can't find anything? I'm greatful for every idea!
@stablesort
@stablesort 3 года назад
@@nchm2625 Try searching for "k nearest neighbors using kdtree", such as: stackoverflow.com/questions/34688977/how-do-i-traverse-a-kdtree-to-find-k-nearest-neighbors
@frederickhamilton7663
@frederickhamilton7663 Год назад
10/10 :-)
@adityarajora7219
@adityarajora7219 2 года назад
how to search for second nearest neighbor?
@user-hf8os4yp4d
@user-hf8os4yp4d Год назад
Have you found a solution? I need to find the 5 closest points, but I can't figure out how to do it
@adityarajora7219
@adityarajora7219 Год назад
@@user-hf8os4yp4d still not, One stupid method is like delete the first node that you have found from the tree and then go for first node again, it will be your second.
@vasilispolychronakis5812
@vasilispolychronakis5812 Год назад
why do you have a corn in a flower pot???
@ai__76
@ai__76 2 года назад
The shown code is incorrect!!!
@pawtha2238
@pawtha2238 2 года назад
thanks for explanation in video. Please dont add background music, its distracting and annoying.
@twisted_cpp
@twisted_cpp Год назад
Is this Mlepnos? Jokes aside, nice explanation.
@billwang1456
@billwang1456 2 года назад
3:59 good
@letoiiatreides2466
@letoiiatreides2466 3 года назад
we just gonna ignore the corn in the flower pot?
@stablesort
@stablesort 2 года назад
:-p
@hifam9522
@hifam9522 2 года назад
Kevin Durant tree lesss go
Далее
K-d Trees - Computerphile
13:20
Просмотров 233 тыс.
УРА! Я КУПИЛ МЕЧТУ 😃
00:11
Просмотров 865 тыс.
Approximate Nearest Neighbors : Data Science Concepts
15:05
Coding Challenge #98.1: Quadtree - Part 1
38:08
Просмотров 303 тыс.
K-D Tree: build and search for the nearest neighbor
5:07
K-d Tree in Python #1 - NNS Problem and Parsing SVG
11:31
I gave 127 interviews. Top 5 Algorithms they asked me.
8:36