This is such a great video. To answer a student's question in one sentence demonstrates the teacher's complete understanding of the knowledge. The more the teacher talks to answer, the less the teacher knows what you are asking and the more confused you become.
runn around all over the internet none the wiser then come across this channel and bam! It all fits so easy. Why do some people over complicate such simple things? Thanks Josh!
Thank you so much for such an easy and bite-size content that I can understand to the fullest. It's way much better visualized and informative compared with other videos I've seen !!!
@Josh Starmer: in minute 4:14 there is a tiny mistake in the formula: the difference for gene 3 should be (2.2 - 1)². Instead the distance for gene 2 was repeated.
Thanks a lot for pointing that out. I've added this to the "Errata" page that I maintain so that one day, when I create new editions of these videos, I can correct all the little mistakes.
Very helpful. Not sure if this has been pointed out yet but at around 4:17 you talk about distance for gene 3 but the numbers aren't accurate for that gene difference.
Hi Josh! I´ve been here many times and love your channel. I have a question about the axis. I understand that each one accounts for "x" percentage variation of the dataset, being axis one, the higher percentage, however if I look at samples along PC1, can I assume any biological meaning for those samples far to the right or far to the left?
Wow. The PCA and MDS really are very similar, just like the videos describing them (clearly explained and overall awesome ;) It seems to me that PCA is just a particular case of MDS, as in the case of MDS one can adjust the distance metric to get various outputs, including the one given by PCA. If that is the case, why aren't people use MDS more? It seems under-utilized. Is it trickier to implement?
Hi Josh, have you ever encountered a clustering model where there were more than 3-4 clusters? I've done it many times, and it looks like the number of optimal clusters (3-4) is "natural".
Thanks so much for your video, but I still have a question; I really don't understand what is the difference between PCoA and MDS. It would be a great help if anyone could explain the difference between PCoA and MDS.
MDS has two versions: "Classical" and "Non-Metric". This video shows how "Classical" MDS works. Classical MDS is the exact same thing as PCoA. There is no difference. However, there is a difference between PCoA and "Non-Metric" MDS. Maybe one day I'll make a video on "Non-Metric" MDS.
Unlike PCA where we compared Genes variation in order to give weight to calculate the value for each cell and then map them accordingly to PC1 and PC2. Here we are calculating the distance between cells with reference to each genes. What is the calculation for MDS1 and MDS2 . I am confused because we are taking 2 cells at a time, instead of one and are we plotting the difference of each gene with respect to cell 1 along x axis and cell 2 along y axis. Could you please explain what to consider for MDS1 and MDS2 ? Thanks a ton
If this is in reference to the LogFold Change graph then I too agree that it isn’t explained what those two axis distinctly represent. I get how the lfc was calculated before then (between every single pair of datapoints), but those axis could theoretically be anything at the discretion of the investigator...and what it is here hasn’t been made clear.
If MDS and PCA have the same outputs, why would you chose one over the other? What's the importance of correlation vs distance? P.S. I've been trying to understand PCA and MDS for months now and this was so much easier than reading articles and books :D
Heyy josh i m confuse in the pca statement "correlations among samples" isn't suppose to be correlation among variables? Since we are reducing dimension of variables in this case ( genes) not the samples?
The goal of the plot is to show correlations among the samples - so each sample has a lot of gene measurements, and correlations among sample would mean that a lot of those measurements are similar (or the exact opposite of similar) and we want to preserve those relationships. We want things that are highly correlated to appear close to each other in a graph.
Hi Josh, I really like your videos and they are very intuitive. Could you do a StatQuest video on Partial Least Squares if possible? Thanks in Advance :)
The same as PCA? I'm not sure. However, I do know that whatever assumptions there are are often ignored and people just try PCA or MDS and see what happens.
@@statquest this could lead to false interpretations. isn't it? I'm using such technique and LDA to analyze taxonomic data and I'm scared that my dataset is not independent due to phylogenetic common origin.
@@manueltiburtini6528 I don't really think that's a big problem for MDS or PCA. These methods are just designed to reduce dimensionality for drawing graphs or to plug into some other analysis (like regression).
Hi, Joshua. I noticed that you mention "the data is not linear" in the reply of comments. I am really confused about this concept for some time. What does non-linear data mean(I guess it is not the same kind of concept of linear model right, haha)? A bioinformatician told me that single-cell data is non-linear and we'd better used tSNE rather than PCA. How to explain the bulk RNA-seq data is linear data and single-cell RNA-seq data is non-linear. I really really hope you could answer my question because it really really confuses me for quite a long time.
Haha, thank you Joshua. The spiral pattern is the so-called "Swiss roll" model I think. Someone says that linear dimensional reduction focus more on global pattern(like distance), while the non-linear dimensional reduction methods focus more on local pattern. Why not talking about zero-inflation in single-cell next time and the normalization methods used in single-cell data analysis?
Hi Josh, thanks for the video. I'm a bit confused that when you said "PCA starts by calculating the correlation among samples", did you mean the plotting of each sample on multi-dimensions like your previous PCA video? If so, how about PCoA? Do we also "plot" the distances among samples first and then try to get the top 2 PCs as well? If that's true, then how is the number of dimensions determined in the case of PCoA? I watched all of your PCA videos and I can understand how to get a PCA, but somehow I still don't know how a PCoA is done... thank you!
There are two ways to do PCA - an old method that is based on covariances and correlations (described in this ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-HMOI_lkzW08.html and this ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-_UVHneBUBW0.html ) and a new method that uses Singular Value Decomposition (described in this ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-FgakZw6K1QQ.html ) . This video on PCoA/MDS references the older method (using covariances and correlations). To calculate the covariances and correlations among the samples, you follow the steps outlined in these videos on covariance statquest.org/2019/10/08/covariance-and-correlation-part-1-covariance/ and correlation statquest.org/2019/10/08/covariance-and-correlation-part-2-pearsons-correlation/ . That gives you a single number for every pair of samples. We then do Eigen Decomposition of those numbers to get the PCs. With PCoA, we calculate distances (using the euclidian distance or some other metric) between each pair of samples and do Eigen Decomposition of those numbers to get the PCs.
At 6:20 in the video I mention that a Biologist might choose to use MDS to show clustering using log-fold changes because, traditionally, gene measurements are analyzed in terms of log-fold changes. Alternatively, it could be you want to cluster locations in a city based on how far they are away via taxi (so blocks and one-way streets are a factor) - MDS can do this.
Ah I see, so it is more that MDS allows you to cluster via any distance metric of interest, where as PCA limits you to correlation/euclidian distance. Thanks for taking the time to help me out!
You are correct - MDS lets you cluster stuff using any distance metric. The coolest thing about that, which I forgot to mention, is that, via Random Forests, you can use MDS to cluster any data, regardless of type. Check it out in "Random Forests Part 2:" ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-nyxTdL_4Q-Q.html
Hi, Thank you so so much for such a good explanation. Do you mind if I ask the reference book/paper for the terminologies? I am a little bit confused since I assume the same methods are a little bit different in various reference books. Thank you
To be honest, I can't remember what my original sources are for this video. More recently I've been putting the sources in the description below the video, but this video is too old for that.
Hi Josh, 1. How do we plot the values of MDS on the graph. because with distances we only have a single value. DO we plot it on a number line? but you showed a graph with 2 axis
MDS converts a matrix of distances into different axes in much the same way that we do it for PCA. For details, see: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-_UVHneBUBW0.html
Are you saying we compute Eigen Values and Eigen vectors on the distance matrix instead of the covariance matrix? Is that the only the only difference between PCA and MDS ?
To plot the data, do we select the cells with maximum distances? Like for example if cell 1 & 2 and cell 3&4 have maximum distances, do we plot with respect to them?
I like to learn using videos (mainly from your channel) and gpt for the maths equation. I checked wikipedia just to be sure but it looks like you skipped the step about "Double Centering and Matrix Transformation" entirely.
I talk about that in my PCA videos: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-FgakZw6K1QQ.html and ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-oRvgq966yZg.html
@@kaynkayn9870 Those video specifically talk about the centering of the data - how and why we need to do that. I don't talk about matrix transformations explicitly because those are just one of several ways to specifically perform PCA.
Yes, I got it seeing the video. But I'm not sure which kind of distance should I use in case I want to perform a PCoA with microsatellites in R, and also if PCoA is better than PCA when you use a specific distance for microsatellites. It's weird because when I used the Adegenet function [dudi.pca()] for my df of 5 SSRs with 23 alleles, the function instead of considering 5 variables (the 5 SSRs) took 23 variables (the 23 alleles) and for this reason, the explanation of variance of PC1 and PC2 is quite low. Hope you can suggest me something based on your experience as a geneticist. Thanks a lot.
Hi Josh, thanks so much ... Confusion: the new table with distances will have columns d12. d13 d14 .... d23 d23 .... so when we plot stuff why would we still have clusters corresponding to cell1 cell 2... wouldn;t the colours correspond to d12 d13 ... etc. ?
The first column in the distance matrix will be cell1, the second will be cell2, etc, the first row in the distance matrix cell1 and the second will be cell2 etc. The distances are then the values in the matrix. The distance between cell1 and cell1 (in the upper left hand corner of the matrix) is 0, etc.
Hi Josh.. I am a little confused, regarding features and samples . For example here on 6:56, you say that PCA create plots based on correlations among samples. Only concept of correlation that I know is between features, so when 2 features change together, correlation is big. But I got confused here. I tried to search about sample correlations, and what I found was correlations on samples, as part of a population, but here samples should be like rows/instances/observations. Also your computation of Euclidean distance got me confused, Since you have features as rows - gene1, gene2, and samples as columns, cell 1, cell 2. Can you please confirm my understanding - Does PCA create plot based on correlations among FEATURES, like, person age, weight etc., where each person is a sample?? Thank you :)
Ok. I'm going to admit. I don't understand what this video is saying. It says to just replace the dot product with other distance metrics. And that sounds fine? But it doesn't make sense that we are using the same computations mathmatically for a distance matrix and a correlation matrix. The correlation matrix (dot product distance) makes sense because its special properties allow it to have a decomposition with a diagonal component which we can sort and then reduce in dimension to produce our PCA plot. It is not at all clear to me why an arbitrary distance plot of the predictors will be diagonalizable in the same way. So the rest of the mathmatical interpretation breaks down from there. Basically. The math and the interpretation feels a bit off to me. I'll have to do more research on the topic.
@@statquest is it the same thread as calculating the PCs when calculating the axis of MDS? Like finding the best fitted line by minimizing the SSR. If it is, what role does calculating the distances between points play?
@@jxaskcijiaxhsic9943 It's a related technique. It's not the same, but related. Based on the distances we can calculate variances and covariances and from those we can find the directions that there is the most variation in the data.
One more comment. The fact that mds uses a precomputed distance reminds me hierarchical clustering. Does it mean that MDS is a 2d representation of hierarchical clustering?
That would be cool. I am brewing something. I might have an idea. Not sure how good at this moment. I need to write it up. I'll keep u posted to see if it is worth anything. Ta ta
I went down in flames :) It turns out I was thinking of re-inventing the wheel :) My inclination was to further dissect the PCA results/"clouds" and see the relationship between the comprising datapoints. I was deflated to see that this problem was solved many years ago by clustering (either kmeans or hierarchical). ;( On the good side, I found a few useful things. A paper that confirms the relatedness between PCA and Kmeans as you were anticipating:. ranger.uta.edu/~chqding/papers/KmeansPCA1.pdf I also found out about the HCPC package in R that can do hierarchical clustering after factor analysis. It seems kinda cool as on the graphical side it does pseudo 3D hc. Imagine the first 2 PC as a horizontal plane and the clusters roots coming from the top... www.r-project.org/conferences/useR-2009/slides/LeRay+Molto+Husson.pdf . In the usual 1D hc, I don't like the fact that some less related points are adjacents. This HCPC plotting is not perfect either as it obscures some datapoints. I was thinking that 2D-density could be used to further "cluster" the PC plot ; for example with geom_density_2d()/stat_density_2d() in ggplot2. with the right arguments and aesthetics (with the right function) might be able to pick up some "clusters" but not relationship between the point inside of a contour. Maybe adding relatedness by connecting the dots somehow on a zoomed in plot (by adjusting the axes) my help to see further details. .. What other ways of showing relatedness besides hc and correlation matrices do people use ?
Great Video.... Actually I am elevating my self from Excel data analysis to machine learning... Right now I am in stage to grab everything I can....What are ur advise to excel users to machine learning enthusiasts...
Hi Josh, it is me again. Thanks for the great video! I am wondering if you have a video on nMDS because I saw this quite often in biological studies, but still quite blur..
@@trinh123456 Unfortunately, I do not have plans to do it anytime soon. My to-do list is huge (it has 100s of items on it) and I can only make a few videos each month. I work as fast as I can, and I work all the time, but it's not enough to keep up with the requests.
Hi Tien, maybe I can provide some help for nMDS based on Josh's tripple BAM video! (As always amazing job Josh!). To my knowledge NMDS is a ranked based approach. Like MDS you start with computing the distance between samples. The these distance values get then ranked. After the ranking you perform the "fancy math" thing to get the coordiantes for a graph. Be aware that you loose quantitative information when clustering on ranks. You can check this website for more details: mb3is.megx.net/gustame/dissimilarity-based-methods/nmds
LDA is supervised, so you can only use it when you know what groups you want to supervise. MDS is useful when you want to change the distance metric. And if you don't want to change the distance metric, MDS and PCA are the same.
Thank you for this great video but I want to understand for nMDS graph I should transform my values from % to square root? I have like 28 species. Your help will be highly appreciated.
@@statquest if mds is plotted between 2 genes, then the distance itself became single variable. Any combination and their distance can be pointed on number scale. So if this is x coordinate of the plot, what is the y coordinate for a point
hi Josh! Thanks for the video! I still didn't get the point how we can do the same thing on the distance matrix as we do on PCA(ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-FgakZw6K1QQ.html) I watched this video, and thanks for your wonderful explanation, I could imagine that for serveral samples with 2 genes, we can draw the dot on the 2-D plot(gene1 and gene2), and we find the best fit line, which is the PC1 and then a PC1 vertical line as PC2, both with the largest distance to the origin. But when it comes to the distance matrix, how can we draw the dot, because there is no gene. Only sample1, sample2 ...et al. I really confused. Truly thankful!
There are two methods for doing PCA - the one I present in that video is called "Singular Value Decomposition" and it works the way I presented in that video. Alternatively, we can do something called "Eigen Value Decomposition" and this is based on using the covariance or correlation matrix of the data. It is through this second way that PCA ends up giving us results similar to MDS. Unfortunately, I don't have a good video for explaining how this second way works. :(