I'd be interested in learning how to plot >1 species accumulation curve in ggplot2, where the Y axis is species richness and the X axis is total area sampled (most of the tutorials that I've seen show the X axis as number of individuals or number of sites)
I really enjoy the content in your videos and think that. you have. some really great workflows that you have incorporated that show a high level of understanding that are also very generalizable for other people. I see your point about doing your analysis using a real-world problem incorporating the use of C++ etc, but would comment that it is hard to parse a specific video without watching all of them. The current style is great if you want to see the entire analysis, but can be difficult if you are just wanting to learn about a specific problem or how to incorporate a specific function into solving a problem. For example I recently watched your expand.grid video as well as your nested dataframe videos to get a better sense for incorporating those packages into my own work ( medicine, using ICD codes to predict in-hospital outcomes from a large multiyear dataset). As much as I don't like the general datasets (mtcars, gapminder etc) I think they are useful for modeling workflows because they are pretty consistent in what the inputs and outputs will be . I think this channel has really good content but think that it would be more useful if you either make your videos centered around the type of problem that you want to solve , or the type of package that you are incorporating. Also, in the intro providing a brief description of the type of tibble/dataframe that you are using (column names, types ) for some context of a more generalizable problem would make it much more accessible. I'm a physician so I have taken graduate lvl biochem and micro courses , but even I found the source material density of your problem a little off-putting and I ended up speeding through to the part that was useful for the specific problems I am working with. I can see that the density of the microbiology being off-putting to people who are not from a biological sciences background and who are just looking for more of R tutorial that can be consumed on an as needed basis. At the end of the day, everything is about consistent inputs = consistent outputs, and I ultimately came back to your videos because no other videos else was really addressing the high level utilization that you are. Take what you will from this. I have watched a lot of videos from this channel at this point and am glad that someone is finally filling this content gap. I definitely think the content is top notch so I hope to see this channel grow.
Thanks - I appreciate the input! I feel like there are a number of datasets using the built in datasets. I also want to demonstrate how different tools are integrated to show people how they work together. I’m trying to provide more of a vlog style approach. As we get more into visualization the content becomes more focused on a single thing. Thanks for tuning in
Hi patrick, thanks for your effort in all these great tutorials If possible, can you please make an episode about how to test the association between microbiome composition and a continuous outcome, which can be MiRKAT (Microbiome Regression-Based Analysis Tests) or corncob.
For anyone following this tutorial recently and hitting issues with the adonis output, R now recognises adonis2 instead, so just add the 2 and it should sort the issue
You must be a great professor (human) who supports and uplifting students and peers. Thank you very much for your RU-vid channel. Also one request, Could you do one video on only base R most important functions that can be widely use. Stay safe and blessed.
Hi IOT - Thanks for watching and for your praise. Great minds think alike! I've been creating a top 10 list of base R functions for a future video. Stay tuned :)
@@Riffomonas Excellent ☺️ Looking forward to them. Also again thank you very much for your valuable time for creating and sharing these resources. Stay safe and blessed.
Your vidioes carried me through my own data, comparing the treatment effect of vortioxetine on fecal samples of mice with two different types of diet.. Thank you sooo much !
I was wondering if you could kindly let me know how we can perform an analysis of the Gram +ve and Gram -ve difference in R. Thanks very much for all of the interesting and informative videos.
You'd have to find a way to aggregate the taxa that are Gram + or -. Then you could do something like group_by and summarize to compare their relative abundances. I'm not sure how easy it's going to be to classify taxa by their Gram staining properties since some of them are a bit variable
Hi Pat, thanks so much for making these videos. I'm a newcomer and so glad I found you. Is there any reason why you don't suggest pairwise comparisons with package pairwiseAdonis? Thanks! Mel
Hey Mel - well … I don’t know pairwiseAdonis 😂. Although all the packages are a great treasure, I find it’s hard to remember all the syntax if I rarely use the package/function. It’s easier for me to remember the basics and then apply those in many situations.
Thank you for the sweet ordination videos, you're helping a brother out! Does it matter that you have a low r2 value for your model even though it's a corresponding low p-value. The model you're building around the 23:00 mark is what I am referring to. Cheers son.
Thanks for watching NWH! Good eye. The p-value is testing whether the R2 is significantly different from zero. So if we have a lot of data and a marginal r2 value, then we can have enough power to find small differences as being significant. Basically, things can be statistically significant, but not biologically significant. I'm not really sure what to make of that tradeoff in this situation.
Hi, Pat! I have a microbiome dataset with people belonging to 3 different groups sampled across 3 time points. I would like to see if there is any difference between the groups for each time point and then do the pairwise analysis for all the groups within the significant time points. Should I run adonis for the time points and then run more adonises for the groups within the time points? My main concern is: should I consider all the p-values as one vector for the adjustment or do I adjust separately? Thank you for sharing your knowledge and sorry for this long question!
Actually I came across the linear mixture modelling approach for dealing with multivariate data with dependencies (like time points) but I didn't understand exactly if this could be a substitute for PERMANOVA in my case.
Thanks for a nice video. Short question though. Why would you adjust your p-values when they're derived from a re-sampling method (a permutation test)? I was under the impression that this was not necessary. If you read papers such as, doi: 10.1186/1751-0473-3-15, it sounds as though, a permutation test is dealing with the potential for false positive "in its own way" - and that you wouldn't have to correct (additionally) for multiple testing. I am curious to know, what you think of this :) Best, Simon.
Very interesting session. I was wondering if you have already done a test on homogeneity of variance in betadisper() function? I mean how do you rely on these p-values? Maybe they are significant not because of the differences in the centroids between groups but rather due to the dispersion around the centroids?
I haven't seen homogeneity of variance increase Type 1 errors. We published some simulations on this - www.nature.com/articles/ismej20085. Looking at the first column of Table 1, the lack of homogeneity of variance didn't impact the type 1 error. We could certainly run that test - I'll be sure to add it to a future video!
@@Riffomonas Hi both, I had the same question about the need to test for homogeneity of variance first. (excellent demonstration by the way). I am becoming a bit confused on the differences between ADONIS and Betadisper tests, and how from my understanding, if the anova in Betadisper is significant, an ADONIS PERMANOVA would not be suitable as homogeneity is a condition of ADONIS. This is particularly a contention I am running into with my species matrix dataset. Can you provide any more insight on this?
@@11mgarrard In adonis() (PERMANOVA) we compute the group means and test if the estimated differences between groups are consistent with a null hypothesis that all groups have the same mean. This is exactly the same as a ANOVA or t-test. Those methods assume that the groups have the same variance. It has been shown that multivariate methods that rely on dissimilarities (PERMANOVA) especially, and to a lesser extend CCA, are also sensitive to difference variances in the groups. These variances are called dispersions in the multivariate case, but they're just the spread of the observations about their group mean. betadisper() tests the null hypothesis that the dispersions about the group means are equal. Note the difference: PERMANOVA (adonis()) focuses on the differences in the group means while PERMDISP (betadisper()) focuses on the difference in the dispersions of samples around these group means. Pat misspoke in the video by suggesting PERMDISP is a synoym for PERMANOVA/adonis. PERMDISP is for dispersions, just like betadisper(). In practice, if betadisper() says your group dispersions are different, then you can't have confidence that any differences in means you detect using adonis() are truly due to differences in group means. The result in part could be due to the differences in dispersion. Being uncertain that the result you have from adonis() might not be correct isn't a good position to be in, but in practice I would just treat the p values with a big pinch of salt. I'd read Warton et al 2011 (doi.org/10.1111/j.2041-210X.2011.00127.x) for more on this, if you haven't already, as well as the paper Pat mentions,
Check out this video. Earlier ones in this series generated the ordination. Also geom_jitter and geom_boxplot have similar usage. Alternatives to ordination in R: Visualizing community change relative to a specific point (CC207) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-jLVKJ_n6Qd0.html
For me a big big deal is reproducibility and openness. Excel is $, doesn’t lend itself to easily adding more data, it’s super transparent in what code is there , and it mixes raw data, code, and results which I prefer to keep separate
Hi Pat! Thank you so much for this useful tutorial. Could you please make the code for pairwise comparison between factors in case of the significant interaction effect? For example between different groups from both factors "disease_stat" and "gender".
Hi Pat. This is very useful. Thanks for creating these resources. One question- where does the .dist file come from in the Mothur SOP output? I have 5 different .dist files from going through the Mothur SOP (the really big .dist matrix that I know we're not using here, as well as opti_mcc.thetayc.0.03.lt.ave.dist, opti_mcc.thetayc.0.03.lt.std.dist, opti_mcc.jclass.0.03.lt.std.dist, and opti_mcc.jclass.0.03.lt.ave.dist) and none of them look like the square phylip in this video.
What would be the function that performs pairwise comparisons between several groups? In your example there are only three groups (case, diarrhea, nondiarrhea), but let's imagine another example with 20 groups. How can I have pairwise comparisons without manually repeating 20 times the same test?
Is the Adonis package suitable for analyzing two continuous variables, by factor? If so, is there a way to transform a table with columns headings into adonis?
Hi! An argentinian paleontologist here! I hope is not too late to be asking questions here. But first, I want to thank you and congratulate you on the content. It's really good and I'm looking forward to watching the other videos. I ended here trying to understand the output of ADONIS from a script written by my advisor. I also have 3 groups to compare. In the script, he first did the test looking for differences using the three groups and then he did the pairwise comparisons. To do the pairwise comparisons he recalculated the distances between the individuals of the two compared groups but I think you used the same distance matrix calculated with all individuals to do this. The questions are if you recalculated the distances and I didn't see it or my advisor did it differently? Do you have any bibliographic suggestion where I could read more about this? Thanks again! Gerardo A. Lo Valvo
Hey Gerardo! Thanks for tuning in 🤓 For this video I calculated the distances within mothur and then used R to pull out the distances for the pairwise comparison. I don't think there's a reason to recalculate a distance matrix. I'd suggest looking at the References section of the ?adonis documentation for papers about adonis. I like the MJ Anderson papers in there
2 года назад
@@Riffomonas Thank you very much for your answer!! I'll look there.
Hi, thak you for the video. One question: why when i enter the adonis formula, with* between two independent variables, im not getting the interaction between two variables? when i put a / instead of a *, im getting the interaction and the Permanova for the first group only. The results of the two formulas are the same, that weird. it may be related with my data being presence absence? THX
Hmmm, presence absence data might have different syntax. The data in this episode is from a distance matrix that I calculated outside of R. It doesn't quite sound the same as what you've got. Perhaps the presence absence data is structured so that there isn't enough variation in the data to calculate the interaction term?
Thank you for this tutorial, exactly what I need, as I am a beginner. I have a question regarding my data. I have 2 substrats with 4 treatments each and I wanted to see the effect of substrate and treatment on my community composition. So I run adonis on distance_matrix ~ substrat*treatment. All was significant (substrat, treatment., substrat*treatment). However when I run the pairwise I got non significant p values. Could you give me a hint on why I get such a result ? thanks
My pleasure! Glad the videos are helpful. I did another more recent episode on Adonis you might enjoy too. I wonder if the problem is the significant interaction
@@Riffomonas My question is that I have several groups,i can get significant difference of each two groups,however, I don't know which group has a higher beta diversity If I'd like to use a,b,c,d to express their difference.
The next step would be to create a biplot. You can do this type of analysis in mothur with the core.axes function. I'll see if I can create an episode around this in the future
I am sorry for another comment. I run your code, and everything went smoothly until I do str(all_test). I got the output below, only 3 obs. of 5 variables (instead of 3 obs. of 6 variables in your output). Then I could not run the next steps for pairwise comparison as in your tutorial. Do you have any suggestions? Classes ‘anova.cca’, ‘anova’ and 'data.frame': 3 obs. of 5 variables: $ Df : num 2 335 337 $ SumOfSqs: num 14.1 103.3 117.4 $ R2 : num 0.12 0.88 1 $ F : num 22.9 NA NA $ Pr(>F) : num 0.000999 NA NA
Thank you for the great R videos. What do you think about learning Tidyverse after a week in base R? Suppose you are going to teach students who have no programing language background; in this scenario, when would you start them to teach the "tidyverse"? I'm asking this because in the Tidyverse there is a lot of function and sometimes people have trouble memorizing them and give up eventually (and the pipelines of course). An example: ---------------- mpg %>% filter(year% select(cty,hwy)%>% summary() ------------- summary(mpg[mpg$year
I fought against teaching tidyverse for the longest time. Now it’s the foundation for what I teach (see the tutorials at Riffomonas.org). Even your “simple” example has a ton of non intuitive syntax and functions - [, :, $, summary,
Thank you for making wonderful tutorials. could you make some tutorials on Interclass correlation between microbial taxa's and between samples variables?
Wonderful tutorial as always Pat, thank you so much! I may have missed this in there (admittedly I skipped around a bit) but I'm curious how the adonis function changes when you are using a continuous numeric variable like moisture content or pH instead of categorical like Treatment. Is this a valid use of adonis? And if so how does it establish a "group" in this case?
@@Riffomonas @Robert Jones `adonis()` and `adonis2()` work just fine with continuous data. In the same way that a linear model works just fine with continuous and categorical covariates (this is the general linear model which fused ANOVA and linear regression), with sums of squares being computed for categorical and continuous covariates alike, `adonis()` and `adonis2()` do the same thing. If you're worried you could use `dbrda()` instead but these things are really all the same under the hood.
thanks for this video! May I ask what version of r and the vegan package you are using? I replicated your code exactly but get the following error: Error in `colnames
Hi Miranda - sorry to hear you're running into problems. I'm using R v4.0.5 and vegan 2.5-7. Here's what I had by the end of the episode leading up to the line with the colnames function. Does this help you spot anything? library(tidyverse) library(readxl) library(vegan) permutation
@@Riffomonas Amazing thank you! After some tinkering it seems like it was a technical problem with my machine, not your code. Thanks so much for an awesome tutorial! :)
Hi Michael - the ordination at the end was created in the preceding episodes. You might check out these videos: (CC080) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-ipy8qZKqiM4.html, (CC079) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Y0GI34S-ZMI.html, and (CC078) ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-xijDvx-J1jo.html
Hey Erick - hmmm. How different are our p-values? I would expect a little difference because the algorithm uses a random number generator. Their performance can vary by the seed and sometimes by the operating system, but the differences in outcome shouldn’t be that large
Awesome video,thanks a lot for your sharing and time. But could you pleases explain “strata”,when I need to have it. Because for my datasets, when I have strata or not, I will get different results. Secondly, for my significant factor, I only got R2 = 0.22, but R2 of residual is 0.78. If I understood well, my factor can only explain 22%, residual can explain 78%. Could you please confirm my understanding? Also, how to explain residual can explain more than your significant factor? Thanks again !
strata is a grouping level that you want to restrict the permutation test within. In permutation/designs this would be a blocking factor. Say you have repeated observations from a number of individuals; you wouldn't want to swap samples from individual A with any other individual. Hence you would pass strata a factor indicating which individual each sample came from and that way when we permute we only shuffle samples within individuals, never between them. It is much better today to use the permutations argument and pass it a control object to indicate how you want the samples permuted. Check out `?permute::how` for details on this as vegan uses my permute package to do the restricted permutation tests
Hi Sakke - thanks for the question. It appears that adonis really isn't set up to do random effects. However, you might be interested in the answer to this posting for how to approximate a random effect stats.stackexchange.com/questions/350462/can-you-perform-a-permanova-analysis-on-nested-data