Welcome to Biostatsquid! In this channel, you will find clear explanations and hands-on tutorials for analysing and interpreting biological data. - Learn basic techniques in biostatistics, from heatmaps to pathway enrichment analysis. - Manage your data and create publication-ready visualisations by following my easy step-by-step tutorials in R and Python. - Practical explanations without complex formulas for you to understand your datasets and be able to interpret your results. Are you ready to dive in? Don't miss any other video and hit that subscribe button!
Thank you so much for this video! I have a question regarding the forest plot of the cox regression, can we add the global p-value (summary) to the forest plot? is there any way? I would appreciate your help with this!
Hey! Thanks for your comment, I'm glad it was useful:) The global p-value should be already there, in the bottom of the plot. If you'd like it somewhere else, you can easily extract it from the object as a variable (assign it to gloabl_p_val or similar), and then use annotate() as you would to annotate a ggplot object! Hope this helps:)
I am a bit confused by the hazard ratio. It seems like its group A is HR times as like to die as group B. So in the smoking example where smoking had a hazard ratio of 7.4. I took non_smokers as 0 being group A and smokers as 1 being group B. Would this mean that non-smokers were 7.4 times as likely to die compared to smokers?
Thanks for your question! The positive HR for smoking means that there is an increase in the hazard for the smoking group compared to the control (non-smoker group) at any given time. Is this what you were asking? As a sidenote: Hazard ratios are a bit different to relative risk - the HR accounts for also the timing of the event (death), whereas the relative risk only checks if it happened or not. An HR = 1 indicates no change in the hazard (probability of death given that you have survived up to a specific time), if HR > 1 it's increased, and if HR < 1 it's decreased. But this does not translate directly to "7.4 times more likely to die", because it's a ratio, not a probability. To get the probability you can use this equation P = HR/(1 + HR). So for example, a hazard ratio of 2 means there's a 67% chance of the smoking group dying first, and a hazard ratio of 3 corresponds to a 75% chance of dying first. A HR of 6.7 means there's an 87% chance a smokers will die before a non-smoker at any given time. Does this make sense? This paper is really useful in case you want to read more about it: www.ncbi.nlm.nih.gov/pmc/articles/PMC478551/
@@biostatsquid Ahhh I think I was not thinking of things in terms of a group vs control, but was thinking of it in terms of the first group and second group which doesnt make as much sense. Lmao also it being called a ratio should make it obvious to me that it is a ratio and not a probability. I appreciate the clarification, this makes a ton more sense now. Time to finish running this cox-prop model on my GBM survival data. Fingers crossed this paper gets out by Oct T-T
The differential data that you loaded in the r script initially, which has approx 30 thousand something genes and four variables, are they pre-processed data, like removing the duplicates and adjusting the p values and log FC?? Or are they raw data tT saved from r script?
Hi. This is a nice video. I am new to data visualisation and I find it very complex as to how to memorise the code or understand how to use it with various datasets. Could you please share some tips on how you do that?
Hi, thanks so much for your comment! My recommendation is... don't memorise code! You'll end up remembering the most common functions and bits and pieces anyway if you use them a lot - but a lot of bioinformatics is just googling:) As for what to use in which case and with which data... honestly, it comes with practice. Seeing and reading what other people do with similar problems / datasets definitely helps, e.g., from publications, tools, github repos... if you encounter a problem, odds are someone already did too! And probably solved it:) Good luck, you'll see how it gets easier the more you do it! Just have fun with it:)
Your videos are great and very easy to follow. For the background genes, how do you download GSEA GMT files for only genes expressed in the specific tissue you are interested in.
Thanks so much for your feedback! Hmm as far as I know, you cannot do that. But you can download the full .gmt file and then just filter it for all of the genes you detected in your tissue.
Thank you for a very nice video. I have trouble understanding the fold change for gene 1 in the table example. Wouldn't the fold change (FC) be 3 (9 divided by 3) and log2(FC) 1.585?
This was such an informative video! Helped explain so much for me as I have never been exposed to Volcano plots before. Will definitely be tuning in more for more videos! Thank you.
I'm currently watching without logging into my Google account. 😊 However, halfway through, I made the decision to log in, hit the like button, and subscribe to your channel. 🎉 Thank you for your valuable content-it's truly helpful, and I encourage you to keep up the great work! 👍
Hi thank you so much for explaining PCA in such a clear way. I've been really stressed about understanding it for my uni stats exam, but now I feel much more confident :)
Thank you a lot! I'm struggling with my data. is there any option to create a clustering within a group on the same heatmap? I have many groups of species I want to analyze but I just want the clustering only within the same group.
I am with zero experience, and failed so many times by following youtubers, you script works and I can easily catch up, even different methods. Thankyou sooooooooomuch.
Many thanks for this video. It was extremely helpful! Just a quick question, do you have a link to any papers that use the same method for ranking genes? I've gone for the same approach, but will need to defend it in my viva and I am struggling to find publications using this method. Secondly, I just want to confirm that you use regular p-values rather than adjusted p-values for the ranking calculation?