No video :(

DESeq2 Basics Explained | Differential Gene Expression Analysis | Bioinformatics 101

Bioinformagician

Подписаться 30 тыс.

Просмотров 81 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

20 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 104

@wesleyeliasbheringbarrios8108 2 года назад

Based on the videos I've seen from your channel, everything is really great! Everything we bioinformatics beginners need most: theory explained without complication, in a way that is easy to understand, and guiding us step by step through the process with a list of videos in logical order. I just have to thank you for all your commitment and 100% accurate work on these videos, PLEASE continue! I will credit you in several of my presentations, thank you very much!

@Bioinformagician 2 года назад

I really appreciate your kind words, really encourages me to keep doing this :) Thank you very much!

@coolalexpcs 5 месяцев назад

You explain this in very clear and logical way! Appreciate it

@amus21455 Год назад

It is superrrrr helpful!!!!!!! This is the best video about DESeq for someone with zero background like me!

@BarcodeIIIlIIllIlll Год назад

I am writing a thesis that is partially reliant on bioinformatics and have no experience with deseq2. This video was immensely helpful in getting me up to speed in general understanding. Thank you very much!

@kitdordkhar4964 2 года назад

You are a great teacher! I enjoy watching the detailed theory and analysis. Your tutorial is very helpful. Cheers!

@grace-426 3 месяца назад

I am so thankful for your tutorials.. can you please make one video on like, how to manage so many genes and how to come to some conclusion after getting so many genes

@andydavidson3097 6 месяцев назад

Very well done! I watched lots of video on DESeq2 nobody explains the underlying math!

@IndigoIndustrial Год назад

Very impressive. More scientists should be engaging the way you do.

@abhisheksawalkar1018 Год назад

Simply excellent. Everything was explained using lucid examples. Very good for beginners.

@user-up7ms2cs7m 3 месяца назад

Thank you! This was helpful! My study design was complex, as I was looking at 4 different conditions, with one reference level.

@kmrsongh Год назад

Really very helpful video tutorial. I appreciate the effort you made in explaining the DESeq2 background statistics. You explain them perfectly and in a very simple manner. very helpful for us. Thanks a lot and keep sharing such informative videos.

@alicekao6305 Год назад

Thank you so much! This is very clear. I like the series of your video talking about the logic behind each bioinformatic package. I think it's extremely important for me with biology background to know the basics of each package and identify the best tool to use when I get my data.

@chrisspeed8432 Год назад

This was incredibly helpful. I plan to watch it again and take detailed notes along the way. Thank You!

@QAKS1264 2 года назад

Very helpful, clear and accurate explanation. Thank you.

@humphreygardner6982 3 месяца назад

Really superb! Thank you!

@priyankabiotech87 2 года назад

U made it so simplified..loved ur explanation..thank you

@amirhosseinshafieian3951 2 года назад

Really love that, after watching lots of videos on RU-vid, finally I understood what's going on by ur video, I only could not understand the MLE part, if it is feasible for u please make a video to elaborate it in more detail. Thanks a lot

@Bioinformagician 2 года назад

I will think about making a separate video explaining MLE. Thanks :)

@chibrina 11 месяцев назад

this is amazingly helpful as a beginner, thank you

@preeti97rox Год назад

Thank you for being so helpful to everyone!

@VenuraHerathPhotography 2 года назад

Keep up the good work! Would love to see a tutorial on edgeR time-series differential analysis.

@Bioinformagician 2 года назад

Will plan a video covering this. Thanks for the suggestion :)

@MahdiAbdul-Jabbar 3 месяца назад

Awesome video!

@joseoviedo4529 Год назад

hello, I truly appreciate your videos and explanations. They are very clear and concise. I do have a request though for a future video. Could you do a how-to on gene set analysis using a GO class annotation and how to filter the desired genes from the completed DE analysis data frame. Thank you for all you do, Keep it up!

@anvieb1293 Год назад

This is such a valuable and informative video, thanks so much!

@KTROWS 5 месяцев назад

Amazing job explaining.

@riaztabassum8395 Год назад

very detailed and simplest explanation. 👌

@nikitamaurya4518 Месяц назад

I am confused between the normalization method explained in this video and the normalization method explain in another video [Difference between RPKM/FPKM and TPM | RNA-Seq Normalization Methods | Bioinformatics 101]. Which normalization is correct?

@muhammadhafizsulaiman7163 2 года назад

Very smart person. Great explanation

@kevinradja 2 года назад

Really love your video and is inspiring me to also try making my own videos and test my knowledge. At 14:30 there's an error when you are estimating the size factor. The geometric mean is calculated by the mean of the natural log of the counts (ln because that is what DESeq2 uses). Taking the log turns the Pi symbol in the paper into a sigma of logs. Might also be good to mention that it isn't square root if you have more than two conditions. If I'm wrong though, someone please let me know!

@Bioinformagician 2 года назад

Thank you for pointing out that error. You are right, DESeq2 uses natural logs and it would be 1/nth power of the total of multiplied terms. I should have mentioned it. However, the values barely differ with the method chosen. Just for the explanation, I chose the multiplying method because it has fewer steps which makes it easier to understand and gets the point across :) Geometric mean with log method: log(2) + log(10) = 2.99/2 = 2.718281828459^1.495 OR exp(1.495) (taking antilog) = 4.459337 Geometric mean with multiply method: sqrt(2*10) = 4.472136

@kevinradja 2 года назад

That's a great point and shows why we take the log! With large outliers the averages of logs are less affected than regular averages but doesn't change when the values are close. Also do you plan on making a video on the dispersion in DESeq2 in more detail? There's so much more in the paper I didn't understand at all.

@Bioinformagician 2 года назад

@@kevinradja I will surely think about making a video on dispersion in more detail :)

@sarahnawaz6925 4 месяца назад

Amazing💯

@pooriasalehi5402 Год назад

really really thanks ma'am, it's amazing, I owe you.

@emojiman745 2 года назад

I may have missed it, but what do we do in with the reeplicates? You mentioned the replicates in the study design segment (00:38), but the calculations you display are about one group. Should we take the mean of the samples and make them into one column? one column for the treated (mean of the b1, b2 and b3 for t1 and b1, b2 and b3 for t2) and one column for untreated (mean of the B1, B2 and B3 for T1 and B1, B2 and B3 for T2)?

@Bioinformagician 2 года назад

Apologies if I wasn't clear in my video, there are ways to handle technical replicates. Check this section out from DESeq2 vignette: bioconductor.org/packages/devel/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#collapsing-technical-replicates With regards to biological replicates, you should NOT collapse biological replicates.

@aldaszarnauskas27 Год назад

Great video and explanation!!!

@clutch3171 5 месяцев назад

this is secretly genius

@islamalmsarrhad2152 9 месяцев назад

That was epic.. Many thanks

@kobrarahimi9164 2 года назад

it was great 100 out of 100.

@LongboardTrickfreak 2 года назад

I might be mistaken but are you shure the values for calculating the median in step 3 (est. size factors) are correct? When i calculate them with R i get 0.45 for instance for the normalizatiom factor untreated. Shouldn‘t the median be one of the values? Apart from that: great video, helped me a lot!

@Bioinformagician 2 года назад

Thanks for reaching out! I am sure, the median of values 0, 0.45, 0.55, 0.58 is 0.5. I calculated it using R as well.

@user-sf1ys2wl4k Год назад

Hey, the calculations in the video are correct. But maybe you were confused because those are medians, not means. In the case of 4 values, you have to take two values in the middle and then the average of them;) So we take 0.45 and 0.55 and get 0.50.

@devinjones7271 Год назад

This is SO helpful! Thank you!!!

@abdourahamandjibotassiou4367 11 месяцев назад

very nice

@a.k.nikson3987 Месяц назад

Awesome !

@DimitriTrubetskoy 8 дней назад

Thanks a lot! Though can you explain how normalised values are calculated? Say for gene A that is 2 and 10 (for untreated and treated) how are these 4.016 and 5.62 received? Thanks again

@reakal7740 2 года назад

Great video! Congrats!

@harshasatuluri4540 2 года назад

Very clear!

@patticat 4 месяца назад

Is this the video where design factor was explained? I'm coming from another of your videos where you say "if you don't know design factor, look at my previous video" but you never said which one. I think this one was a good candidate, however, I am still very confused as to how to use the design factor.. that was x= 0 or x =1? or what was that when you added two conditions? I'm super lost with the last 2 seconds of explanation there.. if you have another video explaining this, which one is it? Thanks! Everything else is on point!

@you-mingliu3261 2 года назад

Great video, but I'm still confused about the dispersion α. For one gene, the α was estimated separately in the control group and treatment group (So, there are 2 α for one gene)? Or there is only one α for each gene which means the mean and the variance were calculated cross the control and treatment group?

@Bioinformagician 2 года назад

As far as my understanding goes, it the latter. The mean and variance is calculated across all groups, so there is only one α for each gene.

@aditimehta4886 2 года назад

Hey Khushbu, really nice explaination.😊

@adrianozaghi9209 Год назад

Thank you so mutch, the paper about this algorithm is complex asf

@bobyang8491 2 года назад

very helpful!! Thanks for teaching!

@1993dana15 2 года назад

crispy clear

@jamesrauschendorfer9396 2 года назад

This is super helpful!

@farihachaudhary577 Год назад

Hi there, i just wanted to ask that if we can use DEseq analysis for unpaired data. I have 11 samples of normal (control) and about 160 tumor samples. Or we should go with paired data?

@PriyaDas-zw5hn Год назад

Hi Dr. Khushbu, Thankyou for the very informative videos. Learning a lot from these. I had a query, if we have a time series of treated and untreated samples, should the pairs of treated and untreated at each time point be considered separately for estimating size factors?

@ayaqz3144 Месяц назад

thank you

@CarlMedriano Год назад

Thanks for this info, I am just a bit lost especially when I try to calculate using gene D which resulted to GM of 0 and reference values of 0. Wouldnt the following steps result to 0 (assuming that values /0 are just placed as 0)?

@AA-gl1dr 2 года назад

excellent video.

@amrsalaheldinabdallahhammo663 2 года назад

Thanks for that video, You are genius :)

@tushardhyani3931 2 года назад

Thank you for this video !!

@angelamoreira5023 2 года назад

Excellent!!!

@leia2636 2 года назад

wow that was magical

@georgyjogen2859 Год назад

Hi, Really like your video. thank you for the channel once again. Its a blessing. I have a small doubt. @11:27 you said that since gene D is not expressed in treated condition the total of 42 from untreated needs to be divided amoung the expressed 3 genes, causing it to be inflated. How is that, could you please explain? Thanks in advance

@wansabaiinjapan1586 8 месяцев назад

Very excellent explanation. Thank you! I am too new to the field. I have questions regarding how we can use or what values we will use to make heatmap, Venn diagram, etc. In 15.49, once we get median of ratio and normalize our samples with this value to obtain norm_values for each gene of each sample. Before I use these value to plot heatmap. Do I need to again transform to log2? Or do I need to convert to z-Score? if yes, how to get z-score for each gene in each sample? Sorry for asking so many questions. Thanks in advance!

@adaobiokafor9546 6 месяцев назад

for visualizations, you need to scale (ie. calculate z scores). Just use the scale() function in R.

@georgeanthonywalters-marra9628 2 года назад

Hello, this was an awesome and very informative video! I've been trying to learn more about CRISPR screen analysis (specifically MAGeCK). Are you familiar at all with analysis of CRISPR screens and would you say that the concepts in this video would be transferable? Thank you so much!

@Bioinformagician 2 года назад

Unfortunately, I have not worked with CRISPR screen data before, so I am unable to answer whether these concepts are transferable.

@georgeanthonywalters-marra9628 2 года назад

@@Bioinformagician No problem!

@NguyenThiPhuongLan-in5cd Год назад

Hi may I ask if we have n=3 biological replicates/2 groups how can we put in 2 groups? Just calculate mean of read counts for each genes in each group?

@khalildabrat4593 2 года назад

Very helpful!

@jatinderchera1613 Год назад

Hello mam. Your video is very helpful especially for beginners like me. I have some queries and I would be very grateful if you can help me out. We got RNAseq done from a company and they have provided us with analyzed data. My queries are : 1. They have provided PCA plot and they have mentioned the following, "DESeq2 generates PCA plot based on a matrix of normalized read counts,the result typically depends only on the few most strongly expressed transcripts because of showing largest absolute differences between control and treated samples." The plot they provided showed very high variance among the biological replicates of one treatment group (due to lower read count in some samples). Is there any way to get around this by considering some other features (apart from read counts) to compute variances ? 2. They have also provided RPKM values of various genes that are unique to specific treatment groups. I observed some of the genes had 'zero' reads in some of the replicates of the same treatment group. Can we consider these genes for our analyses ? 3. I also observed completely identical RPKM values for many genes in the list (identical even upto 9 decimal places). What could be the reason for this and can we proceed with the analyses of such genes ? Any help from your side would be highly appreciated. 😊

@Bioinformagician Год назад

1. Do you happen to know how low are the read counts among biological replicates of that one treatment group? You could perhaps take a look a pre-alignment and post-alignment QC especially total number of reads and total number of uniquely mapped reads for each sample. Another way to identify noisy/problematic samples is to use a distance matrix to get similarities or dissimilarities across samples. 2. You could get total counts for genes across all samples and see if these genes with 0 reads have consistent low read counts across other samples as well. We would ideally want to remove genes with less than 10 total read counts across all samples. You could be more stringent and set a higher number. 3. This seems suspicious. I would recommend to generate RPKM/TPM values yourself.

@jatinderchera1613 Год назад

Thank you very much for your response mam. I am very new to such data types. I am learning everything from scratch so I will try my best to carry out whatever you suggested.

@ghadeeralkurdi174 2 года назад

Could i ask you what are the range of x and y axis you used in mean vs variance plot at 6:27 min

@juanete69 2 месяца назад

Before getting the counts... do we need to align our reads?

@saranyasweet 2 года назад

Mam please do put videos for how to do DGE for raw 16srDNA paired end data in fastq format ?

@rays_of_hopes 2 года назад

Thank you so much mam

@alexyang274 2 года назад

question regarding the coefficients for the fitting the linear model - from my understanding, based on this explanation, the linear model can accommodate theoretically infinite number of coefficients. in the vignette for deseq2, michael love mentions that while deseq2 can do this, it is perhaps easier to concatenate multiple factors into a single variable and have deseq2 perform its linear modeling this way. can you explain why this is the case? and how this can extend from a 2-factor design to a n-number design and so forth?

@Bioinformagician 2 года назад

Can you point me to the section in the vignette where Michael Love talks about concatenating multiple factors into a single variable?

@alexyang274 2 года назад

@@Bioinformagician in the vignette, the subheading is under "interactions"; copied and pasted from the vignette, love writes: Initial note: Many users begin to add interaction terms to the design formula, when in fact a much simpler approach would give all the results tables that are desired. We will explain this approach first, because it is much simpler to perform. If the comparisons of interest are, for example, the effect of a condition for different sets of samples, a simpler approach than adding interaction terms explicitly to the design formula is to perform the following steps: combine the factors of interest into a single factor with all combinations of the original factors change the design to include just this factor, e.g. ~ group Using this design is similar to adding an interaction term, in that it models multiple condition effects which can be easily extracted with results.

@Bioinformagician 2 года назад

Thank you for pointing me to this. I want to bring in a little context here, without it can be misleading. I have tried to explain it here: khushbupatel.notion.site/Interaction-terms-DESeq2-5a4a75b83adc4fe89576e6ee9b00daf0 Hope this clears your confusion and answers your question. Thanks! :)

@poojasavla6240 Год назад

bro i love you

@relaxstation600 Год назад

13:40 step1

@benjaminbergey5512 2 года назад

One of the clearest explanations of the DESeq pipeline - thank you. Question about using the GLM: Do you know where I might find an example of a calculation for a given gene? I had a bit of difficulty following through the calculations, and I think a concrete example (just with arbitrary data) might help me grasp it better.

@Bioinformagician 2 года назад

I am glad you found this video helpful! Check out this paper: www.ncbi.nlm.nih.gov/pmc/articles/PMC7873980/ It does a fantastic job explaining single and multi factor linear models with calculations.

@user-zc9jl2to3h Год назад

In 22:53, why do you say that "y - B0 = log(y) - log (B0)" ???? isn't that incorrect?

@shetalkzz8842 4 месяца назад

can I perform deseq2 in galaxy for finding differentially expressed mirnas

@donklike09 Год назад

Awesome! but how is 2/0.5 = 4.016...? isn't it just 4? (16:14) and same with the other numbers from the untreated.

@Bioinformagician Год назад

You’re right. The discrepancy is due to rounding off. If you don’t round the numbers, you would get 4.016 instead of 4

@justsoil15 Год назад

I use docker and command line to run deseq2. How to save plots to png files?

@snekhai 2 года назад

When you normalize counts, and have 0/0 (your sample D), why do you assign 0?

@Bioinformagician 2 года назад

In step 1 to calculate geometric mean, we take square root of product of counts in all samples. For sample D, product of 30 x 0 = 0. Square root of 0 is 0. Hence 0.

@pgresner 2 года назад

yes, but then, in Step 2, you divide 30/0 (which is infinity) and even 0/0 (which is undefined) - so why you get 0's for untreated/ref and treated/ref? is this some kind of a convention or just a mistake?

@Bioinformagician 2 года назад

@@pgresner It’s a mistake. They should be Inf instead of 0s. I didn’t mention a very important point, non-finite values (i.e Inf, -Inf and NaN) are filtered out and not used to calculate the median. Thank you for pointing it out, I shall put a note about this in the description.