No video :(

RNAseq analysis | Gene ontology (GO) in R

Sanbomics

Подписаться 13 тыс.

Просмотров 57 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

20 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 126

@natureabioros8686 4 месяца назад

Best GO video out there lol. 5 minutes is record time.

@sanbomics 4 месяца назад

I don't mess around xD

@user-td7sh6db6c Месяц назад

Great, accessible video. Making my beginner to RNAseq path easier

@capita007 2 года назад

Thank you for this straight to the point video! Besides explaining in an accessible way, it is very direct and informative!

@sanbomics 2 года назад

No problem! Thank you for letting me know!

@user-nw3zn8nw3t 7 месяцев назад

Thank you for your lesson. Honestly, I spent 3 days to get the result from gene list as a beginner. I did not know that the gene list (value) from my data frame should be converted into a vector form. Then the enrichGO recognized my gene list.

@sanbomics 6 месяцев назад

Glad it helped!

@rahulramekar1373 Год назад

Thank you for the video; the part with gene symbols and mouse data really helped me, cheers for your good work

@sanbomics Год назад

Glad it helped!

@philipkirianki1033 9 месяцев назад

How can one bacteria convert locus tags into Ensemble IDs?

@ahmedal-mammari9639 2 года назад

plz we need more like this sample video

@sanbomics 2 года назад

Hope to keep releasing at least one a week!

@user-cj1sh8qu5h 2 месяца назад

I love this, thank you!

@anmolpardeshi3138 5 месяцев назад

Great video! this is helpful. in the comment at 3:45 you meant 366 of 402 ? (instead of 403) because when you printed the to be tested genes there were 402 out of the initial 1192 that crossed the significance threshold

@ganeshmuthugangadhar 2 года назад

Really great video and thanks for sharing one :)

@sanbomics 2 года назад

Thanks! No problem!

@sakibsarkerii514 27 дней назад

Thank you so much. Could you please make a video about how to perform KEGG pathway analysis using clusterProfiler?

@erinbiggar3344 Год назад

Hey! Thank you for the video. Question--> is it possible to remove the grid lines from the plot?

@ahmedal-mammari9639 2 года назад

thank you so much

@sanbomics 2 года назад

You're welcome!

@hozhoz2 2 года назад

Thanks dude you're legit

@yijingwang7308 2 года назад

Hi thank you for your video. But I have a question about the genes for test, normally the FC cutoff is 2, which means |log2FC | >= 1, right? Besides, not only |log2FC | >= 1 but also the padj should less than 0.05, right?

@sanbomics 2 года назад

The cutoffs you chose are always arbitrary. But, abs(lfc2) >= 1 AND padj < 0.05 is pretty typical. Without looking at what I did again, I most likely filtered based on both lfc and padj. Maybe at different lines of code if you didn't see both.

@yijingwang7308 2 года назад

@@sanbomics Thank you so much for your reply!

@fgeuna Год назад

Thanks a lot for the inspiring video! What about the GO annotation of a plant genome's gene list (i.e. Triticum turgidum spp. durum) as for the "org" database? Is there a way to select a species-specific resource?

@sanbomics Год назад

I've never worked with plants before. This method might not work. You can try checking out something like DAVID to see if they have resources for you species.

@olgastepanova9336 Год назад

Very easy to follow, thank you! I was previously using ShinyGO and there was an option to supply a background list, is it possible to do it with enrichGO?

@sanbomics Год назад

Yup! Check out the docs for the command. I forget the argument off the top of my head but there is one

@atlma2 Год назад

Hi, is there a location of the database/file in which you are using for this script? I'm trying to follow along but as a beginner is it difficult if I cannot view the file

@sanbomics Год назад

If you want to start from the beginning I have a complete walkthrough through the process of RNAseq leading up to this point. Check out my RNAseq section

@user-xq6cr5ul8p 9 месяцев назад

Thank you very much for this! I am just wondering how did you determine the cutoff for baseMean? In this case it's 50, is this a commonly used threshold value?

@sanbomics 9 месяцев назад

Good question. There isn't a definitive answer. A lot of people use arbitrary thresholds, but you can also base it on the variability of your dataset at lower values or use a filter based on the distribution of gene abundances.

@Canadianwithacat 4 месяца назад

thanks for the video! have you had any luck using fit

@1smorenoc 3 месяца назад

use ggplot2 # p is plot p

@GabrielleWidjaja-te5pm Год назад

Hi Sanbomics, I did two GO plots and my PI wants the two p adj keys to be the same scale/range. I have been searching how to achieve this, but no beans. Do you have any suggestions? Thank you for the informative video!

@sanbomics Год назад

You could convert them to -log10 values which may make it easier to put on the same scale if they are far apart. Alternatively (harder) you can make your own color mapper that spans the whole range of values and color the bars based on that and make a legend bar. There might be an easier way I don't know. (I am much better at plotting in python than R).

@GabrielleWidjaja-te5pm Год назад

@@sanbomicsI appreciate your swift response! That is a smart solution, thank you!

@lst595991 2 года назад

Great video! I would like to know how to do such an analysis in a nonmodel organism

@sanbomics 2 года назад

That is a great question. Basically all you need is a background list of genes, target lists of genes, (e.g., genes that belong to a GO term) and your enriched set of genes. You can do a hypergeometric enrichment analysis. I have a video that goes over this. But, you still need a list of target genes, and if it is not a well-characterized organism you may have to be creative in coming up with these lists.

@MaheshPaintings Год назад

Thanks a lot for this descriptive and informative content. Could you also please provide some contents on deconvolution of bulk RNASeq data?

@sanbomics Год назад

Good idea, I can keep that in mind for the future!

@user-rn3vh1ff3m Год назад

Hi, cheers for the super informative video to save a bunch of grad students like me. Anyways, I have one question. It seems like that enrichGO doesn't work but in CC mode, which is not so useful like you mentioned in the video. It doesn't matter if I change the keyType or any options. Can you recommend the alternative function? Apparently gseGO and groupGO don't work in the same way.

@sanbomics Год назад

Hi, sorry for the late reply. Were you able to figure it out?

@user-rn3vh1ff3m Год назад

@@sanbomics Nope unfortunately, not yet😅 Any ideas?

@barbarainb Год назад

Hi, I am using Jaculus jaculus and I cannot find the "org" database for them , but I can retrieve the GOs info from biomart in R, How can I adapt that data in order to make this graph that you so nicely explain here, or is there a jaculus jaculus org database that you know ? thank you so much

@sanbomics Год назад

Aww I had to look up jaculus jaculus. Cute little ones. You may have to use a different tool for encirhment maybe like an EnrichR wrapper. Or you can use the DAVID web tool and save the output table and make a graph from it.

@barbarainb Год назад

@@sanbomics thank you so much for your answer and for your awesome videos, they really make science come to life 🥰. And yeah jaculus are the cutest 😁 do you happen to have any tutorial on some of these steps ? Thank you so much, very grateful!

@sanbomics Год назад

Thank you! 😊 I don't unfortunately, but It is a common enough question that I may make one in the future.

@sreehariap655 5 месяцев назад

❤

@philipkirianki1033 9 месяцев назад

Thanks for the informative video. My Deseq output excel file (from a bacterium) has Locus tags instead of geneIDs. How can I convert locus tags into Ensemble IDs?

@srivatsanparthasarathy1745 Год назад

Thanks lot for this amazing video. I am trying to do the same analysis but with Genbank Accession numbers instead of Ensembl ID as keyType. Eventhough after applying Log2FC and p-adj filters, I only get 271 genes for enrichGO, when I use "ACCNUM" as keyType, R took over 30 minutes and threw an exception error and crashed. I use Mus musculus (Mm) database.

@sanbomics Год назад

Hmm i've never tried accession. Maybe be try converting them to entrezid first and see if that fixes is?

@danielasturm4836 Год назад

Thanks for the helpful video! I'm super confused about the data base though since I'm working with the coccolithophore Coccolithus braarudii, for which we don't even have a genome. What should I do?

@sanbomics Год назад

I don't have much experience working with human or non-model organisms. To do enrichment analysis you need annotated gene sets. I'm not sure if they exist for your organism or not. You could check if DAVID has anything : david.ncifcrf.gov/

@maruthiram5523 11 месяцев назад

what to do if we have ensembl gene id versions i.e. ENSG00000003436.16. how to change the key type so that we can go for further analysis?

@layakalita9018 Год назад

Thank you so much for this informative video. I am doing exactly the same thing as you did in this video. But I am unable to plot the barplot and dotplot. I have also tried with the enrichplot package but still it is not showing the plots. It shows the command as "Error in barplot.default(go_analysis) : 'height' must be a vector or a matrix" Can you please advice me on this issue?

@sanbomics Год назад

I'm sorry, but it is very hard to troubleshoot without more information. I hope you were able to figure it ou!

@user-ej1lh5wl8f Год назад

Thank you for your video, should I drop NA value in DEseq2 result that I can conduct the GO analysis?

@sanbomics Год назад

Yeah any NA values should be dropped since in GO enrichment you only include significant genes

@Jungjis Год назад

Huge thanks for informative tutorial, and I have a question, I got DEGs from snRNA seq data with seurat, in my marker.csv doesn't has STAT column like your dataset. So what is stat column meaning and how can I calculate it?

@sanbomics Год назад

Are you trying to do GSEA? For simple GO enrichment you don't need it. For GSEA you don't need that STAT column if you can use something else. Does the DE test you use provide any statistic? If not you can use the log fold change to rank them.

@Jungjis Год назад

Exactly, I wanted to run GSEA and my DEG.csv has colums of gene symbol, log2FC, p-value, adj-pval, pct.1 and pct.2. It just cluster marker extracted from seurat. So I can run GSEA with my DEG.csv if I arrange the data with p-val or log2FC in descending manner?

@adria12vc 9 месяцев назад

On my R version on MAC when i type the rownames it prints the row number that that sample corresponds, how can I choose the column that I want it to express???

@johnyijaq536 11 месяцев назад

Thank you for the video. May I know which database I have to select if I want to analyze plant and bacterial genes?

@sanbomics 11 месяцев назад

Sorry, never done it so I don't know the answer of the top of my head. Good luck!

@Nisar_Ahmed_khan 2 года назад

Hi, thanks for these amazing videos. I am unable to create the plot of this go_result. Is there any specific package which needs to be installed or something else?

@sanbomics 2 года назад

Hi, what error are you getting?

@Nisar_Ahmed_khan 2 года назад

@@sanbomics Hey thank you for asking. It worked by installing "enrichplot" package.

@layakalita9018 Год назад

Hey, I am also facing the same problem as u did earlier, i.e. I am unable to plot the go_result. I have also tried the package enrichplot, but it is showing some commands. Can you please help me on this?

@jujajuja742 11 месяцев назад

Hello, I am trying to use enrichGO but I am running into an error, Expected input gene ID: ENSMUSG00000020191,ENSMUSG00000063281,ENSMUSG00000030898,ENSMUSG00000028294,ENSMUSG00000038651,ENSMUSG00000021822. My genes have the ensembl id so I do not know why it is giving me this error.

@sanbomics 11 месяцев назад

hard to say without seeing the code. I'm guessing it is a small typo or mistake

@miladsabzevary 4 месяца назад

Hi. In your command {[sigs$log2FoldChange>0.5,]}, you just find genes that have higher expression in A vs B. How is the command for both downregulated and upregulated in A?

@sanbomics 4 месяца назад

you can do the absolute value > 0.5

@Viralworldremix Год назад

Very informative and helpful. Please suggest me how to prepare the database for nonmodel organizations. Can I use all the gene ID and corresponding GO number for this purpose. Thanks

@sanbomics Год назад

Check out topGO: bioconductor.org/packages/release/bioc/html/topGO.html I haven't done it in a long time, but I remember it being straightforward

@sanjaisrao484 Год назад

Sir for this analysis can we take all DEGs (up and down regulated) or should take any one?

@sanbomics Год назад

Thats a good question. Typically, you pick up OR down. But depending on your question there might be some instances you want to include both. Unless you know for sure I would pick the former.

@sanjaisrao484 Год назад

@@sanbomics thankss

@fuad1245 Год назад

Hie, you used human database here which is available in bioconductor, but what if I have, for example, goldfish, which database is not in bioconductor, how will I proceed then? Pls let me know.thank you

@sanbomics Год назад

Goldfish! That is awesome. I've never heard of someone do goldfish. Yeah, unfortunately you will need to build your own reference database. This is a common question and I go into it a little more depth in other responses if you want to look through. I may make a video doing this in the future

@khanmohdsarim 2 года назад

Thanks for this nice video. Please inform the following: 1. What if database of a bacteria at .org....eg.db is removed by bioconductor. 2. How to input data if I have 8 treatment with 3 replicates, can I put all treatment simultaneously or in group of 2/3? 3. After deseq2 is GO/GSEA or what should be the step to complete the analysis? Please inform Thanks in advance

@sanbomics 2 года назад

1) You can make a custom database although it will be a bit more involved. I am not familiar with bacterial work so I cant assist much more than that. 2) Do you mean for DE analysis? It depends on the questions you are trying to answer. You can only do pairwise comparisons, but if one group is one treatment and the other group is a combination of the other 7 treatments, is up to you. Usually people would do a pairwise comparison of all treatments, but 8 groups is a lot of comparisons. I would pick the comparisons that make biological sense. For example, you might want to just compare the treatments individually to the control for 8 total DE analysis. Its a hard question to answer without knowing more though. 3) Again, this is highly dependent on what you are trying to answer. But, GO/GSEA are almost always done after DE analysis and I would highly recommend doing that at the minimum. People usually like to see the top DE genes in something like a volcano plot or heatmap, even though IMO those don't add that much to the analysis that a csv of the output doesn't already tell you. If you are interested in how similar the treatments are you can do some sort of clustering (PCA/hierarchical/etc). if you have 24 samples you can theoretically do a co expression analysis. good luck!

@khanmohdsarim 2 года назад

@@sanbomics Thank you for a detailed description 1. I understand but in almost every information source people taking the example of humans, ratus, and Arabidopsis, I am unable to follow their instructions. Could you please elaborate on the custom database if possible? 2. Yes I have one control with 3 treatments in one condition and another control and 3 treatment in the second condition. (1+3 = condition 1) (1+3 = condition 2) 3. I am looking for how such treatments change the morphology of organisms and which genes are important for it. Thus was looking for advice on GSEA or pathway analysis?

@sanbomics 2 года назад

Hi! Sorry for the delay, I don't get notified when people respond after I respond. 1) Several of the R GO packages allow you to input custom gene lists for for enrichment. Its gonna take a little trial and error on your part likely. 2) Im sorry, I am still a little confused about the layout 3) I think both are important. You should try both. GSEA is just a method to test gene set enrichment. Pathway analysis usually means the gene sets are specific to pathways, as opposed to gene ontology where the gene sets are more broad. You can use GSEA on both pathways or GO

@siddharthadas86 Год назад

I was wondering should the gene list for enrichment be all the genes tested for or all the genes in the genome?

@sanbomics Год назад

You mean the background gene list? This is a good question. It should be all the genes you detected in your analysis - not all the genes in the genome. I wish I had specified that clearly in the video.

@elifsukartal2840 Год назад

Hello, first of all thank you for the video and the effort. However I try to implicate it R gives object ('sigs' not found) error. I don't know how to resolve the issue and I am fairly new to the R.

@sanbomics Год назад

sigs is a dataframe if have that only has the significant DE genes. You will need something like that but it doesnt have to be called sigs

@researcher7410 Год назад

How can we perform GO enrichment analysis on genomic data and how to separate the gene list from a plethora of genomes????

@sanbomics Год назад

Hi, I am sorry but I am not sure I understood the question. But, GO enrichment requires a gene list. Theoretically, whatever gives you a list of genes can also be used for GO analysis.

@carolinejuery437 Год назад

Hi ! Thanks for this very clear video. I am working with a non model organisms that do not have the annotation file prepared as for human or mouth. Is there a package to do this ? Thanks in advance

@sanbomics Год назад

Do you have a list of genes and categories you want to test enrichment for?

@carolinejuery437 Год назад

@@sanbomics thanks for your answer, yes. I have the GO file for the genome and a set of differentially expressed genes

@sanbomics Год назад

I've never had to do it, but I think you can with topGO: bioconductor.org/packages/release/bioc/html/topGO.html I've had this question multiple times, so I might figure it out and make a video down the line.

@carolinejuery437 Год назад

@@sanbomics yes, thanks, I am trying to use TopGO! All the best

@freezingtolerance7493 Год назад

If I have "GO ID" as rownames, instead of ensembl ID,, can I do also go term analysis using enrichGO function?

@sanbomics Год назад

GO ID for individual genes or pathways?

@freezingtolerance7493 Год назад

@@sanbomics GO id for individual genes.. since my data is non-model species, I could not use org.hs database. So, I extracted the go id of each gene_id against interproscan. So, Now, I have deseq data and GO id corresponding to each gene. With only this information, Can I perform GO analysis as you do in video?

@Stop-and-listen Год назад

I am trying to reproduce your results, but I cannot find the file "count_table.csv" on your website.

@sanbomics Год назад

try this: github.com/mousepixels/sanbomics_scripts/blob/main/count_table_for_deseq_example.csv

@kitony Год назад

Is it possible to create your own database with proteins/gene modes to use clusterprofiler?

@sanbomics Год назад

Yup! Except I use the EnrichR wrapper when I do it. It might work with clusterprofiler too, but I haven't tried

@hyyyui Год назад

Could you please upload the script? Thank you!!

@sanbomics Год назад

Sure! Here it is: github.com/mousepixels/sanbomics_scripts/blob/main/GO_in_R.Rmd

@hebamohammed2517 Год назад

And if the genes for rhesus macaque which library we will install?

@sanbomics Год назад

Try this: bioconductor.org/packages/release/data/annotation/html/org.Mmu.eg.db.html

@claudiaferreira6325 5 месяцев назад

And plant genomes?? Not Hs...or mouse...?

@sanbomics 4 месяца назад

Nobody cares about plants... JK. Not really familiar working with them.. but at the end of the dat the algorithm is the same you just have to find and use the right database

@veki2630 Год назад

How to do functional analysis for miRNA?

@sanbomics Год назад

You need to find a database that has functional terms associated to miRNA. I am not sure which ones exist because I have not done much with miRNA. You can change the database that you use in the function from the default one

@MM-fj7ym Год назад

Hi can you teach Gene Ontology Enrichment Analysis by GOhyperGALL function?

@sanbomics Год назад

Hi, I've never used that function. But al GO enrichment is basically the same idea

@MM-fj7ym Год назад

@@sanbomics Thank, and I have a question I dont understand GO enrichment analysis vs GSEA. Could you explain this?

@zeinabbahari 3 месяца назад

Hi.thanks for your good video.how can acsess to you dr. i need some emergenecy help in my data analysis.. please help me

@sanbomics 2 месяца назад

Hi, you can reach me through sanbomics.com

@excelobiageli9446 2 года назад

Can i use this DEGs to carry out gene co-expression analysis??

@sudeeris7294 2 года назад

as far as i know you need to have enough number of samples to carry out co exp analysis

@sanbomics 2 года назад

Hi. You co-expression analysis is different than just DE/GO. You need a large sample size for that in order to find correlations between gene expression. Usually the minimum is around ~20 samples. But that varies based on which model you are using. If you are using humans you need a lot more than if you are using cell culture from one cell line or if you are using genetically identical mice.

@excelobiageli9446 2 года назад

Oh, thank you. But what I really want to know is that do i have to use Differentially Expressed genes for co-expression analysis? Or i can just use whatever dataset and use it for the analysis without finding DEGs

@sanbomics 2 года назад

You don't necessarily need to do DE analysis to do co-expression. Co-expression finds correlations between genes, irrespective of DE testing. However, DE analysis can help you focus in on specific co-expression pathways or genes that are different between conditions.

@excelobiageli9446 2 года назад

@@sanbomics thank you very much. And I love your videos. Please keep doing more🙏🏼

@zeinabbahari 3 месяца назад

my name is zeinab bahari . you can find me in research gat... i need help in rna seq data analysis

@sanbomics 2 месяца назад

If you need help you can check out sanbomics.com

@user-jz4bw1bj9g Год назад

Hi, thanks a lot for this video, it is very well explain and looks sooooo simple! However it is not working for me. First, to visualise the data frame, I cannot do just: as.data.frame(GO_results), but this: as.data.frame(GO_results@result) otherwise it gives me an empty df. Second, I cannot do the plot since I obtain this error message: > fit

@sanbomics Год назад

Hmm, the video is starting to age. It is possible things changed a little in the packages. Were you able to figure it out?

@yavorjordanov7416 3 месяца назад

Regarding the second problem that you have, I generated the plots without creating a dataframe of the enrichGO output, before that I kept getting the same "height" error.