Thank you for this very helpful video! I have recently moved from a clinical genetics laboratory to a research laboratory where pipelines are written in R and they extensively leverage the capabilities of dplyr library. So, I needed a tutorial to help me understand its basic functioning. This helped. Keep up the good work you are doing through this channel. Cheers!!
This tutorial encouraged me to continue my R learning process by showing me how I can manipulate these kind of datas in the simplest way! thank you bioinformagician :)
This video was extremely helpful for me. I am currently learning how to use R and GEO2, and this video helped to clarify it. Thank you and keep up the great work!
Thank you for the great tutorial! Just to let you know, I had to download these packages first to perform your script. install.packages("dplyr") install.packages("tidyverse") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GEOquery")
Wonderful explanation. Thank you so much for making this tutorial. Just a sidenote: when both dplyr and plyr (from tidyverse) packages are loaded and you want to use a certain function, it is better to specify the package the function is available in when calling the function (such as: dplyr::rename()). Otherwise, R may mistakenly think you are trying to use the function in the plyr package and return an error. Happy coding!
I can't understand most of the things you do. I need to go to other tutorial videos for understanding every single step. If you want your viewers to understand especially beginners, then please make your explanation more lucid and easy.
Really nice video! I was wondering if you could demonstrate how to convert the raw count to tpm or fpkm values in r as my GSE dataset provide raw count. Thanks!
> gse = GEOquery::getGEO(GEO = 'GSE183947', GSEMatrix = TRUE) Error in open.connection(x, "rb") : Problem with the SSL CA cert (path? access rights?) why this error?
HI have some questions. Please help to resolve the or to understand them. What if the GEO study only gives us a raw file containing either text files, or . CEL files. how to read the data from that. 2) suppose if a GEO study contain many samples of different tissues, then how to make 2 groups comprising on only those samples that a person is interested e.g. as i want to compare expression data from healthy and covid patients but GEO study contain some samples of ell lines treated with a certain chemical along with tissues of healthy and covid patients. Then how can i make two group with heathy and covid name and also includes samples into those groups accordingly. 3) If GEO raw file contain count.text files of each sample then how we can use them for differential expression analysis. Your kind reply would be much appreciated.
Hello, how can i plot a specific gene expression in cancer subtypes from tcga, for example; I want to plot> MSH2 gene expressions in Colon Mucinous versus Colon Adenocarcinoma
Hi and thanks for this very nice tutorial, I have this error when I am trying to reshape the data Error in `stop_formula()`: ! Formula shorthand must be wrapped in `where()`. # Bad data %>% select(~gene) # Good data %>% select(where(~gene))
Thank you for the tutorial. I have a question about converting GSE to ExpressionSet. I used your vignette and tried to do the same for GSE181462. 1th I got GSE by : gse
Sometimes gene expression data is also available as a .txt file on GEO. You could read in .txt similar to how you read a .csv file in R. Please make sure .txt file contains gene expression data. Usually, the 'data processing' section for each sample should provide details on what does the txt file contains and how it is processed.
@Bioinformagician, I apologize for my question (please), but, as a Biologist, I am now learning Python. I really don't want to spend what little time I have learning another language (R). So, to get these results, is it possible to just use Python instead of R? Thank you very much, my dear.
Hello, your videos are very informative. I am trying to look at the gene expression of my gene of interest. The supplementary data in GEO is in the form of a .fpkm_tracking file. How can I go about solving/looking at the expression using these files? Thank you!
If there are no raw counts provided, you can create them yourself. You can fetch RNA-Seq reads associated with GEO dataset from SRA. Once you get the reads, you can align and quantify them to get counts.
The "../" is the Linux notation to move up a directory level in the file system hierarchy. For instance, if you're in the directory "/home/user/documents/" and you use "../", you'll move up to the "/home/user/" directory.
Thank you for this well explained video. Please, if i want to do survival analysis based on gene expression data with lets say GE183947, how can i get the clinical data information from GEO ?
Hi what happens when there are NAs in the gene expression data? The accession number is GSE70947 and it's a breast cancer data set with 296 total samples and 62976 features (genes). I followed what you did and queried the data directly using GEOquery from Bioconductor. I am just stuck now and figuring out how to deal with NAs and would appreciate your help. Thank you!
I would quantify the NAs for each gene across all samples and filter out genes that have NAs in more than half of the samples. I usually prefer to replace NAs with 0.
@@Bioinformagician Thank you very much! Do you also might have any recommended methods for feature (gene) selection for creating a classification model in predicting cancer/normal samples?