How to manipulate gene expression data from NCBI GEO in R using dplyr | Bioinformatics for beginners

Bioinformagician

Подписаться 31 тыс.

Просмотров 52 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

15 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 96

@danielajbq 2 года назад

youre an ANGEL for making these. I am doing my MS in bioinformatics right now and this is genuinely better than some of my courses. Thank you!!

@MichealIdedia 3 месяца назад

Hello, are you done with your Msc now?

@sanjaisrao484 2 года назад

Excellent explanation, Thanks for teaching the basics of R, It was extremely helpful, please continue to make more videos

@mayank9986 Год назад

I am new to programming. I was looking for help to analyse RNAseq data and your video just came as a blessing. Thank you a ton.

@amitrupani9898 2 года назад

Thank you for this very helpful video! I have recently moved from a clinical genetics laboratory to a research laboratory where pipelines are written in R and they extensively leverage the capabilities of dplyr library. So, I needed a tutorial to help me understand its basic functioning. This helped. Keep up the good work you are doing through this channel. Cheers!!

@Bioinformagician 2 года назад

I am really glad this helped you get a basic understanding of dplyr package. Thank you for your kind words, encourages me to do more of this! ☺️

@mahshidpooladvand8502 Месяц назад

This was the best tutorial I could possibly find online!!! You are incredibly smart! Thanks!

@eylulozerbil8548 Год назад

This tutorial encouraged me to continue my R learning process by showing me how I can manipulate these kind of datas in the simplest way! thank you bioinformagician :)

@muyyy9000 9 месяцев назад

Thank you so much for making content like this. It's extremely helpful for beginners like me trying to analyze gene expression data on Rstudio.

@Radslom Год назад

This video was extremely helpful for me. I am currently learning how to use R and GEO2, and this video helped to clarify it. Thank you and keep up the great work!

@zlj8435 2 года назад

Thank you for this wonderful course! I am a year 1 PhD student and it really helps me a lot！

@syedmansoorjan2671 2 года назад

Amazing, don't have words to say for you.. try to share more... I just found this very helpful...

@Grzegorz-f1b 4 месяца назад

Thank You my new teacher I work actually about that biogenetics in IT and C++ this video helps me very much ❤️🙏👌

@aishaa812 2 месяца назад

Thank you. Its extremely helpful for me since I am a beginner in R studio and I am trying to apply data analysis in R studio.

@claudiocesarmontenegrojuni5141 Год назад

You're amazing teacher! Thank you so much for this outstanding content.

@Ojaswini-Pathak Год назад

Very well made video and your understanding of the subject is tremendous!

@mocabeentrill Год назад

Thank you. You're really good at what you do. I did tis in base R and oh my word, it looks grotesque!

@Bunga-p5i Год назад

Thank you for the great tutorial! Just to let you know, I had to download these packages first to perform your script. install.packages("dplyr") install.packages("tidyverse") if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GEOquery")

@mikewafula9470 Год назад

Thanks so much for this great video. You have made it easy for me to explore gene data analysis with R. Keep sharing such content. Cheers!!

@jammerkd 2 года назад

Excellent videos and you are a fantastic teacher

@rajanirao6011 2 года назад

These videos are so good!!! Good practise to learn R. Thank you!

@Bioinformagician 2 года назад

I am glad you found this helpful! :)

@mohammeddabbour2254 Год назад

Wonderful explanation. Thank you so much for making this tutorial. Just a sidenote: when both dplyr and plyr (from tidyverse) packages are loaded and you want to use a certain function, it is better to specify the package the function is available in when calling the function (such as: dplyr::rename()). Otherwise, R may mistakenly think you are trying to use the function in the plyr package and return an error. Happy coding!

@Bioinformagician Год назад

Correct, thanks for pointing it out. Have taken care of that in the videos following this video :)

@hemanthchenga5671 Год назад

Thanks for explaining the code in detail and please make more videos

@余长 Год назад

Very helpful and you are very patient. It seems that you know exactly what my questions are.

@cerenuzun5989 2 года назад

It was very helpful and it would be great if you continue these tutorials. Thank you so much!!

@Bioinformagician 2 года назад

I am glad you find my videos helpful! :)

@seungwonkim8359 Год назад

Really helpful! Thank you very much. I hope you continue these marvelous work for long, since I am working on bulk/single cell RNA seq these days.

@setarehsohail5422 2 года назад

Amazing!! You are a professional teacher!! Thanks!

@karthibiotech426 2 года назад

Wow.. its very helpful I am just practicing with another dataset..with your same protocol... Thanks a lot...

@BISMILLAH7334 2 года назад

Excellent ! Thank you for the tutorial . Looking forward to many more such useful tutorials

@ayobamiogunsola6139 Год назад

Thank you for making this video. It has been helpful.

@Saed7630 Год назад

Clean, clear and informative!

@sayeman9577 10 месяцев назад

Thanks! Very helpful

@o1kun 2 года назад

Your video really helped me!! Really appreciate it😊

@arcturusdig1673 Год назад

I can't understand most of the things you do. I need to go to other tutorial videos for understanding every single step. If you want your viewers to understand especially beginners, then please make your explanation more lucid and easy.

@xelaldaero9339 Год назад

Thank you! Your videos are very useful!

@IslamSafwat-- 4 месяца назад

GREAT! many thanks::)

@lisahuang850 2 года назад

Really nice video! I was wondering if you could demonstrate how to convert the raw count to tpm or fpkm values in r as my GSE dataset provide raw count. Thanks!

@Bioinformagician 2 года назад

Thanks for the suggestion. Will plan a video covering this!

@alaminafendy6071 9 месяцев назад

Thank you so much. Nicely explain..

@MohammadNasirAbdullah 7 месяцев назад

Thank you so much, it really helps me 😊😊😊😊😊😊😊😊

@gustavoantoniobrugesmorale1881 2 года назад

You are excellent. Thank you!!!

@tushardhyani3931 2 года назад

Thank you for this video !!

@jithus89 7 месяцев назад

> gse = GEOquery::getGEO(GEO = 'GSE183947', GSEMatrix = TRUE) Error in open.connection(x, "rb") : Problem with the SSL CA cert (path? access rights?) why this error?

@sanjaisrao484 2 года назад

Thanks

@1980yadalam 2 года назад

very good video, thanks.

@melinaguillon2449 2 месяца назад

Hi! I can't install GEOquery, I get this error message: Warning in install.packages : package ‘GEOquery’ is not available for this version of R

@QAKS1264 2 года назад

@gaurangagarwal3817 24 дня назад

Hey! could u help me in finding the differential gene expression level from a gene omnibus dataset through R Limma package

@faizu0076 Год назад

I didnt founr getGEO protein query in this there is no any package support with this name solve rhe problem plz

@juliangrandvallet5359 2 года назад

Amazing!!!! now how can I plot a heatmap out of this data?

@chinspostdoc Год назад

HI have some questions. Please help to resolve the or to understand them. What if the GEO study only gives us a raw file containing either text files, or . CEL files. how to read the data from that. 2) suppose if a GEO study contain many samples of different tissues, then how to make 2 groups comprising on only those samples that a person is interested e.g. as i want to compare expression data from healthy and covid patients but GEO study contain some samples of ell lines treated with a certain chemical along with tissues of healthy and covid patients. Then how can i make two group with heathy and covid name and also includes samples into those groups accordingly. 3) If GEO raw file contain count.text files of each sample then how we can use them for differential expression analysis. Your kind reply would be much appreciated.

@kajalpanchal8239 2 года назад

thankya Khushbu!

@moulytasnuva1860 2 года назад

@Bioinformagician Is there any process to find the threshold value from FPKM to compare the early and late stages of cancer?

@aheedan9957 2 года назад

Hi, nice one, but I did not understand the part of pData and phenodata function.

@mikewafula9470 Год назад

Thanks again for the video. I have managed to download the gene expression data (GSE 216497). How do I get its corresponding metadata.

@yahyayozbatiran Год назад

Hello, how can i plot a specific gene expression in cancer subtypes from tcga, for example; I want to plot> MSH2 gene expressions in Colon Mucinous versus Colon Adenocarcinoma

@mohamedalfaki4268 2 года назад

Hi and thanks for this very nice tutorial, I have this error when I am trying to reshape the data Error in `stop_formula()`: ! Formula shorthand must be wrapped in `where()`. # Bad data %>% select(~gene) # Good data %>% select(where(~gene))

@Bioinformagician 2 года назад

Can you give me a little context of what you are trying to do? I am having a hard time recreating this error. Thanks!

@SamipSapkota-zg8hy 2 месяца назад

the value of strain samples and cell.type becomes null

@irodasay3448 2 года назад

Thank you for the tutorial. I have a question about converting GSE to ExpressionSet. I used your vignette and tried to do the same for GSE181462. 1th I got GSE by : gse

@Bioinformagician 2 года назад

Try changing GSEMatrix = FALSE

@imvasco 2 года назад

What about GEO data thats not CSV but TXT?

@Bioinformagician 2 года назад

Sometimes gene expression data is also available as a .txt file on GEO. You could read in .txt similar to how you read a .csv file in R. Please make sure .txt file contains gene expression data. Usually, the 'data processing' section for each sample should provide details on what does the txt file contains and how it is processed.

@aytacoksuzoglu2975 Год назад

why did we put -> .

@Ijazalijin 2 года назад

how can is activate the GEOquery packge??

@Bioinformagician 2 года назад

Run library(GEOquery) at the beginning of the script

@terryadams2652 Год назад

@Bioinformagician, I apologize for my question (please), but, as a Biologist, I am now learning Python. I really don't want to spend what little time I have learning another language (R). So, to get these results, is it possible to just use Python instead of R? Thank you very much, my dear.

@Bioinformagician Год назад

You can perform R equivalent operations in python. I believe it is pandas package in python that will allow you to do all your data wrangling.

@sharadjaiswal1705 Год назад

Ma'am how to write R script. that are used in this video?

@zeynepdurkaya883 Год назад

ı cant command call the data the chapter 6.14 isnt clear enough

@harshjasani8637 Год назад

Hello, Thank you for amazing video and tutorials. I could not load the GEOquery library, any ideas what could be the reason?

@Bioinformagician Год назад

probably you need to install it first before loading?

@muneeramashkoor7919 2 года назад

Hello, your videos are very informative. I am trying to look at the gene expression of my gene of interest. The supplementary data in GEO is in the form of a .fpkm_tracking file. How can I go about solving/looking at the expression using these files? Thank you!

@Bioinformagician 2 года назад

If there are no raw counts provided, you can create them yourself. You can fetch RNA-Seq reads associated with GEO dataset from SRA. Once you get the reads, you can align and quantify them to get counts.

@awa8061 2 года назад

can you suggest any python package for gene expression analysis?

@Bioinformagician 2 года назад

Unfortunately, I do not have any recommendations for python packages. I only use R for gene expression analysis.

@markrenton6981 9 месяцев назад

Can someone please explain what the two ".." are at the start of her file path when reading in the data file?

@Bioinformagician 9 месяцев назад

The "../" is the Linux notation to move up a directory level in the file system hierarchy. For instance, if you're in the directory "/home/user/documents/" and you use "../", you'll move up to the "/home/user/" directory.

@andyderek3021 2 года назад

Thank you for this well explained video. Please, if i want to do survival analysis based on gene expression data with lets say GE183947, how can i get the clinical data information from GEO ?

@Bioinformagician Год назад

If it is not provided with the metadata, you might have to reach out to the authors.

@vahidgorganli8895 Год назад

🙂👍

@Ojaswini-Pathak Год назад

Hi, I tried installing GEOquery package and got error - package GEOquery is not available for this version of R, could you please help.

@naveenyethirajula1279 Год назад

Please tell me how to install it

@hiraalmas9042 Год назад

I am facing same issue

@killa14108 2 года назад

Hi what happens when there are NAs in the gene expression data? The accession number is GSE70947 and it's a breast cancer data set with 296 total samples and 62976 features (genes). I followed what you did and queried the data directly using GEOquery from Bioconductor. I am just stuck now and figuring out how to deal with NAs and would appreciate your help. Thank you!

@Bioinformagician 2 года назад

I would quantify the NAs for each gene across all samples and filter out genes that have NAs in more than half of the samples. I usually prefer to replace NAs with 0.

@killa14108 2 года назад

@@Bioinformagician Thank you very much! Do you also might have any recommended methods for feature (gene) selection for creating a classification model in predicting cancer/normal samples?

@sanjaisrao484 2 года назад

Mam some doesn't have sample names in Geoquery metadata please help, I am stuck here

@Bioinformagician 2 года назад

Are you using the same dataset used in the video?