No video :(

Differential expression in Python with pyDESeq2

Sanbomics

Подписаться 13 тыс.

Просмотров 20 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

20 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 96

@sanbomics Год назад

Fix: You should used the normalized counts layer for the PCA step.

@sarap5548 Год назад

Thanks! Could you put the code for it?

@cagankaplan5302 9 месяцев назад

Should we specify sc.tl.pca(dds.layers["normed_counts"] or just adding the log1p layer to dds before the pca is enough?

@user-rb2pt1zk1s 7 месяцев назад

@@sarap5548 did you know please how to fix it in the code ?

@vincenzomazza2880 6 месяцев назад

sc.pp.log1p(dds) sc.tl.pca(dds) sc.pl.pca(dds, color = 'Condition', size = 30)

@louis-philippechaumont8400 Год назад

I watched a bunch of your videos. They are really helpful and well made. I like that you present both python and R. They are helping me get started in bioinformatics!

@sanbomics Год назад

I am glad you have found them helpful!

@vjsanchezarevalo Год назад

Great video! I have been using pyDeseq2 and works great! Very cool the mapper function! Your videos are always very helpful, thanks a lot!

@sanbomics Год назад

Thank you for the kind words :)

@parneetkaur7068 28 дней назад

For anyone watching in 2024, the DeseqDataSet function does not consider 'clinical' an argument. Instead, it takes 'metadata'.

@lly6115 Год назад

I will click your video everyday to pump that viewed number.(lol) Thank you!

@sanbomics Год назад

Doing the lord's work

@lly6115 Год назад

@@sanbomics thank you for your respectul endeavor and time!

@AlyenaMango 2 дня назад

I was wondering why you use the stat column for your rankings in GSEA. From what I've read most people will rank their gene list by Log2FC or multiplying Log2FC*-log10(p-value). Why do you choose this method?

@user-zm6ip8iy9q Год назад

As a beginner in bioinformatics, I am watching the uploaded videos well. There are cases where TCGA data is used for cancer analysis at the single cell level, so if you have time, could you make a related video? Thank you always.

@sanbomics Год назад

What analysis are you trying to do?

@nikelElegance 6 месяцев назад

AMAZING. Thank you

@alexrutherford3105 10 месяцев назад

I'm making a dds and I got a weird error of: TypeError: DeseqDataSet.__init__() got an unexpected keyword argument 'clinical'. has this been updated?

@Rodrigo-yx3sn 10 месяцев назад

I think it changed.I used it like this : dds = DeseqDataSet(counts=counts, metadata=metadata, design_factors="Condition")

@sanbomics 10 месяцев назад

Yeah, sorry for the late reply. They changed clinical to metadata

@lamjennygrace623 10 месяцев назад

Thank you for such a comprehensive tutorial. I'm totally new to Bioinformatics and I found your videos are very helpful. Could you please do a tutorial on how to get the count table and sample file from scratch in Python?? I would very appreaciated

@sanbomics 9 месяцев назад

Hi! You should be able to find guides on how to open csv with pandas. If you can load the counts into pandas you should be able to follow along with the rest

@JAIRAJMATHUR Год назад

I love your videos, and I am deeply grateful for all the content, especially python related that you put out! Hope to see more in the future!! What exactly does baseMean tell us? How would you set the threshold for it, as it could vary from dataset to dataset? Also when you use scanpy for PCA, you didnt normalize your data, or log transform it. Are those not necessary when using sc.tl.pca? Another approach that I have tried and cross verified with R, is to use vst transformation using the dds.vst function, and then use this vst transformed data to make PCA, do you think this would make your PCA plot look better? Sorry, one last question: For GSEA, What would be the difference using stat for ranking vs sign(LFC)*-log10(padj) vs LFC*-log10(padj) Also, pre_res.res2d gives a nice dataframe with everything :)

@sanbomics Год назад

Thank you! Basemean is just the mean of all the samples for that given gene. Setting a threshold is somewhat arbitrary. You could do it based on the distribution but most people just pick some number between 10-100, unfortunately I haven't seen standardized guidelines. Thanks for pointing that out.. it is actually a mistake on my part. Should have specified the normed_counts layer. Stat vs other metrics for GSEA is another one of those things that hasn't really been standardized. I like stat more now because it doesn't require an extra line of code and it doesn't seem to be inflated as badly as p-value can be sometimes. I would expect the actual results to not differ much though. Wow, thanks for pointing out pre_res.res2d xD

@sanbomics Год назад

I'm going to pin this message because it has several important points. Thanks again for commenting!

@burcakotlu7858 Год назад

@@sanbomics I also liked the content. First of all, thanks for it. Regarding PCA, when I apply vst and then PCA in R, I got more meaningful PCA plot than the scanpy PCA without any transformation beforehand for the same bulk RNA-seq data at hand.

@burcakotlu7858 Год назад

@@sanbomics By the way, I got error for GSEA. gp.prerank(rnk=ranking, gene_sets=['GO_Biological_Process_2023', my_gene_set], seed = 6, permutation_num=100) givesTypeError: ufunc 'isinf' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe'' on my side.

@heatherpeng7437 9 месяцев назад

Hello, thank you so much for this great, informative tutorial! I'd like to ask what does the PCA plot mean in your example? Thank you.

@mikezhao5294 Год назад

Your video is super helpful! I am wondering is any pipeline for differential expression analysis for ST data, or we just use SC pipeline? Thanks!!

@AbelDavid-qc5xy 2 месяца назад

Thank you for this very helpful tutorial. is there a function for plotting the euclidean distance map for each sample?

@analeighgui4693 Год назад

Thx a million for the video! I have a general question about Python vs. R. For people new to bioinformatics, R is usually the lang they start with. Many cool packages are written in Python (and more and more). Do you see benefits for bioinformatics newbies to adopt python? You have great videos on both langs, so I want to see your opinion.

@sanbomics Год назад

You'll need to learn at least a little of both. But Python is the future in my opinion. I did a video on this a while back: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-vNH3HejAwSE.html

@analeighgui4693 Год назад

@@sanbomics Thanks for sharing! I rely heavily on R but have come across more and more often Python packages, especially in VDJ analysis. Got to seriously pick up the Python now!

@juanjara2221 5 месяцев назад

where can I download count table? in the GSE accesion we only have normalized counts or raw data

@mocabeentrill Год назад

Been a while man! Hope you've been well.

@sanbomics Год назад

Been busy with werk and haven't been able to do one. But doing well, thanks!

@marwanmohamed3844 5 месяцев назад

Hey! Many thanks for this tutorial! i got a questions regarding the design factors, am interested in DE to check effect of AGE, while i want to control for SEX and also want to test for there interaction , do you know how to implement the interaction term in the design factors? design_factors. = ["Age","Sex","Age x Sex"],? continuous_factors=["AGE"]

@vidalmrc 5 месяцев назад

Hey! Many thanks for this tutorial! I don't know if something changed in the PyDeseq2 API but the line "stat_res.summary()" triggers this error: "IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed" I have spent about 1h now trying to troubleshoot it without any success. I use the "count_table_for_deseq_example.csv" from your github, and I tried going throught the official "Getting Started" to end up with the same error... Any idea what's going on ?

@sanbomics 5 месяцев назад

They have made a lot of changes since I made this video, to the point where I am thinking about doing a new one. I'm not sure I have seen that error at that specific point before, but I have had that same error somewhere else in the workflow. Off the top of my head I can't remember what I did to fix it. It may be that something is off in the dds object. Hard to say without taking a closer look. Let me know if you figure it out

@vidalmrc 5 месяцев назад

@@sanbomics Thanks for your reply, and your videos in general!

@lly6115 Год назад

A quick simple question, do you know if pydeseq2 is capable of processing multiple groups? since I haven't being able to find that in its official docs.

@sanbomics Год назад

What exactly are you trying to do?

@lly6115 Год назад

@@sanbomics Thank you for your reply. Let's say I have RNA counts data for 9 samples, including control(3 sample), treatment1(3 samples), treatment2(3 samples). Is it possible to transform 9 columns of counts data to expression matix data using pydeseq2. Could be a blatant rookie question( so ignore this would be totally understandable). Thank you very much for your time.

@JAIRAJMATHUR Год назад

@@lly6115 @sanbomics I have been able to convert it into a single condition column, by joining the strings. Then I can specify which contrast I want to run and get my results.

@NewPlant_ 4 месяца назад

Hey man, I'm trying to use this video to help me with my own work but during the GSEA part where you are using the gp.prerank function, I keep getting a value error. ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements. I've done everything you have and my dataframe has two columns just like yours, I don't know why it's expecting 3.

@sanbomics 3 месяца назад

It might be related to the service being down. It's an API call to their server and when it's down it returns not what is expected and you get a weird error. At least I know I have had to deal with this in the past

@begomorras 10 месяцев назад

My dataset is just samples and doesn't have any control. I have made the metadata dataframe with all S in the conditions, as this is my design but then the prgram spits an error and says : ValueError: Factors should take at least two values, but Condition takes the single value '['S']'. I know that in R the program can be bypassed by writting " design =~1 " and I would like to get help on how to do so in python Thanks in advance

@sanbomics 10 месяцев назад

You don't need a control, but you do need more than one group to test. You need to specify the groups in the condition column

@Shin-ct1ts 4 месяца назад

Thank you for your great introduction. I have the questions in GSEA part. Though I could get ranking and wanna do all gene analysis in 9:40 min and always get the error like below. ValueError: Length mismatch: Expected axis has 3 elements, new values have 2 elements How can I solve this problem...?

@sanbomics 2 месяца назад

Sometimes the server is down. Were you able to get it working?

@chunliang8290 7 месяцев назад

I am using your sanbomics. Unfortunately, you only has human and mouse as the species. I work with chicken data. What is your suggestion for alternative solution here?

@sanbomics 6 месяцев назад

I would stick to ensembl/biomart for now

@user-fn6fl4jl4x 8 месяцев назад

Hi, can i compare 4 groups(you use 2 groups)? Or DESeq2 only for two? Thanks for answer.

@sanbomics 8 месяцев назад

You can only do pairwise comparisons. But if you want to group them you can. for example, 3 vs 1. Or 2 vs 2. Very dependent on what you are trying to accomplish.

@paulavazquez5135 9 месяцев назад

Thanks for the great tutorial. How can I get a data frame with the normalized counts for each gene in each sample?

@sanbomics 9 месяцев назад

Without checking.. I think it is in one of the layers of the dds object. Something like dds.layers['norm_counts'}

@chunliang8290 7 месяцев назад

Can you install pyDESeq2 into your google drive. Then, later, you just need to re-mount your google drive so that the pyDESeq2 will be available next time you use it without re-installation?

@sanbomics 6 месяцев назад

No idea, I haven't tried this. Sorry!

@ustigergirl 10 месяцев назад

at the 'mapper' stage if your species is mouse do you change it to 'mouse'? What would be the correct format?

@sanbomics 10 месяцев назад

just make sure to specify species = mouse when you initialize the mapper the first time. Then you can use it over and over again without specifying

@TakaTatsumi Год назад

thanks !

@sanbomics Год назад

You're welcome!

@gerolduntergasser4000 Год назад

cool

@mehwishshoaib1591 10 месяцев назад

When define condition colum you jave 8 samples for which yoù put C and RS manually what to do when we have more no of samples for exapmle first 50 are diseased and last 40 are normal for that how to define condition column

@sanbomics 9 месяцев назад

You can do ['disease'] * 50 + ['normal']*40 or you can import the metadata from csv

@imranluqman4622 8 месяцев назад

Hey! I got an error saying 'DeseqStats.__init__() got an unexpected keyword argument 'n_cpus''. Could it be that it's an old argument? Thank you!

@sanbomics 8 месяцев назад

Yeah possibly, it was a brand new package and they have been making a lot of changes. For example, they changed "clinical" to "metadata" since i've made the video too

@irvingfrancisco6822 6 месяцев назад

Hey, do you know what is the new argument for 'n_cups'? I have the same issue

@nicowerschler9596 Месяц назад

stat_res = DeseqStats(dds, contrast = ('Condition','XX','YY'))

@DrRam_Undi 4 месяца назад

Thank you for the video tutorial. I am not able to install pydeseq2 modules using jupyter notebook. Could you provide me any details on how to solve this? I am getting an error like this : ModuleNotFoundError: No module named 'scipy.stats._boost.nct_ufunc'. Thank you

@sanbomics 2 месяца назад

I would try making a new environment with python=3.9 and reinstall your packages. Try to install them all in one command so that pip can do a better job avoiding conflicts: pip install a b c d e f ...

@NadaHashimOsman 7 месяцев назад

what would be the difference in the code it i have "normalized" paired data? whick steps should i neglect , whick steps should be edited to fit my case

@sanbomics 6 месяцев назад

If the data are already normalized you may not be able to use Deseq. I think limma works for that

@jousty6696 7 месяцев назад

At 4:13 Name error: name ‘DeseqDataSet_init_’ is not definied

@serychristianrenaud Год назад

thanks

@sanbomics Год назад

No problem!

@MegaLiaoliao 9 месяцев назад

ValueError: The count matrix should only contain integers at 4:00 min

@sanbomics 9 месяцев назад

Try converting the whole matrix to int and see if you get an error

@AbelDavid-qc5xy 2 месяца назад

Can you cover batch correction in python?

@sasimuo 5 месяцев назад

Thanks for your video . Sometimes I get a value error when I run dds.deseq2() , ValueError: The first guess on the deviance function returned a nan. This could be a boundary problem and should be reported. how do I fix it?

@sanbomics 5 месяцев назад

Hard to say without looking more closely. See if you filter more lowly expressed genes out if that gets rid of the error. For example, get rid of genes that aren't expressed in at least two samples

@sasimuo 5 месяцев назад

@@sanbomics Thank you. when I filter (counts = counts[counts.sum(axis = 1) > 5] ), it helps. Sometimes I play with different counts sum value. However, if I get too high of counts sum value, I get another error which says perfect partition error. is there any parameters I could use in dds.deseq2() function to force to run that overcomes any error? Thanks

@adekunleajiboye1244 Год назад

Thanks for this video. I was not getting the one in R. How do I get your contact?

@sanbomics Год назад

You can send me a message on twitter!

@ustigergirl 10 месяцев назад

can your sanbomic script/package be used for cuffdiff results (cufflink tool)?

@sanbomics 9 месяцев назад

Which part? The volcano plot?

@ustigergirl 8 месяцев назад

@@sanbomics sorry for the late reply, for heatmap and volcano plot. Thanks!

@LeticiaRojasC Год назад

Can pyDESeq2 be use for single-cell RNAseq instead of bulk?

@sanbomics Год назад

Yup! I actually just did a pseudobulk video ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-Ee0PQUwVH8Q.html

@Dr.UgurComlekcioglu 10 месяцев назад

Thank you very much for this great video. How can I get gene symbols of "Cow"?

@sanbomics 10 месяцев назад

You are doing DE analysis from cow samples? Interesting. Check ensembl for the species and get the gtf. The gtf will have ensembl id and gene symbol, you just have to parse the file

@Dr.UgurComlekcioglu 9 месяцев назад

@@sanbomics Thank you very much. I did it, and it worked.

@user-rb2pt1zk1s 5 месяцев назад

Hello , Can you please share the source of the dataset ? which article did you use pleaaaase

@sanbomics 5 месяцев назад

Ooops I forgot to put that in the description. Thanks! The samples include normal human cell control and replicative senescence cells from NCBI accession GSE171663

@alexrutherford3105 10 месяцев назад

do you have a link to your sample count_table.csv ? thanks!

@sanbomics 10 месяцев назад

There is a counts table on my github, but I don't remember if it is the same one I used in this video