2024 updated single-cell guide - Part 1: RNA preprocessing and quality control

Подписаться 12 тыс.

Просмотров 7 тыс.

50% 1

This is a comprehensive tutorial on the most up-to-date recommendations for single-cell sequencing. This is part 1 of a multi-part series. Here I download a dataset, remove background RNA, preform quality control, and remove low quality cells.
Part 2 will cover dimension reduction and cell annotation. We will eventually get to in-depth analysis and scATAC analysis.
Notebook:
github.com/mousepixels/sanbom...
Paper/dataset:
www.cell.com/cancer-cell/full...
Reference:
www.sc-best-practices.org/pre...
0:00 Intro
0:27 Setup
12:08 Cellbender
18:20 QC
28:05 preprocessing
39:42 Conclusions

Наука

Опубликовано:

26 июн 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 70

@007ZK 2 месяца назад

Amazing series idea. I hope they keep coming.

@sanbomics 2 месяца назад

Hope is next week!

@yaseminsucu416 2 месяца назад

You rock! Thank you for doing this, looking forward to following this series!

@jonathanback5731 2 месяца назад

Your work is fantastic, great content!

@DuqueVJ 2 месяца назад

Amazing! Thanks very much for the tutorial, I'm learning a lot!

@0996Winglet-mq4on 2 месяца назад

really appreciate your videos🎉❤cannot wait to see spatial omics tutorial in the future😊

@sanbomics 2 месяца назад

Right now I am eagerly waiting some interesting datasets with newer more high res technology than visium

@supakornpongpakdee1544 2 месяца назад

Thank you very much for creating this tutorial! Looking forward to the next lessons!😊❤

@lly6115 2 месяца назад

Good to see you back😊 and thank you for your update

@sanbomics 2 месяца назад

Yeah sorry I have been busy! Shouldn't be as long between the next few videos.

@babyfriedrice4878 2 месяца назад

i love sanbomics so much!!!!!!!!!!!!!!!!!!!

@sanbomics 2 месяца назад

I love you too!

@piroDYMSUS 2 месяца назад

Amazing work, hope we will see second part soon

@sanbomics 2 месяца назад

Trying to release in the next week or two!

@ykoy1577 2 месяца назад

I was waiting for your video. your video is so helpful for beginner like me. Thank you so much for sharing your knowledge and experience

@avp300 2 месяца назад

this is brilliant! can't wait for part two!! Ridge plot look awesome! thank you Mark! :-)

@sanbomics 2 месяца назад

Tomorrow hopefully!

@jackmineeechen4380 2 месяца назад

I started with the video camparing different intergration method. That one really helped me! I eventually choose scanorama for my dataset, which worked out. Looking forward to this series! I appreciate your videoes!

@caspase888 2 месяца назад

I look forward to your videos. Your grasp on the subject and the ability to teach are amazing. Thanks a lot 👍🏻

@sanbomics 2 месяца назад

Thank you! :)

@dardas15 Месяц назад

this is fantastic and really helps people with limited bioinformatics background to independently analyze data-thanks so much for making these videos, ive been using them with python ever since you shared a few years ago!

@MrJordi94 2 месяца назад

You trully are an inspiration for rna-seq! Love your videos and your communication skills. Hope to see the rest of the 2024 tutotial soon :D

@sanbomics 2 месяца назад

Thank you

@laloulymounia9266 2 месяца назад

Thx for the update ！

@jianhuacao7180 2 месяца назад

welcome back, bro. Your channel is better than before.

@sanbomics 2 месяца назад

Thanks! I am trying to continually improve the quality and make videos people are actually interested in.

@moonmoun2983 2 месяца назад

Waiting impatiently for the next part

@sanbomics 2 месяца назад

Wait no further! :)

@alexeyryzhenkov7579 Месяц назад

Thank you for your work!

@sanbomics Месяц назад

Thank you so much!!! Really appreciate it! :)

@brunovinagre427 2 месяца назад

gratefull Mark!!

@user-ne7vm7fb3y 2 месяца назад

You were great.

@taoufikbensellak9274 2 месяца назад

I just started your sc guide and I really enjoy it. Just for some clarifications about the tools, I use mamba (conda) with python 3.8 and a lower version of pandas (

@sanbomics 27 дней назад

I'll be doing DE using a different approach this time which should give people fewer issues. Diffxpy can be a struggle so I don't really use it anymore

@gerolduntergasser4000 2 месяца назад

cool good job😁

@MinnnWang-uv8bn 2 месяца назад

🎉🎉🎉thanks！

@kristifourie8427 2 месяца назад

best page ever

@sanbomics 2 месяца назад

Thank you :)

@islemgammoudi842 2 месяца назад

Thanks for the Videos. Currently, I'm embarking on the journey of analyzing single-cell RNA sequencing (scRNA-seq) data combined with CITE-seq data. However, I'm facing challenges related to duplicate discrimination and assigning sub-samples via hashtags. Given your expertise in this area, I was hoping you could provide some guidance and advice on how to navigate these challenges effectively.

@moonmoun2983 2 месяца назад

I would like to thank you immensely because you’re one of the few bioinfo channels I can follow along, I have a question regarding a result I obtained from a following the previous full scRnA seq walkthrough you posted a year ago. I tried applying the code to a before and after chemotherapy treatment. Everything worked perfectly until i got to the deg analysis part with heat maps, With 25 top upregulated and downregulated genes and the filtering codes it didn’t yield more than 12 degs, so I had to reduce the filtering and kept genes with significant fold change above 0.05 . And I ended up with more differentially expressed genes, however in both cases my heat map was devoid of pattern, both the condition and control looked mostly downregulated. Should I conclude that there is no deg or expression signatures in both cancer sample before and aftertreatment? Because the original paper i took my data from didn’t do a deg analysis for the whole dataset but selected 4 patients out of 12 to create a deg heatmap with less than 10 genes. thank you, I’d highly appreciate your insight on my results

@sanbomics 2 месяца назад

Its really hard to say without knowing more and actually getting a feel for the data. You can try a pseduobulk approach and see if you have and degs. I have a video on that, but will also be covering it soon in the new tutorial series.

@fsh9134 25 дней назад

Thanks for making very useful videos. I was wondering if you would like to make a video related to single cell analysis using Julius AI a data analysis AI.

@mehdiraouine2979 Месяц назад

amazing work as always ! on a side note, if I were to download a fastq data from GEO with no specification of whether the adapters were removed or not in the paper, how should I check if they were removed on python.

@sanbomics Месяц назад

I wouldn't use python to do it only because there are several command line tools that are much faster that can do the same thing. Like cutadapt

@555gong9 2 месяца назад

Thank you for such a great video. Which is better for removing doublets, doubletdetection or the previous SCVI method?

@sanbomics 2 месяца назад

I haven't done or seen a comparison between the two. The best would probably be to run both and see how they overlap. All i can say is that doubletdetection is easier and faster

@555gong9 2 месяца назад

Thank you for your advice, I will try it next, thank you very much, my superhero.

@CaveCrack 2 месяца назад

Thanks for the great video and series. I have a question at around 36:40 on how to interpret the graph. If the experiment had loaded say 14000 cells it appears that around 8000 would be recovered which I assume we would interpret as the number called by cellranger... For 14000 cells loaded the multiplet rate appears to be 6%, 6% of 14000 being 840 expected multiplets. However, all the blue recovery dots are aligned around 4.5%. 4.5% of 8000 would be only 360 expected multiplets. The document from which the graph is extracted says "Generally an increased number of cells per sample will increase the doublet rate". I've not been able to find clarification. Thank you

@CaveCrack 2 месяца назад

Also, I am wondering if your low number of detected doublets at 1e-16 was due to the previous QC step where you exclude cells with the highest logp_total_counts and log1p_n_genes_by_counts, as these could filter a lot of doublets.

@sanbomics 2 месяца назад

I think in this case just ignore the blue line. The more cells you load the higher multiplet rate and more total multiplets you will have

@sanbomics 2 месяца назад

Exactly, it's hard to say exactly what percent the multiplets are because of the first step. I think I mention it in the video briefly... or at least i thought it

@goddyhong 2 месяца назад

thx for sharing! if i use a filtered matrix for analysis, do i still need to remove the background RNA? since i dont have a 4090🤣

@sanbomics 2 месяца назад

If you have a filtered matrix you can't remove background RNA. But if its just a time thing, you can use your CPUs with SoupX. I have another video on that. If you only have filtered counts, you are stuck with what you have!

@mehdiraouine2979 Месяц назад

Another question: if you were to choose between SCVi model for detecting doublets and this clf doubletdetection method, which one is more straightforward? I feel like this method needs some tinkering around depending on the specific dataset

@sanbomics Месяц назад

The best method would be to use multiple methods. They will all give you slightly different results but hopefully have significant overlap. The reason I used doubletdetection here is because it is fast/simple and I already have multiple video tutorials on SOLO (scVI). It's hard to say which is more accurate. Changing parameters in scvi/SOLO will likely change the results a lot too just like what happened here.

@abellopez8017 2 месяца назад

Hello! Thanks for the Video, I will begin my PhD in Bioinformatics in August, what computer do you have?

@sanbomics 2 месяца назад

Well.. at home I have a 32 vCPU, 128 gb ram, rtx 4090. At werk I have a 64 cpu, 256 gb RAM, rtx 4090. Sometimes I have to use AWS when I need more than that. Depending on what you plan to do it can vary a lot.

@frutitadelosmares 6 дней назад

Hi! Thanks so much for such a great tutorial! Have a naïve question of someone who just started in this world: When raw data is not available, for example, you can only download normalised filtered values, do you skip the pre-processing step? Is it correct to pre-process normalised values, let's say tmm? Again, thanks so much for all the videos!

@sanbomics 9 часов назад

Yeah if there are no raw counts then you will have to skip the ambient removal. Unfortunately, this is the only way sometimes.

@caspase888 15 дней назад

Your videos are amazing. Thanks a lot. Could I use 3050 with 64 GB RAM for this kind of analysis? Thanks a lot.

@sanbomics 9 часов назад

You can do a decent number of cells with 64 gb ram. I would think you could handle around ~200k in memory at the same time without too many issues. Some steps/algoirthms use a lot more memory though so it is highly dependent on what you do. In my experience 64 gb wont be enough for large datasets/atlases but you can def do small numbers of samples.

@AP-vo7gp 2 месяца назад

Sir, I have count matrix and want generate annotation matrix out of it then do the batch correction and then DGA plz help via process as i am not getting suitable results.

@sanbomics 27 дней назад

Hi it is hard for me to help without knowing more specifics and what the issue you are having is

@AP-vo7gp 26 дней назад

@@sanbomics thanks alot sir I was able do it :)