so still transcription polymerase wont fully be halted because methalaytion. What about in the diffrentiation stage? how does it not express a type of cells gene, if methalaytion doesnt fully halt a transcription
The point of salmon and rsem is to get the transcript counts not gene counts. If you want gene counts you can simply use feature counts. Both tools use error models and an estimation algorithm to distribute reads that can align to several transcripts of a gene. This problem becomes less of an issue with long reads but it’s still there as some transcripts are very similar or subsets of longer transcript.
salmon has two running modes, the most canonical is the pseudo alignment mode where you first generate an index of your reference genome and the annotation file know isoforms. You can then generate a count matrix using the raw reads as input. This takes more memory. There’s another mode where it takes the alignments already determined by an aligner, say STAR or minimap2 for long-reads and simply adjusts the count accounting for multi mapping reads, many reads will overlap several isoforms for a gene. For this mode it is important to have a bam where the reads were aligned to the transcriptome not the genome. STAR will project your genomic alignments on to the transcriptome for you but if you use minimap2 make sure to use the transcriptome reference as the index.
good work .. I hope I wish to have free online courses specificly in R studio and analyzing the a dataset of Peripheral Blood Mononuclear Cells (PBMC) ? Than you Prof
Thank you for making this videos freely available to the public! I have a question and I hope you can see this: I wonder if we are comparing between HiC experiment, how are the data normalized to the read depth then? is there recommended packages to use?
Dear Professor Liu, I am learning about RSEM for RNAseq analysis. I download the genome (fasta) and its annotation (GFF) of an untypical organism from JGI (Joint GenomeInstitute) database, and want to set them as reference for RNAseq analysis using the star + RSEM strategy. On my macOS, the command was as below: rsem-prepare-reference --gff3 /ref/Aurli1_GeneCatalog_genes_20120618.gff \ --star \ --star-path /Users/xinjun/miniconda3/envs/rnaseq/bin/star \ /ref/Aurli1_AssemblyScaffolds.fasta \ /Auran_RSEM/auran_rsem However, an error message “Invalid number of arguments!” always occurs. How to resolve this? Thanks so much.
Thank you for this material! I just wanted to ask, is this material the same as in STAT115 2020, or should I go through both playlists to cover the whole course?
Hi Dr. Liu, thank you for this lecture. I'm wondering whether the increased dispersion is due to genes having different propensities to be "off" (almost no counts) versus being "expressed" (whatever that means in contrast), and whether this means that it would be better to model genes with low expression differently from genes with high expression. In other words, maybe the uni-modal NB distribution is inaccurate, and a bimodal distribution would be more appropriate. I wonder what your thoughts on that are. Thank you!