Nice Dan-o! And good to see you on the Tube! At 15:37 you say that the pairwise distances used in Mothur are not feasible for big data. That is not correct, it's only not feasible in QIIME. In Mothur you dereplicate your data first using the unique.seqs command (among other things) before clustering. This works great with illumina sequences.
Thanks. Your videos are really helpful. I guess this video was made before unoise was added to the usearch package. I would really like to know what you think the unoise3 feature of usearch? I much prefer that idea to the 97% cluster_otus.
Hi Dr Knights, thank you for providing this useful lecture. I am just wondering how I can find the fig. from the EMP, telling the percentages that do not hit the database vs. different habitats. I searched the EMP website, but it seems this fig. is buried deeply. Thank you!
vsearch is now recommended by the Brazilian Microbiome Project pipeline(BMP pipeline). Uses full dynamic programming and 64bit memory allocated. I think it is better for larger datasets.
For ref-based we use either NINJA-OPS (journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004658) because it's consistently faster and more accurate than the other ref-based pickers, or our 100% optimal (best-hit-every-time) ref-based picker in the works. For de novo we are still looking for good solutions-nothing that is out there currently has really nailed it IMO.