Bianca Brown began the hard work of collating scripts the lab uses to process fastq data from our lab's diverse Illumina amplicon projects. These strategies, and a draft explanation of why we use different "flavors" of these approaches for different projects, are provided here.
Modules included the tutorial include "cutadapt," "dada2," and "R," with some references to "Obitools" and Brown University's supercomputing cluster "Oscar."
Many of the steps and principles of these workflows are identical -- we want to thoughtfully prepare our data for analysis and remove errors -- but a few of the nuts and bolts differ. Most often, these differences arise from whether or not a project included single-end sequence data (used to be common) or paired-end sequence data (now standard in the lab). There are also differences in approaches depending on whether the amplicons are typically invariable in length (e.g., 16S-V4 rRNA or COI markers), or if there is considerable length variation (e.g., trnL-P6 markers).
For members of Brown University seeking to run parts of these modules on Oscar, Bianca has very kindly provided some blank bash scripts that can get you started here.
NB: This compilation of scripts is a work in progress. We are aware of necessary updates and improvements, and we intend to push them soon. We'll add posts describing any substantial updates in the future, and we welcome feedback.
We also wish to express our appreciation to all of the authors of the softwares that we use and cite in our work.
Computational resources kindly contributed and explained by members of our community.