Conservation & Molecular Ecology @Brown
  • Home
  • People
  • Publications
  • Research
  • Conservation
  • News
  • Join
  • Contact

Bioinformatics Workshop Archive

Bioinformatic strategies for abundance filtering

1/20/2022

0 Comments

 
As part of Beth's critical review in Molecular Ecology on abundance-filtering strategies in DNA metabarcoding pipelines, we conducted simulations and sensitivity analyses to illustrate how key assumptions in the design of our bioinformatic strategies can introduce biases that undermine ecological interpretations of the data. 

The Dryad repository for the paper contains data and code that will be useful for anyone who would like to replicate or enhance the simulations and/or sensitivity analyses. I consider this a major bioinformatic resource for researchers in the field, and an illustration of thoughtful research strategies that I hope others will build upon in a few key ways.

The simulations we conducted are relatively simple, but extremely relevant. It would be rewarding to explore the relevance of other assumptions, parameters, data structures, and/or downstream ecological metrics. This would not only be of fundamental interest, but the developments and insights would be profoundly useful for all researchers in the field (us included). The Reviewers and Editors of this original manuscript seemed to agree with that sentiment. We briefly considered publishing an R Shiny App or similar to facilitate this type of exploration -- I still think it could be worthwhile, so please let us know if you would like to contribute!

The sensitivity analyses model a strategy that I developed piecemeal over the years to help me check my assumptions about how robust my published conclusions would be and to be more persuasive with reviewers. Similar sensitivity analyses have been described in the supplementary materials in several publications in recent years. It requires a bit more work than simply using a plug-and-chug approach to bioinformatics and downstream analyses, but I think it pays off in terms of my own understanding of each study system and the reliability of my papers. I often encourage authors of papers that I review to consider doing something similar when their results are borderline, and I hope this code can serve as a resource to support that type of effort when appropriate.   
0 Comments

Mpala Reference Library formatted for "dada2"

6/18/2021

0 Comments

 
Collaborator Nick Harvey has kindly provided a formatted version of our current Mpala Plant DNA Barcode Reference Library that is suitable for taxonomic assignments using the R package dada2. You can download the fasta file of reference library v.2.0 (corresponding to Gill et al. 2019) formatted for dada2 here.

​Thank you, Nick, for making this time-saving resource available to share!
0 Comments

Building a plant DNA barcode library; fieldwork edition

1/16/2021

0 Comments

 
We are often asked to provide advice or assistance building plant DNA reference libraries for use in dietary metabarcoding projects. To begin centralizing info on our methods and sharing some important lessons-learned from experience, I have created a section on the lab's wiki for building plant barcode libraries. I will treat the google docs that you can link to from there as living documents. All of the details provided are nested within two main goals. The first goal is to collect plant voucher specimens and plant DNA barcode samples that match in ways that can be clearly documented  through their respective metadata sheets. This is critical for the long-term value of the data. The second goal is to ensure work done by field biologists and molecular biologists are mutually informative -- the best reference libraries are developed through the meaningful engagement of expert botanists who are knowledgeable in a local flora and the researchers who will be analyzing the laboratory data. 

We love to archive relevant vouchers in the Brown University Herbarium. Please keep in mind that the herbarium is staffed by expert botanists. Properly collected specimens can be mounted, archived, and digitized by professional staff -- this greatly reduces the cost and complexity of fieldwork. 
0 Comments

Simple phylogenetics workflow

3/26/2019

0 Comments

 
On March 19th, Brian Gill kindly provided a broad overview of a phylogenetics workflow and resources. These are some great go-to documents for building simple phylogenies quickly, especially based on data from some of the common mitochondrial and chloroplast "DNA barcode" markers that we often use.

The workflow infographic to the right is simply out-of-the-ballpark outstanding.

A very valuable list of resources corresponding to each stage highlighted in this infographic follows below the break.

THANK YOU, Brian, for compiling this information.
Picture

Read More
0 Comments

New release of trnL-P6 reference data for Mpala Research Centre

2/27/2019

0 Comments

 
Following the recent publication of our plant DNA barcode library from Mpala Research Centre, Kenya, led by Brian Gill, we are happy to provide a set of files to serve as our local trnL-P6 reference library (version 2.0). These files were carefully prepared by Courtney Reed, to whom we are most grateful.

Read More
0 Comments

Workflows for denoising Illumina amplicon data using dada2 and Oscar

1/29/2019

0 Comments

 
Bianca Brown began the hard work of collating scripts the lab uses to process fastq data from our lab's diverse Illumina amplicon projects. These strategies, and a draft explanation of why we use different "flavors" of these approaches for different projects, are provided here. 

Modules included the tutorial include "cutadapt," "dada2," and "R," with some references to "Obitools" and Brown University's  supercomputing cluster "Oscar."

Many of the steps and principles of these workflows are identical -- we want to thoughtfully prepare our data for analysis and remove errors -- but a few of the nuts and bolts differ. Most often, these differences arise from whether or not a project included single-end sequence data (used to be common) or paired-end sequence data (now standard in the lab). There are also differences in approaches depending on whether the amplicons are typically invariable in length (e.g., 16S-V4 rRNA or COI markers), or if there is considerable length variation (e.g., trnL-P6 markers). 

For members of Brown University seeking to run parts of these modules on Oscar, Bianca has very kindly provided some blank bash scripts that can get you started here.

NB: This compilation of scripts is a work in progress. We are aware of necessary updates and improvements, and we intend to push them soon. We'll add posts describing any substantial updates in the future, and we welcome feedback.

We also wish to express our appreciation to all of the authors of the softwares that we use and cite in our work.
0 Comments

High-performance computing on Oscar

1/24/2019

0 Comments

 
Prof Rebecca Kartzinel shared a tutorial on Brown University's high-performance computing cluster, called Oscar.
  • Notes and code are available.
0 Comments

    Author

    Computational resources kindly contributed and explained by members of our community.

    Archives

    January 2022
    June 2021
    January 2021
    March 2019
    February 2019
    January 2019

    Categories

    All
    Dada2
    Data
    Data Management
    DNA Barcodes
    High-performance Computing
    Metabarcoding
    Microbiome
    Oscar
    Phylogenetics
    Pipelines
    Reference Libraries
    Software

    RSS Feed


Picture
Copyright 2021 © Tyler Kartzinel

  • Home
  • People
  • Publications
  • Research
  • Conservation
  • News
  • Join
  • Contact