Conservation & Molecular Ecology @Brown
  • Home
  • People
  • Publications
  • Research
  • Conservation
  • News
  • Join
  • Contact

Bioinformatics Workshop Archive

New release of trnL-P6 reference data for Mpala Research Centre

2/27/2019

0 Comments

 
Following the recent publication of our plant DNA barcode library from Mpala Research Centre, Kenya, led by Brian Gill, we are happy to provide a set of files to serve as our local trnL-P6 reference library (version 2.0). These files were carefully prepared by Courtney Reed, to whom we are most grateful.
The reference library is provided below. We will post occasional updates as new data become incorporated. (To be sure you are working with the most current version, please check the link to "Data" under Categories in the panel to the right to see if there are more recent updates).

Version 2.0 corresponds exactly to the trnL dataset and taxonomic identifications included in the Gill et al. 2019 Molecular Ecology Resources paper. This replaces Version 1.0 of the library, from the publication of our first Mpala DNA metabarcoding study (Kartzinel et al. 2015, PNAS), which is archived on here on Dryad (together with other datasets presented in that study).

The Mpala plant DNA barcoding project is a collaboration between Mpala Research Centre, The East African Herbarium, National Museums of Kenya, the Smithsonian Institution, Princeton University, and Brown University. Links to the current set of publicly available data associated with this project can be found on the Barcode of Life Datasystems (www.boldsystems.org) under the project name UHURU. 

We are grateful to the Government of Kenya for permission to conduct this research. We are especially grateful to Sam Kurukura, Ali Hassan, Peter Lokeny, and Dr. Mutuku Musili for the painstaking efforts required to archive and identify these invaluable research specimens. 

The data files follow this brief description: These data files were kindly provided by Courtney Reed in February 2019. The reference library was extracted from the trnL sequence set provided in the Gill et al. publication and formatted to serve as a the trnL-P6 reference database using the program Obitools. If you have a relevant metabarcoding dataset in fasta format, these files should include everything necessary to identify each sequence using the ecoTag function in Obitools
  • Specimen metadata (spreadsheet)
    • This spreadsheet provides relevant metadata from the UHURU project in BOLD at the time Gill et al. (2019) was published.
  • The trnL sequence data (fasta)
    • ​This fasta file contains complete set of trnL sequences downloaded from BOLD at the time of the Gill et al. (2019) publication.
  • The trnL-P6 reference library (fasta with Obitools headers)
    • A fasta file containing all unique trnL-P6 sequences in Obitools format. These sequences were extracted from the complete trnL set using a standard Obitools workflow, and the fasta headers are in Obitools format. This is the input you'd provide to the ecotag command using the -R flag.
  • The ecoPCR taxonomy database (ecopcrdb file)
    • ​This is a file format specific to Obitools that contains the taxonomic information needed to identify query sequences based on this reference library. It is the input file required for the ecotag command using the -d flag.
  • Mpala trnL-P6 taxonomy (spreadsheet)
    • ​This is a file that Courtney created to provide information about the taxonomy of specimens representing an EXACT match to each reference sequence. This is helpful because most identification algorithms in metabarcoding pipelines assume that a sequence matching multiple species CANNOT be identified to species level and thus only report the finest-grained identification that applies to all potential matches (i.e., genus, family, order, or higher levels). That's a useful assumption for many reasons, but it is not quite correct. The reality is that we can attribute that sequence read to a SET of possible matches, which we may occasionally want to do. For each sequence in this Sheet, the ExactTax column provides a ;-delimited list of all known taxonomic matches.
0 Comments



Leave a Reply.

    Author

    Computational resources kindly contributed and explained by members of our community.

    Archives

    January 2023
    December 2022
    November 2022
    January 2022
    June 2021
    January 2021
    March 2019
    February 2019
    January 2019

    Categories

    All
    Dada2
    Data
    Data Management
    DNA Barcodes
    High-performance Computing
    Lab Protocols
    Metabarcoding
    Microbiome
    Oscar
    Phylogenetics
    Pipelines
    Reference Libraries
    R Tutorials
    Software

    RSS Feed


Picture
Copyright 2021 © Tyler Kartzinel

  • Home
  • People
  • Publications
  • Research
  • Conservation
  • News
  • Join
  • Contact