Bioinformatic Strategies for Abundance FilteringOver the years, our lab has contributed a number of essential reviews about how DNA sequence data can be accurately converted into dietary information. The science is clear: inappropriate assumptions about how to 'clean up' sequence data using bioinformatics can do more harm than good by warping our diet profiles and generating misleading assumptions. Nevertheless, we have to make some such assumptions to generate datasets that are useful and informative. How should we think about striking a balance between these competing imperatives? Led by Dr. Bethan Littleford-Colquhoun, one of the more important reviews we've produced on this topic was published in Molecular Ecology: The Precautionary Principle. This review, and a follow-up reply describing Evidence-based Strategies to Navigate Complexity, tackle the challenge of identifying appropriate abundance-filtering strategies in DNA metabarcoding pipelines. This post provides an essential summary of what we found... The one-sentence critical take-home message: We conducted extensive simulations and sensitivity analyses that illustrate how the assumptions we make about what abundance-filtering strategies accomplish in our bioinformatic pipelines can introduce biases that undermine all subsequent ecological interpretations of the data.
The simulations we conducted are relatively simple, but extremely relevant to ongoing discussions and challenges concerning how to balance the risk of false-positives and false-negatives in our data. The Dryad repository for the paper contains data and code that will be useful for anyone who would like to replicate or enhance the simulations and/or sensitivity analyses. I consider this a major bioinformatic resource for researchers in the field, and an illustration of thoughtful research strategies that I hope others will build upon in a few key ways. The sensitivity analyses we present are based on a strategy that I developed the hard way: attempting to be critical of my own results, taking one step at a time piecemeal over the years to check my own assumptions because I wanted to be sure any conclusions I published would be robust, and to persuade reviewers who thoughtfully pushed me to be more circumspect. Consequently similar sensitivity analyses have been described in the supplementary materials of several publications from the lab, but these new papers formalize and centralize the key things that we have learned. It requires a bit more work than simply using a plug-and-chug approach to bioinformatics and downstream analyses, but I think it pays off in terms of my own understanding of each study system. At the end of the day, I want to publish reliable papers. I often encourage authors of papers that I review to consider doing something similar when their results are surprising borderline, and I hope this code can serve as a resource to support that type of effort when appropriate. From here, it would be rewarding to explore the relevance of other assumptions, parameters, data structures, and/or downstream ecological metrics. This would not only be of fundamental interest, but the developments and insights would be profoundly useful for all researchers in the field (us included). The Reviewers and Editors of this original manuscript seemed to agree with that sentiment. We briefly considered publishing an R Shiny App or similar to facilitate this type of exploration -- I still think it could be worthwhile, so please let us know if you would like to contribute!
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. |
RSS Feed