How Many Samples Do You Need for a Dietary DNA Study?Designing a dietary DNA metabarcoding study often begins with a deceptively simple question: How many samples do I really need to collect? There is not a universally “correct” number. We all want to have a large enough sample size for a powerful analysis. But it can be extremely challenging to collect fresh scat samples from wild animals—especially when they are rare and widespread—and then we face the cost of analyzing what we get. To answer this question, we need to focus mostly on the ecological inferences we want to make. Are we trying to compare groups? Estimate niche breadth? Detect rare food items? Describe seasonal shifts? The number of samples required to detect differences between sample sets is often very different from the number needed to perfectly catalog everything in a diet. So, I want to share some helpful rules of thumb based on experience across a wide variety of study systems... What Actually Determines the Sample Size You Need?Several factors shape how many samples you should target for collection:
Highly generalized feeders with diverse prey taxa typically require more replication than specialists with narrow and relatively constant diets. Systems with strong seasonal shifts may require replication across time. Populations occupying heterogeneous landscapes may show greater variance among individuals that can only be quantified with effort. The most important question is not “How many samples is enough?” but "When does the ecological signal that I need to detect become robust?" Describing A Diet vs. Comparing DietsIt is much easier to detect differences between diets than to perfectly catalog everything in a diet. If your goal is exhaustive description—identifying every taxon consumed and estimating its relative contribution—replication requirements can be high, especially in species with diverse diets. Instrumental error becomes a significant concern as well. If your goal is comparative or experimental—such as:
Comparative research designs are powerful because they focus on relative differences rather than perfection. In many conservation contexts, that distinction is critical. Management decisions often hinge on contrasts, and prior information is almost always limited. A Practical Rule of ThumbAs a general guideline, when I'm asked to recommend a target sample size I usually suggest aiming for 20–30 independent samples per “group.” A “group” should be defined relative to the goals of the study, such as:
In many systems that we have studied—including generalized feeders with diverse diets—we find that species accumulation curves approach an asymptote around this level of replication, levels of inter-individual variation can be reliably characterized, and group-level differences stabilize enough for robust comparisons. This is not an absolute target. It is a practical starting point that balances ecological realism with logistical constraints. If you discover diets are less varied, you can scale back to 10-20. We rarely see groups that are still rapidly accumulating dietary taxa beyond 30, though, and that’s the “magic” number we hear about in introductory statistics. Why You Should Avoid Pooling SamplesOne common strategy that people hear about to reduce costs involves combining samples from multiple individuals into a pooled composite sample. I generally recommend against this approach for dietary DNA studies. Pooling masks inter-individual variation, and that variation is statistically powerful—even when total sample sizes are modest. Quantifying differences among individuals is often essential for revealing ecological structure, niche partitioning, or behavioral flexibility that would otherwise remain hidden. Even relatively small numbers of independent samples frequently provide more inferential leverage than a few pooled composites. In dietary DNA research, independent replication is far more valuable than artificial composites. What If You Only Have One or a Few Dietary DNA Samples?My take: You should analyze them. Does that appear to be a little cavalier? Perhaps. But do it anyway. Here’s why… A small dataset can still be valuable, particularly when:
When “Groups” Aren’t the Primary Unit of AnalysisIt is worth noting that not all dietary DNA studies need to rely on predefined groups such as species or treatment categories. In some cases, analytical approaches—particularly unsupervised or minimally supervised machine learning—allow ecological structure to emerge directly from the data without imposing a priori bins. In our recent work in Yellowstone, for example, we used data-driven methods to identify structure in large herbivore diets without defining groups in advance. In analyses like these, the question shifts from “How many samples per group?” to “How well does our sampling capture the underlying ecological variation?” Replication still matters, but it is used to ensure representation of ecological variation rather than to balance sampling across predefined categories. This distinction is subtle but important: thoughtful sampling strategies remain essential, even when group identity is not the central organizing principle of the analysis, but pre-defining target sample sizes may not always be required. 🔗 Post: Do You Even Need “Groups”? Rethinking Replication in Dietary DNA Studies. 🔗 Software & Data: Our DNA metabarcoding pipelines and code are freely available for use. Pilot Studies Help Structure Dietary DNA AnalysesIf you can, it may be wise to consider a pilot study before scaling up. You can:
Pilot data can help you develop your final target sample size so you don't over- or under-sample. It used to be hard to do this due to the cost and labor involved in putting together a full Illumina run: by the time you had enough samples to justify the cost of the run, you might as well complete the whole project in one go. (Sometimes if you knew the director of a core facility, they might be willing to ‘spike’ a few of your samples into a run that somebody else was paying for… but that wasn't always an option.) Portable sequencers now make it much more cost-effective to run small pilots quickly. Our group, through the Genomic Opportunities Lab, may be able to help you create pilot dietary DNA data if you are an academic and conservation practitioner. Align Sampling With Ecological InferenceThere is no single “correct” number of samples for a dietary DNA metabarcoding study.
Thoughtful replication—matched to the scale of inference—matters much more than achieving your maximal replication. In conservation research especially, study design should prioritize the ability to detect meaningful ecological differences over the pursuit of exhaustive completeness. We need to get that part of the study right or we risk eroding trust in our credibility. So design your sampling strategy around the question you are trying to answer. The rest follows.
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. |
RSS Feed