CONSERVATION & MOLECULAR ECOLOGY
  • Home
  • Research
    • DNA metabarcoding
    • Conservation Genetics
    • Molecular Parasitology
    • Savanna Ecology
    • Sloth Ecology & Evolution
    • Fray Jorge
    • Yellowstone
  • Resources
    • Publications
    • News
    • Bioinformatics Workshop
    • Protocols
    • Software & Data
  • Impact
    • Conservation
    • Annual Reports
    • Donate
  • Work with us
    • People
    • Join
    • Contract & Collaborate >
      • DNA metabarcoding contracts
      • DNA barcoding
      • Training
  • Contact

Bioinformatics Workshop

We have curated our most popular Software & Data repositories so you can find them easily

Our Lab's GitHub site also provides useful info and resources related to current projects

How Many Samples for a Dietary DNA Study?

2/28/2026

0 Comments

 

How Many Samples Do You Need for a Dietary DNA Study?

Designing a dietary DNA metabarcoding study often begins with a deceptively simple question: How many samples do I really need to collect?

There is not a universally “correct” number. We all want to have a large enough sample size for a powerful analysis. But it can be extremely challenging to collect fresh scat samples from wild animals—especially when they are rare and widespread—and then we face the cost of analyzing what we get.

To answer this question, we need to focus mostly on the ecological inferences we want to make. Are we trying to compare groups? Estimate niche breadth? Detect rare food items? Describe seasonal shifts? The number of samples required to detect differences between sample sets is often very different from the number needed to perfectly catalog everything in a diet. So, I want to share some helpful rules of thumb based on experience across a wide variety of study systems...

What Actually Determines the Sample Size You Need?

Several factors shape how many samples you should target for collection:
  • Individual-level diet variation
  • Population-level diet heterogeneity
  • Temporal variability
  • Diet diversity (specialist vs. generalist feeders)
  • Whether your goal is description or comparison

Highly generalized feeders with diverse prey taxa typically require more replication than specialists with narrow and relatively constant diets. Systems with strong seasonal shifts may require replication across time. Populations occupying heterogeneous landscapes may show greater variance among individuals that can only be quantified with effort.

The most important question is not “How many samples is enough?” but "When does the ecological signal that I need to detect become robust?"

Describing A Diet vs. Comparing Diets

It is much easier to detect differences between diets than to perfectly catalog everything in a diet.

If your goal is exhaustive description—identifying every taxon consumed and estimating its relative contribution—replication requirements can be high, especially in species with diverse diets. Instrumental error becomes a significant concern as well.

If your goal is comparative or experimental—such as:
  • Restored vs. degraded habitat
  • Dry vs. wet season
  • Species A vs. Species B
  • Treatment vs. control
—then structured differences often emerge with fewer samples.

Comparative research designs are powerful because they focus on relative differences rather than perfection. In many conservation contexts, that distinction is critical. Management decisions often hinge on contrasts, and prior information is almost always limited.

A Practical Rule of Thumb

As a general guideline, when I'm asked to recommend a target sample size I usually suggest aiming for 20–30 independent samples per “group.”

A “group” should be defined relative to the goals of the study, such as:
  • A species
  • A species × season combination
  • A population
  • A treatment
  • A site
Figure S2a from Kartzinel et al 2015 (PNAS)
Dietary species accumulation curves for each 7 species of large mammalian herbivores at Mpala Research Centre in Kenya. Figure S2a from the open-access publication by Kartzinel et al 2015 (PNAS).
In many systems that we have studied—including generalized feeders with diverse diets—we find that species accumulation curves approach an asymptote around this level of replication, levels of inter-individual variation can be reliably characterized, and group-level differences stabilize enough for robust comparisons.

This is not an absolute target. It is a practical starting point that balances ecological realism with logistical constraints.

​If you discover diets are less varied, you can scale back to 10-20. We rarely see groups that are still rapidly accumulating dietary taxa beyond 30, though, and that’s the “magic” number we hear about in introductory statistics.

Why You Should Avoid Pooling Samples


One common strategy that people hear about to reduce costs involves combining samples from multiple individuals into a pooled composite sample. I generally recommend against this approach for dietary DNA studies.

Pooling masks inter-individual variation, and that variation is statistically powerful—even when total sample sizes are modest. Quantifying differences among individuals is often essential for revealing ecological structure, niche partitioning, or behavioral flexibility that would otherwise remain hidden.

Even relatively small numbers of independent samples frequently provide more inferential leverage than a few pooled composites.
​
In dietary DNA research, independent replication is far more valuable than artificial composites.

What If You Only Have One or a Few Dietary DNA Samples?

My take: You should analyze them. Does that appear to be a little cavalier? Perhaps. But do it anyway. Here’s why…

A small dataset can still be valuable, particularly when:
  • Your system is understudied
  • Your work is at an exploratory or hypothesis-generating stage of development
  • Your samples were so difficult to obtain that no one is likely to try again anytime soon
  • You are in a unique position to publish diet profiles that will be new to the literature
The smallest sample sizes that I can remember publishing for any species that we have studied was two: for steenbok and the crested porcupine of Kenya. They were part of a broader dataset that included hundreds of samples and dozens of herbivorous species from Kenya in an open-access paper on diet-microbiome linkages that we published in PNAS. Both were exceedingly difficult to sample, despite effort, and those few samples we did manage to collect provided very useful context for comparison with others species that were very well sampled.

The key is being careful about how you interpret the data. Avoid overgeneralizing to entire species or seasons. We must acknowledge uncertainty and treat it appropriately when we have limited replication. But my opinion is the more dietary DNA results researchers publish—when presented properly—the stronger our collective ability to synthesize patterns across systems will become. 
​

Under sampled groups are most problematic when they are over interpreted, not when they are carefully contextualized.
Diets and microbiomes of megafauna. Cover article for PNAS.
Diets and microbiomes of megafauna. Cover article for PNAS. This open-access article included some species that were extremely well sampled and some that had few.

​When “Groups” Aren’t the Primary Unit of Analysis

It is worth noting that not all dietary DNA studies need to rely on predefined groups such as species or treatment categories. In some cases, analytical approaches—particularly unsupervised or minimally supervised machine learning—allow ecological structure to emerge directly from the data without imposing a priori bins. In our recent work in Yellowstone, for example, we used data-driven methods to identify structure in large herbivore diets without defining groups in advance. In analyses like these, the question shifts from “How many samples per group?” to “How well does our sampling capture the underlying ecological variation?” Replication still matters, but it is used to ensure representation of ecological variation rather than to balance sampling across predefined categories. This distinction is subtle but important: thoughtful sampling strategies remain essential, even when group identity is not the central organizing principle of the analysis, but pre-defining target sample sizes may not always be required.

🔗 Post: Do You Even Need “Groups”? Rethinking Replication in Dietary DNA Studies.
🔗 Software & Data: Our DNA metabarcoding pipelines and code are freely available for use.

Pilot Studies Help Structure Dietary DNA Analyses

If you can, it may be wise to consider a pilot study before scaling up. You can:
  • Generate rarefaction curves
  • Estimate among-individual variance
  • Evaluate precision in the data
  • Assess detection consistency

Pilot data can help you develop your final target sample size so you don't over- or under-sample.

It used to be hard to do this due to the cost and labor involved in putting together a full Illumina run: by the time you had enough samples to justify the cost of the run, you might as well complete the whole project in one go. (Sometimes if you knew the director of a core facility, they might be willing to ‘spike’ a few of your samples into a run that somebody else was paying for… but that wasn't always an option.)

Portable sequencers now make it much more cost-effective to run small pilots quickly. Our group, through the Genomic Opportunities Lab, may be able to help you create pilot dietary DNA data if you are an academic and conservation practitioner.

Align Sampling With Ecological Inference

There is no single “correct” number of samples for a dietary DNA metabarcoding study.

Thoughtful replication—matched to the scale of inference—matters much more than achieving your maximal replication. In conservation research especially, study design should prioritize the ability to detect meaningful ecological differences over the pursuit of exhaustive completeness. We need to get that part of the study right or we risk eroding trust in our credibility. So design your sampling strategy around the question you are trying to answer. The rest follows.

Explore More Dietary DNA Content

  • Rethinking replication in dietary studies
  • Metabarcoding versus Stable Isotopes
  • Free protocols: field to lab
  • Freely available software and data
0 Comments

Your comment will be posted after it is approved.


Leave a Reply.

    Categories

    All
    AI
    Bioinformatics Workflows & Pipelines
    DNA Barcoding
    DNA Metabarcoding
    HelmBank
    HPC
    Lab Protocols
    Mapping & Visualization
    Molecular Methods
    Protocols & Methods
    R
    Reference Libraries & Data
    R Tutorials
    Software & Data
    Workflow

    RSS Feed


Interested in supporting impactful conservation genomics?
​Learn how to partner or contribute.
Dr. Tyler Kartzinel
Department of Ecology, Evolution, and Organismal Biology
Institute at Brown for Environment and Society
Brown University
​Address: 85 Waterman Street, Providence, Rhode Island 02912 USA
Office: 246(B)
​Lab (pre-PCR): 244
​Lab (post-PCR): 230
​Phone: 1-401-863-5851
tyler_kartzinel[at]brown.edu
Disclaimer: views expressed on this site are those of the author. They should not be interpreted as opinions or policies held by his employer, collaborators, or lab members. Mention of trade names or commercial products does not constitute endorsement.

Copyright 2017-2026 © Tyler Kartzinel
​Privacy Policy
  • Home
  • Research
    • DNA metabarcoding
    • Conservation Genetics
    • Molecular Parasitology
    • Savanna Ecology
    • Sloth Ecology & Evolution
    • Fray Jorge
    • Yellowstone
  • Resources
    • Publications
    • News
    • Bioinformatics Workshop
    • Protocols
    • Software & Data
  • Impact
    • Conservation
    • Annual Reports
    • Donate
  • Work with us
    • People
    • Join
    • Contract & Collaborate >
      • DNA metabarcoding contracts
      • DNA barcoding
      • Training
  • Contact