<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" >

<channel><title><![CDATA[CONSERVATION & MOLECULAR ECOLOGY - Bioinformatics Workshop]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop]]></link><description><![CDATA[Bioinformatics Workshop]]></description><pubDate>Sat, 02 May 2026 19:26:55 -0400</pubDate><generator>Weebly</generator><item><title><![CDATA[HelmBank Release R1]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/helmbank-release-r1]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/helmbank-release-r1#comments]]></comments><pubDate>Fri, 13 Mar 2026 20:50:33 GMT</pubDate><category><![CDATA[DNA barcoding]]></category><category><![CDATA[HelmBank]]></category><category><![CDATA[Reference Libraries & Data]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/helmbank-release-r1</guid><description><![CDATA[New DNA barcodes for wildlife helminths now available (HelmBank Release R1)  &#8203;The first public release of data from the HelmBank project is now live with Release R1. This release provides a new set of voucher-linked parasite DNA barcode records designed to improve how helminths are detected and identified from wildlife. This release adds 45 barcodes from 20 newly sequenced specimens with data for the markers COI, 16S, and ITS. Release R1 is part of a larger, growing, and actively curated c [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">New DNA barcodes for wildlife helminths now available (HelmBank Release R1)</h2>  <div class="paragraph">&#8203;The first public release of data from the HelmBank project is now live with Release R1. This release provides a new set of voucher-linked parasite DNA barcode records designed to improve how helminths are detected and identified from wildlife. This release adds 45 barcodes from 20 newly sequenced specimens with data for the markers COI, 16S, and ITS. Release R1 is part of a larger, growing, and actively curated collection of parasite specimens from wildlife. It currently focuses on the helminth parasites of Neotropical mammals, but coverage is quickly expanding to include a broader array of host taxa.</div>  <div id="440117607890141198"><div><style type="text/css">	#element-530e7163-047d-4aa9-8b4b-79dfa273e910 .callout-box-wrapper {  padding: 20px 0px;  word-wrap: break-word;}#element-530e7163-047d-4aa9-8b4b-79dfa273e910 .callout-box--standard {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;}#element-530e7163-047d-4aa9-8b4b-79dfa273e910 .callout-box--material {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;  box-shadow: 0 0 20px rgba(0,0,0,0.15);}#element-530e7163-047d-4aa9-8b4b-79dfa273e910 .callout-base {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;}#element-530e7163-047d-4aa9-8b4b-79dfa273e910 .material {  box-shadow: 0 0 20px rgba(0,0,0,0.15);}</style><div id="element-530e7163-047d-4aa9-8b4b-79dfa273e910" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents">	<div class="callout-box-wrapper">	<div class="callout-box--standard">	    <div class="element-content">	        <div style="width: auto"><div></div><h2 class="wsite-content-title" style="text-align:center;">Quick links</h2><div class="paragraph" style="text-align:center;"><a href="http://www.boldsystems.org/index.php/Public_SearchTerms?query=DS-HELMBR1" target="_blank">Download / access Release R1</a>&nbsp;|&nbsp;<a href="http://www.boldsystems.org/index.php/Public_SearchTerms?query=DS-HELMBR1" target="_blank">&#8203;</a><a href="https://www.kartzinellab.com/molecular-parasitology.html">Project overview</a>&nbsp;|&nbsp;<a href="https://www.kartzinellab.com/protocols.html">Lab protocols</a></div></div>	    </div>	</div></div></div><div style="clear:both;"></div></div></div>  <div>  <!--BLOG_SUMMARY_END--></div>  <h2 class="wsite-content-title" style="text-align:center;">Why this matters</h2>  <div class="paragraph">Parasites are a major&mdash;often overlooked&mdash;component of biodiversity, and they are central to wildlife health, conservation decision-making, and disease surveillance strategies (e.g., One Health). Yet accurate molecular identification depends on reference sequences that are expert-verified and relevant to local hosts and regions&mdash;resources that remain sparse for wildlife systems in the tropical Americas.<br /><br />HelmBank&rsquo;s goal is to translate parasite specimens collected from wildlife into high-quality reference data: DNA barcodes tied to morphological identifications, host species, and collection metadata, so that researchers and practitioners can more reliably:<ul><li>assign taxa based on sequences detected using DNA-based monitoring</li><li>compare parasite communities across hosts and regions in molecular ecology</li><li>support veterinary and One Health investigations at wildlife&ndash;livestock&ndash;human interfaces</li></ul></div>  <h2 class="wsite-content-title" style="text-align:center;">The bigger picture: what&rsquo;s currently under production</h2>  <div class="paragraph">&#8203;Release R1 is the first public slice of a larger curated collection. The current sample dataset includes 105 parasite specimens linked to host records spanning at least seven mammal orders&mdash;a cross-section of Neotropical wildlife that is frequently the target of conservation and wildlife health programs but yet badly under sampled for parasite diversity.<br /><br /><strong>Host breadth</strong><br />The working collection includes helminths from diverse mammal hosts, including:<ul><li><strong>Wild felids</strong>: ocelot (<em>Leopardus pardalis</em>), jaguar (<em>Panthera onca</em>), cougar/puma (<em>Puma concolor</em>), jaguarundi (<em>Herpailurus yagouaroundi</em>)</li><li><strong>Sloths &amp; anteaters</strong>: brown-throated three-toed sloth (<em>Bradypus variegatus</em>), Hoffmann&rsquo;s two-toed sloth (<em>Choloepus hoffmanni</em>), southern tamandua (<em>Tamandua tetradactyla</em>), giant anteater (<em>Myrmecophaga tridactyla</em>)</li><li><strong>Large-bodied mammals</strong>: lowland tapir (<em>Tapirus terrestris</em>), white-lipped peccary (<em>Tayassu pecari</em>)</li><li><strong>Wild canids</strong>: crab-eating fox (<em>Cerdocyon thous</em>)</li><li><strong>Opossums</strong>: big-eared opossum (<em>Didelphis aurita</em>), white-eared opossum (<em>Didelphis albiventris</em>), gray four-eyed opossum (<em>Philander quica</em>)</li><li><strong>Armadillos</strong>: nine-banded armadillo (<em>Dasypus novemcinctus</em>), southern three-banded armadillo (<em>Tolypeutes matacus</em>), screaming hairy armadillo (<em>Chaetophractus vellerosus</em>), large hairy armadillo (<em>Chaetophractus villosus</em>)</li><li><span><strong>Livestock</strong>: </span><span>domestic goat from rural Argentina (<em>Capra aegagrus hircus</em>)</span></li></ul><br /><u>Host breadth is intentional</u>: it makes the resulting reference sequences more likely to improve parasite detection and identification in real-world monitoring programs&mdash;where wildlife samples come from a mix of species, landscapes, and levels of contact with people and livestock.<br /><br /><strong>Parasite breadth</strong><br />Across these hosts, the working collection currently spans three major helminth phyla:<ul><li><strong>Roundworms</strong> (Nematoda)</li><li><strong>Flatworms</strong> (Platyhelminthes) &mdash; including cestodes and trematodes</li><li><strong>Thorny-headed worms</strong> (Acanthocephala)</li></ul><br />HelmBank is being built with traceability and verification as first principles: Expert morphological identifications are captured alongside molecular data whenever possible. Ongoing curation includes taxonomic updates and corrections, including review against external resources.</div>  <div id="174662545227293573"><div><style type="text/css">	#element-bdc5f9bf-a18d-4001-8828-0c9cb96e8f33 .callout-box-wrapper {  padding: 20px 0px;  word-wrap: break-word;}#element-bdc5f9bf-a18d-4001-8828-0c9cb96e8f33 .callout-box--standard {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;}#element-bdc5f9bf-a18d-4001-8828-0c9cb96e8f33 .callout-box--material {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;  box-shadow: 0 0 20px rgba(0,0,0,0.15);}#element-bdc5f9bf-a18d-4001-8828-0c9cb96e8f33 .callout-base {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;}#element-bdc5f9bf-a18d-4001-8828-0c9cb96e8f33 .material {  box-shadow: 0 0 20px rgba(0,0,0,0.15);}</style><div id="element-bdc5f9bf-a18d-4001-8828-0c9cb96e8f33" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents">	<div class="callout-box-wrapper">	<div class="callout-box--material">	    <div class="element-content">	        <div style="width: auto"><div></div><div class="paragraph" style="text-align:center;"><strong><span>The current release of HelmBank includes data from at least one new&mdash;previously undescribed&mdash;lineage of parasites that we will formally describe and update soon.&nbsp;</span></strong></div></div>	    </div>	</div></div></div><div style="clear:both;"></div></div></div>  <h2 class="wsite-content-title" style="text-align:center;">What&rsquo;s included in HelmBank Release R1</h2>  <div class="paragraph">Release R1 of HelmBank includes a priority set of initial "field-to-sequence" data that we can publish now, even while the broader collection continues through verification and sequencing. These data originate from 20 helminth specimens and include 45 new barcodes from&nbsp;<span>COI (N = 11), 16S (N = 19), and ITS1 (N = 15).</span><br /><br />Release R1 includes an initial subset of sequenced specimens linked to hosts that span multiple mammalian lineages of the Neotropics.<ul><li>Lowland tapir (<em>Tapirus terrestris</em>)</li><li>Sloths: brown-throated three-toed sloth (<em>Bradypus variegatus</em>), Hoffmann&rsquo;s two-toed sloth (<em>Choloepus hoffmanni</em>)</li><li>Opossums: big-eared opossum (<em>Didelphis aurita</em>), white-eared opossum (<em>Didelphis albiventris</em>), gray four-eyed opossum (<em>Philander quica</em>)</li><li>Armadillos: nine-banded armadillo (<em>Dasypus novemcinctus</em>), screaming hairy armadillo (<em>Chaetophractus vellerosus</em>)</li></ul><br /><span>Currently identified parasitic helminths associated with these hosts</span>&nbsp;span two phyla: Nematoda and Platyhelminths:<ul><li>Physalopteridae (<em>Physaloptera</em>, <em>Turgida</em>)</li><li>Kathlaniidae (<em>Cruzia</em>)</li><li>Aspidoderidae (<em>Aspidodera</em>)</li><li>Spirocercidae (<em>Paraleiuris</em>, <em>Physocephalus</em>)</li><li>Trichuridae (<em>Trichuris</em>)</li><li>Strongylidae (<em>Neomurshidia</em>)</li><li>Spiruridae (<em>Tejeraia</em>)</li><li>&hellip;and one as-yet unidentified Cestoda (Anoplocephalidae).</li></ul><br />Several specimens in the release are resolved to species level, examples including:<ul><li><em>Cruzia tentaculata</em></li><li><em>Turgida turgida</em></li><li><em>Physaloptera papilontruncata</em></li><li><em>Aspidodera scoleciformis</em></li></ul><br />Not every record is species-level, reflecting real-world constraints on the painstaking work of identifying parasites by microscope when they vary in specimen condition, life stage, and taxonomic group. Future releases will increase resolution over time as identifications are refined and the reference library expands.</div>  <h2 class="wsite-content-title" style="text-align:center;">How to use this release (and who it&rsquo;s for)</h2>  <div class="paragraph"><u>Wildlife biology &amp; conservation</u>: improve detection and interpretation of helminths as part of biodiversity monitoring and host health assessments.<br /><br /><u>Parasite diversity research</u>: compare host-associated parasite communities and flag potential new lineages and host records.<br /><br /><u>One Health &amp; veterinary applications</u>: support parasite surveillance where wildlife, domestic animals, and humans interact.<br /><br /><u>Molecular ecology &amp; bioinformatics</u>: use HelmBank as a reference resource for read assignment, benchmarking, and reproducible pipelines.</div>  <div class="paragraph" style="text-align:center;">If you'd like to use the data, please cite DOI&nbsp;<a href="http://dx.doi.org/10.5883/DS-HELMBR1" target="_blank"><span>dx.doi.org/10.5883/DS-HELMBR1</span><br /></a>Guidance on peer-reviewed citation (*<em>coming soon</em>*)</div>  <h2 class="wsite-content-title" style="text-align:center;">What's next</h2>  <div class="paragraph">Future releases of HelmBank will expand both host and parasite coverage beyond R1, including additional helminth groups already present in the working collection (e.g., acanthocephalans and trematodes) and new host taxa (e.g., wildcats, canids, peccaries, and anteaters) as sequences and metadata packages.<br /><br /><strong>To stay updated</strong>:<ul><li><a href="https://www.kartzinellab.com/bioinformatics-workshop/category/helmbank" target="_blank">Review the HelmBank releases index for updates</a></li><li><a href="https://www.kartzinellab.com/molecular-parasitology.html">Explore the project overview</a></li><li><a href="https://www.kartzinellab.com/news/category/parasites" target="_blank">Follow our news &amp; updates about parasites</a></li></ul></div>  <h2 class="wsite-content-title" style="text-align:center;">Partners, contributions, and collaborations</h2>  <div class="paragraph">HelmBank exists because <span>veterinarians,&nbsp;</span>field biologists, parasitologists, and molecular ecologists are working together to connect specimens to sequence to metadata. If you are interested in collaborating&mdash;especially around under-sampled host taxa, new regions, or integration with monitoring programs&mdash;<strong>please reach out</strong>.<ul><li><a href="https://www.kartzinellab.com/work-with-us.html">Work with us</a></li><li><a href="https://www.kartzinellab.com/contact.html">Contact</a></li></ul></div>]]></content:encoded></item><item><title><![CDATA[Rethinking Replication in Dietary DNA Studies]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/rethinking-replication-in-dietary-dna-studies]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/rethinking-replication-in-dietary-dna-studies#comments]]></comments><pubDate>Wed, 04 Mar 2026 17:02:44 GMT</pubDate><category><![CDATA[AI]]></category><category><![CDATA[DNA metabarcoding]]></category><category><![CDATA[Protocols & Methods]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/rethinking-replication-in-dietary-dna-studies</guid><description><![CDATA[Do You Even Need &ldquo;Groups&rdquo;? Rethinking Replication in Dietary DNA Studies  In many dietary DNA metabarcoding studies, sampling and replication tends to be framed around predefined groups:Species A vs. Species BDry season vs. wet seasonTreatment vs. controlPopulation 1 vs. Population 2We are taught to ask ourselves: How many samples do we need to collect per group for a statistically robust sampling design?But what if group identity does not need to be the primary unit of analysis in t [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">Do You Even Need &ldquo;Groups&rdquo;? Rethinking Replication in Dietary DNA Studies</h2>  <div class="paragraph">In many dietary DNA metabarcoding studies, sampling and replication tends to be framed around predefined groups:<ul><li>Species A vs. Species B</li><li>Dry season vs. wet season</li><li>Treatment vs. control</li><li>Population 1 vs. Population 2</li></ul><br />We are taught to ask ourselves: <em>How many samples do we need to collect per group for a statistically robust sampling design?</em><br /><br />But what if group identity does not need to be the primary unit of analysis in the first place?<br /><br />Recent analytical approaches &mdash; including the use of unsupervised and minimally supervised machine learning tools &mdash; allow ecological patterns to emerge directly from dietary data without requiring us to impose <em>a priori</em>&nbsp;sampling categories on the "groups' that we have under study. When that happens, the logic of replication changes.<br /><br />Replication still matters.<br />But <em>why</em> it matters is different.</div>  <div>  <!--BLOG_SUMMARY_END--></div>  <h2 class="wsite-content-title" style="text-align:center;">The Traditional Logic: Balanced Groups</h2>  <div class="paragraph">In classical statistics, we target meaningful sample sizes with replication that serves to:<ul><li>Estimate within-group variance</li><li>Compare means or multivariate centroids</li><li>Test differences between predefined categories</li></ul><br />In this framework, sample size per group is central. Balance matters and power analyses often assume categorical groups are the target for sampling.<br /><span>&#128279; Post:&nbsp;</span><a href="https://www.kartzinellab.com/bioinformatics-workshop/how-many-samples-for-a-dietary-dna-study">How Many Samples for a Dietary DNA study</a>?<br /><br />This logic remains powerful and appropriate in many ecology and conservation contexts &mdash; especially when management decisions hinge on explicit contrasts (e.g., restored vs. degraded habitat).<ul><li>Experimental design remains one of the core skills that ecologists need to learn and practice</li><li>Experimental manipulations remain the gold standard for identifying causal mechanisms in ecology</li><li>Experimental field studies are among our group's&nbsp;favorite and most productive lines of research; we use them for <a href="https://www.kartzinellab.com/research.html">understanding food webs at study sites</a> around the world</li></ul><br />But it is not the only analytical pathway available.</div>  <h2 class="wsite-content-title" style="text-align:center;">When Structure Emerges Without Predefined Categories</h2>  <div class="paragraph">In <a href="https://www.pnas.org/doi/10.1073/pnas.2502691122" target="_blank">our recent study published</a> in <em>Proceedings of the National Academy of Sciences</em>, we analyzed large-herbivore diets from Yellowstone National Park using relatively simple machine learning tools.<br /><br />Rather than defining groups such as &ldquo;species &times; season&rdquo; in advance, we allowed patterns in the dietary data to organize themselves. The <a href="https://www.kartzinellab.com/bioinformatics-workshop/hot-off-the-press-code-from-hoff-et-al-2025-pnas-paper">algorithms we used</a> identified structure based on shared dietary signals &mdash; revealing ecological gradients and overlap that outshined any categorical labels that we could have applied.<br />&#8203;<br />This does not mean species identity was irrelevant, but it does mean that&nbsp;structure can emerge from the data without requiring us to focus on it.<br /><br />This insight really matters for study design.</div>  <h2 class="wsite-content-title" style="text-align:center;">If Groups Aren&rsquo;t Predefined, What Does Replication Accomplish?</h2>  <div class="paragraph">When you remove predefined groupings, replication no longer needs to be a central target that we use to achieve balanced sampling of all categories.<br /><br />Instead, it serves to:<ul><li>Represent the full range of ecological variation present in a system</li><li>Capture dietary heterogeneity across individuals, regardless of <em>a priori</em> grouping(s)</li><li>Cover multidimensional dietary space</li><li>Stabilize pattern detection in high-dimensional data</li></ul><br />In this context, the central question shifts from:<br /><br /><em>&ldquo;How many samples do I need to find per group?&rdquo;</em><br /><br />to:<br /><br /><em>&ldquo;How well does my sampling capture the ecological variation that exists out there in nature?&rdquo;<br /></em><br />Replication is still essential &mdash; but its purpose is representational rather than categorical.</div>  <h2 class="wsite-content-title" style="text-align:center;">Representation vs. Balance</h2>  <div class="paragraph">There is a subtle but important distinction in these concepts.<br /><br />In group-based sampling designs, you pay attention to:<ul><li>Equal replication across treatments</li><li>Sufficient power to detect differences</li></ul><br />In any structure-detection framework, you pay more attention to:<ul><li>Coverage of environmental gradients</li><li>Inclusion of rare but informative dietary signals</li><li>Adequate sampling across behavioral diversity</li></ul><br />An imbalanced dataset that you might worry about in a classic experiment can still become extremely informative if it adequately spans ecological space. Conversely, perfectly balanced groups can still miss important gradients if sampling is narrow.<br />&#8203;<br />The emphasis shifts from symmetry to coverage.</div>  <h2 class="wsite-content-title" style="text-align:center;">What This Does <em>Not</em> Mean...</h2>  <div class="paragraph">The distinction between representation and balance does not mean:<ul><li>Sample size no longer matters</li><li>Small datasets are good enough</li><li>Machine learning eliminates the need for thoughtful sampling design</li><li>The field is moving beyond the need for mechanistic experiments to understand nature</li></ul><br />High-dimensional dietary data can be noisy, and sparse sampling can produce unstable clustering or misleading gradients. Overinterpretation remains a risk, especially if find ourselves using data-hungry approaches without enough raw data to feed them.<br /><br />Unsupervised approaches require at least as much planning and careful consideration as traditional comparisons &mdash; the key&nbsp;distinction is all about what we are replicating and how it supports inference.</div>  <h2 class="wsite-content-title" style="text-align:center;">When This Approach Is Especially Powerful</h2>  <div class="paragraph">Structure-first approaches can be particularly useful when:<ul><li>Group boundaries are biologically fuzzy</li><li>Species groups overlap extensively in diet</li><li>Environmental gradients are continuous rather than discrete</li><li>You suspect hidden structure not captured by simple categories</li></ul><br />In <a href="https://www.kartzinellab.com/yellowstone.html">Yellowstone National Park</a>, for example, dietary overlap among large herbivores is shaped by shared landscapes, seasonal dynamics, and plant community structure. <br /><br />&#8203;Allowing patterns to emerge from the data helped reveal how <a href="https://www.kartzinellab.com/news/story-behind-the-science-yellowstone-wildlife-diets">ecological organization did not always align predefined groupings</a>.</div>  <h2 class="wsite-content-title" style="text-align:center;">So How Many Samples Do You Need</h2>  <div class="paragraph">Even in structure-detection studies, replication helps lead us toward reliable inferences.<br /><br />More samples:<ul><li>Better represent ecological space</li><li>Improve detection of subtle gradients</li><li>Reduce sensitivity to outliers</li><li>Strengthen generalizability</li></ul><br />In many practical cases, replication levels similar to those recommended for comparative designs should be the target (e.g., ~<a href="https://www.kartzinellab.com/bioinformatics-workshop/how-many-samples-for-a-dietary-dna-study">20&ndash;30 independent samples</a> per ecological unit). This is not because groups must be balanced, but because ecological variation must be adequately sampled.<br /><br />The logic evolves, but the need for replication does not disappear.</div>  <h2 class="wsite-content-title" style="text-align:center;">Designing Studies Without Groups in Mind</h2>  <div class="paragraph">The broader takeaway is not that predefined groups are obsolete.<br /><br />It is that dietary DNA metabarcoding now supports multiple analytical frameworks:<ul><li>Hypothesis-driven comparisons</li><li>Gradient-based inference</li><li>Clustering and dimensionality reduction</li><li>Hybrid approaches combining categorical and emergent structure</li></ul><br />Thoughtful study design requires aligning replication with the type of inference you intend to make.<br /><br />If your management question hinges on specific contrasts, group-based replication remains essential.<br /><br />If your goal is to detect latent structure or ecological gradients, prioritize coverage of variation across individuals and environments.<br /><br />In both cases, replication fuels our inference &mdash; it doesn't lead us to the truth about nature on its own.<br></div>  <h2 class="wsite-content-title" style="text-align:center;">The Bigger Picture</h2>  <div class="paragraph">As dietary DNA datasets grow and analytical tools diversify, we may see a shift from strictly categorical thinking toward more flexible representations of ecological structure.<br /><br />That shift does not reduce the importance of sampling.<ul><li>It raises the bar for intentional study design.</li><li>Replication is no longer only about statistical power between bins.</li><li>It is about faithfully representing ecological reality.</li></ul> <br />And that remains the central goal in our efforts to achieve&nbsp;<a href="https://www.kartzinellab.com/impact.html">impact in conservation</a> through molecular ecology.</div>]]></content:encoded></item><item><title><![CDATA[How Many Samples for a Dietary DNA Study?]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/how-many-samples-for-a-dietary-dna-study]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/how-many-samples-for-a-dietary-dna-study#comments]]></comments><pubDate>Sat, 28 Feb 2026 18:49:32 GMT</pubDate><category><![CDATA[DNA metabarcoding]]></category><category><![CDATA[Protocols & Methods]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/how-many-samples-for-a-dietary-dna-study</guid><description><![CDATA[How Many Samples Do You Need for a Dietary DNA Study?  Designing a dietary DNA metabarcoding study often begins with a deceptively simple question: How many samples do I really need to collect? There is not a universally &ldquo;correct&rdquo; number. We all want to have a large enough sample size for a powerful analysis. But it can be extremely challenging to collect fresh scat samples from wild animals&mdash;especially when they are rare and widespread&mdash;and then we face the cost of analyzi [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">How Many Samples Do You Need for a Dietary DNA Study?</h2>  <div class="paragraph">Designing a dietary DNA metabarcoding study often begins with a deceptively simple question: <strong>How many samples do I really need to collect?</strong> <br /><br />There is not a universally &ldquo;correct&rdquo; number. We all want to have a large enough sample size for a powerful analysis. But it can be extremely challenging to collect fresh scat samples from wild animals&mdash;especially when they are rare and widespread&mdash;and then we face the cost of analyzing what we get.<br /><br />To answer this question, we need to focus mostly on the ecological inferences we want to make. Are we trying to compare groups? Estimate niche breadth? Detect rare food items? Describe seasonal shifts? The number of samples required to detect differences between sample sets is often very different from the number needed to perfectly catalog everything in a diet. So, I want to share some helpful rules of thumb based on experience across a wide variety of study systems...<br></div>  <div>  <!--BLOG_SUMMARY_END--></div>  <h2 class="wsite-content-title" style="text-align:center;">What Actually Determines the Sample Size You Need?</h2>  <div class="paragraph">Several factors shape how many samples you should target for collection:<ul><li><strong>Individual-level diet variation</strong></li><li><strong>Population-level diet heterogeneity</strong></li><li><strong>Temporal variability</strong></li><li><strong>Diet diversity (specialist vs. generalist feeders)</strong></li><li><strong>Whether your goal is description or comparison</strong></li></ul><br />Highly generalized feeders with diverse prey taxa typically require more replication than specialists with narrow and relatively constant diets. Systems with strong seasonal shifts may require replication across time. Populations occupying heterogeneous landscapes may show greater variance among individuals that can only be quantified with effort.<br /><br />The most important question is not &ldquo;How many samples is enough?&rdquo; but <strong>"When does the ecological signal that I need to detect become robust?"</strong><br></div>  <h2 class="wsite-content-title" style="text-align:center;">Describing A Diet vs. Comparing Diets</h2>  <div class="paragraph">It is <strong>much easier to detect differences</strong> between diets than to perfectly catalog everything in a diet.<br /><br />If your goal is exhaustive description&mdash;identifying every taxon consumed and estimating its relative contribution&mdash;replication requirements can be high, especially in species with diverse diets. Instrumental error becomes a significant concern as well.<br /><br />If your goal is comparative or experimental&mdash;such as:<ul><li>Restored vs. degraded habitat</li><li>Dry vs. wet season</li><li>Species A vs. Species B</li><li>Treatment vs. control</li></ul>&mdash;then structured differences often emerge with fewer samples.<br /><br />Comparative research designs are powerful because they focus on relative differences rather than perfection. In many conservation contexts, that distinction is critical. Management decisions often hinge on contrasts, and prior information is almost always limited.<br></div>  <h2 class="wsite-content-title" style="text-align:center;">A Practical Rule of Thumb</h2>  <div class="paragraph">As a general guideline, when I'm asked to recommend a target sample size I usually suggest aiming for 20&ndash;30 independent samples per &ldquo;group.&rdquo;<br /><br />A &ldquo;group&rdquo; should be defined relative to the goals of the study, such as:<ul><li>A species</li><li>A species &times; season combination</li><li>A population</li><li>A treatment</li><li>A site</li></ul></div>  <div id="751111054149425217"><div><style type="text/css">	#element-51721575-30ca-4461-9cd9-00502da91e0a .callout-box-wrapper {  padding: 20px 0px;  word-wrap: break-word;}#element-51721575-30ca-4461-9cd9-00502da91e0a .callout-box--standard {  border: 1px solid #E0E0E0;  background: #ffffff;  padding: 1px 1px;}#element-51721575-30ca-4461-9cd9-00502da91e0a .callout-box--material {  border: 1px solid #E0E0E0;  background: #ffffff;  padding: 1px 1px;  box-shadow: 0 0 20px rgba(0,0,0,0.15);}#element-51721575-30ca-4461-9cd9-00502da91e0a .callout-base {  border: 1px solid #E0E0E0;  background: #ffffff;  padding: 1px 1px;}#element-51721575-30ca-4461-9cd9-00502da91e0a .material {  box-shadow: 0 0 20px rgba(0,0,0,0.15);}</style><div id="element-51721575-30ca-4461-9cd9-00502da91e0a" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents">	<div class="callout-box-wrapper">	<div class="callout-box--material">	    <div class="element-content">	        <div style="width: auto"><div></div><div><div class="wsite-image wsite-image-border-none " style="padding-top:10px;padding-bottom:10px;margin-left:0px;margin-right:0px;text-align:center"><a href='https://www.pnas.org/doi/10.1073/pnas.1503283112' target='_blank'><img src="https://www.kartzinellab.com/uploads/9/2/7/9/92793766/editor/kartzinel-et-al-2015-figure-s2a.png?1772305566" alt="Figure S2a from Kartzinel et al 2015 (PNAS)" style="width:425;max-width:100%" /></a><div style="display:block;font-size:90%">Dietary species accumulation curves for each 7 species of large mammalian herbivores at Mpala Research Centre in Kenya. Figure S2a from the open-access publication by Kartzinel et al 2015 (PNAS).</div></div></div></div>	    </div>	</div></div></div><div style="clear:both;"></div></div></div>  <div class="wsite-spacer" style="height:11px;"></div>  <div class="paragraph"><span>In many systems that we have studied&mdash;including generalized feeders with diverse diets&mdash;we find that species accumulation curves approach an asymptote around this level of replication, levels of inter-individual variation can be reliably characterized, and group-level differences stabilize enough for robust comparisons.</span><br /><br /><span>This is not an absolute target. It is a practical starting point that balances ecological realism with logistical constraints.</span><br /><br /><span>&#8203;If you discover diets are less varied, you can scale back to 10-20. We rarely see groups that are still rapidly accumulating dietary taxa beyond 30, though, and that&rsquo;s the &ldquo;magic&rdquo; number we hear about in introductory statistics.</span></div>  <h2 class="wsite-content-title" style="text-align:center;">Why You Should Avoid Pooling Samples</h2>  <div class="paragraph"><br />One common strategy that people hear about to reduce costs involves combining samples from multiple individuals into a pooled composite sample. I generally recommend against this approach for dietary DNA studies.<br /><br />Pooling masks inter-individual variation, and that variation is statistically powerful&mdash;even when total sample sizes are modest. Quantifying differences among individuals is often essential for revealing ecological structure, niche partitioning, or behavioral flexibility that would otherwise remain hidden.<br /><br />Even relatively small numbers of independent samples frequently provide more inferential leverage than a few pooled composites.<br />&#8203;<br />In dietary DNA research, independent replication is far more valuable than artificial composites.<br></div>  <h2 class="wsite-content-title" style="text-align:center;">What If You Only Have One or a Few Dietary DNA Samples?</h2>  <div class="paragraph">My take: You should analyze them. Does that appear to be a little cavalier? Perhaps. But do it anyway. Here&rsquo;s why&hellip;<br /><br />A small dataset can still be valuable, particularly when:<ul><li>Your system is understudied</li><li>Your work is at an exploratory or hypothesis-generating stage of development</li><li>Your samples were so difficult to obtain that no one is likely to try again anytime soon</li><li>You are in a unique position to publish diet profiles that will be new to the literature</li></ul></div>  <div><div class="wsite-multicol"><div class="wsite-multicol-table-wrap" style="margin:0 -15px;">	<table class="wsite-multicol-table">		<tbody class="wsite-multicol-tbody">			<tr class="wsite-multicol-tr">				<td class="wsite-multicol-col" style="width:72.107969151671%; padding:0 15px;">											<div class="paragraph"><span>The smallest sample sizes that I can remember publishing for any species that we have studied was two: for steenbok and the crested porcupine of Kenya. They were part of a broader dataset that included hundreds of samples and dozens of herbivorous species from Kenya&nbsp;</span><a href="https://www.pnas.org/doi/10.1073/pnas.1905666116" target="_blank">in an open-access paper on diet-microbiome linkages</a><span>&nbsp;that we published in&nbsp;</span><em>PNAS</em><span>. Both were exceedingly difficult to sample, despite effort, and those few samples we did manage to collect provided very useful context for comparison with others species that were very well sampled.</span><br /><br /><span>The key is being careful about how you interpret the data. Avoid overgeneralizing to entire species or seasons. We must acknowledge uncertainty and treat it appropriately when we have limited replication. But my opinion is the more dietary DNA results researchers publish&mdash;when presented properly&mdash;the stronger our collective ability to synthesize patterns across systems will become.&nbsp;<br />&#8203;</span><br /><span>Under sampled groups are most problematic when they are over interpreted, not when they are carefully contextualized.</span></div>									</td>				<td class="wsite-multicol-col" style="width:27.892030848329%; padding:0 15px;">											<div id="898355418534519970"><div><style type="text/css">	#element-311030e5-7e0f-42e1-aebd-23ace21843d0 .callout-box-wrapper {  padding: 20px 0px;  word-wrap: break-word;}#element-311030e5-7e0f-42e1-aebd-23ace21843d0 .callout-box--standard {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 1px 1px;}#element-311030e5-7e0f-42e1-aebd-23ace21843d0 .callout-box--material {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 1px 1px;  box-shadow: 0 0 20px rgba(0,0,0,0.15);}#element-311030e5-7e0f-42e1-aebd-23ace21843d0 .callout-base {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 1px 1px;}#element-311030e5-7e0f-42e1-aebd-23ace21843d0 .material {  box-shadow: 0 0 20px rgba(0,0,0,0.15);}</style><div id="element-311030e5-7e0f-42e1-aebd-23ace21843d0" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents">	<div class="callout-box-wrapper">	<div class="callout-box--material">	    <div class="element-content">	        <div style="width: auto"><div></div><div><div class="wsite-image wsite-image-border-none " style="padding-top:10px;padding-bottom:10px;margin-left:0px;margin-right:0px;text-align:center"><a href='https://www.pnas.org/doi/10.1073/pnas.1905666116' target='_blank'><img src="https://www.kartzinellab.com/uploads/9/2/7/9/92793766/pnas-116-47-coverthumb_orig.jpg" alt="Diets and microbiomes of megafauna. Cover article for PNAS." style="width:auto;max-width:100%" /></a><div style="display:block;font-size:90%">Diets and microbiomes of megafauna. Cover article for PNAS. This open-access article included some species that were extremely well sampled and some that had few.</div></div></div></div>	    </div>	</div></div></div><div style="clear:both;"></div></div></div>									</td>			</tr>		</tbody>	</table></div></div></div>  <h2 class="wsite-content-title" style="text-align:center;">&#8203;When &ldquo;Groups&rdquo; Aren&rsquo;t the Primary Unit of Analysis</h2>  <div class="paragraph">It is worth noting that not all dietary DNA studies need to rely on predefined groups such as species or treatment categories. In some cases, analytical approaches&mdash;particularly unsupervised or minimally supervised machine learning&mdash;allow ecological structure to emerge directly from the data without imposing <em>a priori</em> bins. In our <a href="https://www.pnas.org/doi/10.1073/pnas.2502691122" target="_blank">recent work in Yellowstone</a>, for example, we used data-driven methods to identify structure in large herbivore diets without defining groups in advance. In analyses like these, the question shifts from &ldquo;How many samples per group?&rdquo; to &ldquo;How well does our sampling capture the underlying ecological variation?&rdquo; Replication still matters, but it is used to ensure representation of ecological variation rather than to balance sampling across predefined categories. This distinction is subtle but important: thoughtful sampling strategies remain essential, even when group identity is not the central organizing principle of the analysis, but pre-defining target sample sizes may not always be required.<br /><br /><span>&#128279; Post: </span><a href="https://www.kartzinellab.com/bioinformatics-workshop/rethinking-replication-in-dietary-dna-studies">Do You Even Need &ldquo;Groups&rdquo;? Rethinking Replication in Dietary DNA Studies</a><span>.</span><br /><span>&#128279; Software &amp; Data: Our&nbsp;<a href="https://www.kartzinellab.com/software--data.html">DNA metabarcoding pipelines and code</a>&nbsp;are freely available for use.</span></div>  <h2 class="wsite-content-title" style="text-align:center;">Pilot Studies Help Structure Dietary DNA Analyses</h2>  <div class="paragraph">If you can, it may be wise to consider a pilot study before scaling up. You can:<ul><li>Generate rarefaction curves</li><li>Estimate among-individual variance</li><li>Evaluate precision in the data</li><li>Assess detection consistency</li></ul><br />Pilot data can help you develop your final target sample size so you don't over- or under-sample.<br /><br />It used to be hard to do this due to the cost and labor involved in putting together a full Illumina run: by the time you had enough samples to justify the cost of the run, you might as well complete the whole project in one go. (Sometimes if you knew the director of a core facility, they might be willing to &lsquo;spike&rsquo; a few of your samples into a run that somebody else was paying for&hellip; but that wasn't always an option.)<br /><br />Portable sequencers now make it much more cost-effective to run small pilots quickly.&nbsp;Our group, through the Genomic Opportunities Lab, may be able to help you <a href="https://www.kartzinellab.com/contract--collaborate.html">create pilot dietary DNA data</a> if you are an academic and conservation practitioner.</div>  <h2 class="wsite-content-title" style="text-align:center;">Align Sampling With Ecological Inference</h2>  <div class="paragraph">There is no single &ldquo;correct&rdquo; number of samples for a dietary DNA metabarcoding study.<br /><br />Thoughtful replication&mdash;matched to the scale of inference&mdash;matters much more than achieving your maximal replication. In conservation research especially, study design should prioritize the ability to detect meaningful ecological differences over the pursuit of exhaustive completeness. We need to get that part of the study right or we risk eroding trust in our credibility. So design your sampling strategy around the question you are trying to answer. The rest follows.<br></div>  <div id="904280735272366017"><div><style type="text/css">	#element-6c606a94-c936-42cd-8cb3-f27892c0907f .callout-box-wrapper {  padding: 20px 0px;  word-wrap: break-word;}#element-6c606a94-c936-42cd-8cb3-f27892c0907f .callout-box--standard {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;}#element-6c606a94-c936-42cd-8cb3-f27892c0907f .callout-box--material {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;  box-shadow: 0 0 20px rgba(0,0,0,0.15);}#element-6c606a94-c936-42cd-8cb3-f27892c0907f .callout-base {  border: 1px solid #E0E0E0;  background: #FAFAFA;  padding: 20px 20px;}#element-6c606a94-c936-42cd-8cb3-f27892c0907f .material {  box-shadow: 0 0 20px rgba(0,0,0,0.15);}</style><div id="element-6c606a94-c936-42cd-8cb3-f27892c0907f" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents">	<div class="callout-box-wrapper">	<div class="callout-box--standard">	    <div class="element-content">	        <div style="width: auto"><div></div><h2 class="wsite-content-title">Explore More Dietary DNA Content</h2><div class="paragraph"><ul><li><a href="https://www.kartzinellab.com/bioinformatics-workshop/rethinking-replication-in-dietary-dna-studies">Rethinking replication in dietary studies</a></li><li><a href="https://www.kartzinellab.com/news/metabarcoding-vs-stable-isotopes">Metabarcoding versus Stable Isotopes</a></li><li><a href="https://www.kartzinellab.com/protocols.html">Free protocols: field to lab</a></li><li><a href="https://www.kartzinellab.com/software--data.html">Freely available software and data</a></li></ul></div></div>	    </div>	</div></div></div><div style="clear:both;"></div></div></div>]]></content:encoded></item><item><title><![CDATA[Using AI in Research]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/using-ai-in-research]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/using-ai-in-research#comments]]></comments><pubDate>Sun, 04 Jan 2026 01:08:39 GMT</pubDate><category><![CDATA[AI]]></category><category><![CDATA[Bioinformatics Workflows & Pipelines]]></category><category><![CDATA[workflow]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/using-ai-in-research</guid><description><![CDATA[Guidance on the use of AI in the Kartzinel LabTyler KartzinelLast updated January 2026.Jump to: Rules | Risks | Reasons for Concern | University Links & PoliciesArtificial intelligence is increasingly useful as a tool to improve our research and learning. We use it to troubleshoot code, polish writing, get good ideas about how to visualize data, create document templates that save time on busywork… But at the same time, we must be cognizant of legitimate concerns about the accuracy of informat [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;"><strong>Guidance on the use of AI in the Kartzinel Lab</strong></h2><h2 class="blog-author-title"><font size="3">Tyler Kartzinel</font></h2><p>Last updated January 2026.</p><div class="paragraph" style="text-align:center;">Jump to: <a href="https://www.kartzinellab.com/bioinformatics-workshop/category/artificial-intelligence-ai#rules">Rules</a> | <a href="https://www.kartzinellab.com/bioinformatics-workshop/category/artificial-intelligence-ai#risk">Risks</a> | <a href="https://www.kartzinellab.com/bioinformatics-workshop/category/artificial-intelligence-ai#reasons">Reasons for Concern</a> | <a href="https://www.kartzinellab.com/bioinformatics-workshop/category/artificial-intelligence-ai#links">University Links & Policies</a></div><div class="paragraph">Artificial intelligence is increasingly useful as a tool to improve our research and learning. We use it to troubleshoot code, polish writing, get good ideas about how to visualize data, create document templates that save time on busywork&hellip; But at the same time, we must be cognizant of legitimate concerns about the accuracy of information it can provide, its ability to reuse confidential information that we disclosed in chats, and the risk of short-circuiting our own creative uses of the scientific method.<br><br>This post summarizes rules that lab members should follow when using AI in their work. I do not want to regurgitate the types of dry, legalese we are provided by our employer<span>&mdash;</span>rather I will attempt to illustrate the fine-line we have to walk to ensure we are using the tool appropriately while minimizing the risk of unintended harm. I will summarize reasons for concern using language familiar to biologists and conservationists broadly. Some of the details are specific to researchers at Brown, but I believe the information is readily transferable and I welcome others to use this document as a template for their own policies.<br><br>&#8203;Please read on...&nbsp;</div><div><!--BLOG_SUMMARY_END--></div><div><div id="161259434784551724" align="left" style="width: 100%; overflow-y: hidden;" class="wcustomhtml"><a id="rules"></a></div></div><h2 class="wsite-content-title" style="text-align:center;"><strong>Rules from the lab</strong><br></h2><div class="paragraph">1.&nbsp;&nbsp;&nbsp;&nbsp; Never upload or paste your data, code, or scientific writing into a commercial AI platform. Doing so would risk violating legal requirements concerning confidentiality and the use of original data, while compromising our collective efforts to advance the bounds of knowledge.<br>&nbsp;<br>2.&nbsp;&nbsp;&nbsp;&nbsp; Always use Brown University&rsquo;s internal <a href="https://librechat.ccv.brown.edu/login" target="_blank">LibreChat system</a> if you want to use AI to check your work or save you time. This system provides access to many of the same AI tools that researchers often access online&mdash;ChatGPT, Claude, Canva AI, etc.&mdash;but it does not share our data with external tech companies.<br>&nbsp;<br>3.&nbsp;&nbsp;&nbsp;&nbsp; If you use commercial AI platforms&mdash;including Google Search with AI&mdash;do so with extreme care to ensure you do not violate Rule #1. Only do so if you can be certain that you are entering completely generic prompts that cannot be connected to your work in the lab&mdash;if there is any doubt, use LibreChat instead.<br>&nbsp;<br>4.&nbsp;&nbsp;&nbsp;&nbsp; Be able to independently verify all insights and information you obtain from these platforms when asked. The baseline quality of information these platforms can provide is limited based on the quality of information sources they are using&mdash;they make mistakes and have the potential to mislead with some frequency. It is worth remembering that our job is to generate information that no one has ever had before and thus you cannot rely on these tools for information that you are not independently verifying yourself.</div><div><div id="691412935985936279" align="left" style="width: 100%; overflow-y: hidden;" class="wcustomhtml"><a id="risk"></a></div></div><h2 class="wsite-content-title" style="text-align:center;"><strong>Illustrating the risk of making inappropriate disclosures without intending to</strong></h2><div class="paragraph">&#8203;Read each of the four pairs examples and consider differences between each query is constructed:<br></div><div id="606730914954747593"><div><div id="element-95986cc2-f325-40f4-bbc6-da805e583a68" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents"><div class="callout-box-wrapper"><div class="callout-box--material"><div class="element-content"><div style="width: auto"><div></div><div id="595537061222847433"><div><div id="element-581aa1f7-881e-41ac-9956-70392fbc7942" data-platform-element-id="702688850553606843-1.4.3" class="platform-element-contents"><div class="simple-table-wrapper"><table class="simple-table style-basic"><tr><td class="cell"><div class="paragraph"><em>Can you provide code that I can use to align new sequences to a reference genome?</em></div></td><td class="cell"><div class="paragraph">Can you align these new sequences to the reference genome?</div></td></tr></table></div></div><div style="clear:both;"></div></div></div></div></div></div></div></div><div style="clear:both;"></div></div></div><div id="228313424232618839"><div><div id="element-36f825c0-201e-45e9-aefe-69328768c1e8" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents"><div class="callout-box-wrapper"><div class="callout-box--material"><div class="element-content"><div style="width: auto"><div></div><div id="230758510241872724"><div><div id="element-6f4ccc24-54ae-4e1e-a79f-8882043ef19f" data-platform-element-id="702688850553606843-1.4.3" class="platform-element-contents"><div class="simple-table-wrapper"><table class="simple-table style-basic"><tr><td class="cell"><div class="paragraph"><span>What are appropriate statistical frameworks to account for spatial autocorrelation in sample data?</span><br></div></td><td class="cell"><div class="paragraph"><span>What should I do to account for statistical autocorrelation in these data?</span><br></div></td></tr></table></div></div><div style="clear:both;"></div></div></div></div></div></div></div></div><div style="clear:both;"></div></div></div><div id="855357263551103618"><div><div id="element-0c2f06f0-4a36-4e2d-9897-7dbaf679cdf4" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents"><div class="callout-box-wrapper"><div class="callout-box--material"><div class="element-content"><div style="width: auto"><div></div><div id="680922776684546265"><div><div id="element-6016ecf0-e4a3-45b1-8f69-47feb4e6c82d" data-platform-element-id="702688850553606843-1.4.3" class="platform-element-contents"><div class="simple-table-wrapper"><table class="simple-table style-basic"><tr><td class="cell"><div class="paragraph"><span>Please create a GoogleDoc that is properly formatted to use as a template for a manuscript that I will submit to the journal <em>Molecular Ecology</em>.</span><br><span></span></div></td><td class="cell"><div class="paragraph"><span>Take this draft manuscript that I have written and properly format it for submission to the journal <em>Molecular Ecology.</em></span></div></td></tr></table></div></div><div style="clear:both;"></div></div></div></div></div></div></div></div><div style="clear:both;"></div></div></div><div id="554039638732740023"><div><div id="element-bc9b5795-d95f-458c-ba2c-3fea1f5d9ae0" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents"><div class="callout-box-wrapper"><div class="callout-box--material"><div class="element-content"><div style="width: auto"><div></div><div id="478152262243057058"><div><div id="element-39a57dea-6983-4b35-b0a7-ae6d60481633" data-platform-element-id="702688850553606843-1.4.3" class="platform-element-contents"><div class="simple-table-wrapper"><table class="simple-table style-basic"><tr><td class="cell"><div class="paragraph"><span>Give me an example of how I would summarize a description of PCR in Spanish.</span><br><span></span></div></td><td class="cell"><div class="paragraph"><span>Translate my pasted description of PCR into Spanish.</span></div></td></tr></table></div></div><div style="clear:both;"></div></div></div></div></div></div></div></div><div style="clear:both;"></div></div></div><h2 class="wsite-content-title" style="text-align:center;">How would you characterize the difference in risk?</h2><div class="paragraph">&#8203;In each of the first cases, you see that there are ways to create prompts that enable you to use these kinds of algorithms as time-saving tools without compromising the confidentiality of your work in the lab.&nbsp;<br><br>In each of the second cases, you see that it is all too easy to provide third-party commercial platforms with information that you may not be allowed to knowingly disclose&mdash;<em>even if that is not your intent</em>.<br><br>This is the difference between using AI as a tool to accelerate our research and get past roadblocks versus using AI to do things for us. The second case clearly has the potential to cause frustrating experiences and lead to errors<span>&mdash;reason enough not to outsource&nbsp;<em>your&nbsp;</em>responsibility to a commercial chatbot&mdash;but that is just about the user's short-term experience. The risk of making unintended disclosures can be far more significant and long-lasting.</span><br><br>Below, I will illustrate reasons for concern. Awareness will help ensure we are able to use the tools appropriately and are quick to recognized inappropriate types of interactions.</div><div><div id="262353344216899912" align="left" style="width: 100%; overflow-y: hidden;" class="wcustomhtml"><a id="reasons"></a></div></div><h2 class="wsite-content-title" style="text-align:center;"><strong>Reasons for concern</strong></h2><div class="paragraph"><strong>Disclosing confidential, protected information</strong>. We work with protected plants and animals; ethical considerations, and sometimes legal requirements, prevent us from disclosing sensitive information that could jeopardize them, their habitats, or the people who work to protect them. For example, we redact or coarsen information about where we sample protected wildlife populations when we publish results in order to avoid inadvertently aiding poachers. However, AI platforms store information you provide in order to provide information to others&mdash;<em>this is a significant concern</em>. Brown also has categories of data risk that may apply: See the "Data Risk Classifications" link below.<br>&nbsp;<br><strong>Getting scooped, plagiarism, missing out on credit</strong>. We have ethical obligations to one another, and to our funders, to ensure the utmost integrity in our research. We strive to share all scientific information quickly and responsibly&mdash;and especially with respect to tax-payer funded work we are required to do so&mdash;but we undermine our own best efforts when our work is shared with others before we can fully verify results with our careful quality control procedures. This can happen, even unknowingly, if another user enters a prompt and the response they get is based on unpublished information originating within our group. Other individuals may use that information as if it were original or their own, with no opportunity to verify it using the peer-reviewed literature or to properly credit us for the idea. In some cases, AI bots can search and find data or code and package it in ways that make it all too easy for someone else to claim credit. See the related link below about "IP" (Intellectual Property) at Brown.<br><br><strong>Failing to verify results, perpetuating errors</strong>. Each of us bears significant responsibility for ensuring the accuracy of our results&mdash;inclusive of how our data are recorded, how samples are handled, and how data are analyzed, visualized, interpreted and communicated. Failure to do so can result in irreparable harm to our reputation, careers, and field writ large. Mistakes happen in research, but we can only correct them if we are attentive to detail&mdash;and the practice of attending to detail requires time. Failing to verify code, graphics, citations, or translations of original text are just a few of the ways that overreliance on these tools introduces the risk of perpetuating damaging errors.<br>&nbsp;<br><strong>Social and environmental harms</strong>. The explosion of AI data servers is taxing local electric systems, water supplies, and the social fabric of surrounding communities all over the world. Many of these facilities are being constructed in regions with few environmental protections, exacerbating the associated environmental harms. Life is not without impact&mdash;we should use all the tools available to maximize the quality of our science and progress in our careers&mdash;but we should be cognizant of impacts and strive for moderation.<br>&nbsp;<br><strong>Failure to learn, metacognition</strong>. In science and academia, your brain is your greatest asset. It is what provides you with genuine intelligence. Evidence shows that we learn best by recalling information and applying it in new situations. What may feel like a time-saving opportunity that advances your work could ultimately undermine your goals if it becomes a tactic to avoid investing sufficient effort in the difficult task of learning and improving with time. Therefore, I recommend learning about the &ldquo;<a href="https://www.youtube.com/watch?v=0NIXM74NwXs" target="_blank">Desirable Difficulty</a>&rdquo; concept&mdash;it will make you a better student and teacher, who is prepared to use AI effectively while managing its associated risks. It is ultimately our job to recognize what advice we get from AI is based on useful information, and learning to&nbsp;do this efficiently and effectively is going to become an increasingly important part of academic training.&nbsp;</div><h2 class="wsite-content-title" style="text-align:center;">To illustrate and then solve the problem...</h2><div><div class="wsite-multicol"><div class="wsite-multicol-table-wrap" style="margin:0 -15px;"><table class="wsite-multicol-table"><tbody class="wsite-multicol-tbody"><tr class="wsite-multicol-tr"><td class="wsite-multicol-col" style="width:35.730337078652%; padding:0 15px;"><div><div class="wsite-image wsite-image-border-none" style="padding-top:10px;padding-bottom:10px;margin-left:0px;margin-right:0px;text-align:center"><a><img src="https://www.kartzinellab.com/uploads/9/2/7/9/92793766/editor/sloth-bychatgpt.png?1767632549" alt="Image of a three-toed sloth as rendered by ChatGPT" style="width:auto;max-width:100%"></a><div style="display:block;font-size:90%">A three-toed sloth climbing through the trees, as imagined by ChatGPT</div></div></div></td><td class="wsite-multicol-col" style="width:64.269662921348%; padding:0 15px;"><div id="704456974195283554"><div><div id="element-ab8e5c6b-5849-4cfb-9958-8be534a8f714" data-platform-element-id="694046499467037623-1.2.6" class="platform-element-contents"><div class="callout-box-wrapper"><div class="callout-box--standard"><div class="element-content"><div style="width: auto"><div></div><div class="paragraph" style="text-align:center;"><font color="#24678D"><strong><font size="4">Don't want to rely on AI for all your code?</font></strong><br><br>Check out <a href="https://www.kartzinellab.com/software--data.html">these freely available and extensively tested resources</a> compiled by our team!<br><br>More can be found on our <a href="https://github.com/trklab-metabarcoding" target="_blank">GitHub organization</a> site.</font></div></div></div></div></div></div><div style="clear:both;"></div></div></div></td></tr></tbody></table></div></div></div><div><div id="893943043399872526" align="left" style="width: 100%; overflow-y: hidden;" class="wcustomhtml"><a id="links"></a></div></div><div class="wsite-spacer" style="height:50px;"></div><h2 class="wsite-content-title" style="text-align:center;">Links to policies and resources at Brown</h2><div class="paragraph" style="text-align:center;"><span>Lab members should use LibreChat by default. Learn about this and appreciate what it does for you&mdash;every time you use it!<br><br>LibreChat:&nbsp;</span><a href="https://librechat.ccv.brown.edu/login" target="_blank">https://librechat.ccv.brown.edu/login</a></div><div class="paragraph">Additional links relevant to the use of AI at Brown University:<br><ul><li>Intellectual Property: <a href="https://policy.brown.edu/policy/copyright-ownership-and-use-policy" target="_blank">https://policy.brown.edu/policy/copyright-ownership-and-use-policy&nbsp;</a></li><li>Data Risk Classifications: <a href="https://it.brown.edu/policies/data-risk-classifications" target="_blank">https://it.brown.edu/policies/data-risk-classifications</a></li><li>Sheridan Center Teaching with AI in Mind:&nbsp;<a href="https://sheridan.brown.edu/resources/teaching-ai-mind" target="_blank">https://sheridan.brown.edu/resources/teaching-ai-mind</a></li><li>University AI Usage Guidance: <a href="https://provost.brown.edu/communications/potential-impact-ai-our-academic-mission" target="_blank">https://provost.brown.edu/communications/potential-impact-ai-our-academic-mission</a></li><li>Documentation and Privacy Statement: <a href="https://docs.ccv.brown.edu/ai-tools" target="_blank">https://docs.ccv.brown.edu/ai-tools</a></li><li>Brown Technology Innovations: <a href="https://bti.brown.edu/" target="_blank">https://bti.brown.edu/</a></li></ul></div>]]></content:encoded></item><item><title><![CDATA[Preparing dietary DNA data files for publication]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/preparing-manuscript-files-for-dietary-dna-data]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/preparing-manuscript-files-for-dietary-dna-data#comments]]></comments><pubDate>Thu, 09 Oct 2025 13:54:55 GMT</pubDate><category><![CDATA[Bioinformatics Workflows & Pipelines]]></category><category><![CDATA[DNA metabarcoding]]></category><category><![CDATA[workflow]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/preparing-manuscript-files-for-dietary-dna-data</guid><description><![CDATA[Preparing Dietary DNA Data for Manuscript Files  Working with dietary DNA metabarcoding data? Unsure how to concisely summarize your workflow for publication? Tired of all the effort required to format your data tables for archiving in Dryad, supplementary materials, or other archives? The lab has posted new code to our GitHub repository that will help you solve all of these problems.      The new repository for "Step 5" of our standard dietary DNA metabarcoding pipeline provides code to prepare [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">Preparing Dietary DNA Data for Manuscript Files</h2>  <div class="paragraph">Working with dietary DNA metabarcoding data? Unsure how to concisely summarize your workflow for publication? Tired of all the effort required to format your data tables for archiving in Dryad, supplementary materials, or other archives? The lab has posted <a href="https://github.com/trklab-metabarcoding/metabarcoding-downstream-analyses" target="_blank">new code </a>to our GitHub repository that will help you solve all of these problems.</div>  <div>  <!--BLOG_SUMMARY_END--></div>  <div class="paragraph"><span>The new repository for "<a href="https://github.com/trklab-metabarcoding/metabarcoding-downstream-analyses" target="_blank">Step 5</a>" of our standard <a href="https://github.com/trklab-metabarcoding/obitools2-preprocessing-pipeline/blob/main/images/bioinformatic_pipeline_overview2.png" target="_blank">dietary DNA metabarcoding pipeline</a> provides code to prepare 'phyloseq' objects for use in the R package of the same name. You can find an overview of our standard workflow and links to all component parts on the <a href="https://www.kartzinellab.com/software--data.html">Software &amp; Data</a> page of the Kartzinel Lab's website.<br /><br />This workflow takes all of the extensive and complex files generated by using our lab's dietary DNA metabarcoding pipeline and formats them for downstream ecological analyses. The code follows relatively standard processing steps that we have found useful in our extensive work involving dietary data from herbivores (e.g., using the trnL-P6 marker). However, most steps are optional or customizable using different parameters based on the unique needs and goals of a variety of projects.<br /><br />The workflow includes code to filter your full sample set to include only specific subsets of data, if desired. It also facilitates the removal of samples with low read counts, which are occasional even in the most successful sequencing projects.&nbsp;<br /><br />Our workflow continues on to&nbsp;rarefy samples to equal sequencing depth, and merge data with external taxonomic or ecological data as needed.<br /><br />Throughout this process, the code generates comprehensive data summaries that you can report in your manuscripts to clearly describe and justify the workflow that you follow. It also exports both pre- and post-rarefaction OTU tables as permanent records suitable for manuscript supplementary materials, ultimately producing a clean, analysis-ready phyloseq object alongside documentation of key metrics for reporting in scientific publications and permanent repositories.&nbsp;</span></div>]]></content:encoded></item><item><title><![CDATA[Protocols for DNA barcoding of mammals]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/protocols-for-dna-barcoding-of-mammals]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/protocols-for-dna-barcoding-of-mammals#comments]]></comments><pubDate>Tue, 02 Sep 2025 20:38:12 GMT</pubDate><category><![CDATA[DNA barcoding]]></category><category><![CDATA[Lab protocols]]></category><category><![CDATA[Protocols & Methods]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/protocols-for-dna-barcoding-of-mammals</guid><description><![CDATA[Protocols for DNA Barcoding of Mammals  We have posted detailed new protocols describing our methods to sequence key mammalian DNA barcodes.&nbsp;They can be found together with a growing number of field and lab protocols on the Kartzinel Lab's centralized&nbsp;protocol page. You will find protocols for both the D-loop of the mitochondrial control region and the 16S marker are useful for identifying a diversity of mammals, and can be routinely amplified from degraded material such as fecal DNA.  [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">Protocols for DNA Barcoding of Mammals</h2>  <div class="paragraph">We have posted detailed <a href="https://www.kartzinellab.com/protocols.html">new protocols</a> describing our methods to sequence key mammalian DNA barcodes.&nbsp;<span>They can be found together with a growing number of field and lab protocols on the Kartzinel Lab's centralized&nbsp;</span><a href="https://www.kartzinellab.com/protocols.html">protocol page.</a> <br /><br />You will find protocols for both the D-loop of the mitochondrial control region and the 16S marker are useful for identifying a diversity of mammals, and can be routinely amplified from degraded material such as fecal DNA. We have frequently used these protocols to confirm the identity of mammals in studies involving dietary DNA metabarcoding and/or host-microbiome interactions. They are also very useful for phylogenetic analyses. We have used various polymerases over the years, so these protocols may depart slightly from previously published versions (e.g., Kartzinel et al. 2019 PNAS). However, they reflect our current state-of-the-art strategy for routine work and should be generally more cost or time effective as a result of the changes.&nbsp;</div>]]></content:encoded></item><item><title><![CDATA[Hot off the press: Code from Hoff et al. 2025 PNAS paper]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/hot-off-the-press-code-from-hoff-et-al-2025-pnas-paper]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/hot-off-the-press-code-from-hoff-et-al-2025-pnas-paper#comments]]></comments><pubDate>Thu, 17 Jul 2025 19:30:00 GMT</pubDate><category><![CDATA[AI]]></category><category><![CDATA[Bioinformatics Workflows & Pipelines]]></category><category><![CDATA[software & data]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/hot-off-the-press-code-from-hoff-et-al-2025-pnas-paper</guid><description><![CDATA[Hot Off the Press: Code from Hoff et al. 2025 PNAS Paper  New feature on our Software &amp; Data repository page: Hot off the press! Featuring code from Hannah Hoff's 2025 PNAS paper, The Apportionment of Dietary Diversity in Wildlife.This paper presented a potentially paradigm-shifting strategy to&nbsp;quantify and characterize the number of unique 'diet types' that exist within a population or community. The strategy is based on a simple machine-learning algorithm and described in the&nbsp;Hof [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">Hot Off the Press: Code from Hoff et al. 2025 <em>PNAS</em> Paper</h2>  <div class="paragraph">New feature on our <a href="https://www.kartzinellab.com/software--data.html">Software &amp; Data</a> repository page: Hot off the press! Featuring code from Hannah Hoff's 2025 PNAS paper, <em>The Apportionment of Dietary Diversity in Wildlife</em>.<br /><br />This paper presented a potentially paradigm-shifting strategy to<span style="color:rgb(78, 54, 41)">&nbsp;quantify and characterize the number of unique 'diet types' that exist within a population or community. The strategy is based on a simple machine-learning algorithm and described in the&nbsp;</span><a href="https://www.pnas.org/doi/10.1073/pnas.2502691122" target="_blank">Hoff et al. 2025</a><span style="color:rgb(78, 54, 41)">&nbsp;</span><em style="color:rgb(78, 54, 41)">PNAS</em><span style="color:rgb(78, 54, 41)">&nbsp;paper, which used the community of migratory large mammalian herbivores -- such as bison and elk -- as a prime example.</span></div>  <div>  <!--BLOG_SUMMARY_END--></div>  <div class="paragraph">The strategy makes use of dietary DNA metabarcoding data and extensive local plant DNA reference library to characterize variation in animal diets. It then applies a machine-learning algorithm called "Partitioning Around Medoids" -- or "PAM" -- to help organize samples into an optimal number of clusters to maximize variation between groupings. This allows us to recognize patterns in the data, without having to apply a priori assumptions about the groupings (e.g., should we lump all samples from a species together?).&nbsp;<br /><br />After clusters are identified, the code also presents a strategy for performing Indicator Species Analysis to identify plant taxa that contribute strongly to the clustering pattern we observed.<br /><br />You can link to the<span>&nbsp;open-source code and data repositories are at:</span><ul><li>Dryad&nbsp;<a href="https://doi.org/10.5061/dryad.mgqnk99b8" target="_blank">https://doi.org/10.5061/dryad.mgqnk99b8</a></li><li>Zenodo&nbsp;<a href="https://doi.org/10.5281/zenodo.15634157" target="_blank">https://doi.org/10.5281/zenodo.15634157</a></li></ul></div>]]></content:encoded></item><item><title><![CDATA[Our standard DNA metabarcoding pipeline updated and posted for 2025]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/our-standard-dna-metabarcoding-pipeline-updated-and-posted-for-2025]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/our-standard-dna-metabarcoding-pipeline-updated-and-posted-for-2025#comments]]></comments><pubDate>Thu, 17 Jul 2025 13:00:00 GMT</pubDate><category><![CDATA[Bioinformatics Workflows & Pipelines]]></category><category><![CDATA[DNA metabarcoding]]></category><category><![CDATA[workflow]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/our-standard-dna-metabarcoding-pipeline-updated-and-posted-for-2025</guid><description><![CDATA[Our Standard DNA Metabarcoding Pipeline   	 		 			 				 					 						  Fans of the lab will be very excited to see this much-anticipated release of our standard dietary DNA metabarcoding pipeline, with a walk-through easily accessible on the centralized "Software &amp; Data" section of our webpage. Until now, people would have to access code repositories associated with each of our publications or contact us directly to model their analysis after our well-established workflow. That led to multipl [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">Our Standard DNA Metabarcoding Pipeline</h2>  <div><div class="wsite-multicol"><div class="wsite-multicol-table-wrap" style="margin:0 -15px;"> 	<table class="wsite-multicol-table"> 		<tbody class="wsite-multicol-tbody"> 			<tr class="wsite-multicol-tr"> 				<td class="wsite-multicol-col" style="width:61.805555555556%; padding:0 15px;"> 					 						  <div class="paragraph">Fans of the lab will be very excited to see this much-anticipated release of our standard dietary DNA metabarcoding pipeline, with a walk-through easily accessible on the centralized "<a href="https://www.kartzinellab.com/software--data.html">Software &amp; Data</a>" section of our webpage. Until now, people would have to access code repositories associated with each of our publications or contact us directly to model their analysis after our well-established workflow. That led to multiple versions of the pipeline in circulation, since we are constantly improving it and published versions quickly ended up out of date. We have tried to solve that problem by...</div>   					 				</td>				<td class="wsite-multicol-col" style="width:38.194444444444%; padding:0 15px;"> 					 						  <div><div class="wsite-image wsite-image-border-none " style="padding-top:10px;padding-bottom:10px;margin-left:0px;margin-right:0px;text-align:center"> <a href='https://www.kartzinellab.com/software--data.html'> <img src="https://www.kartzinellab.com/uploads/9/2/7/9/92793766/kartzinellabdietpipelineoverview_orig.png" alt="Diet metabarcoding pipeline overview based on tutorial and workflow from the Kartzinel Lab and CCV at Brown University" style="width:auto;max-width:100%" /> </a> <div style="display:block;font-size:90%"></div> </div></div>   					 				</td>			</tr> 		</tbody> 	</table> </div></div></div>  <div>  <!--BLOG_SUMMARY_END--></div>  <div class="paragraph">Featuring our pipeline as a set of open-source repositories, hosted by GitHub and organized for easy access on our webpage. The pipeline is optimized for use on Oscar, which is the Brown University cluster, but it can be adapted readily to other institutions. We also welcome collaborators, who can formally gain access to our ready-to-run infrastructure on Oscar when they <a href="https://www.kartzinellab.com/contract--collaborate.html">train or collaborate</a> with us -- something we have been successfully making easier and more affordable for external collaborators to do!<br /><br />An overview of our bioinformatic pipeline is provided below, current as of January 2026, and maintained on our <a href="https://github.com/trklab-metabarcoding" target="_blank">GitHub</a> site.&nbsp;</div>]]></content:encoded></item><item><title><![CDATA[Lab protocols posted as resources on our website]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/lab-protocols-posted-as-resources-on-our-website]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/lab-protocols-posted-as-resources-on-our-website#comments]]></comments><pubDate>Wed, 16 Jul 2025 13:00:00 GMT</pubDate><category><![CDATA[Lab protocols]]></category><category><![CDATA[Molecular methods]]></category><category><![CDATA[Protocols & Methods]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/lab-protocols-posted-as-resources-on-our-website</guid><description><![CDATA[Lab Protocols Posted as Free Resources on Our Website  Since June 2025, we have increasingly made our internal lab methods publicly visible on the "Protocols" section of our webpage. We began with some of the most frequently requested protocols that speak to the unique strengths of our lab's work and experience, featuring field-to-lab protocols for collecting and banking dietary samples, parasite samples, and plant barcode samples. We have expanded to include...      several protocols for plant  [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">Lab Protocols Posted as Free Resources on Our Website</h2>  <div class="paragraph">Since June 2025, we have increasingly made our internal lab methods publicly visible on the "<a href="https://www.kartzinellab.com/protocols.html">Protocols</a>" section of our webpage. We began with some of the most frequently requested protocols that speak to the unique strengths of our lab's work and experience, featuring field-to-lab protocols for collecting and banking dietary samples, parasite samples, and plant barcode samples. We have expanded to include...</div>  <div>  <!--BLOG_SUMMARY_END--></div>  <div class="paragraph">several protocols for plant barcoding and metabarcoding, which are among the most widely used in our lab as well as those that we collaborate with.&nbsp;<br /><br />More protocols will be posted continuously, including sequencing protocols on various platforms (Sanger, Nanopore, Illumina).<br /><br />Protocols are being posted in both English and Spanish, especially in cases where protocols are being used in primarily Spanish-speaking countries.&nbsp;<br /><br />Please comment or contact the PI if there is a protocol you would especially like to see, or if you have any suggestions for improvements or alternatives. A number of our protocols used to be publicly available via our lab wiki, but unfortunately the site that hosted us is no longer active and so we are transitioning.&nbsp;</div>]]></content:encoded></item><item><title><![CDATA[New Featured Software: geographic coverage of DNA barcodes]]></title><link><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/new-featured-software-geographic-coverage-of-dna-barcodes]]></link><comments><![CDATA[https://www.kartzinellab.com/bioinformatics-workshop/new-featured-software-geographic-coverage-of-dna-barcodes#comments]]></comments><pubDate>Tue, 15 Jul 2025 16:21:33 GMT</pubDate><category><![CDATA[DNA barcoding]]></category><category><![CDATA[Reference Libraries & Data]]></category><category><![CDATA[software & data]]></category><guid isPermaLink="false">https://www.kartzinellab.com/bioinformatics-workshop/new-featured-software-geographic-coverage-of-dna-barcodes</guid><description><![CDATA[New Featured Software: Geographic Coverage of DNA Barcodes  Featured Software from the Kartzinel Lab: Geographic Coverage of DNA Barcodes. The inaugural code repository to be highlighted in our Featured Software section of the Software &amp; Data page presents the Quarto Code Book published in association with our Molecular Ecology Review Paper, "Global Availability of Plant DNA Barcodes as Genomic Resources to Support Basic and Policy-Relevant Biodiversity Research" can be easily modified to ev [...] ]]></description><content:encoded><![CDATA[<h2 class="wsite-content-title" style="text-align:center;">New Featured Software: Geographic Coverage of DNA Barcodes</h2>  <div class="paragraph">Featured Software from the Kartzinel Lab: Geographic Coverage of DNA Barcodes. The inaugural code repository to be highlighted in our Featured Software section of the <a href="https://www.kartzinellab.com/software--data.html">Software &amp; Data</a> page presents the <a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/" target="_blank">Quarto Code Book</a> published in association with our <em>Molecular Ecology</em> Review Paper, "Global Availability of Plant DNA Barcodes as Genomic Resources to Support Basic and Policy-Relevant Biodiversity Research" can be easily modified to evaluate the geographic coverage of other data sets. Although the featured code emphasizes geographic coverage from our work in Yellowstone National Park...</div>  <div>  <!--BLOG_SUMMARY_END--></div>  <div class="paragraph">The codebook itself is readily customizable for a wide variety of uses. To facilitate re-use, we have divided it into four parts that follow the Methods section of the publication and includes eight code notebooks that we used in the analyses for this publication. Source code is available in a GitHub&nbsp;<a href="https://github.com/trklab-metabarcoding/MolEco-MEC-1288">repository</a><br /><br />To see how data are formatted and work through the Yellowstone example as a vignette, supplementary data sets (S1-S4) can be downloaded from the supplement available with the published paper. Move those four files into the&nbsp;/data&nbsp;folder in the GitHub repository when running the code notebooks. If running notebooks from the repository, headers in the data retrieved may differ than headers in the datasets provided in the Supplemental Materials.<br /><br />The sections documenting the workflow are as follows:&nbsp;<br />Building global BOLD data<ol><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/building_bold.html">Building the BOLD dataset</a></li></ol> Geographic coverage<ol><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/geocov_figs1.html">Coverage by climatic zones</a></li><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/geocov_figs2.html">Coverage by country</a></li></ol> Taxonomic coverage<ol><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/fetch_itis.html">Fetching data from ITIS</a></li><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/taxonomic_coverage.html">Correlations &ndash; plant species and available barcodes</a></li></ol> Case-study Yellowstone National Park<ol><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/geocov_ynp_selected.html">Continent-scale barcode coverage of lodgepole pine and big sagebrush</a></li><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/fetch_gbif.html">Fetching data from GBIF</a></li><li><a href="https://trklab-metabarcoding.github.io/MolEco-MEC-24-1288/build_geocov_map.html">Case-study in geographic coverage of site-based reference data</a></li></ol></div>]]></content:encoded></item></channel></rss>