22 Lab08: Beyond GATK – Benchmarking Variant Calling Pipelines
22.1 Overview
Lab 8 is an in-class discussion and synthesis activity focused on recent benchmarking studies that compare variant calling pipelines across different sequencing platforms, organisms, and data types. Over the past decade, the genomics community has moved beyond asking “which tool is best?” to understanding when and why different callers excel, and how upstream choices (alignment, preprocessing, population ancestry) shape final variant quality. In this lab, you will critically evaluate five recent papers (2021–2026) that benchmark variant callers on diverse datasets, extract practical recommendations, and synthesize guidelines for choosing pipelines in your own research1–5.
Each student has been assigned one paper based on the first three letters of their last name (see assignments below). You are expected to read your assigned paper carefully before class, guided by the questions provided, and come prepared to contribute to small-group and whole-class discussions. During class, you will work with peers who read the same paper to fill out a structured summary table, present key findings to the class, and collectively build a decision framework for variant calling that addresses common questions facing genomics researchers.
The entire Lab 8 assignment takes place during the scheduled lab period. Your grade is based on active participation in group discussions and submission of the completed class summary tables (photographed and uploaded to Canvas before leaving). If you cannot attend, you must notify the instructor in advance to arrange an alternative assignment.
22.2 Paper Assignment by Last Name
Each student is responsible for reading ONE of the five papers for the Lab 8 discussion. Find your assigned paper based on the first three letters of your last name as listed in Canvas:
| Last Name Range | Assigned Paper | Reference Citation | Number of Pages |
|---|---|---|---|
| A – Da | Tan et al. 2025, Briefings in Bioinformatics | 1 | 11 |
| Db – L | Abdelwahab et al. 2023, BMC Bioinformatics | 2 | 13 |
| M – Pa | Betschart et al. 2022, Scientific Reports | 3 | 11 |
| Pb – Pz | Pinto et al. 2026, PLoS ONE | 4 | 22 |
| Q – Z | Guille et al. 2024, Briefings in Bioinformatics | 5 | 11 |
22.3 Pre-Class Preparation: Guiding Questions
As you read your assigned paper, take notes on the following questions. You will use these notes during the in-class small-group synthesis.
22.3.1 Study Design and Scope
- What is the main goal of the benchmarking study? (e.g., comparing germline vs. somatic callers, evaluating AI-based tools, assessing performance on non-model organisms)
- Which datasets were used? (e.g., Genome in a Bottle reference samples, monozygotic twins, cancer exomes, specific organisms)
- Which sequencing platforms and read lengths were analyzed? (Illumina short reads, PacBio HiFi, Oxford Nanopore, or a mix)
22.3.2 Tools and Pipelines
- Which aligners were tested (if applicable)? (e.g., BWA-MEM, Bowtie2, minimap2)
- Which variant callers were compared? (e.g., GATK HaplotypeCaller, bcftools, FreeBayes, Platypus, DeepVariant, others)
- Did the study evaluate ensemble or voting-based methods? If so, how were results combined?
22.3.3 Performance Metrics and Results
- Which metrics were used to assess performance? (e.g., precision, recall, F1-score, concordance with truth sets, computational time, memory usage)
- Which tools performed best for SNPs? Which for indels?
- Were there differences in performance by genomic context (e.g., exonic vs. intergenic, high-GC regions, repetitive regions)?
- Did computational cost (runtime, memory) play a role in the recommendations?
22.3.4 Key Findings and Recommendations
- What are the paper’s main recommendations? (e.g., “use GATK for high-coverage human genomes,” “ensemble methods reduce false positives in somatic calling”)
- Did the authors identify biases or limitations? (e.g., training on European-ancestry genomes, poor performance in non-model systems, platform-specific errors)
- Are the recommendations generalizable to other organisms, coverages, or research questions? Why or why not?
- Read your paper thoroughly before class. Annotate it and bring notes—you cannot participate meaningfully without preparation.
- Be ready to explain, not just summarize. Your classmates will rely on you to interpret your paper’s findings.
- Ask questions during presentations. Benchmarking studies often use different metrics or datasets—clarify what makes results comparable.
- Think critically about generalizability. A tool that works well for human exomes may fail for non-model organisms or low-coverage data.
22.4 In-Class Activity Structure (75 minutes)
22.4.1 Part 1: Small-Group Synthesis (20 minutes)
Meet with others who read the same paper. As a group, discuss your answers to the guiding questions and work together to fill out the Paper Summary Table for your study (template below). One member of your group should cast their screen so the table can be filled and photographed later. Focus on extracting key details that will help the class compare across papers:
Paper Summary Table Template:
| Category | Details | Your Paper |
|---|---|---|
| Study Citation | Author, Year, Journal | |
| Study Goal | Brief statement of main objective | |
| Datasets Used | Organism(s), reference samples, sample size | |
| Read Length(s) | e.g., 150 bp paired-end, 15 kb HiFi | |
| Sequencing Platform(s) | Illumina, PacBio, ONT, etc. | |
| Tools/Pipelines Tested | Aligner(s) + Caller(s) | |
| Variant Callers Compared | List key callers (GATK, bcftools, FreeBayes, DeepVariant, etc.) | |
| Ensemble Strategy (if any) | Voting, intersection, machine learning meta-classifier | |
| Performance Metrics | Precision, recall, F1, concordance, runtime, etc. | |
| Best Tool(s) for SNPs | Tool name and brief reason | |
| Best Tool(s) for Indels | Tool name and brief reason | |
| Key Recommendation | One-sentence practical takeaway | |
| Limitations/Caveats | What contexts or questions are NOT covered? |
Your group will present this table to the class in Part 2.
22.4.2 Part 2: Group Presentations (15 minutes)
Each paper group gives a 3-minute presentation summarizing their study. Focus on:
- Study goal and dataset (30 seconds)
- Tools compared and metrics used (1 minute)
- Key findings: which tools performed best for SNPs and indels, and under what conditions (1 minute)
- Main recommendation and limitations (30 seconds)
While other groups present, take notes on their summary tables—you will need this information for Part 3.
22.4.3 Part 3: Synthesis Across Papers (20 minutes)
Now that all five papers have been presented, work as a class to fill out the Cross-Study Synthesis Table on the board. The instructor will lead a discussion to populate this table by extracting common themes and differences across the five benchmarking studies.
Cross-Study Synthesis Table Template:
| Question | Insights from Tan2025 | Insights from Abdelwahab2023 | Insights from Betschart2022 | Insights from Pinto2026 | Insights from Guille2024 |
|---|---|---|---|---|---|
| When should I use GATK HaplotypeCaller? | |||||
| When should I use bcftools/samtools? | |||||
| When is an ensemble approach worth the effort? | |||||
| How do I choose filter thresholds? | |||||
| What are pitfalls of applying human-optimized pipelines to non-model systems? | |||||
| Do AI-based callers (e.g., DeepVariant) outperform traditional tools? | |||||
| Does alignment choice matter as much as caller choice? |
This table will be filled in collaboratively during discussion. One or more students should photograph the final version for submission.
22.4.4 Part 4: Case Study Challenge (15 minutes)
Break into mixed groups (each group should have at least one representative from each paper). Your challenge:
Design a variant calling benchmark for a non-model insect species with the following constraints:
- No high-quality reference panel (like Genome in a Bottle) exists.
- You have 30× coverage Illumina short reads from 10 individuals.
- Your goal is to identify population-level SNPs for a GWAS study.
Discuss and sketch answers to:
- Which variant caller(s) would you use, and why?
- How would you validate your calls without a truth set?
- What filtering strategy would you adopt?
- Would you use an ensemble approach? Why or why not?
Be prepared to share one key insight from your group during wrap-up.
22.4.5 Part 5: Wrap-Up and Reflection (5 minutes)
Class discussion:
- Which study provided the most actionable recommendations?
- What gaps remain in current benchmarking literature? (e.g., underrepresented organisms, low-coverage data, structural variants)
- If you were designing a variant calling pipeline for your own research, what is the one takeaway from today that will influence your decision?
22.5 Deliverable and Grading
Your Lab 8 grade is based on attendance and active participation. To receive credit, you must:
- Attend the full lab session and actively participate in all four parts of the activity.
- Photograph and upload the completed Paper Summary Table (from your small group in Part 1) and the Cross-Study Synthesis Table (from the whole-class discussion in Part 3) to Canvas before leaving lab.
- Case study challenge: briefly write up your tables design for the case study challenge and upload it to Canvas before leaving lab.
- Take clear, legible photos of both tables (or a single photo if they fit on one board/page).
- Upload to Canvas under “Lab 8 Submission” before the end of the lab period.
- If working digitally, you may submit a shared Google Doc or screenshot instead.
- All group members must submit their own photo/screenshot, even if the content is identical—this confirms individual attendance.
- AI can be used to help with the initial paper summary and parsing of information, but the tables and case study design should be based on what you and your group generated during class without the use of generative AI.