14 Where to Go Next: Synthesis, Emerging Frontiers, and Lifelong Learning in Genomics

14.1 Overview

You began this course by asking a deceptively simple question: what does it mean to read a genome? Fourteen chapters, ten labs, a semester-long research project, and a mock grant panel later, you have done exactly that. This final chapter steps back from the technical details to synthesize the arc of the course, survey the broader landscape of bioinformatics beyond what we were able to cover, and orient you toward a field that is changing faster than any single-semester course can fully capture.

14.2 Learning Objectives

By the end of this chapter, you should be able to:

Describe the unified conceptual arc of the course from raw data to biological interpretation.
Name and briefly characterize major areas of bioinformatics not covered in this course.
Identify several active frontiers in the field and explain how they connect to concepts already introduced.
Evaluate a new bioinformatics tool or method by connecting it to foundational concepts from this course.
Articulate what skills and habits of mind are most important for a career that involves genomics and bioinformatics.

14.3 What We Have Covered: A Synthesis

14.3.1 The arc from raw data to biological insight

This course was organized around a workflow that mirrors how real genomics projects are conducted. We began in Chapter 1 with the observation that sequencing technology has outpaced our ability to analyze, store, and interpret the data it produces—the bioinformatic bottleneck¹. We then moved in Chapter 2 through the landscape of sequencing technologies, from Sanger’s chain-termination method through 454 pyrosequencing to modern short- and long-read platforms, each with distinct read lengths, error profiles, and appropriate use cases. Chapter 3 grounded that discussion in the practical reality of data quality, connecting Phred scores and FASTQ format to the real decisions a bioinformatician must make before a single meaningful analysis can begin. The lesson there is enduring: garbage in, garbage out, and understanding your data’s quality is not optional.

From there, Chapter 4 introduced the databases and genome browsers that anchor modern genomics. NCBI, Ensembl, and UCSC are not just archives—they are the infrastructure on which reproducible science depends, which is why we emphasized stable identifiers, versioning, and provenance from the start. Chapter 5 then built the conceptual foundation for sequence comparison, from the mathematics of dynamic programming and the Needleman–Wunsch algorithm to multiple sequence alignment and what it means to say two sequences are homologous rather than merely similar. Chapter 6 scaled this up to the genome level, introduced the distinction between read mapping and de novo assembly, and discussed how reference bias and repetitive sequences constrain what short-read data alone can resolve.

Chapter 7 was deliberately different from the others. Rather than introducing a new analytical method, it turned attention inward to the practices that make science reproducible and extensible: directory organization, version control, workflow managers, and the craft of writing methods clearly enough that someone else—or your future self—could repeat your analysis without you in the room. This chapter was hidden curriculum in the best sense; the skills it covered rarely appear in textbooks but are among the most valued in practice. Chapter 8 then expanded the view beyond single-nucleotide variants to show that the same foundational steps—design, QC, alignment, feature summarization, inference, visualization—reappear across RNA-seq, ChIP-seq, ATAC-seq, metagenomics, and structural variant detection, even when the specific tools change entirely.

Chapter 9 went inside the black box of gene annotation, building intuition for hidden Markov models and showing how probabilistic models can encode biological knowledge—the statistics of coding sequences, splice sites, and intergenic regions—to make predictions about genome structure. Chapter 10 applied the full alignment-to-variant calling workflow using GATK, and Labs 7, 8, and 9 put that workflow in your hands on real data, culminating in IGV sessions where every variant became a hypothesis to evaluate against the read-level evidence. Chapter 11 took a detour that many students find unexpectedly valuable: it demystified how science is actually funded, how grant panels work, and what it means to communicate computational methods in a proposal that a diverse review panel can evaluate. Chapter 12 zoomed out to the population scale, connecting allele frequencies, nucleotide diversity, Tajima’s D, and F_ST to the evolutionary forces—drift, selection, migration, mutation—that shape genomic landscapes. And Chapter 13 showed that regulatory syntax can itself be discovered from sequence, connecting motif finding to the gene regulatory principles introduced earlier in annotation and epigenomics.

14.3.2 The unified workflow beneath the variation

Running through all of these chapters is a skeleton that appears in every domain of genomics, regardless of data type or organism:

Experimental design — What question are we asking, and what data are sufficient to answer it?
Quality control — What does the raw data actually look like, and what has gone wrong?
Alignment or assembly — Where do the sequences come from in the genome?
Feature summarization — Variants, counts, peaks, bins: the specific step depends on the question.
Statistical inference — What is signal, and what is noise?
Visualization and interpretation — Genome browsers, R, IGV: making data legible.
Communication — Methods written to reproduce; results written to matter.

Note

Lab Connections: The yeast dataset threaded through Labs 4, 7, and 9, taking you from FASTQ through BAM and VCF to IGV. Your semester-long project asked you to execute a version of this same workflow on a question of your own. Return to your GitHub repository: those files are a record of what you now know how to do.

14.4 What We Did Not Cover: The Breadth of Bioinformatics

14.4.1 What this course covered

This course was always intended as a survey—an introduction to the concepts, file formats, and analytical logic that underlie modern genomics. We chose to focus on what might be called the core resequencing workflow: from raw reads to quality-controlled alignments, to variant calls, to population-scale interpretation. That focus gave you depth and transferable skills, but it also meant that large and vibrant areas of bioinformatics received only brief attention or none at all.

14.4.2 What lies beyond

Phylogenomics is one of the most active frontiers in evolutionary biology and is one such area. Where population genomics asks how variation is distributed within and among closely related populations, phylogenomics asks how species diverged across evolutionary time, using genome-scale data to resolve relationships that could not be determined from single genes². The tools connect directly to what you have learned—multiple sequence alignment, BLAST, and gene annotation are all inputs—but the statistical frameworks for constructing and evaluating species trees from thousands of loci introduce new challenges, including gene tree-species tree discordance, incomplete lineage sorting, and the computational cost of genome-scale phylogenetic inference.

Pangenomics represents another major frontier that we touched on but could not do justice to in a single semester. The idea is straightforward but transformative: rather than aligning reads to a single reference genome, a pangenome encodes the full sequence diversity of a species or population as a graph, so that no individual’s variants are invisible by construction³. The implications for variant calling, population genetics, and clinical genomics are substantial—variants that are systematically missed because they differ from a reference genome are no longer missed. The 1000 Chinese Pangenome published this year illustrates the medical and population genetic power of this approach: by representing the genetic diversity of over 1,000 individuals from diverse Chinese ancestry in a graph-based reference, the study substantially improved imputation accuracy and variant discovery for populations that were underrepresented in earlier reference panels⁴. A striking local example is the recent haplotype-resolved, chromosome-scale genome assembly for the southern live oak, Quercus virginiana—the famous Toomer’s Corner oak—published by researchers including several from Auburn University⁵. This assembly, featured on the cover of G3: Genes|Genomes|Genetics, used many of the long-read assembly and annotation concepts covered in Chapters 6 and 9 and represents exactly the kind of project a team coming out of a course like this one is now equipped to contribute to.

Figure 14.1

Students roll Toomer’s Corner Oak (Quercus virginiana) for the last time before the tree was removed, Auburn University, Auburn, Alabama. Photo credit: Auburn University. This tree is now the subject of a haplotype-resolved, chromosome-scale genome assembly published in G3 in 2026⁵, illustrating how genomics connects to local history and the kinds of projects being done right here at Auburn.

A upward-looking photograph taken from the base of Toomer's Corner Oak, a southern live oak (Quercus virginiana) on the Auburn University campus, Auburn, Alabama. The massive trunk of the tree dominates the center of the frame, with its deeply furrowed bark visible up close. Hundreds of long white and silver streamers of toilet paper have been thrown into the canopy and hang down through the branches in dense, radiating curtains, a tradition known as rolling Toomer's Corner following Auburn athletic victories. The green leaves of the oak are partially visible through the paper, and the pale sky can be glimpsed in the background. The image captures one of the final rollings before the tree was removed, and the tree is now the subject of a chromosome-scale genome assembly published by Auburn University researchers. — Figure 14.1: Students roll Toomer’s Corner Oak (*Quercus virginiana*) for the last time before the tree was removed, Auburn University, Auburn, Alabama. Photo credit: Auburn University. See Aközbek et al. (2026), https://doi.org/10.1093/g3journal/jkag023.

Single-cell omics is another area that has grown rapidly since the core workflows covered in this course were first developed. Where bulk RNA-seq measures average gene expression across thousands of cells in a tissue, single-cell RNA sequencing (scRNA-seq) profiles each cell individually, enabling identification of cell types, developmental trajectories, and disease-specific states that are invisible in the aggregate^6,7. The conceptual underpinning is familiar—read alignment, count matrices, QC, statistical inference—but the scale (millions of cells, sparse count matrices), the preprocessing (cell barcode demultiplexing, doublet filtering), and the downstream analysis (dimensionality reduction, clustering, pseudotime) introduce new computational demands. Targeted single-cell technologies and multi-modal approaches that simultaneously measure gene expression, chromatin accessibility, and other molecular phenotypes in the same cell are now routine enough that practical guides have been published to help researchers choose among them⁷. These methods connect directly to the epigenomics workflows in Chapter 8, now at cellular rather than tissue resolution.

Other areas that fell outside the scope of this semester include proteomics and metabolomics (connecting the genome to the protein and metabolite layers discussed conceptually in Chapter 1), ancient DNA and paleogenomics (requiring specialized damage correction and authentication methods introduced briefly in Chapter 8), GWAS and quantitative genetics (using variant calls like those in Chapter 10 to map complex traits to genomic loci), and long-read-native applications such as methylation calling and haplotype phasing that exploit the single-molecule character of PacBio and Oxford Nanopore reads. Each of these builds directly on concepts you have already encountered; the tools and statistical frameworks are new, but the underlying workflow logic is not.

14.5 The Pace of Change: An Outlook on the Field

14.5.1 New technologies and the competitive landscape

One of the clearest themes of this course has been that the technology changes faster than the concepts do. The history of genomics is a history of instruments becoming obsolete while the underlying questions persist. When the first edition of the Pevsner textbook was published, Illumina’s HiSeq dominated the market, 454 pyrosequencing was still in use, and PacBio was a promising newcomer¹. By the time this companion was written, 454 is long gone, PacBio HiFi has become the standard for high-quality long-read assembly, and Oxford Nanopore’s MinION has made portable sequencing a reality. The pace continues: Element Biosciences, currently in active litigation with Illumina over patent claims, introduced their VITARI instrument in early 2026, offering a competing short-read platform that the company claims will substantially expand the scale of what high-throughput sequencing makes possible^8,9. Competition in the sequencing market tends to drive down costs and improve data quality for researchers, though it also means that the “standard” platform can change within the span of a graduate degree.

Equally striking is the democratization of genomic data at the consumer level. Gencove has introduced a $49 whole-genome sequencing service that uses low-pass sequencing combined with imputation to deliver population-level variant information at a fraction of the cost of standard whole-genome sequencing¹⁰. This approach—sequencing at very low coverage and then statistically inferring unobserved genotypes using a reference panel—connects directly to the variant calling concepts in Chapter 10 and the population-level imputation methods introduced in Chapter 12. Low-pass plus imputation performs comparably to traditional genotyping arrays for many trait-mapping applications¹¹, and as imputation panels become more diverse and comprehensive, this workflow will likely become an increasingly standard entry point for population-scale genetic studies.

The future of human genomics

Return for a moment to the genomics timeline in Chapter 1 (Figure 1.1). When the Human Genome Project was completed in 2001, it cost approximately $3 billion and took over a decade. By 2022, the T2T Consortium published the first truly gapless human genome assembly, resolving centromeres and satellite repeats that had been missing for twenty years¹². Today, a whole-genome sequence costs less than a routine clinical laboratory test, and population-scale biobanks containing hundreds of thousands of sequenced individuals are actively informing drug development, disease prediction, and precision medicine. The 1000 Chinese Pangenome⁴ and the Human Pangenome Reference Consortium³ represent the next step: moving from a single reference genome to a population-level graph that encodes the diversity of all humans. The ethical questions this raises—about consent, data sovereignty, equitable access to genetic medicine, and the potential for misuse—are not separate from the technical work, but inseparable from it. As you move forward in any career that touches genomics, you will be part of shaping how these tools are used and who benefits from them.

14.5.2 New data types and computational challenges

The rise of single-cell and multi-modal genomics is generating data types and analytical challenges that did not exist a decade ago. A recent paper on the 4D nucleome—the three-dimensional and temporal organization of the human genome across cell types and developmental stages—illustrates how far the field has moved beyond simple read alignment and variant calling¹³. That study integrated Hi-C contact maps, ATAC-seq accessibility data, ChIP-seq histone modification profiles, and gene expression measurements to build a comprehensive model of how genome organization shapes gene regulation across human cell types. The computational infrastructure required to integrate, store, and analyze these multi-modal datasets at scale is itself an active research area. A related study combined multi-omic single-cell profiling with deep learning to dissect regulatory syntax during human development, using the patterns in chromatin accessibility and gene expression to learn the rules by which transcription factors interpret genomic sequence¹⁴. Another demonstrated how single-cell perturbation screens can identify lncRNA regulators of cellular senescence that would be invisible in bulk experiments¹⁵. These studies connect directly to the ATAC-seq and ChIP-seq workflows in Chapter 8 and to the motif and regulatory concepts in Chapter 13, now extended to single-cell resolution and combined with machine learning inference.

The sheer volume of genomic data being generated has made data storage and management a genuine scientific challenge, not just a logistical one¹⁶. The decisions you make about how to store, version, compress, and document your data—discussed in Chapter 7—are not administrative details but choices that determine whether your science is reproducible, whether your datasets can be reused by others, and whether your storage costs remain manageable over the lifetime of a project. The field is actively developing solutions: tiered cloud storage, compression algorithms tailored to genomic data, and federated analysis frameworks that allow computation to come to the data rather than moving sensitive data to the computation. These are areas where computer scientists and biologists need to work together, and where understanding both sides of the problem is an asset.

Gene annotation, which you worked through hands-on in Labs 5 and 6 using BLAST and HMMs, is also undergoing rapid change. A 2026 review in Nature Reviews Genetics describes how annotation is being transformed by long-read transcriptomics, single-cell data, and deep learning models that can predict functional elements from sequence with unprecedented accuracy¹⁷. The manual annotation project you completed this semester gives you an appreciation for just how difficult this problem is and why automated methods still require careful human evaluation. The computational population genomics you conducted in Lab 10, scanning for signatures of selection using sliding-window statistics, has now been extended to new systems: a recent study applied these same conceptual tools to gut microbiome metagenomes, identifying pervasive gene-specific selective sweeps across human gut microbial populations—showing that natural selection acts at the level of individual genes within microbial species in ways that mirror what population geneticists have long observed in eukaryotes¹⁸.

14.5.3 AI and large language models in genomics

The application of large language models and deep learning to genomics is accelerating. Models trained on protein sequences (AlphaFold, ESMFold) have transformed structural biology; analogous models trained on DNA sequences are being used to predict the functional consequences of non-coding variants, identify regulatory elements, and interpret the output of massively parallel reporter assays^19,20. In the context of this course, these models are most relevant as new tools to evaluate critically: they are powerful, but they are black boxes by design, and their performance is tightly linked to the quality and diversity of the training data. The same reproducibility principles from Chapter 7 apply: what was the model trained on? Can you access the training data? Has it been benchmarked on held-out examples that resemble your use case? Using AI tools responsibly in genomics requires the same judgment that using any other tool requires—judgment you have been building all semester.

A cautionary note is worth including here. A 2023 study documented major data analysis errors in a high-profile series of papers claiming to identify tumor-specific microbiome signatures in cancer patients; the errors arose from inadequate handling of contamination and reference database choices in metagenomic classification²¹. The scientific community eventually identified and published corrections, but the episode illustrates that high-profile genomics results are not immune to the kinds of systematic errors that careful QC, version control, and reproducible workflows are designed to prevent. As genomics becomes more prominent in medicine, agriculture, and policy, the cost of analytical errors rises—and the responsibility of people trained in bioinformatics to catch them rises with it. The ethical obligations of genomic data analysis extend beyond consent and privacy to the basic scientific responsibility of getting the analysis right.

14.6 What It Means to Be a Bioinformatician

14.6.1 A field defined by questions, not tools

Recall from Chapter 1 the discussion of what a bioinformatician actually is. The honest answer is that there is no single answer. Some bioinformaticians are biologists who have learned to code well enough to conduct large-scale analyses on their own data; others are software engineers or statisticians who specialize in developing the algorithms and pipelines that biologists use. Some work in academic labs or core facilities, others in biotechnology companies, hospitals, or government agencies²². What unites them is not a particular programming language or pipeline but a commitment to using computation to address biological questions rigorously and transparently.

This course has been designed for a diverse audience: pre-med students and future genetic counselors who need to understand and critically evaluate genomic data without necessarily running pipelines themselves; graduate students who are building toward active computational research; and everyone in between. The skills emphasized throughout—understanding file formats, reading and evaluating methods sections, interpreting QC outputs, thinking critically about experimental design, writing clearly about what you did and why—are relevant across all of these careers. The ability to communicate across the biologist-statistician-computer scientist divide is genuinely rare and genuinely valuable.

14.6.2 Skills that transfer and skills to build

The specific tools you learned this semester will not all still be in active use in ten years. Some will have been superseded by faster, more accurate alternatives; some will be unmaintained; some will have been absorbed into broader platforms. What will still matter is the ability to read documentation, reason about how a new tool fits into the workflow skeleton introduced in Section 14.3, evaluate whether its assumptions match your data, and decide whether to adopt it. That evaluative capacity is what you have been building. When you read and discussed benchmarking papers in Lab 8—comparing variant callers across different datasets, sequencing depths, and filtering strategies—you were practicing exactly this skill: not accepting a method because it is popular, but asking what evidence supports its performance and under what conditions it fails.

Programming skills matter, and the languages that matter are shifting. R remains the standard for statistical analysis and visualization in genomics, and Bioconductor provides a mature ecosystem for everything from QC to single-cell analysis to population genomics²³. Python is dominant in machine learning and workflow management (Snakemake uses it natively). But a 2020 Nature article on the Rust programming language pointed to its growing adoption in bioinformatics tools where performance and memory safety are critical—tools like minimap2 (long-read alignment), ripgrep (fast text search), and several assemblers now have Rust components or Rust-based alternatives²⁴. For students heading toward software-intensive careers in bioinformatics, understanding what Rust offers and why it is increasingly attractive in performance-critical code is worth knowing, even if you never write a Rust program yourself.

Adaptability, then, is the meta-skill. Knowing how to learn a new tool—reading the documentation, running the tutorial dataset, checking the GitHub for maintenance status and open issues, looking for benchmark comparisons in the literature—is more durable than any specific proficiency. This course has given you practice with that process across many different tools and data types. The research project asked you to find and apply tools that were not taught in class. The grant panel asked you to evaluate whether an analysis plan was sensible before seeing the results. These experiences are preparation for a career in which the specific tools will keep changing and your ability to evaluate and adapt will determine how well you keep pace.

14.6.3 Where to keep learning

The resources that have supported this course will continue to serve you beyond it. The Akalin Computational Genomics with R textbook²³ is freely available online and covers R-based workflows in genomic intervals, functional genomics, single-cell analysis, and multi-omics integration in depth. Graham Coop’s population genetics notes²⁵ provide the rigorous mathematical treatment of the population genomic statistics introduced in Chapter 12. The Galaxy Training Network (training.galaxyproject.org) and EMBL-EBI training portal (ebi.ac.uk/training) offer regularly updated tutorials across virtually every workflow in this book and many beyond it. The Cold Spring Harbor Laboratory courses in Advanced Sequencing Technologies, Statistical Methods in Functional Genomics, and related areas represent some of the most intensive professional development opportunities available in the field. The Summer Institute in Statistical Genetics, held annually at Georgia Tech in Atlanta (sisg.biosciences.gatech.edu), offers modular short courses in quantitative and statistical genomics that are particularly well suited to students from this course who want to go deeper in population genetics, GWAS, or statistical modeling. For students at Auburn, the Graduate Certificate in Computational Biology (auburn.edu/cosam/departments/biology/certificate) provides a structured pathway to formalizing the skills introduced here within a degree program.

14.7 A Final Note

Bioinformatics feels like magic until you see the patterns: every pipeline starts with QC, every analysis needs reproducibility, every grant needs a clear workflow. By this point in the semester, you have not only run these analyses but built the confidence to explain them to collaborators, reviewers, and future employers. You have sat with messy real data and made defensible decisions about what to keep and what to discard. You have written methods sections, evaluated benchmarking papers, participated in a grant review panel, and contributed to a research project that belongs entirely to you and your group.

The bioinformatics field needs people who bring all of that—biological intuition, computational skill, statistical literacy, and scientific integrity—to the same table. The tools will keep changing. The questions will not. Go read some more genomes.

Dr. Laurie Stevison Spring 2026

14.8 Practice Problems

These problems are intentionally open-ended. There is no answer key.

A new topic, connected back. Choose a bioinformatics topic you have not encountered in this course—possibilities include phylogenomics, Hi-C and chromosome conformation capture, long-read methylation calling, pangenome graphs, single-cell ATAC-seq, ancient eDNA, or spatial transcriptomics. Find one primary research paper and one tutorial or review. Then answer: What step in the unified workflow does this method most directly affect? What concepts from this course apply directly, and what is genuinely new?
Tool evaluation. Pick any bioinformatics tool released in the last two years. Apply the five-question framework: What biological problem does it solve? What are the inputs and outputs? How is it benchmarked? Is it maintained? Can you reproduce a simple example? Write two paragraphs assessing whether you would trust it for a real project.
The course as a whole. Return to the genome analysis paper you described in Lab 1. Re-read the methods section. How many of the steps described there can you now name, explain, and critically evaluate? What would you ask the authors if you were a reviewer?

14.9 Reflection Questions

Obsolescence and adaptation. As I have now mentioned several times, I used 454 sequencing in my PhD and it has since became obsolete. Has anything you learned in a previous course already become outdated? How does knowing that this will keep happening change how you approach learning new tools?
The hidden curriculum, revisited. Chapters 7 and 11 covered project management, reproducibility, and grant funding. Which of these felt most unfamiliar at the start of the semester? Which will you use first after this course ends? What are other hidden curriculum topics that were not covered that you wish had been?
Your project as evidence. Your GitHub repository is a tangible artifact of what you know how to do. If a graduate program, employer, or collaborator asked you to describe your computational skills, how would you use it? What would you add or improve?
Breadth and depth. After reading Section 14.4, which area of bioinformatics not covered in this course interests you most, and why? What is the first concrete step you would take to learn more about it?
Ethics and responsibility. The cautionary example of the cancer microbiome study²¹ shows that analytical errors in genomics can have real consequences. What practices from this course—specific habits, documentation steps, or QC checks—would have been most likely to catch or prevent such errors? And what ethical responsibilities do you see for yourself as someone who can now evaluate and run genomics analyses?

14.10 References

Pevsner, J. Bioinformatics and Functional Genomics. (Wiley, 2015).

Carter, J. K. et al. Estimating phylogenies from genomes: A beginners review of commonly used genomic data in vertebrate phylogenomics. Journal of Heredity 114, 1–13 (2023).

Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).

Wang, Y. et al. The 1000 Chinese Pangenome empowers medical and population genetics. Nature 1–10 (2026) doi:10.1038/s41586-026-10315-y.

Aközbek, L. et al. A haplotype-resolved, chromosome-scale genome assembly for the southern live oak, Quercus virginiana. G3 GenesGenomesGenetics 16, jkag023 (2026).

Woo, H. & Eyun, S. Applications and techniques of single-cell RNA sequencing across diverse species. Briefings in Bioinformatics 26, bbaf354 (2025).

Moro, G., Brunner, E. & Basler, K. A practical guide to targeted single-cell RNA sequencing technologies. Communications Biology 9, 250 (2026).

Element Biosciences Introduces VITARI™, Redefining What High-Throughput Sequencing Makes Possible.

Philippidis, A. Illumina Sues Element Biosciences, Alleging Infringement of Flow Cell, Imaging Patents. GEN - Genetic Engineering and Biotechnology News (2025).

10.

Gencove Launches $49 Whole Genome Sequence for Consumer Health Alex Dickinson posted on the topic. LinkedIn.

11.

Wasik, K. et al. Comparing low-pass sequencing and genotyping for trait mapping in pharmacogenetics. BMC Genomics 22, 197 (2021).

12.

Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).

13.

Dekker, J. et al. An integrated view of the structure and function of the human 4D nucleome. Nature 649, 759–776 (2026).

14.

Liu, B. B. et al. Multiomics and deep learning dissect regulatory syntax in human development. Nature 1–14 (2026) doi:10.1038/s41586-026-10326-9.

15.

Zhu, S. et al. Multiomic single-cell perturbation screens reveal critical lncRNA regulators of senescence. Nature Aging 1–22 (2026) doi:10.1038/s43587-026-01100-7.

16.

Wild, S. Drowning in data sets? Here’s how to cut them down to size. Nature 651, 1121–1122 (2026).

17.

Ji, H. J., Pertea, M. & Salzberg, S. L. Annotating genomes at increased scale and resolution. Nature Reviews Genetics 1–13 (2026) doi:10.1038/s41576-026-00937-3.

18.

Wolff, R. & Garud, N. R. Gene-specific selective sweeps are pervasive across human gut microbiomes. Nature 650, 710–717 (2026).

19.

Olawade, D. B. et al. Bioinformatics and artificial intelligence in genomic data analysis: Current advances and future directions. Molecular Genetics and Genomics 300, 111 (2025).

20.

Ruan, W. et al. Large language models for bioinformatics. Quantitative Biology 14, e70014 (2026).

21.

Gihawi, A. et al. Major data analysis errors invalidate cancer microbiome findings. mBio 14, e01607–23 (2023).

22.

Gocke, M. What Can You Do With a Bioinformatics Degree? Northeastern University Graduate Programs (2024).

23.

Akalin, A. Computational Genomics with r. (CRC Press, 2020).

24.

Perkel, J. M. Why scientists are turning to Rust. Nature 588, 185–186 (2020).

25.

Coop, G. Population and quantitative genetics, 3rd ed. (2020).