Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Porubsky, D. et al. Gaps and complex structurally variant loci in phased genome ***emblies. Genome Res. 33, 496–510 (2023).
Ebler, J. et al. Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant cl***es. Nat. Genet. 54, 518–525 (2022).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Garg, S. et al. Chromosome-scale, haplotype-resolved ***embly of human genomes. Nat. Biotechnol. (2020).
Porubsky, D. et al. Fully phased human genome ***embly without parental data using single-cell strand sequencing and long reads. Nat. Biotechnol. (2020).
Koren, S. et al. De novo ***embly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated blockysis of structural variation. Science 372, eabf7117 (2021).
Rautiainen, M. et al. Telomere-to-telomere ***embly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Cheng, H., Asri, M., Lucas, J., Koren, S. & Li, H. Scalable telomere-to-telomere ***embly for diploid and polyploid genomes with double graph. Nat. Methods 21, 967–970 (2024).
1000 Genomes Project Consortiumet al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Henglin, M. et al. Graphasing: phasing diploid genome ***embly graphs with single-cell strand sequencing. Genome Biol. 25, 265 (2024).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
Aganezov, S. et al. A complete reference genome improves blockysis of human genetic variation. Science 376, eabl3533 (2022).
Kazazian, H. H. Jr et al. Haemophilia A resulting from de novo insertion of L1 sequences represents a novel mechanism for mutation in man. Nature 332, 164–166 (1988).
Porubsky, D. et al. Recurrent inversion polymorphisms in humans ***ociate with genetic instability and genomic disorders. Cell 185, 1986–2005.e26 (2022).
Cooper, G. M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011).
Sudmant, P. H. et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010).
Sudmant, P. H. et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015).
Vollger, M. R. et al. Segmental duplications and their variation in a complete human genome. Science 376, eabj6965 (2022).
Jeong, H. et al. Structural polymorphism and diversity of human segmental duplications. Nat. Genet. 57, 390–401 (2025).
Hallast, P., Agdzhoyan, A., Balanovsky, O., Xue, Y. & Tyler-Smith, C. A Southeast Asian origin for present-day non-African human Y chromosomes. Hum. Genet.140, 299–307 (2021).
Hallast, P. et al. Assembly of 43 human Y chromosomes reveals extensive complexity and variation. Nature 621, 355–364 (2023).
Rhie, A. et al. The complete sequence of a human Y chromosome. Nature 621, 344–354 (2023).
Porubsky, D. et al. Human de novo mutation rates from a four-generation pedigree reference. Nature 643, 427–436 (2025).
Ruderfer, D. M. et al. Patterns of genic intolerance of rare copy number variation in 59,898 human exomes. Nat. Genet. 48, 1107–1111 (2016).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19 (2022).
Prodanov, T. et al. Locityper: targeted genotyping of complex polymorphic genes. Preprint at bioRxiv (2024).
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).
Horton, R. et al. Gene map of the extended human MHC. Nat. Rev. Genet. 5, 889–899 (2004).
Norman, P. J. et al. Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA cl*** I and II. Genome Res. 27, 813–823 (2017).
Trowsdale, J. & Knight, J. C. Major histocompatibility complex genomics and human disease. Annu. Rev. Genomics Hum. Genet. 14, 301–323 (2013).
Abi-Rached, L. et al. Immune diversity sheds light on missing variation in worldwide genetic diversity panels. PLoS ONE 13, e0206512 (2018).
Barker, D. J. et al. The IPD-IMGT/HLA Database. Nucleic Acids Res. 51, D1053–D1060 (2023).
Mentzer, A. J. et al. High-resolution African HLA resource uncovers HLA-DRB1 expression effects underlying vaccine response. Nat. Med. 30, 1384–1394 (2024).
Liu, B., Shao, Y. & Fu, R. Current research status of HLA in immune-related diseases. Immun. Inflamm. Dis. 9, 340–350 (2021).
Horton, R. et al. Variation blockysis and gene annotation of eight MHC haplotypes: the MHC Haplotype Project. Immunogenetics 60, 1–18 (2008).
Houwaart, T. et al. Complete sequences of six major histocompatibility complex haplotypes, including all the major MHC cl*** II structures. Hladnikia 102, 28–43 (2023).
Gorski, J. The HLA-DRw8 lineage was generated by a deletion in the DR B region followed by first domain diversification. J. Immunol. 142, 4041–4045 (1989).
Gongora, R. Presence of solitary exon 1 sequences in the HLA-DR region. Hereditas 127, 47–49 (1997).
Chung, E. K. et al. Genetic sophistication of human complement components C4A and C4B and RP-C4-CYP21-TNX (RCCX) modules in the major histocompatibility complex. Am. J. Hum. Genet. 71, 823–837 (2002).
Bánlaki, Z. et al. Intraspecific evolution of human RCCX copy number variation traced by haplotypes of the CYP21A2 gene. Genome Biol. Evol. 5, 98–112 (2013).
Chin, C.-S. et al. Multiscale blockysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nat. Methods 20, 1213–1221 (2023).
Gu, S. et al. Alu-mediated diverse and complex pathogenic copy-number variants within human chromosome 17 at p13.3. Hum. Mol. Genet. 24, 4061–4077 (2015).
Balachandran, P. et al. Transposable element-mediated rearrangements are prevalent in human genomes. Nat. Commun. 13, 7115 (2022).
Beck, C. R. et al. Megabase length hypermutation accompanies human structural variation at 17p11.2. Cell 176, 1310–1324.e10 (2019).
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
Audano, P. A., Paisie, C., The Human Genome Structural Variation Consortium & Beck, C. R. Large complex structural rearrangements in human genomes harbor cryptic structures. Preprint at bioRxiv (2024).
Collins, R. L. et al. Defining the diverse spectrum of inversions, complex structural variation, and chromothripsis in the morbid human genome. Genome Biol. 18, 36 (2017).
Marques-Bonet, T. & Eichler, E. E. The evolution of human segmental duplications and the core duplicon hypothesis. Cold Spring Harb. Symp. Quant. Biol. 74, 355–362 (2009).
Winkelsas, A. M. et al. Targeting the 5′ untranslated region of SMN2 as a therapeutic strategy for spinal muscular atrophy. Mol. Ther. Nucleic Acids 23, 731–742 (2021).
Sivanesan, S., Howell, M. D., Didonato, C. J. & Singh, R. N. Antisense oligonucleotide mediated therapy of spinal muscular atrophy. Transl. Neurosci. (2013).
Bolognini, D. et al. Recurrent evolution and selection shape structural diversity at the amylase locus. Nature (2024).
Yilmaz, F. et al. Reconstruction of the human amylase locus reveals ancient duplications seeding modern-day variation. Science 386, eadn0609 (2024).
Usher, C. L. et al. Structural forms of the human amylase locus and their relationships to SNPs, haplotypes and obesity. Nat. Genet. 47, 921–925 (2015).
Altemose, N. et al. Complete genomic and epigenetic maps of human centromeres. Science 376, eabl4178 (2022).
Logsdon, G. A. et al. The variation and evolution of complete human centromeres. Nature 629, 136–145 (2024).
Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022).
Shepelev, V. A., Alexandrov, A. A., Yurov, Y. B. & Alexandrov, I. A. The evolutionary origin of man can be traced in the layers of defunct ancestral alpha satellites flanking the active centromeres of human chromosomes. PLOS Genet. 5, e1000641 (2009).
O’Neill, R. J., O’Neill, M. J. & Graves, J. A. Undermethylation ***ociated with retroelement activation and chromosome remodelling in an interspecific mammalian hybrid. Nature 393, 68–72 (1998).
Chaisson, M. J. P. et al. Resolving the complexity of the human genome using single-molecule sequencing. Nature 517, 608–611 (2015).
Gao, Y. et al. A pangenome reference of 36 Chinese populations. Nature 619, 112–121 (2023).
Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022).
Schloissnig, S. et al. Structural variation in 1,019 diverse humans based on long-read sequencing Nature (2024).
International HapMap Consortium. The International HapMap Project. Nature 426, 789–796 (2003).
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Sanders, A. D., Falconer, E., Hills, M., Spierings, D. C. J. & Lansdorp, P. M. Single-cell template strand sequencing by Strand-seq enables the characterization of individual homologs. Nat. Protoc. 12, 1151–1176 (2017).
Falconer, E. et al. DNA template strand sequencing of single-cells maps genomic rearrangements at high resolution. Nat. Methods 9, 1107–1112 (2012).
Astashyn, A. et al. Rapid and sensitive detection of genome contamination at scale with FCS-GX. Genome Biol. 25, 60 (2024).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing ***essment for genome ***emblies. Genome Biol. 21, 245 (2020).
Vollger, M. R. et al. Long-read sequence and ***embly of segmental duplications. Nat. Methods 16, 88–94 (2019).
Chen, Y., Zhang, Y., Wang, A. Y., Gao, M. & Chong, Z. Accurate long-read de novo ***embly evaluation with Inspector. Genome Biol. 22, 312 (2021).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Huang, N. & Li, H. compleasm: A faster and more accurate reimplementation of BUSCO. Bioinformatics 39, btad595 (2023).
Kriventseva, E. V. et al. OrthoDB v10: sampling the diversity of animal, plant, fungal, protist, bacterial and viral genomes for evolutionary and functional annotations of orthologs. Nucleic Acids Res. 47, D807–D811 (2019).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Li, H. New strategies to improve minimap2 alignment accuracy. Bioinformatics 37, 4572–4574 (2021).
Jain, C., Koren, S., Dilthey, A., Phillippy, A. M. & Aluru, S. A fast adaptive algorithm for computing whole-genome homology maps. Bioinformatics 34, i748–i756 (2018).
Ren, J. & Chaisson, M. J. P. lra: a long read aligner for sequences and contigs. PLoS Comput Biol. 17, e1009078 (2021).
Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).
Heller, D. & Vingron, M. SVIM-asm: structural variant detection from haploid and diploid genome ***emblies. Bioinformatics 36, 5519–5521 (2021).
Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).
Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read blockysis. Bioinformatics 28, i333–i339 (2012).
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Chen, Y. et al. Deciphering the exact breakpoints of structural variations using long sequencing reads with DeBreak. Nat. Commun. 14, 283 (2023).
Heller, D. & Vingron, M. SVIM: structural variant identification using mapped long reads. Bioinformatics 35, 2907–2915 (2019).
Zheng, Z. et al. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat. Comput. Sci. 2, 797–803 (2022).
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).
Smit, A. F. A., Hubley, R. & Green, P. RepeatMasker Open-4.0 (Institute for Systems Biology, 2013).
Storer, J., Hubley, R., Rosen, J., Wheeler, T. J. & Smit, A. F. The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob. DNA 12, 2 (2021).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008 (2021).
Li, H. et al. The sequence alignment/map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Gros, C., Sanders, A. D., Korbel, J. O., Marschall, T. & Ebert, P. ASHLEYS: automated quality control for single-cell Strand-seq data. Bioinformatics 37, 3356–3357 (2021).
Höps, W. et al. Impact and characterization of serial structural variations across humans and great apes. Nat. Commun. 15, 8007 (2024).
Porubsky, D. et al. Inversion polymorphism in a complete human genome ***embly. Genome Biol. 24, 100 (2023).
Numanagic, I. et al. Fast characterization of segmental duplications in genome ***emblies. Bioinformatics 34, i706–i714 (2018).
Benson, G. Tandem repeats finder: a program to blockyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics (2009).
Morgulis, A., Gertz, E. M., Schäffer, A. A. & Agarwala, R. WindowMasker: window-based masker for sequenced genomes. Bioinformatics 22, 134–141 (2006).
Pendleton, A. L. et al. Comparison of village dog and wolf genomes highlights the role of the neural crest in dog domestication. BMC Biol. 16, 64 (2018).
Mistry, J., Finn, R. D., Eddy, S. R., Bateman, A. & Punta, M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res. 41, e121 (2013).
Krumsiek, J., Arnold, R. & Rattei, T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics 23, 1026–1028 (2007).
Frankish, A. et al. GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 (2023).
McLaren, W. et al. The Ensembl Variant Effect Predictor. Genome Biol. 17, 122 (2016).
Martin, F. J. et al. Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023).
Lee, B. T. et al. The UCSC Genome Browser database: 2022 update. Nucleic Acids Res. 50, D1115–D1122 (2022).
Lindeboom, R. G. H., Supek, F. & Lehner, B. The rules and impact of nonsense-mediated mRNA decay in human cancers. Nat. Genet. 48, 1112–1118 (2016).
Pardo-Palacios, F. J. et al. SQANTI3: curation of long-read transcriptomes for accurate identification of known and novel isoforms. Nat. Methods 21, 793–797 (2024).
Robinson, J. T. et al. Integrative Genomics Viewer. Nat. Biotechnol. 29, 24–26 (2011).
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Pertea, M. et al. CHESS: a new human gene catalog curated from thousands of large-scale RNA sequencing experiments reveals extensive transcriptional noise. Genome Biol. 19, 208 (2018).
Larsson, A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioinformatics 30, 3276–3278 (2014).
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
ENCODE Project Consortiumet al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).
Sollis, E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
Purcell, S. et al. PLINK: a tool set for whole-genome ***ociation and population-based linkage blockyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Hickey, G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. 42, 663–673 (2024).
Hofmeister, R. J., Ribeiro, D. M., Rubinacci, S. & Delaneau, O. Accurate rare variant phasing of whole-genome and whole-exome sequencing data in the UK Biobank. Nat. Genet. 55, 1243–1249 (2023).
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Camacho, C. et al. BLAST+: architecture and applications. BMC Bioinformatics 10, 421 (2009).
Audano, P., Christine, B. & Human Genome Structural Variation Consortium. A method for calling complex SVs. Zenodo (2024).
Bellman, R. On a routing problem. Quart. Appl. Math. 16, 87–90 (1958).
Yoo, D. et al. Complete sequencing of ape genomes. Nature 641, 401–418 (2025).
Prodanov, T. & Bansal, V. Robust and accurate estimation of paralog-specific copy number for duplicated genes using whole-genome sequencing. Nat. Commun. 13, 3221 (2022).
Chen, X. et al. Spinal muscular atrophy diagnosis and carrier screening from genome sequencing data. Genet. Med. 22, 945–953 (2020).
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Li, H. Identifying centromeric satellites with dna-brnn. Bioinformatics 35, 4408–4410 (2019).
R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, 2020).
Wickham, H. Ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).
McNulty, S. M. & Sullivan, B. A. Alpha satellite DNA biology: finding function in the recesses of the genome. Chromosome Res. 26, 115–138 (2018).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGl***: interactive visualization of m***ive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Mastrorosa, F. K. et al. Identification and annotation of centromeric hypomethylated regions with CDR-Finder. Bioinformatics 40, btae733 (2024).
Ebert, P. hgsvc/phase3-main-pub: v1.1 HGSVC phase 3 revision stage/ZENODO (v1.1). Zenodo (2024).