The Vertebrate Genomes Project: a new era of genome sequencing
16 new high-quality reference genomes from vertebrates are published, advancing comparative biology, conservation, and health research
The international Vertebrate Genomes Project (VGP) publishes today their flagship study focused on genome assembly quality and standardization for the field of genomics in a special issue of Nature, along with 20 associated publications. This study presents 16 diploid high-quality, near error-free, and near complete vertebrate reference genome assemblies that result from the five-year pilot phase of the VGP project. Understanding the DNA sequence of all vertebrates will enable the study of how genes have contributed to the evolution and survival of these species and it will also enable us to answer questions in health research. Genome data were primarily generated at three sequencing hubs that have invested in the mission of the VGP including the Rockefeller University Vertebrate Genome Lab, New York, USA (partly supported by the Howard Hughes Medical Institute), the Wellcome Sanger Institute, UK, and the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden, Germany.
Growing out of the decade-old mission of the Genome 10K Community of Scientists (G10K) to sequence the genomes of 10,000 vertebrate species and other comparative genomics efforts, the goal of the VGP is to generate near error-free reference genome assemblies of all 72,000 extant vertebrate species. Reference genome assemblies provide a map of a species’ DNA sequence and its spatial context, that is, where along the chromosomes a specific piece of DNA sequence can be found. With its ambitious mission the VGP aims to address fundamental questions in biology, conservation, and disease including identifying species most genetically at risk for extinction and preserving their genetic information for future generations. The high-quality VGP genomes will become the main references for their species and will be stored in the Genome Ark, a digital open-access library of genomes.
In the past, the generation of reference assemblies was expensive and labor-intensive so that they were only produced for human and the most important model organisms, while still containing gaps and errors. However, for a complete understanding of evolutionary processes and other fundamental questions in biology, high-quality reference genome assemblies of all species are required. Adam Phillippy, chair of the VGP genome assembly and informatics working group of over 100 members and head of the Genome Informatics Section of the National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland, USA, says: “Completing the first vertebrate reference genome, human, took over 10 years and $3 billion dollars. Thanks to continued research and investment in DNA sequencing technology over the past 20 years, we can now repeat this amazing feat multiple times per day for just a few thousand dollars per genome.”
Contribution of Max Planck Institutes to the VGP
One of the sequencing hubs is the Max Planck Institute of Molecular Cell Biology and Genetics in Dresden. Gene Myers, lead of the VGP sequencing hub at the Max Planck Institute and the Center for Systems Biology Dresden says: “The VGP project is at the vanguard of the creation of a genomic catalog in analogy with Linnaeus' classification of life. I and my colleagues in Dresden are excited to be contributing such superb genome reconstructions with the financial support of the Max-Planck Society of Germany.” The Dresden scientists are part of the DRESDEN-concept Genome Center and have special expertise in the use of various “long-read” sequencing technologies. Longer pieces of sequence are important because they can resolve and span complex and repetitive parts in the genome, allowing clear assignments. The Dresden hub has contributed to three genomes of the 16 released genomes: the greater horseshoe bat (Rhinolophus ferrumequinum), the flier cichlid fish (Archocentrus centrarchus), and the genome pale spear nosed bat (Phyllostomus discolor). The first two genomes were wholly the work of the Dresden hub, and for P. discolor, the genome data was produced at the Rockefeller hub and the assembly and transcriptomic data were produced by Dresden.
The bat tissue samples were provided by members of the Bat1K consortium led by Sonja Vernes, Max Planck Institute for Psycholinguistics, NL and the University of St Andrews, UK and Emma Teeling, University College Dublin. The Bat1K consortium has the goal of sequencing all living bat species and has been instrumental in sequencing and analyzing bat genomes in collaboration and as a partner of the overall VGP project. The flier cichlid tissue samples came from Axel Meyer, University of Konstanz, Germany.
Robert Kraus from the Max Planck Institute of Animal Behavior was an early contributor to generating ideas and vision to the mission of VGP and pushed focus onto fewer genomes but higher quality per genome. Robert Kraus also sequenced several bird species to study the basis of avian influenza immune responses. These genomes are still in the background of VGP and will be part of the next releases.
The excellent quality of these genome assemblies enables novel discoveries at unprecedented scale with implications for characterizing the biodiversity of all life, species conservation, and human health and disease. The first high-quality reference genomes of six bat species, generated by the Bat 1K consortium were published in July 2020 in Nature and revealed selection and loss of immunity-related genes that may underlie bats’ unique tolerance to viral infection, providing novel avenues of research to increase survivability, particularly relevant for emerging infectious diseases, such as the current Covid-19 pandemic.
Sonja Vernes, a founding director of the Bat1K consortium and UKRI Future Leaders Fellow said: “These new genomes are a huge step towards answering key questions about biology and evolution across vertebrates. We can already see exciting new features of chromosomal evolution, including changes found in the six bat species provided by the Bat1K consortium that may contribute to enhanced immune systems and tolerance to pathogens. In the future these genomes will also help us understand complex behaviors like the evolution of animal communication and how human speech and language evolved.”
Specific to conservation, analyses of the VGP genomes for the kākāpō, a flightless parrot endemic to New Zealand, and the vaquita, a small porpoise and the most endangered marine mammal endemic to the Gulf of Mexico, imply evolutionary and demographic histories of purging harmful mutations in the wild and long-term small population size at genetic equilibrium.
A new era in genome science through collaboration
This massive comparative genomics project represents a new era of innovation in genome science, developing and using novel pipelines for state-of-the-art and consistent sequencing, assembly, and annotation techniques, with implications for addressing fundamental questions in comparative biology, genetics, biodiversity, conservation and health. It also serves as a model of scientific cooperation for other large-scale genomic projects based on the extensive infrastructure, collaboration and leadership of the VGP involving hundreds of international scientists working together from more than 50 institutions in 12 different countries since the VGP was initiated in 2016.
As a next step, the VGP will continue to work collaboratively across the globe and with other consortia to complete Phase 1 of the project whose goal is to sequence one representative species of the 260 orders of vertebrates. Technological advances, improved computational methods and the ever-decreasing cost of sequencing have enabled the VGP to pursue the ambitious goal of producing a reference genome assembly for each of the extant vertebrate species on earth. In the first phase of the project, the VGP has been focused on testing and improving genome sequencing and assembly approaches, on assembling a first set of 260 high-quality genomes of species representing all vertebrate orders. Phase 2 will focus on representative species from each vertebrate family and is currently in the process of sample identification and fundraising. The VGP has an open-door policy and welcomes others to join its efforts, ranging from fundraising, sample collection, and generating genome assemblies, or including their own genome assemblies that meet the VGP metrics as part of our overall mission.
All sequence data and assemblies are being made freely available as they are being produced and can be downloaded or browsed at GenomeArk https://vgp.github.io/genomeark/, Genbank https://www.ncbi.nlm.nih.gov/bioproject/489243, Ensembl https://projects.ensembl.org/vgp/, and UCSC https://hgdownload.soe.ucsc.edu/hubs/VGP/.
Full list of the 16 genomes
- pale spear-nosed bat (Phyllostomus discolor)
- greater horseshoe bat (Rhinolophus ferrumequinum)
- Canada lynx (Lynx canadensis)
- platypus (Ornithorhynchus anatinus)
- zebra finch (Taeniopygia guttata)
- Kākāpō (Strigops habroptilus)
- Anna's hummingbird (Calypte anna)
- Goode's thornscrub tortoise (Gopherus evgoodei)
- two-lined caecilian (Rhinatrema bivittatum)
- zig-zag eel (Mastacembelus armatus)
- climbing perch (Anabas testudineus)
- flier cichlid (Archocentrus centrarchus)
- eastern happy cichlid (Astatotilapia calliptera)
- channel bull blenny (Cottoperca gobio)
- blunt-snouted clingfish (Gouania willdenowi)
- thorny skate (Amblyraja radiata)