By sequencing 523 ancient humans, we show that the primary source of ancestry in modern South Asians is a prehistoric genetic gradient between people related to early hunter-gatherers of Iran and Southeast Asia. After the Indus Valley Civilization's decline, its people mixed with individuals in the southeast to form one of the two main ancestral populations of South Asia, whose direct descendants live in southern India. Simultaneously, they mixed with descendants of Steppe pastoralists who, starting around 4000 years ago, spread via Central Asia to form the other main ancestral population. The Steppe ancestry in South Asia has the same profile as that in Bronze Age Eastern Europe, tracking a movement of people that affected both regions and that likely spread the distinctive features shared between Indo-Iranian and Balto-Slavic languages.
We report an ancient genome from the Indus Valley Civilization (IVC). The individual we sequenced fits as a mixture of people related to ancient Iranians (the largest component) and Southeast Asian hunter-gatherers, a unique profile that matches ancient DNA from 11 genetic outliers from sites in Iran and Turkmenistan in cultural communication with the IVC. These individuals had little if any Steppe pastoralist-derived ancestry, showing that it was not ubiquitous in northwest South Asia during the IVC as it is today. The Iranian-related ancestry in the IVC derives from a lineage leading to early Iranian farmers, herders, and hunter-gatherers before their ancestors separated, contradicting the hypothesis that the shared ancestry between early Iranians and South Asians reflects a large-scale spread of western Iranian farmers east. Instead, sampled ancient genomes from the Iranian plateau and IVC descend from different groups of hunter-gatherers who began farming without being connected by substantial movement of people.
The genetic formation of Central and South Asian populations has been unclear because of an absence of ancient DNA. To address this gap, we generated genome-wide data from 362 ancient individuals, including the first from eastern Iran, Turan (Uzbekistan, Turkmenistan, and Tajikistan), Bronze Age Kazakhstan, and South Asia. Our data reveal a complex set of genetic sources that ultimately combined to form the ancestry of South Asians today. We document a southward spread of genetic ancestry from the Eurasian Steppe, correlating with the archaeologically known expansion of pastoralist sites from the Steppe to Turan in the Middle Bronze Age (2300-1500 BCE). These Steppe communities mixed genetically with peoples of the Bactria Margiana Archaeological Complex (BMAC) whom they encountered in Turan (primarily descendants of earlier agriculturalists of Iran), but there is no evidence that the main BMAC population contributed genetically to later South Asians. Instead, Steppe communities integrated farther south throughout the 2nd millennium BCE, and we show that they mixed with a more southern population that we document at multiple sites as outlier individuals exhibiting a distinctive mixture of ancestry related to Iranian agriculturalists and South Asian hunter-gathers. We call this group Indus Periphery because they were found at sites in cultural contact with the Indus Valley Civilization (IVC) and along its northern fringe, and also because they were genetically similar to post-IVC groups in the Swat Valley of Pakistan. By co-analyzing ancient DNA and genomic data from diverse present-day South Asians, we show that Indus Periphery-related people are the single most important source of ancestry in South Asia — consistent with the idea that the Indus Periphery individuals are providing us with the first direct look at the ancestry of peoples of the IVC — and we develop a model for the formation of present-day South Asians in terms of the temporally and geographically proximate sources of Indus Periphery-related, Steppe, and local South Asian hunter-gatherer-related ancestry. Our results show how ancestry from the Steppe genetically linked Europe and South Asia in the Bronze Age, and identifies the populations that almost certainly were responsible for spreading Indo-European languages across much of Eurasia.
Although it has previously been shown that Neanderthals contributed DNA to modern humans, not much is known about the genetic diversity of Neanderthals or the relationship between late Neanderthal populations at the time at which their last interactions with early modern humans occurred and before they eventually disappeared. Our ability to retrieve DNA from a larger number of Neanderthal individuals has been limited by poor preservation of endogenous DNA and contamination of Neanderthal skeletal remains by large amounts of microbial and present-day human DNA. Here we use hypochlorite treatment of as little as 9 mg of bone or tooth powder to generate between 1- and 2.7-fold genomic coverage of five Neanderthals who lived around 39,000 to 47,000 years ago (that is, late Neanderthals), thereby doubling the number of Neanderthals for which genome sequences are available. Genetic similarity among late Neanderthals is well predicted by their geographical location, and comparison to the genome of an older Neanderthal from the Caucasus indicates that a population turnover is likely to have occurred, either in the Caucasus or throughout Europe, towards the end of Neanderthal history. We find that the bulk of Neanderthal gene flow into early modern humans originated from one or more source populations that diverged from the Neanderthals that were studied here at least 70,000 years ago, but after they split from a previously sequenced Neanderthal from Siberia around 150,000 years ago. Although four of the Neanderthals studied here post-date the putative arrival of early modern humans into Europe, we do not detect any recent gene flow from early modern humans in their ancestry.
The genetic features of isolated populations can boost power in complex-trait association studies, and an in-depth understanding of how their genetic variation has been shaped by their demographic history can help leverage these advantageous characteristics. Here, we perform a comprehensive investigation using 3,059 newly generated low-depth whole-genome sequences from eight European isolates and two matched general populations, together with published data from the 1000 Genomes Project and UK10K. Sequencing data give deeper and richer insights into population demography and genetic characteristics than genotype-chip data, distinguishing related populations more effectively and allowing their functional variants to be studied more fully. We demonstrate relaxation of purifying selection in the isolates, leading to enrichment of rare and low-frequency functional variants, using novel statistics, DVxy and SVxy. We also develop an isolation-index (Isx) that predicts the overall level of such key genetic characteristics and can thus help guide population choice in future complex-trait association studies.
Heterozygous mutations within homozygous sequences descended from a recent common ancestor offer a way to ascertain de novo mutations across multiple generations. Using exome sequences from 3222 British-Pakistani individuals with high parental relatedness, we estimate a mutation rate of 1.45 ± 0.05 × 10(-8) per base pair per generation in autosomal coding sequence, with a corresponding non-crossover gene conversion rate of 8.75 ± 0.05 × 10(-6) per base pair per generation. This is at the lower end of exome mutation rates previously estimated in parent-offspring trios, suggesting that post-zygotic mutations contribute little to the human germ-line mutation rate. We find frequent recurrence of mutations at polymorphic CpG sites, and an increase in C to T mutations in a 5' CCG 3' to 5' CTG 3' context in the Pakistani population compared to Europeans, suggesting that mutational processes have evolved rapidly between human populations.Estimates of human mutation rates differ substantially based on the approach. Here, the authors present a multi-generational estimate from the autozygous segment in a non-European population that gives insight into the contribution of post-zygotic mutations and population-specific mutational processes.
UNLABELLED: Runs of homozygosity (RoHs) are genomic stretches of a diploid genome that show identical alleles on both chromosomes. Longer RoHs are unlikely to have arisen by chance but are likely to denote autozygosity, whereby both copies of the genome descend from the same recent ancestor. Early tools to detect RoH used genotype array data, but substantially more information is available from sequencing data. Here, we present and evaluate BCFtools/RoH, an extension to the BCFtools software package, that detects regions of autozygosity in sequencing data, in particular exome data, using a hidden Markov model. By applying it to simulated data and real data from the 1000 Genomes Project we estimate its accuracy and show that it has higher sensitivity and specificity than existing methods under a range of sequencing error rates and levels of autozygosity. AVAILABILITY AND IMPLEMENTATION: BCFtools/RoH and its associated binary/source files are freely available from https://github.com/samtools/BCFtools CONTACT: email@example.com or firstname.lastname@example.org SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
RATIONALE: Vascular smooth muscle cell (VSMC) accumulation is a hallmark of atherosclerosis and vascular injury. However, fundamental aspects of proliferation and the phenotypic changes within individual VSMCs, which underlie vascular disease, remain unresolved. In particular, it is not known whether all VSMCs proliferate and display plasticity or whether individual cells can switch to multiple phenotypes.
OBJECTIVE: To assess whether proliferation and plasticity in disease is a general characteristic of VSMCs or a feature of a subset of cells.
METHODS AND RESULTS: Using multicolor lineage labeling, we demonstrate that VSMCs in injury-induced neointimal lesions and in atherosclerotic plaques are oligoclonal, derived from few expanding cells. Lineage tracing also revealed that the progeny of individual VSMCs contributes to both alpha smooth muscle actin (aSma)-positive fibrous cap and Mac3-expressing macrophage-like plaque core cells. Costaining for phenotypic markers further identified a double-positive aSma+ Mac3+ cell population, which is specific to VSMC-derived plaque cells. In contrast, VSMC-derived cells generating the neointima after vascular injury generally retained the expression of VSMC markers and the upregulation of Mac3 was less pronounced. Monochromatic regions in atherosclerotic plaques and injury-induced neointima did not contain VSMC-derived cells expressing a different fluorescent reporter protein, suggesting that proliferation-independent VSMC migration does not make a major contribution to VSMC accumulation in vascular disease.
CONCLUSIONS: We demonstrate that extensive proliferation of a low proportion of highly plastic VSMCs results in the observed VSMC accumulation after injury and in atherosclerotic plaques. Therapeutic targeting of these hyperproliferating VSMCs might effectively reduce vascular disease without affecting vascular integrity.
Examining complete gene knockouts within a viable organism can inform on gene function. We sequenced the exomes of 3222 British adults of Pakistani heritage with high parental relatedness, discovering 1111 rare-variant homozygous genotypes with predicted loss of function (knockouts) in 781 genes. We observed 13.7% fewer homozygous knockout genotypes than we expected, implying an average load of 1.6 recessive-lethal-equivalent loss-of-function (LOF) variants per adult. When genetic data were linked to the individuals' lifelong health records, we observed no significant relationship between gene knockouts and clinical consultation or prescription rate. In this data set, we identified a healthy PRDM9-knockout mother and performed phased genome sequencing on her, her child, and control individuals. Our results show that meiotic recombination sites are localized away from PRDM9-dependent hotspots. Thus, natural LOF variants inform on essential genetic loci and demonstrate PRDM9 redundancy in humans.
Whole-genome and whole-exome sequence data from large numbers of individuals reveal that we all carry many variants predicted to inactivate genes (knockouts). This discovery raises questions about the phenotypic consequences of these knockouts and potentially allows us to study human gene function through the investigation of homozygous loss-of-function carriers. Here, we discuss strategies, recent results, and future prospects for large-scale human knockout studies. We examine their relevance to studying gene function, population genetics, and importantly, the implications for accurate clinical interpretations.
Mountain gorillas are an endangered great ape subspecies and a prominent focus for conservation, yet we know little about their genomic diversity and evolutionary past. We sequenced whole genomes from multiple wild individuals and compared the genomes of all four Gorilla subspecies. We found that the two eastern subspecies have experienced a prolonged population decline over the past 100,000 years, resulting in very low genetic diversity and an increased overall burden of deleterious variation. A further recent decline in the mountain gorilla population has led to extensive inbreeding, such that individuals are typically homozygous at 34% of their sequence, leading to the purging of severely deleterious recessive mutations from the population. We discuss the causes of their decline and the consequences for their future survival.
BACKGROUND: During intra-erythrocytic development, late asexually replicating Plasmodium falciparum parasites sequester from peripheral circulation. This facilitates chronic infection and is linked to severe disease and organ-specific pathology including cerebral and placental malaria. Immature gametocytes - sexual stage precursor cells - likewise disappear from circulation. Recent work has demonstrated that these sexual stage parasites are located in the hematopoietic system of the bone marrow before mature gametocytes are released into the bloodstream to facilitate mosquito transmission. However, as sequestration occurs only in vivo and not during in vitro culture, the mechanisms by which it is regulated and enacted (particularly by the gametocyte stage) remain poorly understood.
RESULTS: We generated the most comprehensive P. falciparum functional gene network to date by integrating global transcriptional data from a large set of asexual and sexual in vitro samples, patient-derived in vivo samples, and a new set of in vitro samples profiling sexual commitment. We defined more than 250 functional modules (clusters) of genes that are co-expressed primarily during the intra-erythrocytic parasite cycle, including 35 during sexual commitment and gametocyte development. Comparing the in vivo and in vitro datasets allowed us, for the first time, to map the time point of asexual parasite sequestration in patients to 22 hours post-invasion, confirming previous in vitro observations on the dynamics of host cell modification and cytoadherence. Moreover, we were able to define the properties of gametocyte sequestration, demonstrating the presence of two circulating gametocyte populations: gametocyte rings between 0 and approximately 30 hours post-invasion and mature gametocytes after around 7 days post-invasion.
CONCLUSIONS: This study provides a bioinformatics resource for the functional elucidation of parasite life cycle dynamics and specifically demonstrates the presence of the gametocyte ring stages in circulation, adding significantly to our understanding of the dynamics of gametocyte sequestration in vivo.
In the current era of malaria eradication, reducing transmission is critical. Assessment of transmissibility requires tools that can accurately identify the various developmental stages of the malaria parasite, particularly those required for transmission (sexual stages). Here, we present a method for estimating relative amounts of Plasmodium falciparum asexual and sexual stages from gene expression measurements. These are modeled using constrained linear regression to characterize stage-specific expression profiles within mixed-stage populations. The resulting profiles were analyzed functionally by gene set enrichment analysis (GSEA), confirming differentially active pathways such as increased mitochondrial activity and lipid metabolism during sexual development. We validated model predictions both from microarrays and from quantitative RT-PCR (qRT-PCR) measurements, based on the expression of a small set of key transcriptional markers. This sufficient marker set was identified by backward selection from the whole genome as available from expression arrays, targeting one sentinel marker per stage. The model as learned can be applied to any new microarray or qRT-PCR transcriptional measurement. We illustrate its use in vitro in inferring changes in stage distribution following stress and drug treatment and in vivo in identifying immature and mature sexual stage carriers within patient cohorts. We believe this approach will be a valuable resource for staging lab and field samples alike and will have wide applicability in epidemiological studies of malaria transmission.
Metagenomic shotgun sequencing data can identify microbes populating a microbial community and their proportions, but existing taxonomic profiling methods are inefficient for increasingly large data sets. We present an approach that uses clade-specific marker genes to unambiguously assign reads to microbial clades more accurately and >50× faster than current approaches. We validated our metagenomic phylogenetic analysis tool, MetaPhlAn, on terabases of short reads and provide the largest metagenomic profiling to date of the human gut. It can be accessed at http://huttenhower.sph.harvard.edu/metaphlan/.