A much-debated question in human evolution is the relationship between modern humans and Neandertals. Modern humans appear in the African fossil record about 200,000 years ago. Neandertals appear in the European fossil record about 230,000 years ago  and disappear about 30,000 year ago. They lived in Europe and western Asia with a range that extended as far east as Siberia  and as far south as the middle East. The overlap of Neandertals and modern humans in space and time suggests the possibility of interbreeding. Evidence, both for  and against interbreeding , have been put forth based on the analysis of modern human DNA. Although mitochondrial DNA from multiple Neandertals has shown that Neandertals fall outside the range of modern human variation , , , , , , low-levels of gene flow cannot be excluded , , .
Analysis of the draft sequence of the Neandertal genome revealed that the Neandertal genome shares more alleles with non-African than with sub-Saharan African genomes . One hypothesis that could explain this observation is a history of gene flow from Neandertals into modern humans, presumably when they encountered each other in Europe and the Middle East  (Figure 1). An alternative hypothesis is that the findings are explained by ancient population structure in Africa , , , , whereby the population ancestral to Neandertal and modern human ancestors was subdivided. If this substructure persisted until modern humans carrying Upper Paleolithic technologies expanded out of Africa so that the modern human population that migrated was genetically closer to Neandertals, people outside Africa today would share more genetic variants with Neandertals that people in sub-Saharan Africa , ,  (Figure 1). Ancient substructure in Africa is a plausible alternative to the hypothesis of recent gene flow. Today, sub-Saharan Africans harbor deep lineages that are consistent with a highly-structured ancestral population , , , , , , , , , , . Evidence for ancient structure in Africa has also been offered based on the substantial diversity in neurocranial geometry amongst early modern humans . Thus, it is important to test formally whether substructure could explain the genetic evidence for Neandertals being more closely related to non-Africans than to Africans.
(A) In the case of recent gene flow from Neandertals (NEA) into the ancestors of non-Africans (CEU) but not into the ancestors of Africans (YRI), we expect long range LD at sites where Neandertal has the derived allele, and this expectation of admixture generated LD is verified by computer simulation as shown in the right of the panel along with a fitted exponential decay curve. (B) In the case of ancient structure, we expect short range LD, reflecting the time since Neandertals and non-Africans derived from a shared ancestral population, and this expectation is also verified by simulation.
A direct way to distinguish the hypothesis of recent gene flow from the hypothesis of ancient substructure is to infer the date for when the ancestors of Neandertals and a modern non-African population last exchanged genes. In the recent gene flow scenario, the date is not expected to be much older than 100,000 years ago, corresponding to the time of the earliest documented modern humans outside of Africa . In the ancient substructure scenario, the date of last common ancestry is expected to be at least 230,000 years ago, since Neandertals must have separated from modern humans by that time based on the Neandertal fossil record of Europe .
The strategy of using LD to estimate dates of gene flow events has been previously been explored by several groups , , , , . Our methodology is conceptually similar to the methodology developed by Moorjani et al., but is dealing with a more challenging technical problem since the methodology developed by Moorjani et al. is adapted for relatively recent admixtures. In recently admixed populations that have not experienced recent bottlenecks, admixture LD extends over size scales at which non-admixture LD makes a negligible contribution. Thus, one can infer the time of gene flow based on inter-marker spacings that are larger than the scale of non-admixture LD. For older admixtures however (such as may have occurred in the case of Neandertals), non-admixture LD occurs almost at the same size scale as admixture LD. To account for this, we study pairs of markers that are very close to each other, but ascertain them in a way that greatly minimizes the signals of non-admixture LD while enhancing the signals of admixture LD. Thus, unlike in the case of recent admixtures, non-admixture LD could bias an admixture date obtained using our methods; however, we show using simulations of a very wide set of demographic scenarios that our marker ascertainment procedure makes the bias so small that our inferences are qualitatively unaffected.
Our methodology is based on the idea that if two alleles, a genetic distance x (expected number of crossover recombination events per meiosis) apart, arose on the Neandertal lineage and introgressed into modern humans at time tGF, the probability that these alleles have not been broken up by recombination since gene flow is proportional to . We show that the LD across introgressed pairs of alleles is expected to decay exponentially with genetic distance. The rate of decay is informative of the time of gene flow and is robust to demographic events (Appendix A, Text S1). In practice, we need to ascertain SNPs that, assuming recent gene flow occurred, are likely to have arisen on the Neandertal lineage and introgressed into modern humans. We choose a particular ascertainment scheme and show, using simulations of a number of demographic models, that the exponential decay of LD across pairs of ascertained SNPs provides accurate estimates of the time of gene flow. A second potential source of bias in estimating ancient dates arises from uncertainties in the genetic map. We develop a correction for this bias and show that this correction yields accurate dates in the presence of uncertainties in the genetic map. Combining these various strategies, we are able to obtain accurate estimates of the date of last exchange of genes between Neandertals and modern humans (also see Discussion). This date shows that recent gene flow between Neandertals and modern humans occurred but does not exclude that ancient substructure in Africa also contributes to the LD observed.
To study how LD decays with the distance in the genome, we computed the average value, , of the measure of linkage disequilibrium D (the excess rate of occurrence of derived alleles at two SNPs compared with the expectation if they were independent ) between pairs of SNPs binned by genetic distance x (see Methods). Immediately after the time of last gene flow between Neandertal (or their relatives) and human ancestors, long range LD is generated, and it is then expected to decay at a constant rate per generation as recombination breaks down the segments shared with Neandertals. Thus, in the absence of new LD-generating events (discussed further below), the statistic across pairs of introgressed alleles is expected to have an exponential decay with genetic distance, and the genetic extent of the decay can thus be interpreted in terms of the time of last shared ancestry between Neandertals (or their relatives) and modern humans (Section S1 and Appendix A in Text S1).
To assess how useful this statistic is for measuring admixture LD, we performed coalescent simulations of 100 regions of a million base pairs each, for a range of demographic histories chosen to be plausible for Neandertals, West Africans and non-Africans (these histories were constrained by the observed population differentiation between west Africans and Europeans as measured by their FST and the quantitative extent to which Neandertals share more derived alleles with Europeans than with Africans). The simulation results, which we discuss at length in Section S2 of Text S1, and summarize in Table 1, show that we obtain accurate and relatively unbiased estimates of the number of generations since admixture (never more than 15% from the true value) for (1) constant-sized population scenarios, (2) demographic models that include population bottlenecks as well as more recent admixture after the gene flow, (3) hybrid models of ancient structure and recent gene flow, and (4) mutation rates that differ by a factor of 5 from what we use in our main simulations ( see Figure 2). Two other SNP ascertainment schemes yield qualitatively consistent findings but the ascertainment we used provides the most accurate estimates under the range of demographic models considered (Section S5 of Text S1 and Table 1). The simulations also show that in the absence of gene flow (including in the scenario of ancient subdivision), the dates obtained are always at least 5,000 generations for scenarios of demographic history that match the constraints of real human data. Thus, an empirical estimate of a date much less than 5,000 generations likely reflects real gene flow.
(A, B) Real data for European Americans and East Asians shows longer range LD when the Neandertal genome has the derived allele (left) than when it has the ancestral allele (right). This is as expected due to gene flow from Neandertal, but is not expected in the absence of gene flow. In other words, the fact that LD conditional on Neandertals having the derived allele is longer than LD when Neandertal does not strongly suggests that the pattern we are observing among ascertained SNPs is reflecting the complex historical relationship between non-African modern humans and Neandertals, the signal we care about here, and not demographic events that solely involve the ancestors of non-Africans. The scale of the LD decay (1/e drop of the fitted exponential curve) is shown in the top right of each panel based on the deCODE genetic distance. (In Figure S8 of Text S1, we show that this signal persists when stratified into narrow allele frequency bins.) (C) In West Africans the pattern is qualitatively different such that when Neandertal is derived at both SNPs, LD decays more quickly than when Neandertal is ancestral at both SNPs, as expected in the absence of gene flow (without gene flow, the derived allele is always expected to be older so LD is expected to have had more time to break down). 2b1af7f3a8