Identify whether hybridization would likely lead to speciation or homogenization in each scenario.

Hybridization among diverging lineages is common in nature. Genomic data provide a special opportunity to characterize the history of hybridization and the genetic basis of speciation. We review existing methods and empirical studies to identify recent advances in the genomics of hybridization, as well as issues that need to be addressed. Notable progress has been made in the development of methods for detecting hybridization and inferring individual ancestries. However, few approaches reconstruct the magnitude and timing of gene flow, estimate the fitness of hybrids or incorporate knowledge of recombination rate. Empirical studies indicate that the genomic consequences of hybridization are complex, including a highly heterogeneous landscape of differentiation. Inferred characteristics of hybridization differ substantially among species groups. Loci showing unusual patterns – which may contribute to reproductive barriers – are usually scattered throughout the genome, with potential enrichment in sex chromosomes and regions of reduced recombination. We caution against the growing trend of interpreting genomic variation in summary statistics across genomes as evidence of differential gene flow. We argue that converting genomic patterns into useful inferences about hybridization will ultimately require models and methods that directly incorporate key ingredients of speciation, including the dynamic nature of gene flow, selection acting in hybrid populations and recombination rate variation.

Keywords: evolutionary genomics, gene flow, hybridization, introgression, reproductive barriers, speciation

Introduction

The scope of hybridization plays a special role in understanding how new species are born. The capacity of lineages to hybridize is often used as a basis for species identification (; ). More generally, the dynamics of gene flow between diverging populations describe key elements of the speciation process.

Hybridization leaves detectable footprints in genomes, raising the prospect that it can be characterized from genomic patterns of variation. In particular, genomic data provide avenues to answer two basic questions about speciation: (i) what is the history of gene flow between nascent species and (ii) what genetic barriers maintain their integrity? Although these questions were addressed before the genomic era, the ability to rapidly and affordably survey genomic variation among individuals and populations has ushered in new hope of answering them (; ). Because of recombination and independent assortment during meiosis, diversity patterns at the very large number of loci sampled by genomic data sets can be viewed (ideally) as replicated outcomes of hybridization history, enabling increasingly accurate reconstruction of that history. At the same time, inspection of the genomic distribution of variation can pinpoint specific regions with unusual patterns that might contribute to reproductive isolation or adaptive introgression.

Genomic studies of hybridization generally follow a recipe with two steps. First, genomewide patterns of variation are catalogued. If each of the sampled populations is suspected or determined to belong to one diverging lineage (e.g. species) or another, differentiation between these populations is quantified. If the sampled populations likely reflect recent hybridization (e.g. in hybrid zones), other measures of admixture – such as geographic clines in allele frequency across transects – might be used. The second step is to compare observed genomic distributions to those expected under one or more models to draw inferences about hybridization and speciation.

Genomewide portraits of differentiation and admixture are shaped by a suite of evolutionary processes. Parameters of particular interest from the speciation perspective include the form, frequency, magnitude and timing of hybridization, as well as the fitness of hybrids. But several other factors complicate interpretations. Natural selection acting before or after hybridization affects differentiation at targeted mutations and at nearby neutral variants (; ; ), thereby inflating heterogeneity in differentiation among loci, even in the absence of gene flow (; ; ; ). Recombination rate – a parameter that varies both across the genome and among species () – determines the genomic scale over which patterns of differentiation and admixture vary. Shared variation among populations may reflect unsorted ancestral polymorphism (incomplete lineage sorting) rather than hybridization (; ).

In this paper, we synthesize emerging trends in the genomics of hybridization between diverging lineages. After reviewing analytical methods that can be used to characterize hybridization, we summarize findings from empirical studies conducted on the genomic scale. Based on this survey, we prioritize future research to maximize insights into the process of speciation.

Methods for characterizing hybridization in the context of speciation

Translating genomic data sets into useful inferences about hybridization and speciation is a formidable task. Recognizing the increasing practicality of collecting high-quality genomic data, we focus here on analytical prospects and challenges. We refer the reader to recent reviews for guidance on important issues related to sequencing, genotyping and variant calling (; ; ; ; ; ; ; ).

We surveyed the literature for analytical methods that have either been applied in genomic studies of hybridization between diverging lineages or have the potential to be applied in such studies. Both methods developed specifically to understand speciation and methods developed for other purposes were included.

A wide range of methods is available (Table 1). The collection of approaches samples major descriptors of genetic variation: levels of diversity and divergence, site frequency spectra, haplotype variation and phylogenetic relationships. Methods vary in their characterization of hybridization. Defining gene flow as the movement of alleles between populations, different methods provide access to contrasting levels of gene flow. For example, coalescent approaches usually assume low rates, whereas analyses of clines consider rates high enough to interfere with selection. Some strategies aim to simply detect gene flow, whereas others strive to reconstruct its magnitude and/or timing. Depending on the approach, estimated rates of gene flow are genomewide averages or locus-specific values. Hybridization is modelled as an instantaneous event or as a continuous phenomenon. For several parametric approaches, there exists a trade-off in computational cost between increasing the number of individuals and increasing the number of loci sampled. For example, methods that fit the isolation with migration model using Markov chain Monte Carlo handle many individuals but are slow with a large number of loci (, ; ; ), whereas analytical likelihood methods applied to this model can use entire genomes but are restricted to a few individuals (, , ). Principal components analysis (PCA) (a nonparametric approach) can be efficiently applied to whole-genome data sets from large numbers of individuals (; ), but it provides no direct insights into the causes of resulting patterns.

Table 1

Genomic methods for detecting and characterizing hybridization

MethodCharacterization of hybridizationFocal pattern of variationReferences


Rate of gene flowTiming of gene flowVariable Gene flow across genomeVariable gene flow across timeIndividual ancestry proportions (genomewide)Individual, locus-specific ancestriesGeographic clinesYesNoYesNoNoNoGeographic gradient of allele frequencies across populations (hybrid zone); ; Genomic clinesNoNoYesNoNoNoIndividual genotypes (hybrid zone); ; , ; Structure/Structurama/Frappe/Admixture/FastStructureNoNoNoNoYesNoIndividual genotypes; ; ; ; ; ; Principal components analysis (PCA)NoNoNoNoNoNoIndividual genotypes; HapMix/Recombination via ancestry switch probabilities (RASPberry)NoNoYesNoYesYesIndividual haplotypes; Ancestry tract lengths/shared haplotype lengthsYesYesNoYesNoNoIndividual haplotypes; ; ; Markov Chain Monte Carlo – fit to isolation with migration modelYesNoNoNoNoNoNumbers of unique, shared and divergent polymorphisms; ; ; ; ; Coalescent hidden Markov model (CoalHMM) – fit to isolation with migration modelYesYesNoNoNoNoIndividual haplotypesBlockwise likelihood – fit to isolation with migration modelYesYesNoNoNoNoNumbers of unique, shared and divergent polymorphisms; ; Diffusion approximations for demographic inference (dadi) – fit to isolation with migration modelYesNoNoNoNoNoJoint site frequency spectrumSequentially Markov conditional sampling distribution – fit to isolation with migration modelYesYesNoYesNoNoIndividual haplotypesfineSTRUCTURENoNoNoNoYesNoIndividual haplotypesTreeMixNoNoNoNoNoNoJoint site frequency spectrumDivergence time heterogeneityNoNoYesNoNoNoNumbers of unique, shared and divergent polymorphisms; Approximate Bayesian computationYesYesYesYesNoNoArray of summary statistics; Genomic outliers (summary statistics, e.g. FST)NoNoYesNoNoNoSummaries of population differentiationABBA-BABA/D-statisticsNoNoNoNoNoNoPattern of shared-derived changes Phylogenetic discordanceNoNoNoNoNoNoPattern of shared-derived changes Phylogenetic networksNoNoNoNoNoNoPattern of shared-derived changes

Open in a separate window

Methods differ in the frequency of usage by researchers studying the genomics of speciation (Table 2). The most popular strategy interprets heterogeneity in summary statistics of between-population differentiation – usually FST – as evidence of differential gene flow across the genome. Another commonly used class of approaches searches for signs of gene flow in genomic patterns of shared-derived variants between populations or species. In genomic studies of hybridization, the ‘ABBA-BABA test’ (; ) is a widespread application of such phylogenetic thinking. Interest in this test and its derivatives (; ; ; ; ), referred to as ‘D-statistics’, was stimulated by its original application (), which revealed that ancestors of modern humans hybridized with Neanderthals. Several approaches commonly used to infer demographic history within species (especially humans) – including diffusion-based analysis of the site frequency spectrum () and a host of methods based on individual haplotypes (; ; ; ) – have seen limited application in the genomic characterization of gene flow between diverging lineages.

Table 2

Inferences about hybridization and speciation from genomic data

Species groupSamplingGenomic dataGenomic patterns of differentiation/introgressionAnalytical method for detecting hybridizationEstimate rate or timing of gene flow?Inferences about hybridization and speciation from authorsReferencesButterflies81 individuals across Heliconius melpomene, H. timareta, H. elevatusWhole-genome sequencing RAD sequencingHigher gene flow (2–5% of genome) between sympatric than between allopatric H. melpomene vs. H timareta; elevated gene flow in B/D colour regionD-statisticsNoExchange of protective colour pattern genes between species; H. elevatus formed by hybridizationButterflies31 individuals (total) from 5 H. melpomene populations, 1 Heliconius cydno population, 1 H. timareta populationWhole-genome sequencingPervasive phylogenetic discordance, structured by geography; fraction of genome introgressed higher in older comparisons; reduced differentiation between sympatric pairs; higher ratio of sympatric/allopatric FST on Z chromosomePhylogenetic discordance D-statistics FST outliersNoLarge-scale gene flow between H. melpomene and H. cydno/timareta lineage throughout much of their divergence historyButterflies51 individuals (total) from H. melpomene, H. cydno, H. timareta, Heliconius heurippaRAD sequencingAdmixture up to 13% in some individuals; FST highly correlated across loci among sympatric pairs but not parapatric pairs; some differentiated regions associated with geography rather than taxonStructure FST outliersNoGene flow between sympatric H. melpomene and H. cydno/timareta; possible hybrid origin for H. heurippaButterflies116 Lycaides idas, 76 Lycaides melissa, 186 putative hybridsRAD sequencingAmong hybrids, outlier FST loci enriched for L. idas ancestry; differentiation correlated with genomic cline parametersFST outliers Genomic clinesNoAdaptive divergence and genetic drift in allopatry contribute to reproductive isolation; complex genetic basis to reproductive isolationChipmunks4 Tamias ruficaudus 4 Tamias amoenusExome sequencingGene genealogies show species monophylyPhylogenetic discordance D-statisticsNoLittle to no role for hybridization between these sympatric species, despite earlier introgression of mtDNACichlids10 individuals from each of Pundamilia nyererei, P. pundamilia, P. sp. ‘pink and fin’, Mbipia mbipi, M. LuteaRAD sequencingWhen SNPs are partitioned by levels of differentiation, species relationships change (e.g. high-differentiation SNPs group by colour)FST outliers StructureNoAt least two intergeneric hybridization events; hybridization generates new species and contributes to adaptive radiation; selection plays an important role in speciationCichlids24 species in four sympatric crater lake radiations and 29 outgroupsRAD sequencingWeak support for monophyly of sympatric radiations; species within a crater vary in shared ancestry with outgroupsStructure PCA TreemixNoComplex colonization histories within crater lakes; gene flow with neighbouring rivers after initial colonization of lakes; questions status as compelling model for sympatric speciationCut-throat trout94 individuals from 5 populations of Onchorhynchus clarkia lewisi, westslope cut-throat troutRAD sequencing1–10% population admixture; several superinvasive alleles (at much higher frequencies than genomic background)Frequency of introgressed alleles at diagnostic lociNoCandidate invasiveness genes;Drosophila3 Drosophila pseudoobscura (1 each for 3 subspecies), 1 Drosophila persimilis, 1 D. mirandaPartial genome sequencingElevated sequence divergence near inversions; diversity correlated with divergence correlated between D. pseudoobscura and D. persimilis but not with divergence between D. pseudoobscura and D. mirandaDxyNoInversions crucial for persistence of species in the face of recent gene flow; secondary contact between D. persimilis and D. pseduoobscura homogenized collinear regions; evidence for unidirectional gene flow less than 200 kya, but not on the X chromosomeDrosophila1 Drosophila simulans, 1 D. mauritiana, 1 D. sechelliaWhole-genome sequencingEvidence of gene flow between D. simulans and island endemics at up to 4.6% (autosomes) and 2.2% (X chromosome) of genomeDivergence time heterogeneityNo‘Islands of introgression, in genomes otherwise consistent with isolation’; introgression reduced on the X chromosomeDrosophila4 Drosophila mojavensis (2 from each of 2 populations); 1 D. arizonaeWhole-genome sequencingOverall divergence similar in sympatric and allopatric comparisons, with elevated divergence in both in inverted regionsBlockwise likelihoodYesPostdivergence gene flow, ceasing at 270 kya; same timing in collinear and inverted regions, but reduced gene flow in inversions; inversions originated close to time of initial splitFlycatchers10 Ficedula albiocollis (collared), 10 F. hypoleuca (pied)Whole-genome sequencing50 clusters of elevated FST (overrepresented at chromosome ends); reduced nucleotide diversity (within species), and elevated LD – but reduced Dxy; higher overall divergence on ZFST outliersNoSpecies divergence driven by directional selection at many lociFlycatchers10 Ficedula albiocollis (collared), 10 F. hypoleuca (pied)Whole-genome sequencingSubstantial fraction of shared polymorphisms (22%) with fewer fixed differences (3%)Approximate Bayesian ComputationYesRecent origin of flycatcher species, suggesting rapid evolution of postzygotic isolation; allopatric speciation followed by secondary, recent gene flow; 0.16–0.36 migrants per generation F. hypoleuca into F. albiocollis, no gene flow in the other directionFlycatchers79 F. albiocollis, 79 F. hypoleuca, 20 F. speculigera, 20 F. semitorquataWhole-genome sequencingHigh heterogeneity in differentiation across the genome; concordant genomic regions of differentiation across species pairsFST outliers Dxy outliers D-statisticsNoLinked selection within species rather than variation in gene flow among species explains islands of differentiationFlycatchers2 Zimmerius viridiflavus, 3 Z. chrysops 5 individuals from mosaic population 1 Z. acer 1 Z. gracilipesRAD sequencing1.1% of the alleles in the mosaic population represent introgression from Z. chrysops; introgressed regions enriched in functions involving cell projection and plasma membranesD-statisticsNoIntrogression may account for variation in plumage colorationFrogs45 Craugastor augusti 1 C. tarahumaraensis 1 C. unoRAD sequencingHigh lineage differentiation between evolutionarily significant units – numerous private alleles and high FSTD-statisticsNoTwo episodes of introgression occurred in the same direction between lineages but at different times; higher variation in introgressed lineages than in independently evolving lineagesHomininsHomo neanderthalensis, 5 H. sapiensWhole-genome sequencingEast Asians share more SNPs than sub-Saharan Africans with H. neanderthalensisD-statisticsNoGene flow from H. neanderthalensis into non-African H. sapiens occurred before split of Eurasian groupsHomininsHomo neanderthalensis, 1000 H. sapiensWhole-genome sequencingRate of exponential decay of linkage disequilibrium among SNPs in H. sapiens genomic regions introgressed from H. neanderthalensis fits a model of recent gene flowHaplotypesYesLast gene flow from H. neanderthalensis into H. sapiens occurred 37–86 kyaHomininsHomo neanderthalensis, ‘Denisovans’, 42 H. sapiensWhole-genome sequencingHigher levels of H. neanderthalensis ancestry in East Asians than in EuropeansD-statisticsNoAt least two separate episodes of hybridization between H. neanderthalensis and H. sapiens, including one after split between East Asians and EuropeansHomininsHomo neanderthalensis, ‘Denisovans’, H. sapiensWhole-genome sequencing1.5–2.1% of non-African H. sapiens genomes derived from H. neanderthalensisD-statisticsNo3–5 cases of interbreeding among extinct hominin populations, including an unknown species; complex history of hybridization leaving signatures in small fractions of the genomeHomininsHomo neanderthalensis, 1004 H. sapiensWhole-genome sequencingH. sapiens genomic regions of reduced H. neanderthalensis ancestry include X chromosome and genie regionsConditional random fieldNoH. neanderthalensis X chromosome contributed to hybrid sterility with H. sapiensHomininsHomo neanderthalensis, 3 H. sapiensWhole-genome sequencing3.4–7.9% H. neanderthalensis admixture in EurasiaBlock wise likelihoodYesH. neanderthalensis – H. sapiens admixture higher than other studiesHorses6 individuals from each of Equus quagga quagga, E. quagga boehmi, E. grevyi E. zebra hartmannae, E. kiang, E. hemionus onager, E. africanus somaliensis, E. asinus asinus, E. caballusWhole-genome sequencingFour episodes of gene flow detected: one during the early divergence of the Equus in North America, and three between contemporary Equus lineages in the Old World; D-statistics associated with chromosomal changesD-statistics CoalHMMNoGene flow occurred despite extensive variation in chromosome number, although rearrangements inhibited gene flow; speciation in the face of gene flowHouse mice1301 mice from two transects of hybrid zone between Mus musculus domesticus and M. m. musculusSNP arrayAsymmetric patterns of linkage disequilibrium across transectsGeographic clinesNoMovement of hybrid zone over timeHouse mice679 mice from two transects of hybrid zone between Mus musculus domesticus and M. m. musculusSNP arraySignatures of epistatic selection at many loci; disproportionate number of selected loci on the X chromosome; some selected loci overlap in two transects, although most do notGenomic clinesNoCentral and distal portions of X chromosome contribute to reproductive isolation; complex genetic basis for reproductive isolation, arising in allopatryHouse miceMice from three transects of hybrid zone between Mus musculus domesticus and M. m. musculusSNP arrayIntrogression positively correlated with recombination rate and negatively correlated with differentiationGenomic clinesNoSome reproductive barriers shared across hybrid zone; genome properties (including recombination rate) influence speciationHouse mice8 wild-derived inbred lines of Mus musculus castaneus, 7 M. m. domesticus, 8 M. m. musculusTranscriptome sequencingX more highly differentiated than autosomes; higher differentiation in low-recombination regions in comparison between M. m. castaneus and M. m. musculus; some high-differentiation regions overlap with previously identified loci involved in hybrid male sterilityFST outliers Dxy outliers Numbers of unique, shared, and divergent polymorphismsNoDifferences in demographic history among subspecies may account for differences in divergence patterns among subspecies pairs; X chromosome contributes disproportionately to reproductive isolationHouse mice2 Mus spretus 20 M. musculus domesticusSNP arrayPhylogenetic signals of introgression from M. spretus on a few chromosomes of each M. m. domesticus individualPhylogenetic networksYesAt least three hybridization events, including one 50 years ago connected to warfarin resistanceLousewart1 Pedicularis przewalskii, 1 P. cyathophylla, 1 P. superba, 2 P. cyathophylloides, 2 P. thamnophila, 5 P. rexRAD sequencing8.7–27% of P. thamnophila genome estimated to be derived from P. rexD-statisticsNoIntrogression among nearly all taxa in the P. rex – thamnophila cladeManakins48 Manacus candei, 52 M. vitellinus, 104 putative hybridsGenotype by sequencingGene flow highly variable across genome; rapid decay of genomic autocorrelations; FST correlated with gene flow estimated in hybrids, but genomic outliers not strongly colocalizedFST outliers PCA Genomic clinesNoBarriers to gene flow caused by many alleles with small effects; some evidence that selection drives reproductive isolation, but isolation is not simply a consequence of adaptive changeMosquitoesAnopheles gambiae, A. arabiensis, A. merusRAD sequencingExtensive shared polymorphism; higher divergence in X chromosome inversionsStructure PCA Dxy outliers FST outliersNoX chromosome inversions involved in reproductive isolationMosquitoesAnopheles coluzzii, A. gambiae, A. arabiensis, A. quadriannulatus, A. melas, A. merusWhole-genome sequencingStrong phylogenetic discordance between X chromosome and autosomesPhylogenetic concordanceNoPhylogenetic relationships on the X chromosome represent the species history, revealing pervasive gene flow on the autosomesOaks4 Quercus virginiana, 4 Q. geminate, 4 Q. minima, 3 Q. sagraeana, 5 Q. oleoides, 3 Q. brandegei, 4 Q. fusiformis, 7 outgroup samplesRAD sequencingSubstantial heterogeneity in the presence of admixture between species; six of seven species show evidence of admixture with one or more congeners; significant admixture limited to samples in close proximityStructure Treemix D-statistics dadiYesOak species boundaries and ranges have been evolutionarily stablePigsS. scrofa, S. verrucosus, S. cebifronsWhole-genome sequencing23% of genome admixedBlockwise likelihoodYesAsymmetric hybridization occurring long before arrival of humansRabbits6 Oryctolagus cuniculus cuniculus 6 Oryctolagus cuniculus algirus 1 Lepws timidusTargeted capture of intron sequence; transcriptome sequencingSubstantial fraction of shared polymorphisms (31.2%), with fewer fixed differences (0.3%); higher differentiation on the X chromosome and near centromeres; islands of differentiation highly variable in size, but usually contain small numbers of genesFST outliersNoMany fixed differences probably result from positive selection; complex genetic basis to reproductive isolationPoplars7 genotypes each from Populus alba, P. tremulaRAD sequencingDivergence estimates strongly autocorrelated along chromosomes; many autocorrelations involve low divergence blocks, including incipient sex chromosomeFST outliers Genomic autocorrelationNoPopulus species have porous genomes and exhibit high levels of introgression across incipient sex chromosomePoplars498 Populus trichocarpa, 10 P. balsamiferaSNP arrayAdmixture limited to drainages where the species' ranges overlapAdmixture PCANoHybridization contributes to clinal variation in allele frequencies observed in P. trichocarpa.Sunflowers40 H. annuus, 25 H. petiolaris, 28 H. argophyllus, 14 H. debilisTranscriptome sequencingHigher differentiation and more fixed differences in nonhybridizing species pairs than in hybridizing species pairs, despite closer phylogenetic relationship of the former; islands of genetic differentiation small and found in areas of low recombinationFST outliers Genomic autocorrelationNoInterspecific gene flow has less impact than local genome architecture on the clustering of divergent loci; complex genetic basis to reproductive isolationSunflowers40 H. annuus, 28 H. argophyllusTranscriptome sequencingElevated FST and D within rearrangements and adjacent regions (within 5 cM of chromosomal break points)FST outliersNoChromosomal translocations and inversions reduce interspecific gene flow following secondary contactSunflowers100 genotypes representing 20 subpopulations of Helianthus petiolaris (strong intrinsic and extrinsic isolation between dune and nondune ecotypes); 20 individuals from other species or subspeciesRAD sequencingSplit between dune and nondune ecotypes estimated between 440–10 274 ya; Ongoing asymmetric introgression; Nem (Nondune to Dune) = 2.06; Nem (Dune to Nondune) = 5.53.FST outliers Structure MCMC fit to isolation with migrationYesPossible genomic signature of reinforcement: two FST outlier regions show greater divergence in comparisons between adjacent dune edge and nondune populations than between more distant core dune and nondune populations; Swordtail fish60 Xiphophorus birchmanni, 60 X. malinche; 313 hybrids from two hybrid zonesGenotype by sequencingMany unlinked pairs of loci show linkage disequilibrium in hybrid populationsLinkage disequilibrium among unlinked lociYesA large number of hybrid incompatibilities contribute to reproductive isolationWarblers (ring species)95 individuals from 22 populations (total) from Phylloscopus trochiloides viridanus, P. t. nitidus, P. t. ludlowi, P. t. trochiloides, P. t. obscuratus, P. t. plumbeitarsusGenotype by sequencingTwo northern forms are distinct genetic clusters connected by ring of genetically intergraded formsPCA StructureNoAllopatric divergence played a major role in the formation of ring species; recent hybridization at each area of secondary contact

Open in a separate window

Overall, available methods provide useful tools for reconstructing hybridization. The diversity of focal genomic patterns (Table 1) suggests that collectively, methods can access gene flow occurring over a range of timescales. For example, phylogenetic patterns among species can reveal older hybridization events, whereas the distribution of shared haplotype lengths is shaped by recent gene flow. Although no methods incorporate all aspects of genomic variation, many approaches adopt powerful likelihood or Bayesian frameworks that avoid drastic summaries of the data. For example, several strategies are available that analyse the complete (unsummarized), site frequency spectrum across multiple populations.

The ancestries of individuals provide special insights into hybridization history. Importantly, diverse strategies now exist for probabilistically assigning individual ancestry, assuming that appropriate reference populations can be surveyed (). Since the Bayesian clustering approach Structure revolutionized the inference of ancestry proportions (), methods have been developed that use the same likelihood (genotype data conditional on ancestry) but different algorithms to achieve substantial increases in computational speed (; ; ). With other methods, including HapMix () and RASPberry (), the locus-specific histories of individuals can be reconstructed, enabling changes in ancestry to be detected over short chromosomal distances.

Computational and statistical advances have significantly expanded the hybridization scenarios that may be considered. With approximate Bayesian computation, genomic data can be fit to any model that can be rapidly simulated, allowing (in principle) arbitrarily complex histories of gene flow to be compared. Existing methods also enable testing of specific hypotheses about reproductive barriers. For example, the genomic clines framework (, ) statistically evaluates evidence for selection against gene flow at specific genomic locations through comparison to the genomewide hybrid index.

Despite their potential, available methods for characterizing hybridization suffer from several challenges when applied to genomic studies of speciation. First, options for measuring gene flow often exclude scenarios of interest. Among the subset of methods that estimate gene flow, most assume it happens continuously at an invariant rate. Few approaches reliably reconstruct the timing of hybridization. For example, popular methods based on the isolation with migration framework may produce wide confidence intervals on the timing of gene flow and falsely infer the presence of gene flow under some conditions (; ; ; ). These restrictions do not match the reality of the speciation process, in which the opportunity for hybridization varies over time due to accumulating reproductive isolation, fluctuating geographic range sizes and other factors. Indeed, temporal changes in gene flow are key ingredients for distinguishing major models of speciation. For example, sympatric and para-patric speciation features gene flow during the earliest stages of divergence, whereas speciation by reinforcement is thought to be triggered by secondary contact and gene flow after a period of divergence in allopatry.

A second challenge with existing methods is the limited treatment of natural selection. Approaches that estimate the rate of gene flow usually assume that population history and neutral mutation account for genomic patterns. Therefore, selection has the potential to bias geneflow estimates. This problem could be severe for lineages with a history of reproductive isolation, especially if many loci contribute to isolation. A more fundamental methodological issue in the context of speciation is the assumption that one rate of gene flow characterizes the entire genome (). Because selection acting on hybrids targets specific loci, rates of introgression could vary significantly across the genome (; ; ); indeed, documenting this variation is a major goal of genomic studies of speciation. Alternatively, genome scan approaches search for locus-specific distortions in summary statistics, but usually leave unestimated rates of gene flow and associated selection coefficients. These shortcomings are not surprising as most methods were designed to analyse population structure within species rather than gene flow among diverging lineages. Although strategies specifically developed to analyse the genomic consequences of mixing in hybrid zones (e.g. geographic and genomic clines) enable the detection of selection and differential introgression in hybrids, they downplay the effects of demographic history and do not estimate locus-specific rates of gene flow.

The rate of meiotic recombination is another key parameter to consider in genomic studies of hybridization (). Differential gene flow across a chromosome requires recombination. Recombination rate determines the chromosomal scale over which selection targeting reproductive isolation mutations reduces gene flow at neighbouring loci (; ; ; ; ), the focal signature of genomic scan methods. Furthermore, several models of speciation propose that loci involved in reproductive isolation will preferentially accumulate in rearranged (; ; ) or collinear () chromosomal regions with little recombination, allowing divergence to continue in the face of hybridization. Unfortunately, genomic methods for studying speciation ignore the reality that recombination rates vary across chromosomes on broad and fine scales (; ; ; ). In addition, some methods assume that recombination is absent over short distances (i.e. within loci). As hybridization studies transition to using whole-genome sequences, another practical challenge emerges. How should the genome be partitioned into loci before geneflow analyses are conducted? For example, the typical genomic scan strategy of comparing windows of the same physical size (e.g. 10 kb) implicitly assumes that the rate of recombination is invariant. Most methods do not yet account for the expectation that switches in ancestry associated with hybridization will occur over finer scales in genomic regions with higher recombination rates.

Several approaches assume free recombination among polymorphisms, even when they are closely linked. A more general challenge arises from the treatment of unlinked loci as independent, a feature of all available methods. Loci on different chromosomes can behave nonindependently in scenarios of interest. For example, recent hybridization events involving only a few individuals can generate introgression at many loci ().

The role of geography in speciation has long been and continues to be a source of controversy (; ; ). Knowledge of the geographic locations of past hybridization events is vital for understanding the connection between gene flow and speciation. With few exceptions (), genomic methods for analysing hybridization do not consider geographic information nor do they infer the geographic context of gene flow.

Outcomes of hybridization may vary depending on whether the hybridizing populations diverged in the presence of gene flow or came into contact secondarily after a period of allopatry. But reliable methods for distinguishing between these scenarios are not readily available. Theoretical expectations for patterns of genome divergence following secondary contact or divergence with gene flow appear to be similar under equilibrium conditions (e.g. ; ; ; ). However, given the extent of temporal variation in geographic ranges due to major climate oscillations, nonequilibrium conditions are likely to be common and could impact inferences about hybridization, as well as its consequences. For example, hybrid zones in the Arctic and some temperate regions are thought to have formed repeatedly following interglacial range expansions of hybridizing lineages, in some instances leading to introgression or hybrid speciation ().

In summary, researchers may choose from a variety of analytical approaches to draw inferences about hybridization from genomic data. Nevertheless, existing strategies are missing some components that are important in the context of speciation, suggesting that investment in further method development is warranted.

Empirical studies of hybridization and speciation on the genomic scale

To identify emerging empirical patterns in the genomics of hybridization and speciation, we surveyed the literature. We focused on studies that: (i) articulated a specific interest in characterizing hybridization between diverging lineages (rather than simply scanning genomes for evidence of natural selection); (ii) measured variation across the genome at thousands to millions of loci; and (iii) examined species, subspecies or populations with independent evidence for reproductive isolation (rather than focusing on ecotypic differentiation). Both studies that sampled geographically separate populations and those that examined currently hybridizing populations were considered.

Inspection of the studies (listed in Table 2) reveals several themes. Genomic data confirm what is now conventional wisdom: hybridization between diverging lineages is common (; ; ). Many instances of hybridization were previously unreported and unanticipated (e.g. genetic exchange between humans and Neanderthals), an indication of the potentially high power of genomic data to detect gene flow, even when it involves extinct populations. The history of hybridization among diverging lineages is sometimes complex, including multiple periods of gene flow as well as asymmetries in its direction. In some cases, genomic data raise the possibility that hybridization facilitated speciation (e.g. butterflies, cichlids). Nevertheless, the inference that most of the hybridizing lineages in Table 2 – including a ring species () – evolved in isolation for some period of time suggests that geographic separation contributes to reproductive isolation in many organismal lineages (; ).

A consistent observation across surveyed taxa is rampant heterogeneity in patterns of differentiation and admixture within the genome, sometimes over short chromosomal distances. As discussed above, this patchwork reflects a combination of evolutionary factors, including incomplete lineage sorting, effects of selection before and after hybridization, and recombination rate variation, as well as gene flow and selection targeting hybrids. That heterogeneity occurs over a fine genomic scale suggests that some aspects of hybridization can only be captured by examining the entire genome. In those studies that sample hybrid zones, this heterogeneity raises the prospect that boundaries between nascent species are semipermeable, with a subset of loci maintaining reproductive barriers (, ; ). In those studies that sample geographically separate populations, this conclusion is more difficult to reach: the null model of a low rate of gene flow that is constant across the genome could still produce strong heterogeneity if only a few regions introgress. From a phylogenetic perspective, the genomes of recently diverged species are complex mosaics of alternating histories, highlighting the difficulty of reconstructing species relationships, even with genomic data. Indeed, extensive hybridization challenges basic assumptions of phylogenetic methods, suggesting that some species histories are better represented as phylogenetic networks ().

Genomic regions that display high differentiation (in allopatric populations) or narrow clines (in hybrid zones) could contain genetic changes that confer reproductive isolation. Under this assumption, the studies in Table 2 collectively raise a few genomic themes that could characterize reproductive barriers between species. First, the number of loci that appear to maintain differentiation in the face of gene flow is usually high, suggesting a complex genetic basis to reproductive isolation. An alternative explanation for this pattern is that many loci are falsely labelled as ‘genomic outliers’ because significance thresholds are based on null models that ignore or mischaracterize demographic history.

A second observation is that although outlying loci are scattered throughout the genome, there is evidence for a bias towards regions of low recombination. In several species, this pattern takes the form of higher differentiation near centromeres, which are known or suspected to recombine less. In others, the relationship between outlier location and recombination rate is more obvious. Cline width and recombination rate are positively correlated across the genome in a hybrid zone between two subspecies of house mice (). Selection to remove long genomic blocks from one species in hybrids was postulated to explain a negative correlation between absolute divergence and recombination rate in monkey flowers (). Rearranged chromosomal regions exhibit higher differentiation than collinear regions in several species pairs. Perhaps these results indicate a role for suppressed recombination in the evolution of reproductive isolation. Alternatively, they could simply reflect increased power to detect isolation loci using linked markers in low-recombination regions. In some cases, the pattern appears to be caused by selection at linked sites within lineages rather than barriers to gene flow (; ). Regardless, these results reveal the importance of considering variation in local recombination rate when interpreting genomic patterns of differentiation and admixture.

A third trend among surveyed species is the over-representation of highly differentiated loci on the sex chromosomes relative to the autosomes, seen in both X-Y and Z-W systems. This pattern agrees with results from controlled crosses in the laboratory, which often associate isolation phenotypes – especially hybrid sterility and hybrid inviability – with the sex chromosomes (). Whether this genomic tendency indicates that the sex chromosomes harbour a higher density of loci involved in reproductive isolation (), loci with stronger phenotypic effects on isolation, or both, remains to be seen. Interpretations of this pattern must also recognize the complications inherent in setting neutral expectations for the sex chromosomes, which can experience different rates of genetic drift and migration than the autosomes.

The collection of empirical studies (Table 2) reveals considerable interest in using genomic data to understand the role of natural selection in speciation. The degree to which selection can increase divergence in the face of gene flow remains controversial and difficult to determine (; ; ; , ; ; ; ; ). In contrast, the idea that reproductive isolation is a by-product of local adaptation in allopatry is widely accepted, but genetic tests of this hypothesis remain rare. Those genomic studies that jointly consider currently allopatric populations and currently hybridizing populations provide a special opportunity to evaluate this influential prediction. If the genetic changes that block gene flow are initially driven to fixation by positive selection, genomic regions with restricted introgression in hybrid zones could display higher differentiation between nascent species. Genomic cline width and differentiation are negatively correlated in lycaenid butterflies and in manakins, suggesting that adaptive evolution contributes to reproductive isolation (; ). Nevertheless, the weakness of these correlations leaves open the possibility that genetic drift plays an important role in the establishment of reproductive barriers.

Despite the trends described above, the genomic landscape of divergence appears to differ substantially among species pairs. Estimated hybridization rates vary widely. Whereas some species show evidence for extensive gene flow with a few genomic regions maintaining divergence (e.g. rabbits, mosquitoes), others show widespread differentiation with limited pockets of introgression (e.g. Drosophila simulans vs. island endemics). The genomic scale of heterogeneity also varies. Although these comparisons are complicated by contrasting study designs, some observed disparities probably reflect biological differences among species pairs in the timing and mode of speciation.

This empirical survey also reveals challenges that face genomic studies of hybridization. From the perspective of speciation, a primary motivation for analysing genomic data is its potential for characterizing the magnitude and timing of gene flow among diverging lineages. Only a handful of studies (Table 2) reported estimates of these important quantities. The most widely used measure of locus-specific differentiation was FST, a summary statistic that compares variation between populations to variation within populations (). Although FST enables comparisons of loci that differ in within-population diversity, it suffers from interpretive challenges when used to measure gene flow across the genome. Reduced gene flow, which maintains between-population divergence, is confounded with selection at linked sites, which decreases within-population diversity (; ; ; ). Furthermore, most studies identified unusually differentiated loci as those with values that simply fell in the tail of the genomic distribution, a strategy complicated by the fact that completely neutral genomic distributions also have tails. Other studies used neutral simulations to construct distributions of summary statistics expected in the absence of selection. This approach ignores the likely possibility that selection against gene flow (and other forms of selection) occurs at many genomic locations in hybrids.

Furthermore, Table 2 reveals biases in sampling strategy. The majority of studies examined populations that are currently allopatric, with no a priori evidence of hybridization. Compared to hybrid zones, allopatric populations are usually easier to find and investigators may be more likely to sample them for purposes other than studying gene flow. Nevertheless, connecting genomic patterns with hybridization is much simpler in populations that are currently hybridizing in visible ways (e.g. hybrid zones). Other biases are taxonomic. Animal species – especially vertebrates – were the most popular subjects. It is too early to identify differences between plants and animals from genomic studies of hybridization. But the presence of heteromorphic sex chromosomes seems to alter the genetic architecture of reproductive barriers and to increase the speed with which they accumulate, and most plants lack heteromorphic sex chromosomes (; ; ; ; ). Likewise, it has long been known that the frequency of hybridization tends to be higher in species with external fertilization, such as plants and fishes, than in taxa with internal fertilization (). These and other related observations should motivate genomic studies of hybridization in organismal groups that differ in the genetics of sex determination and/or other key features of reproductive biology.

Recommendations and future directions

Our overview of methods and empirical studies suggests fruitful directions for research on the genomics of hybridization and speciation. First, model-based estimation of the magnitude and timing of hybridization should be prioritized. Reconstructing the history of gene flow between diverging lineages is a necessary step towards understanding the causes and consequences of speciation, and it may provide key insights. For example, the alternative scenarios of speciation with gene flow and allopatric speciation followed by hybridization during secondary contact could be distinguished by quantifying the dynamics of gene flow. Accurate estimates of the magnitude and timing of hybridization are also needed to interpret the results from scans that seek to identify genomic regions with restricted gene flow, which are growing in popularity. Because the expected genomic distribution of differentiation depends on the level and timing of hybridization, the determination that certain loci experienced reduced gene flow ultimately requires a demographic model including these parameters. At present, some authors seem to assume that evidence of hybridization elsewhere in the genome is sufficient to demonstrate that a differentiated genomic region is resistant to gene flow. But a wide spectrum of differentiation among loci is expected, even in the absence of reproductive barriers.

Second, natural selection should be directly incorporated into methods for characterizing hybridization between diverging lineages. Even partial reproductive isolation can generate strong selection against hybrids that shapes variation throughout the genome (; ). Accounting for the effects of selection is therefore a necessary step towards accurately reconstructing the history of hybridization. Furthermore, model-based characterization of selection could provide insights into genetic mechanisms of speciation. By estimating the contributions of individual loci to hybrid fitness, the genetic architecture of reproductive barriers in nature, as well as the identity of genomic regions and candidate genes that underlie adaptive introgression or heterosis, could be revealed. It may be possible to compare the fit of observed genomic patterns to models featuring different architectures. For example, selection against heterozygotes at single loci is predicted to impede gene flow at linked variants more strongly than epistatic selection against two-locus, hetero-specific genotypes (‘hybrid incompatibilities’) (; ; ). The genomic clines approach (, ) provides a good foundation for incorporating selection into genomic studies of hybridization in the context of hybrid zones. For currently allopatric populations that hybridized in the past, model-based assignment of loci to one of a few classes with different rates of gene flow (due to selection) () may offer a promising start towards the construction of more realistic methods.

Building geographic information into genomic methods for analysing hybridization is another worthwhile goal. This strategy would recognize that the prior probabilities associated with past hybridization are functions of current geographic ranges. It would also enable tests of speciation models that depend on geography. For example, reinforcement is expected to generate ‘inverse clines’ in which greater divergence for the trait/loci under reinforcing selection is expected between populations at the centre than at the ends of the cline (; ). Finally, incorporating geographic information could help identify the selective forces responsible for initial divergence and the formation of reproductive isolation, an important goal (). In this regard, clinal models of adaptation with gene flow (; ) could provide useful guidelines.

Even without these methodological advances, authors of empirical studies should be explicit about the models they are considering and the quantitative patterns they expect to find. In general, it is important to bear in mind that hybrid genomes are outcomes of a complex mixture of processes. As a result, intuition about expected patterns will sometimes mislead. Studies that embrace the complications inherent in the speciation process – rather than automatically attributing observed genomic patterns to differential gene flow generated by hybridization – should be prioritized.

Our survey suggests other guidelines for genomic studies of hybridization and speciation. Reproductive isolation should be directly measured whenever possible. Genomic comparisons alone are unlikely to identify genes responsible for reproductive barriers. Understanding how genetic variants restrict gene flow will ultimately require characterizing their functional effects on specific isolation phenotypes. More broadly, the interpretation of genomic patterns will be enriched by comparison to knowledge of reproductive barriers. For example, the over-representation of sex chromosome loci in genetic dissections of isolation traits provides a potential explanation for higher differentiation of sex-linked loci in genomic studies of hybridization. Likewise, a recent critique of the evidence for homoploid hybrid speciation showed that while admixture was well-documented in most putative hybrid species, only a handful of studies connected hybridization to reproductive barrier formation – a key criterion for hybrid speciation ().

Genomic studies of hybridization should stimulate the development of new organismal models for speciation. Most information about mechanisms of speciation still comes from a small subset of species, but Table 2 suggests that the dynamics and outcomes of hybridization could differ significantly across taxa. Examining those systems featuring multiple species pairs that collectively sample a range of divergence times will enable the effects of hybridization to be evaluated at different stages of speciation. Furthermore, species that currently hybridize offer the most direct insights into the genomic consequences of hybridization, indicating that these species should be prioritized in speciation studies.

Finally, the limits to inferring hybridization from genomic data should be examined and embraced. The space of potential hybridization scenarios is expansive and which subset of historical information is recorded in genomes is highly stochastic. Determining what we can realistically hope to learn about speciation from genomic data – rather than treating it as a panacea – is an important next step.

Acknowledgments

We thank Nick Barton, Jeff Good and Richard Abbott for organizing this special issue of Molecular Ecology and for inviting us to contribute. We thank Nick Barton and three anonymous reviewers for insightful comments that improved the manuscript. BAP's work on hybridization is supported by NSF grant DEB 1353737. LHR's work on hybridization is supported by an NSERC Discovery grant.

Footnotes

BAP developed the outline of the paper, conducted most of the literature survey, and completed the bulk of the writing. LHR critiqued the outline, contributed to the writing, and helped with the literature survey, especially with respect to plant hybridization.

How does hybridization affect speciation?

Hybridization has many and varied impacts on the process of speciation. Hybridization may slow or reverse differentiation by allowing gene flow and recombination. It may accelerate speciation via adaptive introgression or cause near-instantaneous speciation by allopolyploidization.

What is hybridization in evolution?

Hybridization, the crossbreeding between individuals of different species, and introgression, the transfer of genes between species mediated primarily by backcrossing, have been the focus of evolutionary studies over many decades (see Anderson, 1949; Arnold, 1992; Rieseberg and Carney, 1998).

What characteristics of stickleback fish make them ideal for studying natural selection in the wild quizlet?

what characteristics of stickleback fish make them ideal for studying natural selection in the wild? Some stickleback populations are isolated in closed environments. Heritable variation in armor directly affects the fitness of sticklebacks.

Which of the following is an example of a Bateson Dobzhansky Muller incompatibility quizlet?

Which of the following is an example of a Bateson-Dobzhansky-Muller incompatibility? Mules are the hybrid offspring of horses and donkeys and almost always exhibit hybrid sterility. Classify each characteristic based on whether it describes the process of allopatric speciation, sympatric speciation, or both.