Link Search Menu Expand Document

Tutorial 3: Testing species delimitation hypotheses

If the question of interest pertains to the delimitation of putative species that are specified in the dataset, then for a given hypothesized tree, you will want to compare the fit of a “full tree” model for which all coalescence times are estimated (i.e., are non-zero) to “collapsed” models for which one or more coalescence times are set to be zero. To do this, rather than adding models to a migrationArray, these collapsed models should be specified when running a GridSearch. Thus, for a given set of models in a migrationArray that specify fully resolved trees, the collapsing of specific nodes can be specified using GridSearchs setCollapseZero argument. This argument inputs a vector that defines which coalescence time parameters in a model one would like to be set to zero. So, for our toy dataset, setting setCollapseZero = 1 when fitting an ((A,B),C) isolation-only model will effectively collapse populations A and B into a single population; setting setCollapseZero = 1:2 will simulate a single panmictic population.

Below is an example for how to test the existance of 1, 2, or 3 species using the first three models run above using the toy dataset (see Tutorial 1). We have increased the number of nTrees to 10000 such there are enough simulated trees to calculate the log-likelihood for all the models. These runs will take a few minutes.


  1. Prepare data
  2. GridSearch
    1. Three species models
    2. Two species models
    3. Single species models
  3. Post-processing
    1. Concatenate results:
    2. Model averages
  4. Calculate genealogical divergence index (gdi)

Prepare data

First, load input data and set relevant parameters:

setwd("~/Desktop")
library(phrapl)
library(ape)
  
trees<-read.tree(paste(path.package("phrapl"),"/extdata/trees.tre",sep=""))
assignFile<-read.table(paste(path.package("phrapl"),"/extdata/cladeAssignments.txt",sep=""),
     header=TRUE,stringsAsFactors=FALSE)
     
popAssignments<-list(c(3,3,3))
assignmentsGlobal<-assignFile  
observedTrees<-trees  
popAssignments<-list(c(3,3,3))  
subsamplesPerGene<-10  
outgroup=TRUE  
outgroupPrune=TRUE  
  
observedTrees<-PrepSubsampling(assignmentsGlobal=assignmentsGlobal,observedTrees=observedTrees,
      popAssignments=popAssignments,subsamplesPerGene=subsamplesPerGene,outgroup=outgroup,
      outgroupPrune=outgroupPrune)
subsampleWeights.df<-GetPermutationWeightsAcrossSubsamples(popAssignments=popAssignments,
      observedTrees=observedTrees)
      
popVector<-popAssignments[[1]]  
maxK<-3  
maxMigrationK=1  
maxN0K=1  
maxGrowthK=0  
forceTree=TRUE  
forceSymmetricalMigration=TRUE  
migrationArray<-GenerateMigrationIndividuals(popVector=popVector,maxK=maxK,  
      maxMigrationK=maxMigrationK,maxN0K=maxN0K,maxGrowthK=maxGrowthK,
      forceTree=forceTree,forceSymmetricalMigration=forceSymmetricalMigration)

modelRange<-1:3  
nTrees<-10000  

GridSearch

Three species models

Then run and save analyses for three species models:

result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
    popAssignments=popAssignments,observedTrees=observedTrees,
    subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
    nTrees=nTrees) 
save(list="result",file="phraplOutput_models1-3_3species.rda")  

Two species models

Run and save analyses for two species models:

result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
    popAssignments=popAssignments,observedTrees=observedTrees,
    subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
    nTrees=nTrees,setCollapseZero=1) 
save(list="result",file="phraplOutput_models1-3_2species.rda") 

Single species models

And then run and save analyses for single species models:

result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
    popAssignments=popAssignments,observedTrees=observedTrees,
    subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
    nTrees=nTrees,setCollapseZero=1:2) 
save(list="result",file="phraplOutput_models1-3_1species.rda") 

If for any reason the steps above are taking to long or you find an error, download the output files and continue with the tutorial.

download.file("https://raw.githubusercontent.com/ariadnamorales/phrapl-manual/master/data/tutorial3/phraplOutput_models1-3_1species.rda", destfile="phraplOutput_models1-3_1species.rda")
download.file("https://raw.githubusercontent.com/ariadnamorales/phrapl-manual/master/data/tutorial3/phraplOutput_models1-3_2species.rda", destfile="phraplOutput_models1-3_2species.rda")
download.file("https://raw.githubusercontent.com/ariadnamorales/phrapl-manual/master/data/tutorial3/phraplOutput_models1-3_3species.rda", destfile="phraplOutput_models1-3_3species.rda")

Post-processing

Concatenate results:

totalResults<-ConcatenateResults(migrationArray=migrationArray)  
(real order)modelsAICparams.KrankdAICwAICparams.vectort1_1.2t2_1-2.3m1_1.2m1_1.3m1_2.1m1_3.1
11102.409023727210.0007.80744156813e-01collapse_1 collapse_20.8859539004194.28733673377NANANANA
42105.627484186323.2181.56217225852e-01collapse_1 collapse_2 migration_11.4299510125575.542726428690.137939540376NA0.137939540376NA
73107.441537070335.0336.30386173283e-02collapse_1 collapse_2 migration_10.6567752701857.64571997328NA0.109671500725NA0.109671500725
83155.2020415102452.7932.68320868533e-12collapse_2 migration_10.0000000000005.01247343036NA1.539211074576NA1.539211074576
21155.3178686531552.9092.53200973488e-12collapse_20.0000000000006.94471465031NANANANA
52155.8614516352653.4521.92998715801e-12collapse_2 migration_10.0000000000005.745809652941.544042397162NA1.544042397162NA
93265.45999152617163.0513.06502455529e-36migration_10.0000000000000.00000000000NA2.149987786716NA2.149987786716
62292.40080363218189.9924.32782946843e-42migration_10.0000000000000.000000000000.460000005209NA0.460000005209 NA 
31340.40080363209237.9921.63381385280e-52 0.0000000000000.00000000000NANANANA

Model averages

And model average parameters

modelAverages<-CalculateModelAverages(totalResults, parmStartCol = 8, keep.na = TRUE)  
t1_1.2t2_1-2.3m1_1.2m1_1.3m1_2.1m1_3.1
0.9564885161714.695158065160.02154853233580.006913539770140.02154853233580.00691353977014

Calculate genealogical divergence index (gdi)

Finally, you can calculate the genealogical divergence index (gdi) between populations A and B using the model averaged parameter values of migration rate and divergence time. This index is a composite metric that estimates overall divergence (between 0 and 1) from the combined effects of genetic drift and gene flow (0 = panmixia; 1 = strong divergence).

gdi<-CalculateGdi(modelAverages$t1_1.2,modelAverages$m1_1.2)
methodxnmeanlowerupper
exact8827100000.8241379310340.814440131880.833500169879

See this paper for a reference on how to interpret this metric, and pay special attention to Fig 6.
“Thus, as a rule of thumb, gdi values less than 0.2 suggest that a single species exists; gdi values above 0.7 suggest there are two species. Values in between indicate ambiguous delimitation (but of course, with values near 0.2 and 0.7 providing stronger or weaker evidence for a single species, respectively), which reflects the reality that there exists a speciation gray zone, where a definitive answer cannot easily be found.”