Tutorial 3: Testing species delimitation hypotheses
Finally, if the question of interest pertains to the delimitation of putative species that are specified in the dataset, then for a given hypothesized tree, you will want to compare the fit of a “full tree” model for which all coalescence times are estimated (i.e., are non-zero) to “collapsed” models for which one or more coalescence times are set to be zero. To do this, rather than adding additional models to a migrationArray
, these collapsed models should be specified when running a GridSearch
. Thus, for a given set of models in a migrationArray
that specify fully resolved trees, the collapsing of specific nodes can be specified using GridSearch
s setCollapseZero
argument. This argument inputs a vector that defines which coalescence time parameters in a model one would like to be set to zero. So, for our toy dataset, setting setCollapseZero = 1
when fitting an ((A,B),C) isolation-only model will effectively collapse populations A and B into a single population; setting setCollapseZero = 1:2
will simulate a single panmictic population.
Below is an example for how to test the existance of 1, 2, or 3 species using the first three models run above using the toy dataset. I have increased the number of nTrees to 10000 such there are enough simulated trees to calculate the log-likelihood for all the models. These runs will take a few minutes.
This tutorial assumes you have read the paper that describes this method and you are familiar with how to use PHRAPL functions, especially with how to design and edit models.
- Jackson N, Carstens BC, Morales AE, O’Meara BC (2017) Species delimitation with gene flow. Systematic Biology. 66:799-812.
Prepare data
First, set relevant parameters:
modelRange<-1:3
nTrees<-10000
GridSearch
Three species models
Then run and save analyses for three species models:
result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
popAssignments=popAssignments,observedTrees=observedTrees,
subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
nTrees=nTrees)
save(list="result",file="phraplOutput_models1-3_3species.rda")
Two species models
Run and save analyses for two species models:
result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
popAssignments=popAssignments,observedTrees=observedTrees,
subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
nTrees=nTrees,setCollapseZero=1)
save(list="result",file="phraplOutput_models1-3_2species.rda")
Single species models
And then run and save analyses for single species models:
result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
popAssignments=popAssignments,observedTrees=observedTrees,
subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
nTrees=nTrees,setCollapseZero=1:2)
save(list="result",file="phraplOutput_models1-3_1species.rda")
Post-processing
Concatenate results:
totalResults<-ConcatenateResults(migrationArray=migrationArray)
Model averages
And model average parameters
modelAverages<-CalculateModelAverages(totalResults)
Calculate genealogical divergence index (gdi)
Finally, you can calculate the genealogical divergence index (gdi) between populations A and B using the model averaged parameter values of migration rate and divergence time. This index is a composite metric that estimates overall divergence (between 0 and 1) from the combined effects of genetic drift and gene flow (0 = panmixia; 1 = strong divergence).
gdi<-CalculateGdi(modelAverages$t1_1.2,modelAverages$m1_1.2)