Tutorial 3: Testing species delimitation hypotheses

Finally, if the question of interest pertains to the delimitation of putative species that are specified in the dataset, then for a given hypothesized tree, you will want to compare the fit of a “full tree” model for which all coalescence times are estimated (i.e., are non-zero) to “collapsed” models for which one or more coalescence times are set to be zero. To do this, rather than adding additional models to a migrationArray, these collapsed models should be specified when running a GridSearch. Thus, for a given set of models in a migrationArray that specify fully resolved trees, the collapsing of specific nodes can be specified using GridSearchs setCollapseZero argument. This argument inputs a vector that defines which coalescence time parameters in a model one would like to be set to zero. So, for our toy dataset, setting setCollapseZero = 1 when fitting an ((A,B),C) isolation-only model will effectively collapse populations A and B into a single population; setting setCollapseZero = 1:2 will simulate a single panmictic population.

Below is an example for how to test the existance of 1, 2, or 3 species using the first three models run above using the toy dataset. I have increased the number of nTrees to 10000 such there are enough simulated trees to calculate the log-likelihood for all the models. These runs will take a few minutes.

This tutorial assumes you have read the paper that describes this method and you are familiar with how to use PHRAPL functions, especially with how to design and edit models.

Jackson N, Carstens BC, Morales AE, O’Meara BC (2017) Species delimitation with gene flow. Systematic Biology. 66:799-812.

Prepare data
GridSearch
Post-processing
1. Concatenate results:
2. Model averages
Calculate genealogical divergence index (gdi)

Prepare data

First, set relevant parameters:

modelRange<-1:3  
nTrees<-10000  

GridSearch

Three species models

Then run and save analyses for three species models:

result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
    popAssignments=popAssignments,observedTrees=observedTrees,
    subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
    nTrees=nTrees) 
save(list="result",file="phraplOutput_models1-3_3species.rda")  

Two species models

Run and save analyses for two species models:

result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
    popAssignments=popAssignments,observedTrees=observedTrees,
    subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
    nTrees=nTrees,setCollapseZero=1) 
save(list="result",file="phraplOutput_models1-3_2species.rda") 

Single species models

And then run and save analyses for single species models:

result<-GridSearch(migrationArray=migrationArray,modelRange=modelRange,
    popAssignments=popAssignments,observedTrees=observedTrees,
    subsampleWeights.df=subsampleWeights.df,subsamplesPerGene=subsamplesPerGene,
    nTrees=nTrees,setCollapseZero=1:2) 
save(list="result",file="phraplOutput_models1-3_1species.rda") 

Post-processing

Concatenate results:

totalResults<-ConcatenateResults(migrationArray=migrationArray)

Model averages

And model average parameters

modelAverages<-CalculateModelAverages(totalResults)

Calculate genealogical divergence index (gdi)

Finally, you can calculate the genealogical divergence index (gdi) between populations A and B using the model averaged parameter values of migration rate and divergence time. This index is a composite metric that estimates overall divergence (between 0 and 1) from the combined effects of genetic drift and gene flow (0 = panmixia; 1 = strong divergence).

gdi<-CalculateGdi(modelAverages$t1_1.2,modelAverages$m1_1.2)