SUBSAMPLING AND CREATING AN INPUT FOR PHRAPL
Multiple samples improve results accuracy. However, the computational time required to analyse many samples increases considerably as more samples are added. PHRAPL
implements a subsampling strategy to reduce computational time and it has been shown that similar model probabilities are recovered when subsampling strategies are implemented and when entire datasets are analyzed.
The idea is to generate multiple smaller datasets that include only a fraction of the current samples. Because this subsampling is performed at random, all samples have the same chance of being part of the dataset. The figure below is a representation of subsampling for two loci and four populations with different sample sizes: A= 16, B=13, C=8 and D=3. In this example 3 individuals per population were sampled, no replicates are shown.
Fig. Subsampling.
Subsample your data and create an input file for PHRAPL
Follow the next steps to subsample your data and create an input file for PHRAPL
Load input files
setwd("/your_working_directory/")
library(phrapl)
#############################
##### Load input files ###
# Assignement file
currentAssign<-read.table(your_working_directory/CladeAssignments.txt)
# Trees file
currentTrees<-read.tree(your_working_directory/trees.tre)
Define arguments
#############################
##### Define arguments ###
# Number of population and individuals per population that will be subsampled
### Simulations showed that 2 or 3 individuals per population are enough.
popAssignments<-list(c(2,2,2)) # 3 populations and 2 individuals per population
subsamplesPerGene<-200 #100 might be enough.
Do subsampling
Read more about how to subsample trees in the PHRAPL
vignette, section III.
#############################
#### Do subsampling #######
observedTrees<-PrepSubsampling(
assignmentsGlobal=currentAssign, # the population assignments table
observedTrees=currentTrees, # the original trees
popAssignments=popAssignments, # the number of tips subsampled per population
subsamplesPerGene=subsamplesPerGene, # the number of replicate subsamples to take per locus
outgroup=FALSE, # whether an outgroup is present in the dataset (TRUE or FALSE)
outgroupPrune=FALSE) # whether an outgroup, if present, should be excluded from the subsampled trees
Get subsample weights
Read more about how to calcule degeneracy weights for subsampled trees in the PHRAPL
vignette, section IV.
#############################
### Get subsample weights ###
subsampleWeights.df<-GetPermutationWeightsAcrossSubsamples(popAssignments=popAssignments,
observedTrees=observedTrees)
Save PHRAPL
input
You only need to subsample and calculate subsample weights one time, even if you are running different sets of models (migrationArray
)
save(list=c("observedTrees","subsampleWeights.df"),file=phraplInput.rda)