Link Search Menu Expand Document

SUBSAMPLING AND CREATING AN INPUT FOR PHRAPL

Multiple samples improve results accuracy. However, the computational time required to analyse many samples increases considerably as more samples are added. PHRAPL implements a subsampling strategy to reduce computational time and it has been shown that similar model probabilities are recovered when subsampling strategies are implemented and when entire datasets are analyzed.

The idea is to generate multiple smaller datasets that include only a fraction of the current samples. Because this subsampling is performed at random, all samples have the same chance of being part of the dataset. The figure below is a representation of subsampling for two loci and four populations with different sample sizes: A= 16, B=13, C=8 and D=3. In this example 3 individuals per population were sampled, no replicates are shown.

Fig. Subsampling.


Subsample your data and create an input file for PHRAPL

Follow the next steps to subsample your data and create an input file for PHRAPL

Load input files

setwd("/your_working_directory/")
library(phrapl)

#############################
#####   Load input files  ###

# Assignement file
currentAssign<-read.table(your_working_directory/CladeAssignments.txt)

# Trees file
currentTrees<-read.tree(your_working_directory/trees.tre)

Define arguments

#############################
#####   Define arguments  ###

# Number of population and individuals per population that will be subsampled
### Simulations showed that 2 or 3 individuals per population are enough.
popAssignments<-list(c(2,2,2))    # 3 populations and 2 individuals per population

subsamplesPerGene<-200       #100 might be enough.

Do subsampling

Read more about how to subsample trees in the PHRAPL vignette, section III.

#############################
####  Do subsampling  #######

observedTrees<-PrepSubsampling(
assignmentsGlobal=currentAssign,      # the population assignments table
observedTrees=currentTrees,           # the original trees
popAssignments=popAssignments,        # the number of tips subsampled per population
subsamplesPerGene=subsamplesPerGene,  # the number of replicate subsamples to take per locus
outgroup=FALSE,                       # whether an outgroup is present in the dataset (TRUE or FALSE)
outgroupPrune=FALSE)                  # whether an outgroup, if present, should be excluded from the subsampled trees

Get subsample weights

Read more about how to calcule degeneracy weights for subsampled trees in the PHRAPL vignette, section IV.

#############################
### Get subsample weights ###
subsampleWeights.df<-GetPermutationWeightsAcrossSubsamples(popAssignments=popAssignments,
observedTrees=observedTrees)

Save PHRAPL input

You only need to subsample and calculate subsample weights one time, even if you are running different sets of models (migrationArray)

save(list=c("observedTrees","subsampleWeights.df"),file=phraplInput.rda)