Link Search Menu Expand Document

INPUT FILES

PHRAPL requires two input files: an assignment file (e.g. cladeAssignments.txt) and a file with phylogenetic trees in newik format (e.g. trees.tre). All files must have the same taxon names (labels) across all loci.

You can have missing data. PHRAPL does not require gene trees to include all individuals/alleles per population/species/group.

You can include both haploid and diploid data in the same analyses using the argument popScaling in the GridSearch function.


Assignment file

The assignment file (cladeAssignments.txt) specifies which individuals belong to which populations/species/groups. This file is a table that must consist of two columns:

  • The first column lists the individuals in the dataset, whose names must match those at the tips of the trees. Note that not all the individuals listed in the table must exist on every tree (i.e., missing data/unique tip names for each tree are fine).
  • The second column should provide the population or species name to which each individual is assigned (e.g., “A”, “B”, “C”, “D”).
## Example of an input assignment file. 
# The first column lists individuals 
# The second column lists populations.
# This example does not include an outgroup.
Indv  PopLabel
ind1	A
ind2	A
ind3	A
ind4	B
ind5	B
ind6	B
ind7	C
ind8	C
ind9	C
ind10	D
ind11	D
ind12	D

If there is an outgroup taxon, it MUST be listed as the last population in the table and the first letter in the population name should also come last alphanumerically (e.g., "Z.outgroup").

## Example of input assignment file with an outgroup (all listed in the last line). 
# The first column lists individuals 
# The second column lists populations.
Indv	PopLabel
ind1	A
ind2	A
ind3	A
ind4	B
ind5	B
ind6	B
ind7	C
ind8	C
ind9	C
ind10	D
ind11	D
ind12	D
ind13	Z.outgroup

Note:

PHRAPL assigns population indexes (i.e., 1, 2, 3, etc.) to taxa/populations alphanumerically, such that population 1 corresponds to the population name in your assignment table that comes first in the alphabet, and so forth. This is important to remember when interpreting parameter nomenclature in the output.A header row of some sort must also be included.