Tree
¶

class
Tree
[source]¶ A phylogenetic tree.
Some instance variables
fName
, if the Tree was read from a filename
, the name of the tree.root
, the root nodenodes
, a list of nodes.preOrder
andpostOrder
, lists of node numbersrecipWeight
, the weight, if it exists, is usually 1/something, so the reciprocal looks nicer …nexusSets
, if it exists, a NexusSets object.
Properties
taxNames
, a list of names. Usually the order is important!data
, aData.Data
objectmodel
, aModel.Model
objectnTax
, the number of taxanInternalNodes
, the number of nonleaf nodes
The node method
You often will want to refer to a node in a tree. This can be done via its name, or its nodeNum, or as an object, via the method
node()
. For example, from a Treet
you can get node number 3 via:n = t.node(3)
and you can get the node that is the parent of the Mastodon via:
n = t.node('Mastodon').parent
For many methods that require specifying a node, the method argument is nodeSpecifier, eg:
t.reRoot(23)
reRoots
’s the tree to node number 23.Describe, draw, and get information about the tree
Print rubbish about self.
Draw the tree to the screen.
Tree Viewer.
Big Tree Viewer.
Returns True if the tree is fully bifurcating.
Asks whether a nexus taxset is a split in the tree.
Returns a list of the leaf names of all children
Returns a list of nodeNums of children of the specified node.
Return the sum of all br.len’s.
Gets a list of nodeNums, in postOrder, above nodeSpecifier.
Returns 2 lists of node numbers, preOrder and postOrder.
Sets self.preOrder and self.postOrder.
Gets a list of seqNums above nodeSpecifier.
Is theNode and everything above it (or below it) bifurcating?
Summarize nNodes for all model components if isHet
For MCMC debugging.
Write
This writes out the Newick tree description to sys.stdout.
Write the tree in Newick, aka Phylip, format.
Write the tree out in Nexus format, in a trees block.
Write the tree in Phylip or Newick format.
Pickle self to a file with a ‘p4_tPickle’ suffix.
See also Trees methods
p4.trees.Trees.writeNexus()
andp4.trees.Trees.writeNewick()
for doing trees by the bunch.Iteration over the nodes
Sometimes you don’t want to just iterate over the self.nodes list, because after some manipulations a node might be in self.nodes but not actually in the tree; using these ‘iter’ methods takes care of that, skipping such nodes.
Internal node generator.
Internal node generator, skipping the root.
Internal post order node generator, skipping the root.
Internal post order node generator, skipping the root.
Internal post order node generator.
Leaf node generator, skipping the root.
Node generator, in preOrder.
Node generator, skipping the root.
Node generator.
Node generator.
Get next node cycling around a hub node.
See also
Node
methods that do similar things starting from a given node.Copy
Duplicates self, but with no cpointers.
Makes and returns a new Tree object, duping part of self.
In combination with Data and Model
Calculate the likelihood of the tree, without optimization.
Calculate the likelihood of the tree, with optimization.
Simulate into the attached data.
Likelihoods, not log likes.
Make a draw from the inferred root character state distribution
Calculate the X^2_m stat
Calculate a statistic from observed and model character frequencies.
Compositional homogeneity test using a null distribution from simulations.
Do model fit tests on the data.
Check that the tree, data, and model specs are good to go.
Do simulations for model fit tests.
Calculate the c_E stat between self.data and model expected composition.
Setting a model
Make, attach, and return a new Comp object.
Make, attach, and return a new RMatrix instance.
Place model components (semi)randomly on the tree.
Set nNodes for all model components
Summarize nNodes for all model components if isHet
Tree manipulation
Add a leaf to a tree.
Add a node between 2 exisiting nodes, which should be parentchild.
Add a leaf to a tree as a sibling, by specifying its parent.
Add a subtree to a tree.
Returns a Trees object containing all possible birootings of self.
Collapse the specified node to make a polytomy, and remove it from the tree.
Collapse all nodes within a clade, but not the clade itself
Rotate nodes for a staircase effect.
Make the leaves line up, as in a cladogram.
Simple nearestneighbor interchange.
Simple nearestneighbor interchange, with specified nodes to interchange.
Remove and return a node, together with everything above it.
Remove and return a subtree, with its parent node and branch
Do a random spr move.
Reroot the tree to the node described by the specifier.
Attach subtree stNode to the rest of the tree at newParent.
Attach subtree stNode to the branch on attachNode
Like it says.
Remove a node, together with everything above it.
Remove everything above an internal node, making it a leaf, and so needing a new name.
Removes the root if self.root is mono or bifurcating.
Rename with phylipfriendly short boring names.
Restore previously removed duplicate taxa from a dict file.
Given the dictionary file, restore proper names.
Rotate a clade around a node.
Subtree pruning and reconnection.
Sets all node.br.len’s to 0.1, the default in p4.
Misc
Check that all taxNames are in the tree, and vice versa.
Check that all self.nodes are actually part of the tree.
Return the topology distance between input trees and pruned self.
Make long integervalued split keys.
Assigns support to the tree, from the PAUP bipartitions table.
Set self.nexusSets from var.nexusSets.
Compares the topology of self with tree2.
Graphically show topology differences.
Matrix of distances along tree path.

addLeaf
(attachmentNode, taxName)¶ Add a leaf to a tree.
The leaf is added to the branch leading from the specified node. A new node is made on that branch, so actually 2 nodes are added to the tree. The new leaf node is returned.

addNodeBetweenNodes
(specifier1, specifier2)¶ Add a node between 2 exisiting nodes, which should be parentchild.
The specifier can be a nodeNum, name, or node object.
Returns the new node object.

addSibLeaf
(attachmentNode, taxName)¶ Add a leaf to a tree as a sibling, by specifying its parent.
The leaf is added so that its parent is the specified node (ie attachmentNode), adding the node as a rightmost child to that parent. The attachmentNode should not be a leaf – it must have children nodes, to which the new leaf can be added as a sibling.
The new node is returned.

addSubTree
(selfNode, theSubTree, subTreeTaxNames=None)¶ Add a subtree to a tree.
The nodes from theSubTree are added to self.nodes, and theSubTree is deleted.
If subTreeTaxNames is provided, fine, but if not this method can find them. Providing them saves a bit of time, I assume.

allBiRootedTrees
()¶ Returns a Trees object containing all possible birootings of self.
Self should have a root node of degree > 2, but need not be fully resolved.
Self needs a taxNames.

ancestralStateDraw
()¶ Make a draw from the inferred root character state distribution
This method works on a tree with an attached model and data.
Conditional on the tree, branch lengths, model, and data, this method infers the ancestral character states of the root node. However, that inference is probabilistic, a distribution, and this method takes a single draw. It returns a string.

bigXSquaredSubM
(verbose=False)¶ Calculate the X^2_m stat
This can handle gaps and ambiguities.
Column zeros in the observed is not a problem with this stat, as we are dividing by the expected composition, and that comes from the model, which does not allow compositions with values of zero.

btv
()[source]¶ Big Tree Viewer. Show the tree in a gui window.
This is for looking at big trees. The viewer has 2 panels – one for an overall view of the whole tree, and one for a zoomed view, controlled by a selection rectangle on the whole tree view.
Needs Tkinter.
If you have nexus taxsets defined, you can show them.

calcLogLike
(verbose=1, resetEmpiricalComps=True)¶ Calculate the likelihood of the tree, without optimization.

checkThatAllSelfNodesAreInTheTree
(verbose=False, andRemoveThem=False)¶ Check that all self.nodes are actually part of the tree.
Arg andRemoveThem will remove those nodes, renumber the nodes, and reset pre and postOrder, and return None
If andRemoveThem is not set (the default is not set) then this method returns the list of nodes that are in self.nodes but not in the tree.

collapseClade
(specifier)¶ Collapse all nodes within a clade, but not the clade itself
Arg specifier, as usual, can be a node, node number, or node name. It calls Tree.collapseNode repeatedly until the clade is a comb. Should be useful for making constraint trees.

collapseNode
(specifier)¶ Collapse the specified node to make a polytomy, and remove it from the tree.
Arg specifier, as usual, can be a node, node number, or node name.
The specified node remains in self.nodes, and is returned.

compStatFromCharFreqs
(verbose=False)¶ Calculate a statistic from observed and model character frequencies.
Call it c_m, little c sub m.
It is calculated from observed character frequencies and character frequencies expected from the (possibly treeheterogeneous) model.
It would be the sum of abs(obsexp)/exp

compoTestUsingSimulations
(nSims=100, doIndividualSequences=0, doChiSquare=0, verbose=1)¶ Compositional homogeneity test using a null distribution from simulations.
This does a compositional homogeneity test on each data partition. The statistic used here is X^2, obtained via Data.compoChiSquaredTest().
The null distribution of the stat is made using simulations, so of course you need to provide a tree with a model, with optimized branch lengths and model parameters. This is a comp homogeneity test, so the model should be treehomogeneous.
The analysis usually tests all sequences in the data partition together (like paup), but you can also ‘doIndividualSequences’ (like puzzle). Beware that the latter is a multiple simultaneous stats test, and so the power may be compromized.
For purposes of comparison, this test can also do compo tests in the style of PAUP and puzzle, using chisquare to assess significance. Do this by turning ‘doChiSquare’ on. The compo test in PAUP tests all sequences together, while the compo test in puzzle tests all sequences separately. There are advantages and disadvantages to the latter– doing all sequences separately allows you to identify the worst offenders, but suffers due to the problems of multiple simultaneous stats tests. There are slight differences between the computation of the Chisquare in PAUP and puzzle and the p4 version. The compo test in PAUP (basefreq) does the chisquared test, but if sequences are blank it still counts them in the degrees of freedom; p4 does not count blank sequences in the degrees of freedom. Puzzle simply uses the row sums, ie the contributions of each sequence to the total Xsquared, and assesses significance with chisquared using the number of symbols minus 1 as the degrees of freedom. Ie for DNA dof=3, for protein dof=19. Puzzle correctly gets the composition from sequences with gaps, but does not do the right thing for sequences with ambiguities like r, y, and so on. P4 does calculate the composition correctly when there are such ambiguities. So p4 will give you the same numbers as paup and puzzle for the chisquared part as long as you don’t have blank sequences or ambiguities like r and y.
This uses the Data.compoChiSquaredTest() method to get the stats. See the doc string for that method, where it describes how zero column sums (ie some character is absent) can be dealt with. Here, when that method is invoked, ‘skipColumnZeros’ is turned on, so that the analysis is robust against data with zero or low values for some characters.
For example:
# First, do a homog opt, and pickle the optimized tree. # Here I use a bionj tree, but you could use whatever. read('d.nex') a = var.alignments[0] dm = a.pDistances() t = dm.bionj() d = Data() t.data = d t.newComp(free=1, spec='empirical') t.newRMatrix(free=1, spec='ones') t.setNGammaCat(nGammaCat=4) t.newGdasrv(free=1, val=0.5) t.setPInvar(free=0, val=0.0) t.optLogLike() t.name = 'homogOpt' t.tPickle() # Then, do the test ... read('homogOpt.p4_tPickle') t = var.trees[0] read('d.nex') d = Data() t.data = d t.compoTestUsingSimulations() # Output would be something like ... # Composition homogeneity test using simulations. # Pvalues are shown. # Part Num 0 # Part Name all #   # All Sequences 0.0000 # Or using more sims for more precision, and also doing the # Chisquare test for contrast ... t.compoTestUsingSimulations(nSims=1000, doChiSquare=True) # Output might be something like ... # Composition homogeneity test using simulations. # Pvalues are shown. # (Pvalues from ChiSquare are shown in parens.) # Part Num 0 # Part Name all #   # All Sequences 0.0140 # (ChiSquared Prob) (0.9933) # It returns the Pvalue for the sims for the first data # partition.
It is often the case, as above, that this test will show significance while the Chisquare test does not.

property
data
¶ (property) The data object

deleteCStuff
()¶ Deletes cpointers from nodes, self, and model, but not the data.

draw
(showInternalNodeNames=1, addToBrLen=0.2, width=None, showNodeNums=1, partNum=0, model=None)¶ Draw the tree to the screen.
This method makes a text drawing of the tree and writes it to sys.stdout.
Arg addToBrLen adds, by default 0.2, to each branch length, to make the tree more legible. If you want the branch lengths more realistic, you can set it to zero, or better, use vector graphics for drawing the trees.
Setting arg model aids in drawing trees with treehetero models. If the model characteristic (usually composition or rMatrix) differs over the tree, this method can draw it for you.
See also
Tree.Tree.textDrawList()
, which returns the drawing as a list of strings.See the method
Tree.Tree.setTextDrawSymbol()
, which facilitates drawing different branches with different symbols.

drawTopologyCompare
(treeB, showNodeNums=False)[source]¶ Graphically show topology differences between two trees.
The two trees (self and treeB) are drawn as text, with differences highlighted with a thick branch.
The taxNames need to be set, and need to be the same for both trees.

dump
(tree=0, node=0, model=0, all=0)[source]¶ Print rubbish about self.
 tree
is the default, showing basic info about the tree.
 node
shows info about all the nodes.
 model
shows which model component number goes on which node. (which you can also get by drawing the tree)
(If you want the info about the model itself, do a aTree.model.dump() instead.)

dupe
()[source]¶ Duplicates self, but with no cpointers. And no data object.
If there is a model, it is duped.
Returns a copy of self.

dupeSubTree
(dupeNodeSpecifier, up, doBrLens=True, doSupport=True)¶ Makes and returns a new Tree object, duping part of self.
The dupeNodeSpecifier can be a node name, node number, or node object.
Arg ‘up’ should be True or False.
The returned subtree has a rootonastick.
BrLens are not duped – brLens are default in the new subtree.
So if the tree is like this:
+2:A +1  +3:B  04:C   +6:D +5:dupeNode  +8:E +7 +9:F
Then the subtree from node 5 up is:
+2:D subTreeRoot:01:dupeNode  +4:E +3 +5:F
and the subtree from node 5 down is:
+2:A +1 dupeNode:05 +3:B  +4:C

eps
(fName=None, width=500, putInternalNodeNamesOnBranches=0)¶ Make a basic eps drawing of self.
The ‘width’ is in postscript points.
By default, internal node names label the node, where the node name goes on the right of the node. You can make the node name label the branch by setting ‘putInternalNodeNamesOnBranches’.

getChildrenNums
(specifier)[source]¶ Returns a list of nodeNums of children of the specified node.
See also Node.getNChildren()

getEuclideanDistanceFromSelfDataToExpectedComposition
()¶ Calculate the c_E stat between self.data and model expected composition.
The expected composition comes from the current tree (self) and model. There is an expected composition of each sequence in each part, and is obtained via pf.p4_expectedComposition(cTree). In nonstationary evolution, the expected composition of sequences approach the model composition asymptotically as the branch increases.
I am calling the Euclidean distance from the actual sequence composition to the expected composition c_E.
 Returns
A list of lists — the c_E for each sequence, for each part. Order of the sequences is as in the Data.

getNodeNumsAbove
(nodeSpecifier, leavesOnly=0)[source]¶ Gets a list of nodeNums, in postOrder, above nodeSpecifier.
The node specified is not included.

getNodeOnReferenceTreeCorrespondingToSelfRoot
(refTree, verbose=True)[source]¶ Find, on a ref tree, a node corresponding to where the self root is.
This works for both bifurcating and nonbifurcating roots (of self).
The refTree should not be birooted.
To facilitate doing a lot of (self) trees, a counter in the corresponding node in the refTree is incremented. The refTree is rooted on the first taxon. For birooted trees, the node.br.biRootCount is incremented, and for nonbirooted trees the node.rootCount is incremented. The root counts should therefore be immune to reRoot()’ing.

getPreAndPostOrderAbove
(nodeSpecifier)[source]¶ Returns 2 lists of node numbers, preOrder and postOrder.
This uses a stack, not recursion, so it should work for large trees without bumping into the recursion limit. The 2 lists are relative to the node specified, and include the node specified. PreOrder starts from theNode and goes to the tips; postOrder starts from the tips and goes to theNode.

getPreAndPostOrderAboveRoot
()[source]¶ Sets self.preOrder and self.postOrder.
This uses a stack, not recursion, so it should work for large trees without bumping into the recursion limit. PreOrder starts from the root and goes to the tips; postOrder starts from the tips and goes to the root.

getSiteLikes
()¶ Likelihoods, not log likes. Placed in self.siteLikes, a list.

inputTreesToSuperTreeDistances
(inputTrees, doSd=True, doScqdist=True)¶ Return the topology distance between input trees and pruned self.
Either or both of two distances, RF (‘sd’) and quartet (‘scqdist’) distances, are returned.
See also the Trees method of the same name that does a bunch of them.

isFullyBifurcating
(verbose=False, biRoot=False)[source]¶ Returns True if the tree is fully bifurcating. Else False.
If arg biRoot is True, then it is required that the tree be biRoot’ed (ie have a bifurcating root)
If arg biRoot is False, then biRoot’ed trees are not allowed, but trees rooted on a leaf are allowed.

iterInternalsNoRootPostOrder
()[source]¶ Internal post order node generator, skipping the root. Assumes preAndPostOrderAreValid.

iterInternalsNoRootPreOrder
()[source]¶ Internal post order node generator, skipping the root. Assumes preAndPostOrderAreValid.

iterInternalsPostOrder
()[source]¶ Internal post order node generator. Assumes preAndPostOrderAreValid.

ladderize
(biggerGroupsOnBottom=True)¶ Rotate nodes for a staircase effect.
This method, in its default biggerGroupsOnBottom way, will take a tree like this:
+4:A +3 +2 +5:B    +6:C +1   +8:D  +7  +9:E 0 10:F   +12:G +11 +13:H
and rearranges it so that it is like …
+10:F   +12:G 11  +13:H 0  +8:D  +7   +9:E +1  +6:C +2  +4:A +3 +5:B
Note that for each node, the more populated group is on the bottom, the secondmost populated second, and so on.
To get it with the bigger groups on top, set biggerGroupsOnBottom=False. I made the default with the bigger groups on the bottom so that it often makes room for a scale bar.
The setting biggerGroupsOnBottom, the default here, would equivalent to set torder=right in paup; torder=left puts the bigger groups on the top.

lineUpLeaves
(rootToLeaf=1.0, overWriteBrLens=True)¶ Make the leaves line up, as in a cladogram.
This makes the rootToLeaf distance the same for all leaves.
If overWriteBrLens is set, then the newly calculated br.lens replace the original br.lens. If it is not set, then the new br.lens are placed in br.lenL, and does not overwrite the original br.lens.

makeSplitKeys
(makeNodeForSplitKeyDict=True)[source]¶ Make long integervalued split keys.
This needs to have self.taxNames set.
We make 2 kinds of split keys– rawSplitKeys and splitKeys. Both are attributes of node.br, so we have eg node.br.splitKey.
Raw split keys for terminal nodes are 2**n, where n is the index of the taxon name. Eg for the first taxon, the rawSplitKey will be 1, for the 3rd taxon the rawSplitKey will be 4.
RawSplitKeys for internal nodes are the rawSplitKey’s for the children, bitwise OR’ed together.
SplitKeys, cf rawSplitKeys, are in ‘standard form’, where the numbers are even, ie do not contain the 1bit. Having it in standard form means that you can compare splits among trees. If the rawSplitKey is even, then the splitKey is simply that, unchanged. If, however, the rawSplitKey is odd, then the splitKey is the rawSplitKey bitflipped. For example, if there are 5 taxa, and one of the rawSplitKeys is 9 (odd), we can calculate the splitKey by bitflipping, as:
01001 = 9 rawSplitKey 10110 = 22 splitKey
(Bitflipping is done by exclusiveor’ing (xor) with 11111.)
The splitKey is readily converted to a splitString for display, as 22 becomes ‘.**.*’ (note the ‘1’ bit is now on the left). It is conventional that the first taxon, on the left, is always a dot. (I don’t know where the convention comes from.)
The root has no rawSplitKey or splitKey.
For example, the tree:
+2:B (rawSplitKey = 2) +1  +3:C (rawSplitKey = 4)  04:E (rawSplitKey = 16)   +6:A (rawSplitKey = 1) +5 +7:D (rawSplitKey = 8)
has 2 internal splits, on nodes 1 and 5.
Node n.br.rawSplitKey n.br.splitKey 1 6 6 5 9 22
There should be no duplicated rawSplitKeys, but if the tree has a bifurcating root then there will be a duped splitKey.
This method will fail for trees with internal nodes that have only one child, because that will make duplicated splits.
If arg makeNodeForSplitKeyDict is set, then it will make a dictionary
nodeForSplitKeyDict
where the keys are the splitKeys and the values are the corresponding nodes.

property
model
¶ (property) The model object

modelFitTests
(fName='model_fit_tests_out', writeRawStats=0)¶ Do model fit tests on the data.
The two tests are the GoldmanCox test, and the tree and model based composition fit test. Both require simulations with optimizations in order to get a null distribution, and those simulations need to be done before this method. The simulations should be done with the simsForModelFitTests() method.
Self should have a data and a model attached, and be optimized.
The GoldmanCox test (Goldman 1993. Statistical tests of models of DNA substitution. J Mol Evol 36: 182198.) is a test for overall fit of the model to the data. It does not work if the data have gaps or ambiguities.
The tree and modelbased composition test asks the question: ‘Does the composition implied by the model fit the data?’ If the model is homogeneous and empirical comp is used, then this is the same as the chisquare test except that the null distribution comes from simulations, not from the chisquare distribution. In that case only the question is, additionally, ‘Are the data homogeneous in composition?’, ie the same question asked by the chisquare test. However, the data might be heterogeneous, and the model might be heterogeneous over the tree; the tree and modelbased composition fit test can ask whether the heterogeneous model fits the heterogeneous data. The composition is tested in each data partition, separately. The test is done both overall, ie for all the sequences together, and for individual sequences.
If you just want a compo homogeneity test with empirical homogeneous comp, try the compoTestUsingSimulations() method– its way faster, because there are not optimizations in the sims part.
Output is verbose, to a file.

modelSanityCheck
(resetEmpiricalComps=True)¶ Check that the tree, data, and model specs are good to go.
Complain and exit if there is anything wrong that might prevent a likelihood evaluation from being done. We are assuming that a data object exists and is attached, and that model stuff has been set.
Check that each part has at least 1 each from comps, rMatrices, and gdasrvs (if nGammaCat is > 1).
Check that each node has a comp, rMatrix, and gdasr. Check that all comps, rMatrices, gdasrvs are used on a node somewhere.
Here relRate, ie the relative rate of each data partition, is adjusted based on the size of the data partitions.
newRelRate_p = oldRelRate_p * (Sum_p[oldRelRate_i * partLen_i] / Sum[partLen_i])
That ensures that Sum(relRate_i * partLen_i) = totalDataLength, ie that the weighted mean of the rates is 1.0.
This method also tallies up the number of free prams in the whole model, and sets self.model.nFreePrams.

property
nInternalNodes
¶ (property) nInternalNodes

property
nTax
¶ (property) nTax

newComp
(partNum=0, free=0, spec='empirical', val=None, symbol=None)¶ Make, attach, and return a new Comp object.
The arg spec should be a string, one of:
'equal' no val 'empirical' no val 'specified' val=[aList] 'wag', etc no val (ie one of the empirical protein models, including cpREV, d78, jtt, mtREV24, mtmam, wag, etc)
If spec=’specified’, then you specify dim or dim1 values in a list as the ‘val’ arg.
This method returns a Comp object, which you can ignore if it is a treehomogeneous model. However, if it is a treehetero model then you may want to get that Comp object so that you can place it on the tree explicitly with setModelComponentOnNode(), like this:
c0 = newComp(partNum=0, free=1, spec='empirical') c1 = newComp(partNum=0, free=1, spec='empirical') myTree.setModelComponentOnNode(c0, node=myTree.root, clade=1) myTree.setModelComponentOnNode(c1, node=5, clade=1) myTree.setModelComponentOnNode(c1, node=18, clade=0)
Alternatively, you can simply let p4 place them randomly:
newComp(partNum=0, free=1, spec='empirical') newComp(partNum=0, free=1, spec='empirical') myTree.setModelComponentsOnNodesRandomly()
Calculation of probability matrices for likelihood calcs etc are wrong when there are any comp values that are zero, so that is not allowed. Any zeros are converted to var.PIVEC_MIN, which is 1e13 this week. Hopefully close enough to zero for you.

newGdasrv
(partNum=0, free=0, val=None, symbol=None)¶

newRMatrix
(partNum=0, free=0, spec='ones', val=None, symbol=None)¶ Make, attach, and return a new RMatrix instance.
spec should be one of:
‘ones’  for JC, poisson, F81
‘2p’  for k2p and hky
‘specified’
‘cpREV’
‘d78’
‘jtt’
‘mtREV24’
‘mtmam’
‘wag’
‘rtRev’
‘tmjtt94’
‘tmlg99’
‘lg’
‘blosum62’
‘hivb’
‘mtart’
‘mtzoa’
‘gcpREV’
‘stmtREV’
‘vt’
‘pmb’
See var.rMatrixProteinSpecs
You do not set the ‘val’ arg unless the spec is ‘specified’ or ‘2p’. If spec=’2p’, then you set val to kappa.
If the spec is ‘specified’, you specify all the numerical values in a list given as the ‘val’ arg. The length of that list will be (((dim * dim)  dim) / 2), so for DNA, where dim=4, you would specify a list containing 6 numbers.

nextNode
(spokeSpecifier, hubSpecifier)[source]¶ Get next node cycling around a hub node.
A bit of a hack to make a p4 Node behave sorta like a Felsenstein node. Imagine cycling around the branches emanating from a node like spokes on a hub, starting from anywhere, with no end.
The hub node would usually be the parent of the spoke, or the spoke would be the hub itself. Usually self.nextNode(spoke,hub) delivers spoke.sibling. What happens when the siblings run out is that self.nextNode(rightmostSibling, hub) delivers hub itself, and of course its branch (spoke) points toward the hub.parent. (Unless hub is the root, of course, in which case self.nextNode(rightmostSibling, hub) delivers hub.leftChild.) In the usual case of the hub not being the root, the next node to be delivered by nextNode(spokeIsHub, hub) is usually the leftChild of the hub. Round and round, clockwise.

nni
(upperNodeSpec=None)¶ Simple nearestneighbor interchange.
You specify an ‘upper’ node, via an upperNodeSpec, which as usual can be a node name, node number, or node object. If you don’t specify something, a random node will be chosen for you. (This latter option might be a little slow if you are doing many of them, as it uses iterInternalsNoRoot(), but mostly it should be fast enough).
The upper node has a parent – the ‘lower’ node. One subtree from the upper node and one subtree from the lower node are exchanged. Both subtrees are chosen randomly.
This works on biRooted trees also, preserving the biRoot.

nni2
(upperSubTreeNode, lowerSubTreeNode)¶ Simple nearestneighbor interchange, with specified nodes to interchange.
You specify an ‘upper’ node, via upperSubTreeNode, which as usual can be a node name, node number, or node object. You also specify a ‘lower’ node, via lowerSubTreeNode.
The grandparent of the upperSubTreeNode must exist and must be the parent of the lowerSubTreeNode.
This works on biRooted trees also, preserving the biRoot.

node
(specifier)¶ Get a node based on a specifier.
The specifier can be a nodeNum, name, or node object.

optLogLike
(verbose=1, method='newtAndBrentPowell')¶ Calculate the likelihood of the tree, with optimization.
There are different optimization methods– choose one. I’ve made ‘newtAndBrentPowell’ the default, as it is fast and seems to be working. The ‘allBrentPowell’ optimizer used to be the default, as it seems to be the most robust, although it is slow. It would be good for checking important calculations.
There are two new optimization methods from the nlopt library — ‘newtAndBOBYQA’ and ‘BOBYQA’. They are fast and seem to work well.
For difficult optimizations it may help to repeat the call to optLogLike(), perhaps with a different method.

optTest
()¶

parseNewick
(flob, translationHash, doModelComments=0)[source]¶ Parse Newick tree descriptions.
This is stackbased, and does not use recursion.

parseNexus
(flob, translationHash=None, doModelComments=0)[source]¶ Start parsing nexus format newick tree description.
From just after the command word ‘tree’, to the first paren of the Newick part of the tree.
 Parameters
flob – an open file or filelike object
translationHash (dict) – associates short names or numbers with long
taxon names (proper) –
doModelComments (bool) – whether to parse p4specific model command
in the tree description (comments) –
 Returns
None

patristicDistanceMatrix
()¶ Matrix of distances along tree path.
This method sums the branch lengths between each pair of taxa, and puts the result in a DistanceMatrix object, which is returned.
Self.taxNames is required.

pruneSubTreeWithParent
(specifier)¶ Remove and return a subtree, with its parent node and branch
 Parameters
specifier – Arg
specifier
can be a nodeNum, node name, or node object. Returns
The node that is the parent of the specified node is returned.
The nodes are left in self (ie they are not removed from the self.nodes list; the idea being that the subtree will be added back to the tree again (via
reconnectSubTreeWithParent()
).Let’s say that we want to detach the subtree composed of taxa E, F, and G, below:
# +1:A #  # 2:B #  # 0 +4:C #   #   +6:D #    #  5 +8:E # +3   #  +7 +10:F #  +9 #  +11:G #  # +12:H
This really means that we want to remove nodes 5 (yes!) and 7 as well, and their branches. To specify that subtree, we can use node 7 as the specifier when we call this method. The node that is returned in this case is node 5
stNode = t.pruneSubTreeWithParent(7) print(f"The returned object is a {stNode}, nodeNum {stNode.nodeNum}") # The returned object is a <p4.node.Node object at 0x107748eb8>, nodeNum 5
At this point we will have the tree as shown here:
# +1:A #  # 2:B # 0 #  +4:C #   # +36:D #  # +10:G
Now we would use the companion method
reconnectSubTreeWithParent()
to reconnect the subtree. For example we can reconnect it in the same place that it was, by specifying node 6 as the arg attachNodet.reconnectSubTreeWithParent(stNode, 6)
That restores the original tree, as shown above.
The subtree can be a single leaf if you wish. In that case the subtree is specified by the leaf, and is composed of the leaf and its parent, including both their branches.
At the moment it does not work for nodes where the parent is not bifurcating.

pruneSubTreeWithoutParent
(specifier, allowSingleChildNode=False)¶ Remove and return a node, together with everything above it.
Arg specifier can be a nodeNum, name, or node object.
By default, the arg allowSingleChildNode is turned off, and is for those cases where the parent of the node has more than 2 children. So when the subTree is removed, the parent node that is left behind has more than one child.
The stuff that is removed is returned. The nodes are left in self; the idea being that the subTree will be added back to the tree again (via reconnectSubTreeWithoutParent()).

randomSpr
()¶ Do a random spr move.

randomizeTopology
(randomBrLens=True)¶

reRoot
(specifier, moveInternalName=True, stompRootName=True, checkBiRoot=True, fixRawSplitKeys=False)¶ Reroot the tree to the node described by the specifier.
The specifier can be a node.nodeNum, node.name, or node object.
Here is a potential problem. Lets say you start with this tree, with split support as shown:
(((A, B)99, C)70, D, E); +3:A +2:99 +1:70 +4:B    +5:C 0 6:D  +7:E
Now we want to reRoot it to node 2. So we do that. Often, node.name’s are really there to label the branch, not the node; you may be using node.name’s to label your branches (splits), eg with support values. If that is the case, then you want to keep the node name with the branch (not the node) as you reRoot(). Ie you want node labels to behave like branch labels. If that is the case, we set moveInternalName=True; that is the default. When that is done in the example above, we get:
(A, B, (C, (D, E)70)99); +3:A  4:B 2  +5:C +1:99  +6:D +0:70 +7:E
Another possibility is that the node names actually are there to name the node, not the branch, and you want to keep the node name with the node during the rerooting process. That can be done by setting moveInternalName=False, and the tree below is what happens when you do that. Each internal node.name stays with its node in the rerooting process:
(A, B, (C, (D, E))70)99; +3:A  4:B 99:2  +5:C +1:70  +6:D +0 +7:E
Now if you had the default moveInternalName=True and you had a node name on the root, that would not work (draw it out to convince yourself …). So in that case you probably want stompRootName=True as well — Both are default. If you set stompRootName=True, it gives you a little warning as it does it. If you set stompRootName=2, it will do it silently. If moveInternalName is not set, then stompRootName is not used.
You probably do not want to
reRoot()
if the current root is bifurcating. If you do that, you will get a node in the tree with a single child, which is strictly ok, but useless and confusing. So by default I checkBiRoot=True, and throw a P4Error if there is one. If you want to draw such a pathological tree with a node with a single child, set checkBiRoot=False, and it will allow it.

readBipartitionsFromPaupLogFile
(thePaupLogFileName)¶ Assigns support to the tree, from the PAUP bipartitions table.
This needs to have self.taxNames set.
This is useful if you want to make a consensus tree using PAUP, and get the support values. When you make a cons tree with PAUP, the support values, usually bootstrap values, are unfortunately not saved with the tree. That information is in the Bipartitions table, which can be saved to a PAUP log file so that p4 can get it. This method will read thePaupLogFileName and extract the split (tree bipartition) supports, and assign those supports to self as node.br.support’s (as a float, not a string).
It also returns a hash with the split strings as keys and the split support as values, if you need it.

reconnectSubTreeWithParent
(stNode, attachNode)¶ Attach subtree stNode to the branch on attachNode
This is a companion method to Tree.pruneSubTreeWithParent()
The subtree is attached on the branch on attachNode.

reconnectSubTreeWithoutParent
(stNode, newParent, beforeNode=None)¶ Attach subtree stNode to the rest of the tree at newParent.
The beforeNode is by default None, and then the subtree is reconnected as the rightmost child of the new parent. However, if you want it somewhere else, for example as the leftmost child, or between two existing child nodes, specify a beforeNode (specified as usual as a node, nodeNumber, or node name) and the subtree will be inserted there.

removeAboveNode
(specifier, newName)¶ Remove everything above an internal node, making it a leaf, and so needing a new name.

removeEverythingExceptCladeAtNode
(specifier)¶ Like it says. Leaves a tree with a rootonastick.

removeNode
(specifier, alsoRemoveSingleChildParentNode=True, alsoRemoveBiRoot=True, alsoRemoveSingleChildRoot=True, deleteCStuff=True)¶ Remove a node, together with everything above it.
Arg specifier can be a nodeNum, name, or node object.
So lets say that we have a tree like this:
+1:A 0  +3:B +2 +4:C
and we remove node 4. When it is removed, node 2 ends up having only one child. Generally you would want to remove it as well (so that the parent of node 3 is node 0), so the option alsoRemoveSingleChildParentNode is turned on by default. If alsoRemoveSingleChildParentNode is turned off, nodes like node 2 get left in the tree.
Removal of a node might cause the creation of a bifurcating root. I assume that is not desired, so alsoRemoveBiRoot is turned on by default.
In the example above, if I were to remove node 4, by default node 2 would also disappear, but by default node 0 would also disappear because it would then be a tree with a bifurcating root node. So starting with a 5node tree, by removing 1 node you would end up with a 2node tree, with 1 branch.
In the example here, if I were to simply remove node 1:
+1:A  0 +3:B +2  +5:C +4 +6:D
Then node 0 would remain, as:
+2:B 01  +4:C +3 +5:D
Presumably that is not wanted, so the arg alsoRemoveSingleChildRoot is turned on by default. When the root is removed, we are left with a biroot, which (if the arg alsoRemoveBiRoot is set) would also be removed. The resulting tree would be:
+0:B  12:C  +3:D
The deleted nodes are really deleted, and do not remain in self.nodes.

removeRoot
()¶ Removes the root if self.root is mono or bifurcating.
This removes the root node if the tree is rooted on a terminal node, or if the tree is rooted on a bifurcating node. Otherwise, it refuses to do anything.
In the usual case of removing a bifurcating root, the branch length of one fork of the bifurcation is added to the other fork, so the tree length is preserved.
In the unusual case of removing a monofurcating root (a root that is a terminal node, a treeonastick) then its branch length disappears.

renameForPhylip
(dictFName='p4_renameForPhylip_dict.py')¶ Rename with phylipfriendly short boring names.
It saves the old names (together with the new) in a python dictionary, in a file by default named p4_renameForPhylip_dict.py
If self does not have taxNames set, it does not write originalNames to that file– which may cause problems restoring names. If you want to avoid that, be sure to set self.taxNames before you do this method.
This method does not deal with internal node names, at all. They are silently ignored. If they are too long for phylip, they are still silently ignored, which might cause problems.

restoreDupeTaxa
(dictFileName='p4DupeSeqRenameDict.py', asMultiNames=True)¶ Restore previously removed duplicate taxa from a dict file.
The usual story would be like this: You read in your alignment and p4 tells you that you have duplicate sequences. So you use the Alignment method checkForDuplicateSequences() to remove them, which makes a dictionary file, by default ‘p4DupeSeqRenameDict.py’, to facilitate restoration of the names. You do your analysis on the reduced alignment, and get a tree. Then you use the dictionary file with this method to restore all the taxon names.
If asMultiNames is turned on, the default, then the leaf nodes are not replicated, and the name is changed to be a long multiname.
If asMultiNames is turned off, then the restored taxa are made to be siblings, and the branch lengths are set to zero.

restoreNamesFromRenameForPhylip
(dictFName='p4_renameForPhylip_dict.py')¶ Given the dictionary file, restore proper names.
The renaming is done by the Alignment method renameForPhylip(), which makes the dictionary file. The dictionary file is by default named p4_renameForPhylip_dict.py

rotateAround
(specifier)¶ Rotate a clade around a node.
The specifier can be a nodeNum, name, or node object.

setCStuff
()¶ Transfer info about self to clanguage stuff.
Transfer relationships among nodes, the root position, branch lengths, model usage info (ie what model attributes apply to what nodes), and pre and postorder.

setEmpiricalComps
()¶ Set any empirical model comps to the comp of the data.
This is done by self.modelSanityCheck(), but sometimes you may want to do it at other times. For example, do this after exchanging Data objects, or after simulating. In those cases there does not seem to be a reasonable way to do it automatically.

setModelComponentOnNode
(theModelComponent, node=None, clade=1)¶

setModelComponentsNNodes
()¶ Set nNodes for all model components

setModelComponentsOnNodesRandomly
(forceRepresentation=2)¶ Place model components (semi)randomly on the tree.
For example, if there are 2 compositions in model part partNum, this method will decorate each node of the tree with zeros and ones, randomly. The actual component set is node.parts[partNum].compNum. If the model is homogeneous, it will just put zeros in all the nodes.
We want to have each model component on the tree somewhere, and so it is not really randomly set. If the model component numbers were assigned randomly on the tree, it may occur that some model component numbers by chance would not be represented. This is not allowed, and you can set forceRepresentation to some positive integer, 1 or more. That number will be the lower limit allowed on the number of nodes that get assigned the model component number. For example, if forceRepresentation is set to 2, then each model component must get assigned to at least 2 nodes.

setModelThingsRandomly
(forceRepresentation=2)¶ This method has been renamed to setModelComponentsOnNodesRandomly

setNGammaCat
(partNum=0, nGammaCat=1)¶

setNexusSets
()[source]¶ Set self.nexusSets from var.nexusSets.
A deepcopy is made of var.nexusSets, only if it exists. If var.nexusSets does not yet exist, a new blank one is not made (cf this method in Alignment class, where it would be made).
Important! This method depends on a correct taxNames.

setPInvar
(partNum=0, free=0, val=0.0)¶

setPreAndPostOrder
()[source]¶ Sets or resets self.preOrder and self.postOrder lists of node numbers.
PreOrder starts from the root and goes to the tips; postOrder starts from the tips and goes to the root.

setRelRate
(partNum=0, val=0.0)¶

setTextDrawSymbol
(theSymbol='', node=None, clade=1)¶

simsForModelFitTests
(reps=10, seed=None)¶ Do simulations for model fit tests.
The model fit tests are the GoldmanCox test, and the tree and modelbased composition fit test. Both of those tests require simulations, optimization of the tree and model parameters on the simulated data, and extraction of statistics for use in the null distribution. So might as well do them together. The GoldmanCox test is not possible if there are any gaps or ambiguities, and in that case GoldmanCox simulation stats are not collected.
Doing the simulations is therefore the timeconsuming part, and so this method facilitates doing that job in sections. If you do that, set the random number seed to different numbers. If the seed is not set, the process id is used. (So obviously you should explicitly set the seed if you are doing several runs in the same process.) Perhaps you may want to do the simulations on different machines in a cluster. The stats are saved to files. The output files have the seed number attached to the end, so that different runs of this method will have different output file names. Hopefully.
When your model uses empirical comps, simulation uses the empirical comp of the original data for simulation (good), then the optimization part uses the empirical comp of the newlysimulated data (also good, I think). In that case, if it is treehomogeneous, the X^2_m statistic would be identical to the X^2 statistic.
You would follow this method with the modelFitTests() method, which uses all the stats files to make null distributions to assess significance of the same stats from self.

simulate
(calculatePatterns=True, resetSequences=True, resetNexusSetsConstantMask=True, refTree=None)¶ Simulate into the attached data.
The tree self needs to have a data and model attached.
Generation of random numbers uses the GSL random number generator. The state is held in var.gsl_rng, which is None by default. If you do a simulation using this method, it will use
var.gsl_rng
if it exists, or make it if it does not exist yet. When it makes it, it seeds the state based on the current time. That should give you lots of variation in the simulations.If on the other hand you want to make simulations that are the same you can reseed the randomizer with the same seed whenever you do it, like this:
if not var.gsl_rng: var.gsl_rng = pf.gsl_rng_get() # unusually, set the seed with each simulation mySeed = 23 # your chosen int seed pf.gsl_rng_set(var.gsl_rng, mySeed)
The usual way to simulate does not use reference data. An unusual way to simulate comes from (inspired by?) PhyloBayes, where the simulation is conditional on the original data. It uses conditional likelihoods of that reference data at the root. To turn that on, set refTree to the tree+model+data that you would like to use. Calculate a likelihood with that refTree before using it, so that conditional likelihoods are set. The tree and model for refTree should be identical to the tree and model for self.
 Parameters
calculatePatterns (bool) – True by default. Whether to “compress” the newly simulated data to facilitate a faster likelihood calculation.
resetSequences (bool) – True by default. whether to bring the simulated sequences in C back into Python
resetNexusSetsConstantMask (bool) – True by default. When simulations are made, the constant mask in any associated nexus sets will get out of sync. Setting this to True makes a new mask and sets it.
refTree (Tree) – None by default. If supplied, a tree+model+data which has had its likelihood calculated, where the tree+model is identical to self.

spr
(pruneNode=None, above=True, graftNode=None)¶ Subtree pruning and reconnection.
See also the Tree.randomSpr() method. It uses this method to do a random spr move.
This only works on fully bifurcating trees. Doing spr moves would tend to break up polytomies anyway; pruning subtrees from a polytomy would require creation of new nodes.
The subtree to be pruned might be pointing up or pointing down from a specified node. If the subtree is pointing up, the subtree to be pruned is specified by the appropriate child of the root of the subtree; the subtree would have a rootonastick (Is monofurcating a proper word?) with the subtree root’s single child being the specified node. If the subtree is pointing down, then the tree is rerooted to the specified node to allow pruning of the subtree, now above the specified node, with the specified node as the root, including the subtree with the prererooting parent of the specified node.
I’ll draw that out. Lets say we want to prune the subtree below node 2 in this tree. That would include nodes 0, 1, 2, and 7.
+1:A   +3:B 2 0  +5:C  +4  +6:D  +7:E
The way it is done in this method is to reroot at node 2, which is the specified node. Then the subtree including the prererooting parent of the specified node, ie node 0, is pruned.
+3:B   +5:C 24  +6:D   +1:A +0 +7:E

stripBrLens
()[source]¶ Sets all node.br.len’s to 0.1, the default in p4.
Then, if you were to write it out in Nexus or Newick format, no branch lengths would be printed.

subTreeIsFullyBifurcating
(theNode, up=True)¶ Is theNode and everything above it (or below it) bifurcating?
Arg up says whether its above or below.

summarizeModelComponentsNNodes
()¶ Summarize nNodes for all model components if isHet

tPickle
(fName=None)¶ Pickle self to a file with a ‘p4_tPickle’ suffix.
If there is an attached Data object, it is not pickled. If there is an attached model object, it is pickled. Pointers to cstructs are not pickled.
If fName is supplied, the file name becomes fName.p4_tPickle, unless fName already ends with .p4_tPickle. If fName is not supplied, self.name is used in fName’s place. If neither is supplied, the pid is used as fName.
If a file with the chosen name already exists, it is silently overwritten!
p4 can read a p4_tPickle file from the command line or using the read() function, as usual.
(This would not be a good option for longterm storage, because if you upgrade p4 and the p4 Classes change a lot then it may become impossible to unpickle it. If that happens, you can use the old version of p4 to unpickle.)

property
taxNames
¶ (property) taxNames

taxSetIsASplit
(taxSetName)[source]¶ Asks whether a nexus taxset is a split in the tree.
 Parameters
taxSetName (str) – The name of the taxset. Case does not matter.
 Returns
the node in self if the taxset is a split, or else None
 Return type

textDrawList
(showInternalNodeNames=1, addToBrLen=0.2, width=None, autoIncreaseWidth=True, showNodeNums=1, partNum=0, model=False)¶

topologyDistance
(tree2, metric='sd', resetSplitKeySet=False)[source]¶ Compares the topology of self with tree2.
The complete list of metrics is given in var.topologyDistanceMetrics
For most metrics using this method, taxNames needs to be set, to the same in the two trees. If the taxa differ, this method simply returns 1
The ‘metric’ can be one of ‘sd’ (symmetric difference), ‘wrf’ (weighted RobinsonFoulds), ‘bld’ (Felsenstein’s branch length distance), or ‘diffs’. The unweighted RobinsonFoulds metric would be the same as the symmetric difference.
There is also an experimental scqdist, but that needs the scqdist.so module, in the QDist directory.
See Felsenstein 2004 Inferring Phylogenies, Pg 529.
The default metric is the very simple ‘sd’, symmetric difference. Using this metric, if the 2 trees share the same set of splits, they are deemed to be the same topology; branch lengths are not compared. This method returns the number of splits that are in self that are not in tree2 plus the number of splits that are in tree2 that are not in self. So it would return 0 for trees that are the same.
The ‘wrf’ and ‘bld’ metrics take branch lengths into account. Bifurcating roots complicate things, so they are not allowed for weighted distance calculations.
In the unweighted case (ie metric=’sd’), whether the trees compared have bifurcating roots or not is ignored. So the trees (A,B,(C,D)) and ((A,B),(C,D)) will be deemed to have the same topology, since they have the same splits.
The measurement ‘diffs’, which returns a tuple of 2 numbers – both are set differences. The first is the number of splits in self that are not in tree2, and the second is the number of splits in tree2 that are not in self. (Consider it as the the symmetric difference split into its 2 parts.)
If you calculate a distance and then make a topology change, a subsequent sd topologyDistance calculation will be wrong, as it uses previous splits. So then you need to ‘resetSplitKeySet’.
The ‘scqdist’ metric calculates quartet distances. The code was written by Anders Kabell Kristensen for his Masters degree at Aarhus University, 2010. http://www.cs.au.dk/~dalko/thesis/ It has two versions – a pure Python version (that needs scipy) that I do not include here, and a fast C++ version, that I wrapped in python. Its speedy – the ‘sc’ in ‘scqdist’ is for ‘subcubic’, ie better than O(n^3).
I have also incorporated the tqDist v1.0.2 code, from 2014, also for quartet distance calculations, from Christian N. S. Pedersen and his group at BiRC in Aarhus. See https://userscs.au.dk/cstorm/software/tqdist/ 2014. It is available here via the ‘tqdist’ metric.

tv
()[source]¶ Tree Viewer. Show the tree in a gui window.
Needs Tkinter.
If you have nexus taxsets defined, you can show them.

tvTopologyCompare
(treeB)[source]¶ Graphically show topology differences.
The taxNames need to be set, and need to be the same for both trees.
(If the red lines don’t show up right away, try adjusting the size of the windows slightly.)

verifyIdentityWith
(otherTree, doSplitKeys=False, doMore=False)[source]¶ For MCMC debugging. Verifies that two trees are identical.

write
()¶ This writes out the Newick tree description to sys.stdout.

writeNewick
(fName=None, withTranslation=0, translationHash=None, doMcmcCommandComments=0, toString=False, append=False, spaceAfterComma=True)¶ Write the tree in Newick, aka Phylip, format.
This is done in a Nexusoriented way. If taxNames have spaces or odd characters, they are singlequoted. There is no restriction on the length of the taxon names. A translationHash can be used.
fName may also be an open file object.
If ‘toString’ is turned on, then ‘fName’ should be None, and a Newick representation of the tree is returned as a string.
The method ‘writePhylip()’ is the same as this, with fewer arguments.

writeNexus
(fName=None, append=0, writeTaxaBlockIfTaxNamesIsSet=1, message=None)¶ Write the tree out in Nexus format, in a trees block.
If fName is None, the default, it is written to sys.stdout.
#NEXUS is written unless we are appending– set append=1.
If you want to write with a translation, use a Trees object.

writePhylip
(fName=None, withTranslation=0, translationHash=None, doMcmcCommandComments=0)¶ Write the tree in Phylip or Newick format.
(This method is just a dupe of writeNewick(). Without the ‘toString’ or ‘append’ args.)
fName may also be an open file object.