Mcmc

class Mcmc(aTree, nChains=4, runNum=0, sampleInterval=100, checkPointInterval=10000, simulate=None, writePrams=True, constraints=None, verbose=True, simTempNTemps=None, simTempMax=10.0)[source]

An MCMC for molecular sequences.

aTree

The tree should have a model and data attached.

nChains

The number of chains in the MCMCMC, default 4

runNum

You may want to do more than one ‘run’ in the same directory, to facilitate convergence testing. The first runNum would be 0, and samples, likelihoods, and checkPoints are written to files with that number.

sampleInterval

Interval at which the cold chain is sampled, including writing a tree, the logLike, and perhaps doing a simulation.

checkPointInterval

Intervals at which the MCMC is checkpointed, meaning that the whole thing is written to a pickle file. You can re-start from a checkpoint, eg in the event of a crash, or if you just want to make the MCMC run longer. You can turn off checkPointing by setting it to zero or None.

simulate

a ‘binary’ number from 1-31 that says what posterior predictive simulation test quantities to collect. This week, we have

  1. multinomial log likelihood

  2. bigXSquared

  1. meanNCharsPerSite

  1. c_m, the compStatFromCharFreqs (2 numbers per part–sim and curTree)

  1. constant sites count (not proportion)

So if you say simulate=1, you get multinomial log like. If you say simulate=2 you get bigXSquared. If you say simulate=6 you get bigXSquared and meanNCharsPerSite. If you say simulate=31 you get all 5.

writePrams

write changeable model parameters to a file.

constraints

If you want to constrain the topology, say so with a Constraints object here. A constraints object may enforce multiple and nested constraints.

To prepare for a run, instantiate an Mcmc object:

m = Mcmc(t, sampleInterval=1000, checkPointInterval=200000)

To start it running, do this:

m.run(1000000) # Tell it the number of generations to do

You will want to make the last generation land on a checkPointInterval. So in this case 1000000 is ok, but 900000 is not.

As it runs, it saves trees and likelihoods at sampleInterval intervals (actually whenever the current generation number is evenly divisible by the sampleInterval).

Whenever the current generation number is evenly divisible by the checkPointInterval it will write a checkPoint file. A checkPoint file is the whole MCMC without the data, and without C-structs. Using a checkPoint, you can re-start an Mcmc from the point you left off. Or, in the event of a crash, you can restart from the latest checkPoint. You can query checkPoints to get information about how the chain has been running, and about convergence diagnostics.

In order to restart the MCMC from the end of a previous run, you need the data:

read('yourData.nex')
d = Data()
# read the last checkPoint file
m = func.unPickleMcmc(0, d)  # runNum 0
m.run(20000)

Its that easy if your previous run finished properly. However, if your previous run has crashed and you want to restart it from a checkPoint, then you will need to repair the sample output files to remove samples that were taken after the last checkPoint, but before the crash. Fix the trees, likelihoods, prams, and sims. (You probably do not need to beware of confusing gen (eg 9999) and gen+1 (eg 10000) issues.) When you remove trees from the tree files to restart, each tree file should have the word end; at the end after the last tree, with a single return after that. Other files (prams and so on) would end more simply with a single return.

The checkPoints can help with convergence testing. To help with that, you can use the McmcCheckPointReader class. It will print out a table of average standard deviations of split supports between 2 runs, or between 2 checkPoints from the same run. It will print out tables of proposal acceptances to show whether they change over the course of the MCMC.

Another way to monitor convergence of split supports is to use the Trees method trackSplitSupports(). That method requires a reference tree to tell it the splits that it should track; an obvious reference tree to use is the consensus tree, but it need not be; and the reference tree need not be fully resolved. Here is how it might be used:

# read in all 1000 trees in the file
read('mcmc_trees_0.nex')
tt = Trees()
# Make a cons tree only from the last 500 trees
tt2 = Trees(trees=var.trees[500:])
tp = TreePartitions(tt2)
t = tp.consensus()
# Track the splits in the cons tree
tt.trackSplitsFromTree(t)

This makes a verbose output, and also writes the numbers to a file in case you want to use them later.

To make a consensus from the trees from more than one run, you can add trees to an existing TreePartitions object, like this:

tp = TreePartitions('mcmc_trees_0.nex', skip=500)
tp.read('mcmc_trees_1.nex', skip=500)   # add trees from a second run
t = tp.consensus()
t.reRoot(t.node('OutgroupTaxon').parent)
for n in t.iterInternalsNoRoot():
    n.name = '%.0f' % (100. * n.br.support)
t.writeNexus('cons.nex')
_doTimeCheck(nGensToDo, firstGen, genInterval)[source]

Time check

firstGen is the first generation of this call to Mcmc.run() else timing fails on restart

_makeChainsAndProposals()[source]

Make chains and proposals.

_makeProposals()[source]

Make proposals for the mcmc.

_refreshProposalProbs()[source]

Adjust proposals after a restart.

_setLogger()[source]

Make a logger.

_setOutputTreeFile()[source]

Setup the (output) tree file for the mcmc.

checkPoint()[source]
lpml(verbose=True)[source]

Calculate log pseudo marginal likelihood from CPO

run(nGensToDo, verbose=True, equiProbableProposals=False, writeSamples=True)[source]

Start the Mcmc running.

writeProposalAcceptances()[source]

Pretty-print the proposal acceptances.

writeProposalProbs(makeDict=False)[source]

(Another) Pretty-print the proposal probabilities.

See also Mcmc.writeProposalAcceptances().

writeSwapMatrix()[source]
writeSwapVector()[source]