Mcmc

class Mcmc(aTree, nChains=4, runNum=0, sampleInterval=100, checkPointInterval=10000, simulate=None, writePrams=True, constraints=None, verbose=True, tuningsFileName=None)

An MCMC for molecular sequences.

aTree
The tree should have a model and data attached.
nChains
The number of chains in the MCMCMC, default 4
runNum
You may want to do more than one ‘run’ in the same directory, to facilitate convergence testing (another idea stolen from MrBayes, so thanks to the authors). The first runNum would be 0, and samples, likelihoods, and checkPoints are written to files with that number.
sampleInterval
Interval at which the cold chain is sampled, including writing a tree, the logLike, and perhaps doing a simulation.
checkPointInterval
Intervals at which the MCMC is checkpointed, meaning that the whole thing is written to a pickle file. You can re-start from a checkpoint, eg in the event of a crash, or if you just want to make the MCMC run longer. You can turn off checkPointing by setting it to zero or None.
simulate

a ‘binary’ number from 1-31 that says what posterior predictive simulation test quantities to collect. This week, we have

  1. multinomial log likelihood
  2. bigXSquared
  1. meanNCharsPerSite
  1. c_m, the compStatFromCharFreqs (2 numbers per part–sim and curTree)
  1. constant sites count (not proportion)

So if you say simulate=1, you get multinomial log like. If you say simulate=2 you get bigXSquared. If you say simulate=6 you get bigXSquared and meanNCharsPerSite. If you say simulate=31 you get all 5.

writePrams
write changeable model parameters to a file.
constraints
If you want to constrain the topology, say so with a Constraints object here. A constraints object may enforce multiple and nested constraints.

To prepare for a run, instantiate an Mcmc object:

m = Mcmc(t, sampleInterval=1000, checkpointInterval=200000)

To start it running, do this:

m.run(1000000) # Tell it the number of generations to do

You will want to make the last generation land on a checkPointInterval. So in this case 1000000 is ok, but 900000 is not.

As it runs, it saves trees and likelihoods at sampleInterval intervals (actually whenever the current generation number is evenly divisible by the sampleInterval).

Whenever the current generation number is evenly divisible by the checkPointInterval it will write a checkPoint file. A checkPoint file is the whole MCMC without the data, and without C-structs. Using a checkPoint, you can re-start an Mcmc from the point you left off. Or, in the event of a crash, you can restart from the latest checkPoint. You can query checkPoints to get information about how the chain has been running, and about convergence diagnostics.

In order to restart the MCMC from the end of a previous run, you need the data:

read('yourData.nex')
d = Data()
# read the last checkPoint file
m = func.unPickleMcmc(0, d)  # runNum 0
m.run(20000)

Its that easy if your previous run finished properly. However, if your previous run has crashed and you want to restart it from a checkPoint, then you will need to repair the sample output files to remove samples that were taken after the last checkPoint, but before the crash. Fix the trees, likelihoods, prams, and sims. (You probably do not need to beware of confusing gen (eg 9999) and gen+1 (eg 10000) issues.) When you remove trees from the tree files to restart, each tree file should have the word end; at the end after the last tree, with a single return after that. Other files (prams and so on) would end more simply with a single return.

The checkPoints can help with convergence testing. To help with that, you can use the McmcCheckPointReader class. It will print out a table of average standard deviations of split supports between 2 runs, or between 2 checkPoints from the same run. It will print out tables of proposal acceptances to show whether they change over the course of the MCMC.

Another way to monitor convergence of split supports is to use the Trees method trackSplitSupports(). That method requires a reference tree to tell it the splits that it should track; an obvious reference tree to use is the consensus tree, but it need not be; and the reference tree need not be fully resolved. Here is how it might be used:

# read in all 1000 trees in the file
read('mcmc_trees_0.nex')
tt = Trees()
# Make a cons tree only from the last 500 trees
tt2 = Trees(trees=var.trees[500:])
tp = TreePartitions(tt2)
t = tp.consensus()
# Track the splits in the cons tree
tt.trackSplitsFromTree(t)

This makes a verbose output, and also writes the numbers to a file in case you want to use them later.

To make a consensus from the trees from more than one run, you can add trees to an existing TreePartitions object, like this:

tp = TreePartitions('mcmc_trees_0.nex', skip=500)
tp.read('mcmc_trees_1.nex', skip=500)   # add trees from a second run
t = tp.consensus()
t.reRoot(t.node('OutgroupTaxon').parent)
for n in t.iterInternalsNoRoot():
    n.name = '%.0f' % (100. * n.br.support)
t.writeNexus('cons.nex')
autoTune(gensPerProposal=500, verbose=True, giveUpAfter=10, writeTunings=True)

Attempt to tune the Mcmc automatically. A bit of a hack.

Here we let the Mcmc run for a while, and then examine the proposal acceptances to see if they are ok, and if they are not, make adjustments and do it again.

We let the chain run in cycles for the number of proposals times gensPerProposal (default 500) gens. The proposals are made randomly but with expected equal frequency, to counter the effects that proposals being common and rare might otherwise have. At the end of a cycle, the proposals that are affected by tunings are examined to see if the acceptances are within range. This week, I am aiming for acceptances from 10-70% (advice from the authors of MrBayes– thanks again), but to be safe I make adjustments if the acceptances are outside 15-60%. (Although I accept as low as 5% acceptance for local). If adjustments are made, then another cycle is done. When all the acceptances are ok, then the operation stops. If arg ‘giveUpAfter’ (by default 10) cycles complete without getting it right, it gives up.

It is complicated a little because for tree-hetero models, some proposals use the same tuning. For example, if there is more than one rMatrix in a given partition, all the rMatrix proposals in that partition will use the same tuning. Sometimes you might see one rMatrix with too low an acceptance rate and its neighbor with too high an acceptance rate– in that case the tuning is left alone.

The chainTemp is also tested. I test the acceptance based in part on the acceptances between adjacent temperature chains, that is between the cold chain and the first heated chain, the first heated chain and the second heated chain, and so on. I call that the ‘diagonal acceptances’, a vector nChains long, and I use its average. This week, I am saying that if the average diagonal acceptance is low (less than 50%) and the exchanges between the cold chain and the hottest chain are accepted less than 1% of the time, then the temperature is too high. However, if the average diagonal acceptance is high (more than 70%), then I judge the temperature to be too low.

(NOTE that chain temp tuning is turned off for the moment, due to problems encountered with very low temps.)

It is a bit of a hack, so you might see a tuning adjusted on one cycle, and then that adjustment is reversed on another cycle.

If you follow this method directly with Mcmc.run(), it uses the chains in the state they are left in by this method, which might save some burn-in time.

News: you can pickle the tunings from this method by turning on the arg writeTunings, which writes a pickle file. The name of the pickle file incorporates the runNum, eg mcmc_tunings_0.pickle for runNum 0. You can then apply the autoTune tuning values to another Mcmc, like this:

# Start an Mcmc with runNum=1, having previously done one with runNum=0
m = Mcmc(t, nChains=1, runNum=1, sampleInterval=10, checkPointInterval=2000, writeTunings=False)

# autoTune() was done before on runNum 0 and pickled, so no autoTune this time
#m.autoTune()
m.tunings.dump()  # see the default tunings for comparison

# apply the saved runNum 0 tunings to this run
import cPickle
f = file('mcmc_tunings_0.pickle', 'r')
theTunings = cPickle.load(f)
f.close()
m.tunings = theTunings
m.tunings.dump()  # see the autoTune()'d tunings
run(nGensToDo, verbose=True)

Start the Mcmc running.

writeProposalAcceptances()

Pretty-print the proposal acceptances.

writeProposalIntendedProbs()

Tabulate the intended proposal probabilities.

writeProposalProbs()

(Another) Pretty-print the proposal probabilities.

See also Mcmc.writeProposalAcceptances().

writeSwapMatrix()