SequenceList

class SequenceList(flob=None)[source]

A container for a list of Sequence objects.

The usual input would be a fasta file:

read('sequences.fas')
sl = var.sequenceLists.pop()

# see what you have
sl.dump()

# look at the sequences
for s in sl.sequences:
    print s.name, s.dataType

# Get at sequences by name from a dictionary
sl.makeSequenceForNameDict()
s = sl.sequenceForNameDict['mammoth']

# align them using muscle
a = sl.muscle()
alignment()[source]

Make self into an alignment, and return it.

If all the sequences are the same length and type, then self, a sequenceList, could be an Alignment. This method generates an Alignment instance, runs the Alignment method checkLengthsAndTypes(), and returns the Alignment.

If you feed p4 a fasta sequence, it makes SequenceList object, and runs this method on it. If it works then p4 puts the Alignment object in var.alignments, and if not it puts the SequenceList object in var.sequenceLists.

It is possible that p4 might think that some short sequences are DNA when they are really protein. In that case it will fail to make an alignment, because it will fail the types check. So what you can do is something like this:

sl = var.sequenceLists[0]
for s in sl.sequences:
    s.dataType = 'protein'
a = sl.alignment()
checkNamesForDupes()[source]
clustalo()[source]

Do an alignment with clustalo.

Its all done in memory – no files are written.

An alignment object is returned.

The order of the sequences in the new alignment is made to be the same as the order in self.

dump()[source]
fName

If it came from a file with a name, this is it.

makeSequenceForNameDict()[source]
muscle()[source]

Do an alignment with muscle.

Its all done in memory – no files are written.

An alignment object is returned.

The order of the sequences in the new alignment is made to be the same as the order in self.

renameForPhylip(dictFName='p4_renameForPhylip_dict.py')[source]

Rename with strict phylip-friendly short boring names.

It saves the old names (together with the new) in a python dictionary, in a file, by default named p4_renameForPhylip_dict.py

restoreNamesFromRenameForPhylip(dictFName='p4_renameForPhylip_dict.py')[source]

Given the dictionary file, restore proper names.

The dictionary file is by default named p4_renameForPhylip_dict.py

sequenceForNameDict

Allows you to find Sequence objects from their Sequence.name

sequences

A list of Sequence objects

writeFasta(fName=None, comment=1, width=60, append=0, seqNum=None, writeExtraNewline=True)[source]

Write out the sequences in Fasta format.

This will write to stdout by default, or a file name, or to an open file-like object, eg a StringIO object.

The sequences may have comments, which are written by default. If you don’t want comments, say comment=None

By default, sequences are wrapped when they are too long. You can set the length at which to wrap the sequences. Set width=0 if you want your sequences in one (long) line.

If seqNum=None, the default, then all the sequences are written. But you can also just write one sequence, given by its number. Write out a bunch to the same file with ‘append’.

By default, a blank line will be written after each sequence. If you prefer your fasta without these extra lines, say writeExtraNewline=False.

writeFastaToBytesFlob(flob)[source]

For subprocesses, eg muscle and clustalo

No comment, no extra new line. Width 60.