Sequence

class Sequence

A container for a single molecular sequence.

We have

  • sequence a string, the molecular sequence
  • name a string, the name
  • dataType either ‘dna’, or ‘protein’, or None, meaning ‘standard’
dump()

Print rubbish about self.

dupe()

Return a duplicate of self.

nChar
reverseComplement()

Convert self.sequence, a DNA sequence, to its reverse complement.

Ambigs are handled correctly. I think.

translate(transl_table=1, checkStarts=False, nnn_is_gap=False)

Returns a protein Sequence from self, a DNA sequence.

Self is translated using GeneticCode.GeneticCode.translate(), so it handles ambiguities. At the moment, we can only do translations where the frame of the codon is 123, ie the first sequence position is the first position of the codon. The default transl_table is the standard (or so-called universal) genetic code, but you can change it.

Other available translation tables, this week:

if transl_table == 1: # standard
elif transl_table == 2: # vertebrate mito
elif transl_table == 4: # Mold, Protozoan,
                        # and Coelenterate Mitochondrial Code
                        # and the Mycoplasma/Spiroplasma Code
elif transl_table == 5: # invertebrate mito
elif transl_table == 9: # echinoderm mito

and now 6, 10, 11, 12, 13, 14, 21.

(These are found in GeneticCode.GeneticCode)

See also Alignment.Alignment.checkTranslation() and Alignment.Alignment.checkTranslation().

If the arg checkStarts is turned on (by default it is not turned on) then this method checks whether the first codon is a start codon, and if it is then it uses it.

Arg nnn_is_gap is for odd sequences where there are long stretches of ‘nnn’ codons, which probably should be gaps. Probably best to correct those elsewise.

write()
writeFasta(fName=None, width=60, doComment=True, writeExtraNewline=True)
writeFastaToOpenFile(flob, width=60, doComment=True, writeExtraNewline=True)