Pagine

mercoledì 26 ottobre 2011

RNA to Protein

Today, I would like to describe a script that given a RNA sequence returns a protein sequence. 


First of all, you have to download this file, containing the dictionary related to genetic code (in order to understand how I created the file follow this link).


Now, copy the script in a file named script.py in the same directory in which you downloaded the dictionary, then execute the script.


RNA sequence is request as argument. The character may be lower-case or upper-case indistinctly, in fact the string.upper() function converts the whole string in upper-case characters.




#!/usr/bin/python

import pickle
import sys

dictionary=open('genetic_code.dic','r')
D=pickle.load(dictionary)

RNA=sys.argv[1].upper()
protein=''
for i in range(0,len(RNA),3):
        if i+3<=len(RNA):
                codon=RNA[i]+RNA[i+1]+RNA[i+2]
                protein=protein+D[codon]
print protein




This script is very stupid!! It is builded using uniquely a computer science approach. In fact, strarting from the first nucleotide, it reads the codons 3 by 3, and convert the codon into the relative amino acid.
Actually a biologist knows that, to initialize the translation, we need a specific codon in the RNA sequence (AUG that corresponds to Metionin), as well as for the termination (UAA, UAG o UGA that are the stop codons).
So we can add some line in the script code, in order to keep into account these molecular biology roles.
We will do nothing but verify in which RNA sequence positions the first AUG and the first STOP codons appear, and we will cositer these positions as the first and the last of the RNA sequence.




#!/usr/bin/python 

import pickle 
import sys 

dictionary=open('genetic_code.dic','r')
D=pickle.load(dictionary) 

RNA=sys.argv[1].upper() 
if 'AUG' in RNA: 
    s=RNA.index('AUG')
    protein='-' * s 
    PROTEIN='' 
else: 
    s=0 
    print RNA 
    print '-' * len(RNA)
    print 'AUG codon was not found' 
    sys.exit() 

for i in range(s,len(RNA),3):
    if i+3<=len(RNA): 
        codon=RNA[i]+RNA[i+1]+RNA[i+2] 
        if D[codon]=='STOP': 
            protein=protein+'-*-' 
            break 
        else: 
            protein=protein+'-'+D[codon]+'-' 
            PROTEIN=PROTEIN+D[codon] 
end='-' * (len(RNA)-len(protein)) 
print RNA 
print protein+end 
print 'Protein sequence --->', PROTEIN

Nessun commento:

Posta un commento