Pagine

giovedì 17 novembre 2011

Python implementation of DOT PLOT

Hello! This is a very simple script to compute the dot plot in order to compare two biological sequences.




#!/usr/bin/python


import sys


X=open(sys.argv[1],'r').readlines()
Y=open(sys.argv[2],'r').readlines()
seq1=''
seq2=''
for f in X[1:]:
        seq1=seq1+f.strip()
for f in Y[1:]:
        seq2=seq2+f.strip()
i=0
g=open('DP.dat','w')
for A in seq1:
        dot=''
        i=i+1
        e=0
        for B in seq2:
                e=e+1
                if A==B:
                        dot=dot+'/'
                        g.write(str(i)+' '+str(e)+'\n')
                else:
                        dot=dot+' '


The script takes as arguments two fasta sequences and generates a file named DP.dat that can be visualized with different tools.(Instruction to execution)
For example you can use gnuplot typing:


gnuplot


and than:


gnuplot> plot "DP.dat" with dots


You will obtain a plot like the following, where the numbers in the axis correspond to amino acid positions in the protein sequences.


Dop Plot obtained with GNUPLOT




Nessun commento:

Posta un commento