|
9 months ago | |
---|---|---|
data | 9 months ago | |
docs | 9 months ago | |
result | 9 months ago | |
script | 9 months ago | |
src | 9 months ago | |
.gitignore | 9 months ago | |
LICENSE.md | 9 months ago | |
README.md | 9 months ago | |
struc_align_cgn.yml | 9 months ago |
README.md
StrucAlign
Short project - M2 Bioinformatics - Université de Paris
Sujet : Conception d’un programme de threading par double programmation dynamique
Objectif : Réaliser un programme reprenant la méthode décrite dans l'article 3) basé sur la double programmation dynamique, pour plus d’information voir l'article 4). Le threading ou "enfilage" (cf articles 1), 2) et 3)) est une stratégie pour rechercher des séquences compatibles avec une structure. Seul les carbones α
de la protéine seront considérés. Vous utiliserez les potentiels statistiques DOPE (data/dope.par).
Références:
- Jones, D.T., Taylor, W.R. & Thornton, J.M. (1992) A new approach to protein fold recognition. Nature. 358, 86-89.
- Jones, D.T., Miller, R.T. & Thornton, J.M. (1995) Successful protein fold recognition by optimal sequence threading validated by rigorous blind testing. Proteins. 23, 387-397.
- Jones, D.T. (1998) THREADER : Protein Sequence Threading by Double Dynamic Programming. (in) Computational Methods in Molecular Biology. Steven Salzberg, David Searls, and Simon Kasif, Eds. Elsevier Science. Chapter 13.
- Protein Structure Comparison Using SAP - Springer
Data set
-
* Details:
1A6K AQUOMET-MYOGLOBIN, ATOMIC RESOLUTION DOI: 10.2210/pdb1A6K/pdb
Installation
Requierments
StrucAlign environment
Clone StrucAlign repository.
% git clone https://git.clamifa.net/Naha/strucalign.git
Move in your local repository and run conda with struct_align_cgn.yml
file.
% conda env create -f struct_align_cgn.yml
And activate it.
% conda activate struc_align_cgn
Usage
StrucAlign need 3 types of files in input:
- par: all energy scores between two amino acid by distances
- pdb: protein data bank with all atoms and their three dimensional positions
- fasta: amino acid sequence
All these files must be in the data/
directory, but can upload files anywhere on the system.
This project is provided with data set included and some parsed file included into /result
Parsing dope and pdb files
In script/
directory, you have two python scripts:
parsedope.py
parsepdb.py
Use these scripts for parse your input dope or pdb files.
Run these python scripts with two arguments: -i
(input file) and -o
(output directory).
Examples:
% python script/parsedope.py -i data/dope.par -o result/
% python script/parsepdb.py -i data/1a6k.pdb -o result/
ℹ️ Info
The output file of parse dope script will be named
parsedope.par
.The output file of parse pdb script will be named according to the
temp_[pdb].pdb
format.Where
[pdb]
is the is the name of the input protein (pdb) file which create the template
Structure alignment
To calculate the structure alignment for the given amino acid sequence and template structure, run structure_align.py
python script.
⚠️ Warning!
Run the script only from the project's parent directory:
% python src/structure_align.py
This script expects 7 arguments. However, if no arguments are specified, the script will use its default values.
Options | Description | Default value |
---|---|---|
-t |
Input template pdb file | result/temp_1ard_1.pdb |
-d |
Input parsed par file (dope) | result/parsedope.par |
-s |
Input fasta file (sequence) | data/1znf_1.fasta |
-g |
Penality value must be negative (gap) | 0 |
-p |
Enable paralellize pocessus | True |
-c |
Defines the number of usable cpu cores | all cores |
-o |
Output path of the generated alignment file | result/ |
ℹ️ Info
- For option
-c
(working with-p
onTrue
):
- if 0 : Error
- if > max of engine number of CPU : Use the max
- By default, all output files will be named according to the
align_[fasta]_[template].txt
format.
[fasta]
is the name of the input sequence (fasta) file[template]
is the name of the input template (pdb) input file.- If this script is run several times with the same structure and sequence, the file is overwritten.
Examples:
% # Run without arguments
% python src/structure_align.py
% # Run with only 1 cpu core and disabled paralellize
% python src/structure_align.py -p False -c 1
Tests
In order to use test features, go read the
user-guide.ipynb
with jupyter labuser-guide.html
ℹ️ Info To activate jupyter lab,
First activate the environment, then :
% conda jupyter lab
Versions
- conda version : 4.10.3
- python version : 3.9.5.final.0
- [GCC 7.5.0] :: Anaconda, Inc. on linux
- pandas : 1.3.2
- scipy : 1.6.2
- jupyter lab : 3.1.7
- joblib : 1.0.1
Copyright
StrucAlign is licensed under MIT License. You can find the complete text in LICENSE
.