This page contains supplemental data, software, and information for:
"Analysis of protein intermolecular interactions with MAFFT-DASH"; Rozewicki, et al.; Methods in Molecular Biology - Multiple Sequence Alignment; 2020
Python 3 scripts on this page may work slightly differently than described in the book chapter. These differences, if any, have been noted in the individual documentation for each script. Online tools that were referenced may also work slightly differently than described at the time, and so the specific outputs from those tools that were used in this chapter have been made available for download.
Please contact us with any questions or bug reports you may have.
Scripts on this page are ©(Copyright) 2020, Department of Genome Informatics; Research Institute for Microbial Diseases; Osaka University.
They are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
An effort has been made to make scripts on this page compatible with as many platforms as possible. They have been tested on Windows, MacOS, and Linux.
We recommend Conda for managing software environments. A Conda dependencies file is available for users who wish to clone the specific software environment these scripts were developed under.
The protein sequence used for the N4BP1 NYN domain is available below. A FASTA version is also available.
>NYN_N4BP1_615_775 TDLKHIVIDGSNVAITHGLKKFFSCRGIAIAVEYFWKLGNRNITVFVPQWRTRRDPNVTEQHFLTQLQELGILSLTPAR MVFGERIASHDDRFLLHLADKTGGIIVTNDNFREFVNESVSWREIITKRLLQYTFVGDIFMVPDDPLGRSGPRLEEFLQ KEV
A model was prepared from the above sequence using Spanner. You may download the specific model we used for this chapter here.
get_interactions.py is a script for overlaying DASH alignments on specific DASH domains (or Search By Structure queries). This can be useful for investigating the "environment" of proteins, nucleotides, HETATM, water, etc. that exist in structurally homologous entries from the PDB.
This script requires only Python 3 and BioPython to be installed.
python3 get_interactions.py --help
python3 get_interactions.py -qd 3V33_A_01
python3 get_interactions.py -qd dash503929320
python3 get_interactions.py -qd dash503929320 -filter-score 20 -nuc -o out-nuc.cif.gz
get_interactions.py -qd dash503929320 -filter-score 20 -nuc -prot -o out-nuc-prot.cif.gz
calculate_conservation.py is a script for visualizing protein residue conservation.
Output from this script can be loaded alongside output from get_interactions.py
in PyMol to improve the usefulness of the visualization.
This script requires Python 3, BioPython, and MAFFT to be installed.
When using this script on Windows it is necessary to pass the -mafft
option with the full path to where mafft.bat
is located on your system.
A -xml
option for manually specifying XML-formatted BLAST+ results is available for use on systems which may not have access to the internet or for cases where the default BLAST+ search done by the script is insufficient.
python3 calculate_conservation.py --help
python3 calculate_conservation.py -i n4bp1-spanner-model.pdb -o n4bp1_conc.cif.gz
python3 calculate_conservation.py -i n4bp1-spanner-model.pdb -xml n4bp1-blast.xml -o n4bp1_conc.cif.gz
python3 calculate_conservation.py -mafft C:\Users\user\Documents\mafft-7.450-win64-signed\mafft-win\mafft.bat -i n4bp1-spanner-model.pdb -o n4bp1_conc.cif.gz
As DASH, Spanner, and the PDB are constantly being updated and improved, we cannot guarantee that users will get identical results to ours when running these same queries in the future. To address this problem we have made the results that were used in this chapter available for download.
Alignment score information from the Search by Structure job that was used in this chapter is available as a TSV. The alignments themselves are available here.
We are always looking to improve DASH based on user feedback. Please contact us with any bug reports or feedback you may have.