Overview

This page contains supplemental data, software, and information for:

"Analysis of protein intermolecular interactions with MAFFT-DASH"; Rozewicki, et al.; Methods in Molecular Biology - Multiple Sequence Alignment; 2020

Python 3 scripts on this page may work slightly differently than described in the book chapter. These differences, if any, have been noted in the individual documentation for each script. Online tools that were referenced may also work slightly differently than described at the time, and so the specific outputs from those tools that were used in this chapter have been made available for download.

Please contact us with any questions or bug reports you may have.

Software License

Scripts on this page are ©(Copyright) 2020, Department of Genome Informatics; Research Institute for Microbial Diseases; Osaka University.

They are licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

System Requirements

An effort has been made to make scripts on this page compatible with as many platforms as possible. They have been tested on Windows, MacOS, and Linux.

Required Software:
Python 3 (3.6.8+)
BioPython (1.76+)
MAFFT (7.427+)

We recommend Conda for managing software environments. A Conda dependencies file is available for users who wish to clone the specific software environment these scripts were developed under.

Input Data

The protein sequence used for the N4BP1 NYN domain is available below. A FASTA version is also available.

>NYN_N4BP1_615_775
TDLKHIVIDGSNVAITHGLKKFFSCRGIAIAVEYFWKLGNRNITVFVPQWRTRRDPNVTEQHFLTQLQELGILSLTPAR
MVFGERIASHDDRFLLHLADKTGGIIVTNDNFREFVNESVSWREIITKRLLQYTFVGDIFMVPDDPLGRSGPRLEEFLQ
KEV

A model was prepared from the above sequence using Spanner. You may download the specific model we used for this chapter here.

Python 3 Script - get_interactions.py

get_interactions.py is a script for overlaying DASH alignments on specific DASH domains (or Search By Structure queries). This can be useful for investigating the "environment" of proteins, nucleotides, HETATM, water, etc. that exist in structurally homologous entries from the PDB.

This script requires only Python 3 and BioPython to be installed.

Show arguments and their descriptions:
python3 get_interactions.py --help
Query by DASH Domain ID:
python3 get_interactions.py -qd 3V33_A_01
Query by DASH Job ID:
python3 get_interactions.py -qd dash503929320
Query nucleotides:
python3 get_interactions.py -qd dash503929320 -filter-score 20 -nuc -o out-nuc.cif.gz
Query proteins that also have nucleotides:
get_interactions.py -qd dash503929320 -filter-score 20 -nuc -prot -o out-nuc-prot.cif.gz

Python 3 Script - calculate_conservation.py

calculate_conservation.py is a script for visualizing protein residue conservation. Output from this script can be loaded alongside output from get_interactions.py in PyMol to improve the usefulness of the visualization.

This script requires Python 3, BioPython, and MAFFT to be installed.

When using this script on Windows it is necessary to pass the -mafft option with the full path to where mafft.bat is located on your system.

A -xml option for manually specifying XML-formatted BLAST+ results is available for use on systems which may not have access to the internet or for cases where the default BLAST+ search done by the script is insufficient.

Show arguments and their descriptions:
python3 calculate_conservation.py --help
Compute residue conservation for PDB file:
python3 calculate_conservation.py -i n4bp1-spanner-model.pdb -o n4bp1_conc.cif.gz
Compute residue conservation for PDB file (from BLAST+ XML):
python3 calculate_conservation.py -i n4bp1-spanner-model.pdb -xml n4bp1-blast.xml -o n4bp1_conc.cif.gz
Compute residue conservation for PDB file (Windows)
python3 calculate_conservation.py -mafft C:\Users\user\Documents\mafft-7.450-win64-signed\mafft-win\mafft.bat -i n4bp1-spanner-model.pdb -o n4bp1_conc.cif.gz

Search by Structure Outputs

As DASH, Spanner, and the PDB are constantly being updated and improved, we cannot guarantee that users will get identical results to ours when running these same queries in the future. To address this problem we have made the results that were used in this chapter available for download.

Alignment score information from the Search by Structure job that was used in this chapter is available as a TSV. The alignments themselves are available here.



Bug Reporting & Feedback

We are always looking to improve DASH based on user feedback. Please contact us with any bug reports or feedback you may have.

Powered by CD-HIT, CentOS, DSSP, Go, Google Cloud, Molmil, MSAViewer, NCBI BLAST+, and PostgreSQL.
© 2020 Department of Genome Informatics; Research Institute for Microbial Diseases; Osaka University