Research in our lab includes both collaborative projects with experimental biologists as well as bioinformatics software development. Collaborative research helps us understand the workings of the immune system in greater details and also guides development of novel bioinformatics tools.

Two areas where we are actively working both collaboratively and developing new methods are B/T cell repertoires and protein-nucleotide interactions. Both projects rely heavily on multiple sequence alignment (MSA), structural modeling and machine learning. We are actively developing the MAFFT multiple alignment software, which has numerous features for supporting specific use-case scenarios (Katoh, K. and Standley, DM Mol. Biol. Evol. (2013)).

B/T Cell Repertoires

We are currently working intensively on high-throughput analysis of B cell receptors (BCRs) and T cell receptors (TCRs). Both BCRs and TCRs belong to the immunoglobulin (Ig)-like protein family, which is the most populous protein domain in the human genome.

The genes that code for BCRs and TCRS are combinatorically rearranged in B and T cells, respectively, leading to a vast number of possible receptor sequences. These BCR and TCR “repertoires” can be sequenced from a routine blood sample and provide a highly sensitive biomarker for any perturbation to the immune system. The number of actual B or T cells in the body is many orders of magnitude smaller than the number of possible receptor gene combinations. Therefore, it is unlikely that any two donors will exhibit the same receptor sequences purely by chance.

Following the well-established paradigm in protein evolution that “structure is more conserved than sequence,” our lab is developing methods to quantify the degree of structural similarity between receptors and using structure similarity metrics to infer functional similarity (e.g. antigen targeted by a given receptor).

As a necessary first step, we have developed a high-throughput BCR and TCR modeling platform, Repertoire Builder that is both fast and accurate when compared with other modeling tools, including our own earlier tools (Shirai, H. et al Proteins (2014); Yamashita, K. et al. Bioinformatics (2014)). We are applying our modeling tools to several human diseases, including lupus (Sakakibara, S. et al Sci Rep (2017)) and influenza (paper in preparation).

Multiple Sequence Alignment

Multiple sequence alignment (MSA) is an important step in many comparative analyses of biological sequences, and MAFFT is one of the most popular programs for building MSAs. Since the first release in 2002, we have been actively developing the standalone and online versions of MAFFT to improve their accuracy, speed and utility in practical situations, and have provided different options for newly emerging types of data and analysis.

Recently-added features include inclusion of secondary structural information of non-coding RNAs and proteins, interactive selection of sequences for phylogenetic tree inference, parallel processing to make more accurate options applicable to a larger number of sequences and respond to the demands of large-scale analysis (Katoh, K. and Standley, DM. Molecular Biology and Evolution 2013; Katoh, K. et al. Briefings in Bioinformatics 2017; Nakamura T. et al. Bioinformatics 2018).

Protein-Nucleotide Interactions

The interaction between proteins and nucleotides (DNA or RNA) is critical for proper regulation of immune responses as well as for direct detection and elimination of viral infections. Protein-nucleotide interactions can also be hijacked by pathogens or tumors to thwart detection by the immune system.

Because of their importance in various aspects of immunology, we have developed a tool called aaRNA to identify RNA binding sites on RNA-binding proteins (Li, S. et al. Nucleic Acids Res (2014)). We have extended aaRNA to the prediction of DNA-binding sites (aaDNA) and incorporated the binding propensities in flexible docking simulations.

As demonstrated by several studies, the predicted nucleotide binding sites agree well with experiment and provide valuable insight into the molecular mechanisms of protein-nucleotide interactions  (Hanieh, H. et al. Eur. J. Immunol.; Nyati, K. K. et al. Nucleic Acids Res (2017); Yokogawa et al. Si Rep (2016); Masuda, K. et al. J Exp Med (2016); Mino, T. et al. Cell (2015)).