Fast protein structure searching using structure graph embeddings

Published in Machine Learning for Structural Biology Workshop, NeurIPS 2022

Authors: Joe Greener and Kiarash Jamali

Comparing and searching protein structures independent of primary sequence has proved useful for remote homology detection, function annotation and protein classification. With the recent leap in accuracy of protein structure prediction methods and increased availability of protein models, attention is turning to how to best make use of this data. Fast and accurate methods to search databases of millions of structures will be essential to this endeavour, in the same way that fast protein sequence searching underpins much of bioinformatics. We train a simple graph neural network to learn a low-dimensional embedding of protein structure, and show that the embedding can be used to query structures against large structural databases with accuracy comparable to current methods. The speed of the method and ability to scale to millions of structures makes it suitable for this structure-rich era.

Find paper here