AI to predict protein structure millions time faster

There is an escalating race to get to the bottom of predicting the 3D structures of proteins from their amino-acid sequences. It would not be wrong if it is said that it is one of the biggest challenges that the biological world face. Here again, thanks to the new artificial intelligence (AI) who comes to the rescue.

At the completion of last year, Google’s AI firm DeepMind introduced an algorithm called AlphaFold, which merged two techniques that were evolving in the field and defeated established contestants in a competition on a protein-structure prediction by an unexpected margin. And this year, in April, a US researcher discovered an algorithm that practices an entirely different approach. He asserts his AI is up to one million times faster at calculating structures than DeepMind’s, although probably not as accurate in every situation.

This has left many biologists wondering how else deep learning-  the AI technique used by both methods — might be adopted for the prediction of protein arrangements, which ultimately dictate a protein’s function.  These approaches cost less and are quicker than current lab techniques such as X-ray crystallography, and the knowledge could help researchers to recognize diseases and formulate drugs. “There’s a lot of anticipation about where things might go now,” tells John Moult, a biologist at the University of Maryland in College Park and the initiator of the periodic contest, titled Critical Assessment of protein Structure Prediction (CASP), where teams contest to design computer programs that predict protein structures from sequences.

Proteins carry nearly all the main biological procedure essential for life. They produce and sustain the shapes of cells and tissues; constitute the enzymes that initiate life-sustaining chemical reactions; function as molecular factories, motors and transporters; work as both signal and receiver for cellular communications; and much more.

Made up of long chains of amino acids, proteins perform these countless tasks by folding themselves into specific 3D structures that administer how they cooperate with other molecules. As a protein’s shape defines its function and the extent of its dysfunction in disease, efforts to improve protein structures are central to all of molecular biology—and in particular, therapeutic science and the development of lifesaving and life-altering medicines. 

State-of-the-art approach

In recent years, computational methods have taken significant steps in predicting how proteins fold based on knowledge of their amino acid sequence. If completely known, then there is a potential of extraordinary transformation in all aspects of biomedical research. However, current approaches have limitations in the scale and scope of the proteins that can be determined.

A biologist and algorithm’s creator, Mohammad AlQuraishi at Harvard Medical School, Boston, Massachusetts, used a method of artificial intelligence called deep learning to guess the 3D structure of a protein-based on its amino acid sequence.  Reporting online in Cell Systems, systems biologist Mohammed AlQuraishi explains a new tactic for computationally specifying protein structure—attaining accuracy similar to current state-of-the-art methods but at speeds upward of a million times faster.

“Protein folding has been one of the most important problems for biochemists over the last half-century, and this approach signifies a primarily new way of undertaking that challenge,” told AlQuraishi, a teacher in systems biology in the Blavatnik Institute at HMS and a colleague in the Laboratory of Systems Pharmacology. “We now have a whole new outlook from which to explore protein folding, and I think we’ve just begun to scratch the surface.”

Though he hasn’t yet directly compared the accuracy of his method with that of AlphaFold, he suspects that AlphaFold would win against his technique in accuracy when proteins with sequences similar to the one being analyzed are available for reference. However, his algorithm uses a mathematical function to compute protein structures in one step — instead of in two steps like AlphaFold, which uses the same structures as a foundation in the first phase — it can calculate structures in milliseconds instead of hours or days.

According to Jinbo Xu, a computer scientist at the Toyota Technological Institute at Chicago, Illinois, who competed at CASP13, AlQuraishi’s approach is encouraging as it builds on developments in deep learning along with new tactics AlQuraishi has invented. He also that AlQuraishi’s idea can be mixed with others for further progress in the field. 

The heart of AlQuraishi’s system is a neural setup, a type of algorithm-driven by the brain’s wiring that learns from instances. It’s fed with identified data on how amino-acid sequences map to protein structures and then learns to make new structures from unfamiliar sequences.

The fresh fragment of his network lies in its capability to generate such mappings end-to-end; other systems use a neural network to calculate specific features of a structure, then another type of algorithm to strenuously search for a suitable structure that integrates those features. AlQuraishi’s network employs months to train, but once skilled, it can convert a sequence to a structure almost instantaneously.

He calls his approach a repeated geometric network, predicts the structure of one segment of a protein partly on the ground of what comes before and after it. This is parallel to how people’s understanding of a word in a sentence can be shaped by surrounding words; these understandings are in turn influenced by the central word.

His algorithm did not perform well at CASP13 due to technical difficulties.  He issued details of the AI in Cell Systems in April and made his code openly available on GitHub, expecting others will build on the work.  He is still unable to compare his method with AlphaFold as the structures for many of the proteins tried in CASP13 have not been revealed publicly yet.

 Neural networks

AlphaFold competed effectively at CASP13 and created excitement when it outdid other algorithms on hard targets by nearly 15%, according to one measure.

AlphaFold works in dual steps. Similar to other approaches used in the competition, it starts with something called multiple sequence alignments. It compares a protein’s sequence with similar ones in a database to reveal pairs of amino acids that don’t lie next to each other in a chain, but that tends to appear in pair. This proposes that these two amino acids are located close to each other in the folded protein. DeepMind trained a neural network to take these pairings and calculate the distance between two paired amino acids in the folded protein.

By comparing its predictions with accurately measured distances in proteins, it learned to make better deductions about how proteins would fold up. A similar neural network predicted the angles of the joints between successive amino acids in the folded protein chain.

However, these steps can’t anticipate a structure by themselves, because the exact set of distances and angles expected might not be actually possible. Therefore in a second step, AlphaFold created a physically possible — but nearly random — folding arrangement for a sequence. It used an optimization method named gradient descent rather than another neural network to iteratively improve the structure; hence it came close to the predictions from the first step.

None of the teams used both approaches; however, few teams used only one of the methods. In the first step, most teams only predicted connection in pairs of amino acids. In the next stage, most used complex optimization rules instead of gradient descent, which is almost programmed.

The future

DeepMind is still to release all the details about AlphaFold — but other groups have since started implementing strategies demonstrated by DeepMind and other prominent teams at CASP13.  Jianlin Cheng, a computer expert at the University of Missouri in Columbia, says he’ll alter his deep neural networks to have few features of AlphaFold’s, for example by adding more layers to the neural network in the distance-predicting stage. Having more layers means a deeper network which frequently allows networks to process information more deeply, therefore the term “deep learning.”

“We look forward to seeing the same systems put to use,” says Andrew Senior, the computer technologist at DeepMind who headed the AlphaFold team.

Moult said there was an exchange of ideas at CASP13 about how in other ways deep learning might be applied to protein folding. Maybe it could help to refine approximate structure predictions; report on how confident the algorithm is in a folding prediction; or model interactions between proteins.

There is still a need for a lot of improvements for computational predictions to be accurate enough to be used in drug design. Meanwhile, the growing accuracy permits for other applications, such as understanding how a mutated protein caters to disease or knowing which part of a protein could be used for a vaccine for immunotherapy. “These models are starting to be useful,” Moult says.