Algorithm: AI predicts the structure of all known proteins and opens up a new world for science | science and technology

AlphaFold’s prediction of the structure of vitellogenin, an essential protein for all egg-laying animals.deep mind

DeepMind’s artificial intelligence (AI) software has predicted the structure of almost every known protein, about 200 million molecules. Knowing the structure of these molecules will help scientists understand the biology of every living thing on the planet, as well as the development of devastating diseases like malaria, Alzheimer’s disease and cancer.

“We are at the start of a new era of computational biology,” said Demis Hassabis, the AI ​​and neuroscience expert who is the lead developer of AlphaFold, the neural network system that almost completely solved the problem. one of the greatest challenges in the field of biology.

A child chess prodigy and expert video gamer, Hassabis is a British citizen who founded DeepMind in 2010, a company that creates artificial intelligence systems that can learn like humans. In 2013, DeepMind developed a system that surpasses human performance on Atari video games. The following year, Google announced that it had purchased the company for $500 million. In 2017, DeepMind’s AlphaGo system beat all the top players in Go, the highly complex Asian board game similar to chess. Hassabis then focused his company on a much bigger challenge: predicting the 3D shapes of proteins by reading their 2D gene sequences written in DNA letters.

Knowing the 3D structure of these molecules is essential to understand how they work, but it is an extremely difficult problem to solve. Some have compared it to trying to put together a jigsaw puzzle with tens of thousands of blank pieces.

Without advanced technology, understanding the structure or shape of a single protein made up of 100 base units (amino acids) could take up to 13.7 billion years, the age of the universe. Some scientists using electron microscopy or huge particle accelerators like the one at the European Synchrotron Radiation Facility in Grenoble (France) have reduced problem-solving time to years. But Google’s AlphaFold system can determine the structure of a protein in just seconds.

“This protein universe is…a gift to humanity,” Hassabis said at a July 26 joint press conference with the European Molecular Biology Laboratory (EMBL), an intergovernmental organization dedicated to molecular biology research that collaborated in the development of AlphaFold.

Prior to AlphaFold, it took 60 years and thousands of scientists to determine the structures of approximately 200,000 proteins. This research was used as learning material for AlphaFold, which searched for valid models that predict protein shape. By 2021, he had successfully predicted the structures of one million proteins, including all human proteins. The latest version of AlphaFold’s results expands the number to 200 million proteins – virtually every known protein in all living things on the planet.

DeepMind provides free and open access to the AlphaFold Code and Protein Database, both of which can be downloaded. A search in this “Google of life” database will show the 2D sequence of a protein and a 3D model with a corresponding level of reliability, which has a margin of error comparable to or lower than conventional prediction methods.

It is important to note that AlphaFold does not determine reality – it predicts reality. AlphaFold reads the genetic sequence and estimates the most likely configuration of its amino acids. The prediction has a high level of reliability, which saves a lot of time and money for scientists doing theoretical work, as they do not need to use expensive equipment to determine the actual structure of a protein until unless it is absolutely necessary.

The applications for this new tool are virtually endless as microscopic proteins are involved in every conceivable biological process, such as bee colony collapse and crop resistance to heat. A team led by Matt Higgins from the University of Oxford (UK) used AlphaFold to help develop an antibody (a type of protein) capable of neutralizing one of the proteins that must be present for the agent to malaria pathogen reproduces. This could accelerate research to develop the first highly effective vaccine against the disease, thus preventing transmission of the parasite by mosquitoes.

more success

Another AlphaFold-related success is the development of the most detailed nuclear pore structure available. Nuclear pores are a donut-shaped protein complex that is the gateway to the nucleus of human cells and has been linked to a host of diseases, including cancer and cardiovascular disease. Jan Kosinski, a researcher at EMBL and co-lead of the nuclear pore modeling effort, told EL PAÍS that AlphaFold gives scientists unprecedented access to understand how the recipe for life (written in the genome ) works when translated into proteins.

Hassabis and his colleagues as well as DeepMind and EMBL say they have analyzed the risks associated with making the AlphaFold system and data openly available. “The benefits clearly outweigh the risks,” Hassabis said, adding that it is up to the international community to decide whether to restrict the use of the technology as it develops.

One of AlphaFold’s most practical applications is designing custom molecules that can block harmful proteins or, better yet, modulate their activity, a much more desirable effect when developing new drugs, said Carlos Fernández. , scientist at the Spanish National Institute. Research Council (CSIC) and Head of the Structural Biology Group of the Spanish Society of Biochemistry and Molecular Biology (SEBBM). His team used AlphaFold to predict part of the structure of a protein complex necessary for the spread of the trypanosome found in sub-Saharan Africa and responsible for sleeping sickness.

Years of work now await us to confirm the accuracy of AlphaFold’s predictions, says biologist José Márquez, an expert in protein structure at the European Synchrotron Radiation Facility in Grenoble. “The next frontier for AlphaFold will be its use in the design of drugs that block or activate proteins, a problem they are already tackling,” Márquez said. And there’s another conundrum to solve: AlphaFold can’t tell why a protein has the shape it does, which could be a vital part of research into diseases like Alzheimer’s or Parkinson’s, both of which are related to misfolded proteins.

Alfonso Valencia, director of life sciences at the National Center for Intensive Computing in Barcelona (Spain), discusses some of the shortcomings of the system. “AlphaFold cannot solve everything because it can only predict what is in the realm of known things. For example, it cannot accurately predict the structure of proteins that protect against freezing because they are rare and databases do not contain many samples. It also cannot predict the consequences of mutations, a matter of great concern in medicine,” Valencia said.

Valencia recognizes the benefits of providing free and open access to AlphaFold, which allows other scientists to improve or modify the system as needed. “It’s clear that the folks at DeepMind are looking to win the Nobel Prize by acting transparently,” Valencia said. “It’s great for their image and gives them a competitive advantage over other companies like Facebook. On the other hand, they hinted that they might reserve specific health data for private use and drug development.

Ida M. Morgan