A research team from Harvard University and Northwestern University in the United States has collaborated to develop a new machine learning method that can sort disordered proteins and design intrinsic disordered proteins (IDPs) with specific properties, thus breaking through the limitations of current artificial intelligence (AI) tools in analyzing the structure of about 30% of human proteins. This achievement was published in the latest issue of Nature Computational Science. These proteins are difficult to model for a long time because they do not fold into a fixed three-dimensional structure. Although advanced AI systems such as "alpha folding" perform well in structure prediction, they cannot effectively handle these highly dynamic molecules. However, IDPs play a central role in key biological processes such as cell signaling, molecular sensing, and cross-linking, and their functional abnormalities are closely related to various diseases such as cancer and neurodegenerative diseases. For example, alpha synuclein is closely associated with Parkinson's disease. To address this challenge, the research team has proposed a new approach that combines physical models with machine learning techniques. This method is based on "automatic differentiation" technology - an algorithm commonly used in deep learning to calculate derivatives, used to track the impact of small changes in input variables on the output. They utilize this mechanism to directly optimize amino acid sequences within a molecular dynamics simulation framework, enabling them to possess predetermined physical or functional properties. Unlike typical artificial intelligence models that rely on large amounts of data for training, this method relies on existing and sufficiently accurate physical simulation systems to efficiently search for protein sequences that meet specific functional requirements through gradient optimization, such as the ability to form flexible connection structures or respond to environmental changes. The team emphasizes that the goal is not to replace physical understanding with data-driven models, but to embed real molecular behavior patterns into the design process, so that the generated protein sequences not only have functionality, but also the design process itself is rooted in the real dynamic principles of nature. The protein designed from this is' differentiable ', meaning that each optimization step is based on continuous and precise regulation of the system's physical state, rather than relying on black box predictions. (New Society)
Edit:Wang Shu Ying Responsible editor:Li Jie
Source:Science and Technology Daily
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com