Scientists from the Swiss Federal Institute of Technology Zurich have published a paper in the latest issue of Nature stating that they have developed a DNA search engine called MetaGraph, which can quickly and efficiently retrieve massive amounts of information from public biological databases, providing a powerful professional tool for studying life sciences. The development of MetaGraph stems from the scientific community's practical dilemma of "not being able to use and find" the increasingly large amount of gene sequencing data. In the past few decades, the scale of various biological databases has exploded. However, raw sequencing data is often fragmented, noisy, and massive, making it difficult for scientists to efficiently extract useful information directly from it. The core breakthrough of MetaGraph lies in the use of mathematical "graph structures" to intelligently connect overlapping DNA fragments. The principle is similar to linking sentences containing the same keywords in a book index to form a knowledge network. The research team integrated seven publicly funded databases to construct a comprehensive index of the entire spectrum of life spanning viruses, bacteria, fungi, plants, animals, and even humans. This index covers a total of 18.8 million unique DNA and RNA sequence sets, as well as 210 billion amino acid sequence sets. Based on this massive index, the team has developed a search engine that can directly retrieve raw data archives through text prompts. The team stated that this is a new way of interacting with biological data - the data is highly compressed but can be accessed at any time. MetaGraph enables researchers to directly ask biological questions about repositories such as Sequence Read Archive (SRA), which itself contains over 100 million DNA letters. To verify its practicality, the team used MetaGraph to scan over 240000 human gut microbiome samples, searching for genetic markers of antibiotic resistance. With just one high-performance computer, the results can be obtained in about an hour, demonstrating powerful analytical efficiency. According to Rayan Hickey, a bioinformatics expert at the Pasteur Institute in France, this is a "major breakthrough" that sets new standards for analyzing raw biological data such as DNA, RNA, and protein sequences. (New Society)
Edit:Wang Shu Ying Responsible editor:Li Jie
Source:Science and Technology Daily
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com