Sci-Tech

Scientists confirm that big models can 'understand' things like humans do

2025-06-11   

Reporters learned from the Institute of Automation of the Chinese Academy of Sciences on the 10th that scientific researchers from the Institute and other institutions confirmed for the first time that multimodal big language models learned to "understand" things themselves during training, and that this way of understanding is very similar to human beings. This discovery opens up a new path for exploring how artificial intelligence "thinks" and lays the foundation for building AI systems that "understand" the world like humans in the future. The relevant research results were published online in the journal Nature Machine Intelligence. The core of human intelligence is the ability to truly 'understand' things. When we see "dogs" or "apples", we can not only recognize what they look like, such as size, color, shape, etc., but also understand what they are used for, what feelings they can bring us, and what cultural significance they have. This comprehensive understanding is the foundation of our understanding of the world. With the rapid development of large models like ChatGPT, scientists are curious: can they learn to "understand" things like humans from massive amounts of text and images? Traditional artificial intelligence research focuses on object recognition accuracy, but rarely explores whether models truly "understand" the meaning of objects. "At present, AI can distinguish between cats and dogs, but the essential difference between this' recognition 'and human' understanding 'of cats and dogs remains to be revealed," said He Huiguang, the corresponding author of the paper and a researcher at the Institute of Automation of the Chinese Academy of Sciences. In this study, researchers drew inspiration from the principles of human cognition and designed a clever experiment: playing a game of "finding differences" between a large model and humans. The experimenters presented three item concepts from 1854 common items and asked them to select the least compatible one. By analyzing up to 4.7 million judgment data, researchers have for the first time drawn a "mind map" of a large model - a "concept map". He Huiguang introduced that they summarized 66 key perspectives representing how artificial intelligence "understands" things from massive experimental data, and gave them names. Research has found that these angles are very easy to explain and highly consistent with the neural activity patterns in the regions of the human brain responsible for object processing. More importantly, multimodal models that can simultaneously understand text and images have a way of thinking and making choices that is closer to humans than other models. In addition, an interesting finding of the study is that when humans make judgments, they not only look at what things look like, such as shape and color, but also think about their meaning or purpose. However, large models rely more on the "textual labels" attached to them and the abstract concepts they learn. This proves that a way of understanding the world that is somewhat similar to that of humans has indeed developed within the big model, "said He Huiguang. (New Society)

Edit:He Chuanning Responsible editor:Su Suiyue

Source:Sci-Tech Daily

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Recommended Reading Change it

Links