Sci-Tech

DeepSeek-R1 model training method released

2025-09-18   

On the 17th, Liang Wenfeng and his colleagues from the DeepSeek AI team published in the journal Nature the large-scale inference model training method used by the open-source artificial intelligence (AI) model DeepSeeker R1. Research has shown that the reasoning ability of Large Language Models (LLMs) can be improved through pure reinforcement learning, thereby reducing the human input workload required for enhancing performance. The trained model performs better than traditional LLM in tasks such as mathematics, programming competitions, and graduate level problems in STEM fields. DeepSeeker R1 includes a deep training phase under human supervision to optimize the inference process. Liang Wenfeng's team reported that the model used reinforcement learning instead of human examples to develop inference steps, reducing training costs and complexity. DeepSeek-R1 will get a template to generate the reasoning process after being shown a high-quality problem solving case, that is, this model will get rewards by solving problems, thus strengthening the learning effect. The team concluded that future research can focus on optimizing the reward process to ensure more reliable inference and task outcomes. In the mathematical benchmark test for evaluating AI performance, DeepSeek-R1 Zero and DeepSeek-R1 scored 77.9% and 79.8%, respectively, and performed equally well in programming competitions and graduate level biology, physics, and chemistry problems. (New Society)

Edit:Momo Responsible editor:Chen zhaozhao

Source:Science and Technology Daily

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Recommended Reading Change it

Links