Shanghai Artificial Intelligence Laboratory releases new authoritative evaluation platform for Chinese medical models
2025-12-03
Recently, the Shanghai Artificial Intelligence Laboratory released MedBench 4.0, an authoritative evaluation platform for Chinese medical models. This is the first medical model evaluation and validation system in China that focuses on vertical models, specialized models, and application scenarios. Industry insiders introduce that MedBench 4.0 provides a scientific standard for measuring the performance and reliability of medical AI products, ensuring the improvement of the quality of medical AI products and healthy competition in the industry. It is reported that MedBench, the authoritative evaluation platform for Chinese medical models, was first launched in mid-2023 and is currently updated to its fourth edition. Xu Jie, the head of the Medical and Evaluation Center at the Shanghai Artificial Intelligence Laboratory, introduced that this upgrade includes three major technological paradigms: big language models, multimodal big models, and intelligent agents. It deeply aligns with the national "Reference Guidelines for Artificial Intelligence Application Scenarios in the Health Industry" and covers 60 fully independently constructed evaluation sets, with a total of more than 700000 professional evaluation questions. The reporter learned from the Shanghai Artificial Intelligence Laboratory that for the evaluation of large language models, the platform has built an evaluation set around dimensions such as medical knowledge Q&A, language understanding, generation, complex reasoning, and safety ethics, and innovatively introduced a scientific indicator system to alleviate the impact of model information leakage or illusion generation on the evaluation results; For multimodal large model evaluation, targeting clinical core scenarios such as medical imaging and detection reports, the platform can cover 10 sub tasks including object detection, image classification, multimodal report quality control, sequence image understanding, and dynamic disease tracking, filling the technical gap in the field of Chinese medical multimodal evaluation; For the evaluation of intelligent agents, the platform focuses on solving the problem of execution discontinuity of intelligent agents, promoting the evolution of medical intelligent agents from "conversational" to "executable and collaborative". Regarding the significance of evaluating the medical big model, Xu Jie told reporters that the evaluation can first verify the compliance of the medical big model, followed by verifying its professionalism, such as whether there will be misdiagnosis, missed diagnosis or wrong medication, and ultimately help the market better train high-quality medical big models. The reporter noticed that currently, many general models such as Deep Search and Thousand Questions can analyze physical examination reports and examination reports. So, what is the value of the medical big model? Regarding this, Xu Jie introduced that mainstream general models can meet daily health consultations. The evaluation shows that the medical capabilities of the top general models in China have surpassed those of similar foreign products. But for medical scenarios that require professional judgment, such as which tests to perform, how to interpret reports, what medications or traditional Chinese medicine formulas should be prescribed, etc., the big model also needs to integrate a large amount of medical data, expert medical cases, and clinical experience corpus. The medical big model can efficiently process massive amounts of information, provide evidence-based references, help improve the diagnosis and treatment level of doctors, make up for the shortcomings of primary healthcare, and ultimately achieve medical universality. In addition to launching an authoritative evaluation platform, Shanghai Artificial Intelligence Laboratory has also collaborated with professional medical institutions and related enterprises to launch medical artificial intelligence applications such as intelligent screening and precise intervention systems for children's eye diseases, and a multimodal assisted diagnosis and treatment model for gastrointestinal diseases. (New Society)
Edit:Momo Responsible editor:Chen zhaozhao
Source:Economic Information Daily
Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com