Sci-Tech

High quality datasets and AI resonance become the 'hard currency' for data circulation

2025-09-03   

As the wave of artificial intelligence sweeps across the globe, the "fuel" behind it - data - is becoming a strategic resource that is being fiercely contested. However, not all data can accelerate the development of AI. A transformation from "massive data" to "high-quality datasets" is happening. What is a high-quality dataset? In December 2024, the National Development and Reform Commission, the National Data Administration, and other departments issued the "Guiding Opinions on Promoting the High Quality Development of the Data Industry", which for the first time explicitly proposed the concept of "high-quality datasets", supported enterprises to innovate in artificial intelligence applications, develop high-quality datasets, and vigorously develop new business models such as "data-as-a-service", "knowledge-based service", and "model-based service". The recently released "Guidelines for High Quality Dataset Construction" points out that with the exponential growth of large model parameter scale and the expansion of multimodal capabilities, the demand for data has shifted from "quantity accumulation" to "equal emphasis on quantity and quality". According to official data, as of June 2025, China has built over 35000 high-quality datasets with a total volume of over 400PB. Data trading institutions have listed 3364 high-quality datasets, which are key commodities in trading circulation, with a cumulative transaction volume of nearly 4 billion yuan and a scale of 246PB. At a recent forum, Yu Xiaohui, President of the China Academy of Information and Communications Technology, stated that looking at the world, there is a large amount of private domain data, and the release of this data in scenarios, industries, and governments is a very important direction for forming high-quality datasets. High quality datasets and AI development complement each other because the training of AI big models requires massive amounts of data. Therefore, there has always been a view in the market that there will be no data available in the future, or that a large amount of synthetic data will have to be used. In this case, high-quality datasets undoubtedly become the "hard currency" for data circulation. Zhang Xiaojin, Dean and Professor of the Institute of Digital Government and Governance at Tsinghua University, stated that wherever artificial intelligence models go, high-quality datasets go; conversely, wherever high-quality datasets go, artificial intelligence goes. This is a complementary and dual wheel driven pattern. Wu Shizhong, an academician of the CAE Member, pointed out that the quality and safety of data set construction is the lifeline of the development of the big model. It is necessary to improve the hierarchical and classified data security system, strengthen the technical protection means of the whole process, and build the underlying technical ability to prevent tampering. In the construction of the dataset, it is also necessary to actively integrate excellent traditional Chinese culture to avoid the model becoming a tool for selfishness. At present, the construction of high-quality datasets is in full swing. Zhou Jianming, Secretary of the Party Group and Director of the Shenzhen Municipal Government Service and Data Management Bureau, shared on the official website of the National Data Administration that Shenzhen has explored the integration of high-quality public data and enterprise data through the authorization and operation of public data resources and the construction of a trustworthy data space. Pilot projects have been carried out in areas such as credit reporting and finance, meteorology, commercial factoring and compensation, and have achieved good results. (New Society)

Edit:Momo Responsible editor:Chen zhaozhao

Source:China News Service

Special statement: if the pictures and texts reproduced or quoted on this site infringe your legitimate rights and interests, please contact this site, and this site will correct and delete them in time. For copyright issues and website cooperation, please contact through outlook new era email:lwxsd@liaowanghn.com

Recommended Reading Change it

Links