生物技术通报 ›› 2025, Vol. 41 ›› Issue (10): 143-155.doi: 10.13560/j.cnki.biotech.bull.1985.2025-0470

• 综述与专论 • 上一篇    下一篇

基于人工智能的蛋白质挖掘与设计研究进展

何远1,2(), 牟强1,2, 和玉兵2,3(), 赵晓燕2, 王健1,2, 周国民4,5,6, 张建华1,2()   

  1. 1.中国农业科学院农业信息研究所,北京 100081
    2.三亚中国农业科学院国家南繁研究院,三亚 572024
    3.中国农业科学院作物科学研究所,北京 100081
    4.农业农村部南京农业机械化研究所,南京 210014
    5.国家农业科学数据中心,北京 100081
    6.中国农业科学院西部农业研究中心,昌吉 831100
  • 收稿日期:2025-05-08 出版日期:2025-10-26 发布日期:2025-10-28
  • 通讯作者: 张建华,男,博士,研究员,研究方向 :计算机视觉与数据挖掘;E-mail: zhangjianhua@caas.cn
    和玉兵,男,博士,副研究员,研究方向 :基因编辑技术;E-mail: heyubing@caas.cn
  • 作者简介:何远,男,硕士研究生,研究方向 :数据挖掘;E-mail: 821012450699@caas.cn
  • 基金资助:
    国家重点研发计划(2022YFF0711805);国家重点研发计划(2022YFF0711801);海南省自然科学基金项目(325MS155);三亚崖州湾科技城科技专项(SCKJ-JYRC-2023-45);三亚中国农业科学院国家南繁研究院南繁专项(YBXM2409);三亚中国农业科学院国家南繁研究院南繁专项(YBXM2410);三亚中国农业科学院国家南繁研究院南繁专项(YBXM2508);三亚中国农业科学院国家南繁研究院南繁专项(YBXM2509);中央级公益性科研院所基本科研业务费专项(JBYW-AII-2024-05);中央级公益性科研院所基本科研业务费专项(JBYW-AII-2025-05);中央级公益性科研院所基本科研业务费专项(Y2025YC90);中国农业科学院科技创新工程(CAAS-ASTIP-2024-AII)

Advances in Protein Mining and Design Based on Artificial Intelligence

HE Yuan1,2(), MOU Qiang1,2, HE Yu-bing2,3(), ZHAO Xiao-yan2, WANG Jian1,2, ZHOU Guo-min4,5,6, ZHANG Jian-hua1,2()   

  1. 1.Agricultural Information Institute, Chinese Academy of Agricultural Sciences, Beijing 100081
    2.National Nanfan Research Institute, Chinese Academy of Agriculture Science, Sanya 572024
    3.Institute of Crop Sciences, Chinese Academy of Agricultural Sciences, Beijing 100081
    4.Nanjing Institute of Agricultural Mechanization, Ministry of Agriculture and Rural Affairs, Nanjing 210014
    5.National Agricultural Science Data Center, Beijing 100081
    6.Institute of Western Agriculture, Chinese Academy of Agricultural Sciences, Changji 831100
  • Received:2025-05-08 Published:2025-10-26 Online:2025-10-28

摘要:

蛋白质是生命活动的基础物质,其结构与功能的多样性支撑了细胞代谢、信号转导、环境响应等复杂生物过程。作为生命科学与合成生物学中的核心研究对象,长期以来蛋白质的功能挖掘和理性设计在新药开发、工业酶优化及农业生物工程等领域展现出重要的应用潜力。随着高通量组学数据积累与计算生物学的发展,传统依赖序列比对、结构解析与实验筛选的方法逐渐显现出效率与可扩展性上的瓶颈。近年来人工智能(artificial intelligence,AI)技术逐步融入蛋白质科学研究,显著推动了其研究范式向数据驱动模式的转型。本文回顾并分析了AI在蛋白质功能挖掘与理性设计中的代表性进展,重点聚焦于“序列→结构”与“结构→序列”两类主流设计框架,探讨了基于序列和结构相似性的多样化挖掘策略,并进一步梳理了语言模型、进化信息整合机制以及生成式模型等关键AI方法在提升设计效率与精度方面所发挥的实际应用与贡献。

关键词: 人工智能, 蛋白质设计, 语言模型, 蛋白质挖掘

Abstract:

Proteins serve as fundamental components of life, with their structural and functional diversity underpinning complex biological processes such as cellular metabolism, signal transduction, and environmental response. As core subjects in life sciences and synthetic biology, protein functional mining and rational design have long demonstrated significant application potential in fields including drug development, industrial enzyme optimization, and agricultural bioengineering. With the accumulation of high-throughput multi-omics data and advances in computational biology, traditional approaches, relying on sequence alignment, structural analysis, and experimental screening, have increasingly revealed limitations in efficiency and scalability. In recent years, artificial intelligence (AI) technologies have been progressively integrated into protein science, catalyzing a paradigm shift toward data-driven research. This review summarizes and analyzes representative advances in AI-driven protein functional mining and rational design, with a particular focus on the two mainstream design frameworks: “sequence-to-structure” and “structure-to-sequence”. The review also explores diverse mining strategies based on sequence and structural similarity and further discusses the practical contributions of key AI methodologies, such as language models, evolutionary information integration, and generative modeling, in enhancing design efficiency and accuracy.

Key words: artificial intelligence, protein design, language models, protein mining