Biotechnology Bulletin ›› 2025, Vol. 41 ›› Issue (12): 50-65.doi: 10.13560/j.cnki.biotech.bull.1985.2025-0627
Previous Articles Next Articles
GUO Fa-xu1,2(
), FENG Quan1(
), ZHANG Jian-hua2,3(
), ZHOU Huan-bin2,4(
), YANG Sen1, WANG Jian2,3, ZHOU Guo-min5,6,7
Received:2025-06-16
Online:2025-12-26
Published:2026-01-06
Contact:
FENG Quan, ZHANG Jian-hua, ZHOU Huan-bin
E-mail:guofax@gsau.edu.cn;fquan@gsau.edu.cn;zhangjianhua@caas.cn;zhouhuanbin@cass.cn
GUO Fa-xu, FENG Quan, ZHANG Jian-hua, ZHOU Huan-bin, YANG Sen, WANG Jian, ZHOU Guo-min. Research Advances in AI-driven Enzyme Modifying and Design[J]. Biotechnology Bulletin, 2025, 41(12): 50-65.
Fig. 1 Components and framework of AI-driven enzyme engineering and designA: Data, acquiring data from databases. B: Encoding, extracting feature from sequence, structure, and network representations, including sequence feature, structural feature, and embedding feature. C: Modeling, applying deep learning models and protein language models to analyze and learn from the extracted features. D: Testing, experimental validation of designed enzymes using high-throughput screening platforms or further optimization via directed evolution
名称 Name | 类型 Type | 数目/大小 Number/Size | 特征 Features | 参考文献 Reference |
|---|---|---|---|---|
| UniProtKB | 蛋白质序列与功能注释 | >2亿蛋白质序列 | 高质量注释,Swiss-Prot与TrEMBL,广泛交叉引用 | [ |
| PDB / RCSB | 蛋白质三维结构 | >210 000结构条目 | 三维结构,支持多种解析方法,结构视图与功能注释 | [ |
| ProThermDB | 热力学稳定性(实验数据) | 25 000突变数据 | 含ΔΔG、Tm等稳定性参数,适用于建模和蛋白质设计 | [ |
| FireProtDB | 蛋白质突变稳定性(实验+预测) | >10 000突变 | 包含ΔΔG预测,适用于稳定性预测工具开发 | [ |
| SoluProtMutDB | 可溶性突变实验数据库 | >10 000突变 | 聚焦蛋白表达产物可溶性,支持可溶性优化 | [ |
| ProtaBank | 蛋白质工程实验数据 | >100 000数据条目 | 包含结合力、催化活性、稳定性等,支持上传与机器学习训练 | [ |
| AlphaFold DB | 蛋白质结构预测 | >2亿预测结构 | 基于深度学习,提供可信度评分,广泛补充实验结构缺口 | [ |
| GotEnzymes | 酶催化参数预测(AI生成) | >10亿酶-底物对 | 基于AI预测kcat,适用于合成生物学与模型代谢网络 | [ |
| InterPro | 蛋白结构域和家族分类 | >47 000结构域类型 | 综合Pfam等数据库,支持功能位点注释与家族进化研究 | [ |
| BRENDA | 酶功能与生化参数数据库 | >10万酶,>100万参数 | 涵盖Km、kcat、pH、温度、抑制剂等,按EC号系统整理 | [ |
| BKMS-react | 生化反应整合数据库(代谢建模) | >81 200酶催化反应 | 整合KEGG/BRENDA等,支持反应建模、底物产物分析 | [ |
Table 1 Summary of commonly used databases
名称 Name | 类型 Type | 数目/大小 Number/Size | 特征 Features | 参考文献 Reference |
|---|---|---|---|---|
| UniProtKB | 蛋白质序列与功能注释 | >2亿蛋白质序列 | 高质量注释,Swiss-Prot与TrEMBL,广泛交叉引用 | [ |
| PDB / RCSB | 蛋白质三维结构 | >210 000结构条目 | 三维结构,支持多种解析方法,结构视图与功能注释 | [ |
| ProThermDB | 热力学稳定性(实验数据) | 25 000突变数据 | 含ΔΔG、Tm等稳定性参数,适用于建模和蛋白质设计 | [ |
| FireProtDB | 蛋白质突变稳定性(实验+预测) | >10 000突变 | 包含ΔΔG预测,适用于稳定性预测工具开发 | [ |
| SoluProtMutDB | 可溶性突变实验数据库 | >10 000突变 | 聚焦蛋白表达产物可溶性,支持可溶性优化 | [ |
| ProtaBank | 蛋白质工程实验数据 | >100 000数据条目 | 包含结合力、催化活性、稳定性等,支持上传与机器学习训练 | [ |
| AlphaFold DB | 蛋白质结构预测 | >2亿预测结构 | 基于深度学习,提供可信度评分,广泛补充实验结构缺口 | [ |
| GotEnzymes | 酶催化参数预测(AI生成) | >10亿酶-底物对 | 基于AI预测kcat,适用于合成生物学与模型代谢网络 | [ |
| InterPro | 蛋白结构域和家族分类 | >47 000结构域类型 | 综合Pfam等数据库,支持功能位点注释与家族进化研究 | [ |
| BRENDA | 酶功能与生化参数数据库 | >10万酶,>100万参数 | 涵盖Km、kcat、pH、温度、抑制剂等,按EC号系统整理 | [ |
| BKMS-react | 生化反应整合数据库(代谢建模) | >81 200酶催化反应 | 整合KEGG/BRENDA等,支持反应建模、底物产物分析 | [ |
Fig. 2 AI-driven strategies for enzyme engineeringStarting from known enzyme sequences or structures, this strategy involves mutation, prediction, and screening steps to obtain novel enzyme variants with improved or altered properties
应用类型 Application type | 模型名称 Model name | 模型类型 Model type | 技术特点 Technical features | 发布时间 Publishing time | 参考文献 Reference |
|---|---|---|---|---|---|
| PET水解酶热稳定性、催化活性优化 | MutCompute | 深度学习/机器学习 | 结合机器学习和结构数据来提高PET水解酶的催化性能 | 2022年 | [ |
| 卤代烷烃脱卤素酶和氟化酶性能优化 | MicroPEX-KinMAP | 深度学习/机器学习 | 结合序列和结构生物信息学与微流控技术来发现高效的脱卤酶 | 2022年 | [ |
| 利用机器人自动化与机器学习进行蛋白质定向进化 | BO-EVO | 贝叶斯优化算法 | 聚焦于优化蛋白质的适应性与功能,减轻实验负担 | 2022年 | [ |
| 癌症治疗中的SHP2抑制剂预测 | XGBoost, KNN, 神经网络等 | 深度学习/机器学习 | 通过十倍交叉验证测试多个机器学习模型 | 2023年 | [ |
| 蛋白质-配体相互结合作用优化 | AlphaSpace | 深度学习/机器学习 | 基于AlphaSpace进行靶点预测和功能性优化 | 2023年 | [ |
| 通过多位点组合突变增强果胶裂解酶的热稳定性 | RoseTTAFold | 蛋白质结构预测模型 | 通过迭代设计-测试-学习的方式提升酶的热稳定性 | 2024年 | [ |
| 结合多重突变和蛋白质语言模型优化酶的热稳定性 | Pro-PRIME | 蛋白质语言模型 | 能捕捉到高阶组合突变中的复杂基因互作(表观效应) | 2024年 | [ |
| PET降解水解酶性能重塑与优化 | TurboPETase | 蛋白质语言模型 | TurboPETase重设计使PET降解接近完全,达到200 g/kg的高固体负载 | 2024年 | [ |
| 丝氨酸水解酶局部改造与活性位点优化 | RFdiffusion | 扩散模型(生成模型) | 使用RFdiffusion设计蛋白质活性位点,具有高结构精度 | 2025年 | [ |
| 通过精确的酶结构分析和优化,设计出具有多功能酶活性的复合分子体系 | iMARS | 蛋白质语言模型/生成模型 | iMARS框架不仅适用于生物制造和PET塑料降解,还可以扩展到其他合成生物学和绿色化学领域 | 2025年 | [ |
Table 2 Representative applications of artificial intelligence in enzyme engineering
应用类型 Application type | 模型名称 Model name | 模型类型 Model type | 技术特点 Technical features | 发布时间 Publishing time | 参考文献 Reference |
|---|---|---|---|---|---|
| PET水解酶热稳定性、催化活性优化 | MutCompute | 深度学习/机器学习 | 结合机器学习和结构数据来提高PET水解酶的催化性能 | 2022年 | [ |
| 卤代烷烃脱卤素酶和氟化酶性能优化 | MicroPEX-KinMAP | 深度学习/机器学习 | 结合序列和结构生物信息学与微流控技术来发现高效的脱卤酶 | 2022年 | [ |
| 利用机器人自动化与机器学习进行蛋白质定向进化 | BO-EVO | 贝叶斯优化算法 | 聚焦于优化蛋白质的适应性与功能,减轻实验负担 | 2022年 | [ |
| 癌症治疗中的SHP2抑制剂预测 | XGBoost, KNN, 神经网络等 | 深度学习/机器学习 | 通过十倍交叉验证测试多个机器学习模型 | 2023年 | [ |
| 蛋白质-配体相互结合作用优化 | AlphaSpace | 深度学习/机器学习 | 基于AlphaSpace进行靶点预测和功能性优化 | 2023年 | [ |
| 通过多位点组合突变增强果胶裂解酶的热稳定性 | RoseTTAFold | 蛋白质结构预测模型 | 通过迭代设计-测试-学习的方式提升酶的热稳定性 | 2024年 | [ |
| 结合多重突变和蛋白质语言模型优化酶的热稳定性 | Pro-PRIME | 蛋白质语言模型 | 能捕捉到高阶组合突变中的复杂基因互作(表观效应) | 2024年 | [ |
| PET降解水解酶性能重塑与优化 | TurboPETase | 蛋白质语言模型 | TurboPETase重设计使PET降解接近完全,达到200 g/kg的高固体负载 | 2024年 | [ |
| 丝氨酸水解酶局部改造与活性位点优化 | RFdiffusion | 扩散模型(生成模型) | 使用RFdiffusion设计蛋白质活性位点,具有高结构精度 | 2025年 | [ |
| 通过精确的酶结构分析和优化,设计出具有多功能酶活性的复合分子体系 | iMARS | 蛋白质语言模型/生成模型 | iMARS框架不仅适用于生物制造和PET塑料降解,还可以扩展到其他合成生物学和绿色化学领域 | 2025年 | [ |
Fig. 3 AI-driven strategies for enzyme designSequence-based strategy: Utilizes deep generative models to learn co-evolutionary patterns from protein datasets, and generates novel sequences with potential functional properties in a data-driven manner. Structure-based strategy: Employs physical energy functions and spatial constraint algorithms to derive stable protein conformations based on three-dimensional structural constraints
应用类型 Application type | 模型名称 Model name | 模型类型 Model type | 技术特点 Technical features | 发布时间 Publication time | 参考文献 References |
|---|---|---|---|---|---|
| 通过变分自编码器生成功能性蛋白质变体,应用于酶设计 | MSA-VAE、AR-VAE | 变分自编码器(VAE) | 使用MSA(多序列对齐)和原始序列作为输入,生成新的功能性蛋白变体 | 2021年 | [ |
| 通过生成对抗网络扩展功能性蛋白质序列空间生成新酶 | ProteinGAN | 生成对抗网络(GAN) | 通过自注意力机制学习自然蛋白质序列的进化关系 | 2021年 | [ |
| 蛋白质序列-功能预测与设计,生成性蛋白质设计 | ProT-VAE | 变分自编码器(VAE) | 将VAE与Transformer结合,用于学习序列-功能关系 | 2023年 | [ |
| 使用生成模型设计蛋白质结构与功能 | RFdiffusion | 扩散模型 | 应用于生成具有特定设计目标的功能蛋白,如结合剂设计、酶活性位点支架、对称蛋白体设计 | 2023年 | [ |
| 蛋白质和肽的设计,特别是alpha-螺旋结构的设计 | HelixGAN | 生成对抗网络(GAN) | 通过梯度搜索优化生成的螺旋结构,能够绑定特定靶标或激活细胞受体 | 2023年 | [ |
| 蛋白质和蛋白质复合物的生成,用于蛋白质设计 | Chroma | 扩散模型 | 支持在生成过程中引入多种条件约束(如对称性、形状、语义等) | 2023年 | [ |
| 基于数据的蛋白质设计,生成新的蛋白质序列 | ProtWave-VAE | 变分自编码器(VAE) | 结合VAE和AR模型,在未对齐的序列数据上进行训练和预测 | 2023年 | [ |
| 高亲和力生物活性螺旋肽结合剂设计 | RFdiffusion | 扩散模型 | 能够生成皮摩尔亲和力结合剂用于生物活性肽 | 2024年 | [ |
| 用于从头设计催化新反应的酶 | RFdiffusion2 | 大语言模型 | 可以基于原子级别的活性位点描述设计酶。无需反向旋转生成和预先指定序列位置 | 2025年 | [ |
| 结合大型语言模型(LLMs)和遗传算法(GAs)的新框架,用于加速酶设计,多目标协同优化 | LLM-GA | 大语言模型、遗传算法(GA) | 框架设计高度模块化,可以集成多种性能指标进行酶性能的综合优化 | 2025年 | [ |
Table 3 Representative applications of artificial intelligence in enzyme design
应用类型 Application type | 模型名称 Model name | 模型类型 Model type | 技术特点 Technical features | 发布时间 Publication time | 参考文献 References |
|---|---|---|---|---|---|
| 通过变分自编码器生成功能性蛋白质变体,应用于酶设计 | MSA-VAE、AR-VAE | 变分自编码器(VAE) | 使用MSA(多序列对齐)和原始序列作为输入,生成新的功能性蛋白变体 | 2021年 | [ |
| 通过生成对抗网络扩展功能性蛋白质序列空间生成新酶 | ProteinGAN | 生成对抗网络(GAN) | 通过自注意力机制学习自然蛋白质序列的进化关系 | 2021年 | [ |
| 蛋白质序列-功能预测与设计,生成性蛋白质设计 | ProT-VAE | 变分自编码器(VAE) | 将VAE与Transformer结合,用于学习序列-功能关系 | 2023年 | [ |
| 使用生成模型设计蛋白质结构与功能 | RFdiffusion | 扩散模型 | 应用于生成具有特定设计目标的功能蛋白,如结合剂设计、酶活性位点支架、对称蛋白体设计 | 2023年 | [ |
| 蛋白质和肽的设计,特别是alpha-螺旋结构的设计 | HelixGAN | 生成对抗网络(GAN) | 通过梯度搜索优化生成的螺旋结构,能够绑定特定靶标或激活细胞受体 | 2023年 | [ |
| 蛋白质和蛋白质复合物的生成,用于蛋白质设计 | Chroma | 扩散模型 | 支持在生成过程中引入多种条件约束(如对称性、形状、语义等) | 2023年 | [ |
| 基于数据的蛋白质设计,生成新的蛋白质序列 | ProtWave-VAE | 变分自编码器(VAE) | 结合VAE和AR模型,在未对齐的序列数据上进行训练和预测 | 2023年 | [ |
| 高亲和力生物活性螺旋肽结合剂设计 | RFdiffusion | 扩散模型 | 能够生成皮摩尔亲和力结合剂用于生物活性肽 | 2024年 | [ |
| 用于从头设计催化新反应的酶 | RFdiffusion2 | 大语言模型 | 可以基于原子级别的活性位点描述设计酶。无需反向旋转生成和预先指定序列位置 | 2025年 | [ |
| 结合大型语言模型(LLMs)和遗传算法(GAs)的新框架,用于加速酶设计,多目标协同优化 | LLM-GA | 大语言模型、遗传算法(GA) | 框架设计高度模块化,可以集成多种性能指标进行酶性能的综合优化 | 2025年 | [ |
| [1] | van Beilen JB, Li Z. Enzyme technology: an overview [J]. Curr Opin Biotechnol, 2002, 13(4): 338-344. |
| [2] | Yang J, Li FZ, Arnold FH. Opportunities and challenges for machine learning-assisted enzyme engineering [J]. ACS Cent Sci, 2024, 10(2): 226-241. |
| [3] | Robinson PK. Enzymes: principles and biotechnological applications [J]. Essays Biochem, 2015, 59: 1-41. |
| [4] | Wiltschi B, Cernava T, Dennig A, et al. Enzymes revolutionize the bioproduction of value-added compounds: From enzyme discovery to special applications [J]. Biotechnol Adv, 2020, 40: 107520. |
| [5] | Victorino da Silva Amatto I, Gonsales da Rosa-Garzon N, Antônio de Oliveira Simões F, et al. Enzyme engineering and its industrial applications [J]. Biotechnol Appl Biochem, 2022, 69(2): 389-409. |
| [6] | Zhou L, Tao CM, Shen XL, et al. Unlocking the potential of enzyme engineering via rational computational design strategies [J]. Biotechnol Adv, 2024, 73: 108376. |
| [7] | Xiong W, Liu B, Shen YJ, et al. Protein engineering design from directed evolution to de novo synthesis [J]. Biochem Eng J, 2021, 174: 108096. |
| [8] | Wang YJ, Xue P, Cao MF, et al. Directed evolution: methodologies and applications [J]. Chem Rev, 2021, 121(20): 12384-12444. |
| [9] | Singh N, Malik S, Gupta A, et al. Revolutionizing enzyme engineering through artificial intelligence and machine learning [J]. Emerg Top Life Sci, 2021, 5(1): 113-125. |
| [10] | Mao SC, Jiang JW, Xiong K, et al. Enzyme engineering: performance optimization, novel sources, and applications in the food industry [J]. Foods, 2024, 13(23): 3846. |
| [11] | Xu KJ, Fu HR, Chen QM, et al. Engineering thermostability of industrial enzymes for enhanced application performance [J]. Int J Biol Macromol, 2025, 291: 139067. |
| [12] | Srivastava N, Khare SK. Advances in microbial alkaline proteases: addressing industrial bottlenecks through genetic and enzyme engineering [J]. Appl Biochem Biotechnol, 2025, 197(8): 4861-4896. |
| [13] | Sikander R, Wang YP, Ghulam A, et al. Identification of enzymes-specific protein domain based on DDE, and convolutional neural network [J]. Front Genet, 2021, 12: 759384. |
| [14] | Dae J, Bae K, Kim Y, et al. Applications of artificial intelligence to enzyme and pathway design for metabolic engineering [J]. Curr Opin Biotechnol, 2022, 73: 101-107. |
| [15] | Wang YH, Han SX, Wang Y, et al. Artificial intelligence technology assists enzyme prediction and rational design [J]. J Agric Food Chem, 2025, 73(12): 7065-7073. |
| [16] | Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold [J]. Nature, 2021, 596(7873): 583-589. |
| [17] | Watson JL, Juergens D, Bennett NR, et al. De novo design of protein structure and function with RFdiffusion [J]. Nature, 2023, 620(7976): 1089-1100. |
| [18] | Sun MGF, Seo MH, Nim S, et al. Protein engineering by highly parallel screening of computationally designed variants [J]. Sci Adv, 2016, 2(7): e1600692. |
| [19] | Siedhoff NE, Schwaneberg U, Davari MD. Machine learning-assisted enzyme engineering [J]. Meth Enzymol, 2020, 643: 281-315. |
| [20] | Strokach A, Kim PM. Deep generative modeling for protein design [J]. Curr Opin Struct Biol, 2022, 72: 226-236. |
| [21] | Corso G, Stark H, Jegelka S, et al. Graph neural networks [J]. Nat Rev Meth Primers, 2024, 4: 17. |
| [22] | Hsu C, Fannjiang C, Listgarten J. Generative models for protein structures and sequences [J]. Nat Biotechnol, 2024, 42(2): 196-199. |
| [23] | Fram B, Su Y, Truebridge I, et al. Simultaneous enhancement of multiple functional properties using evolution-informed protein design [J]. Nat Commun, 2024, 15: 5141. |
| [24] | Chen Z, Liu YG, Wang YG, et al. Validation of an LLM-based multi-agent framework for protein engineering in dry lab and wet lab [C]//2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). December 3-6, 2024, Lisbon, Portugal. Piscataway, NJ: IEEE, 2024: 5364-5370. |
| [25] | Wildey MJ, Haunso A, Tudor M, et al. High-throughput screening [M]//Platform Technologies in Drug Discovery and Validation. Amsterdam: Elsevier, 2017: 149-195. |
| [26] | Fowler DM, Fields S. Deep mutational scanning: a new style of protein science [J]. Nat Meth, 2014, 11(8): 801-807. |
| [27] | Lee SO, Fried SD. An error prone PCR method for small amplicons [J]. Anal Biochem, 2021, 628: 114266. |
| [28] | Giessel A, Dousis A, Ravichandran K, et al. Therapeutic enzyme engineering using a generative neural network [J]. Sci Rep, 2022, 12: 1536. |
| [29] | Li FR, Yuan L, Lu HZ, et al. Deep learning-based kcat prediction enables improved enzyme-constrained model reconstruction [J]. Nat Catal, 2022, 5(8): 662-672. |
| [30] | Varadi M, Anyango S, Deshpande M, et al. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models [J]. Nucleic Acids Res, 2022, 50(D1): D439-D444. |
| [31] | Boutet E, Lieberherr D, Tognolli M, et al. UniProtKB/Swiss-prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view [M]//Plant Bioinformatics. New York, NY: Springer New York, 2016: 23-54. |
| [32] | Bittrich S, Bhikadiya C, Bi CX, et al. RCSB protein data bank: efficient searching and simultaneous access to one million computed structure models alongside the PDB structures enabled by architectural advances [J]. J Mol Biol, 2023, 435(14): 167994. |
| [33] | Nikam R, Kulandaisamy A, Harini K, et al. ProThermDB: thermodynamic database for proteins and mutants revisited after 15 years [J]. Nucleic Acids Res, 2021, 49(D1): D420-D424. |
| [34] | Stourac J, Dubrava J, Musil M, et al. FireProtDB: database of manually curated protein stability data [J]. Nucleic Acids Res, 2021, 49(D1): D319-D324. |
| [35] | Velecký J, Hamsikova M, Stourac J, et al. SoluProtMutDB: a manually curated database of protein solubility changes upon mutations [J]. Comput Struct Biotechnol J, 2022, 20: 6339-6347. |
| [36] | Wang CY, Chang PM, Ary ML, et al. ProtaBank: a repository for protein design and engineering data [J]. Protein Sci, 2018, 27(6): 1113-1124. |
| [37] | Hunter S, Apweiler R, Attwood TK, et al. InterPro: the integrative protein signature database [J]. Nucleic Acids Res, 2009, 37(): D211-D215. |
| [38] | Schomburg I, Chang A, Hofmann O, et al. BRENDA: a resource for enzyme data and metabolic information [J]. Trends Biochem Sci, 2002, 27(1): 54-56. |
| [39] | Sankaranarayanan K, Jensen KF. Computer-assisted multistep chemoenzymatic retrosynthesis using a chemical synthesis planner [J]. Chem Sci, 2023, 14(23): 6467-6475. |
| [40] | Zhou ZY, Zhang L, Yu YX, et al. Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning [J]. Nat Commun, 2024, 15: 5566. |
| [41] | Zhang L, Luo K, Zhou ZY, et al. A deep retrieval-enhanced meta-learning framework for enzyme optimum pH prediction [J]. J Chem Inf Model, 2025, 65(7): 3761-3770. |
| [42] | Patsch D, Buller R. Improving enzyme fitness with machine learning [J]. Chimia, 2023, 77(3): 116. |
| [43] | Wei SZ, Chen ZY, Arumugasamy SK, et al. Data augmentation and machine learning techniques for control strategy development in bio-polymerization process [J]. Environ Sci Ecotechnol, 2022, 11: 100172. |
| [44] | Xie XZ, Valiente PA, Kim PM. HelixGAN a deep-learning methodology for conditional de novo design of α-helix structures [J]. Bioinformatics, 2023, 39(1): btad036. |
| [45] | 徐沛,汪卫华,宁洪伟, 等. 人工智能辅助的酶分子改造应用进展[J]. 生物工程学报, 2024, 40(6): 1728-1741. |
| Xu P, Wang W, Ning H, et al. Progress in the application of artificial intelligence-assisted molecular modification of enzymes [J]. Chinese Journal of Biotechnology, 2024, 40(6): 1728-1741. | |
| [46] | Biswas S, Khimulya G, Alley EC, et al. Low-N protein engineering with data-efficient deep learning [J]. Nat Meth, 2021, 18(4): 389-396. |
| [47] | Shroff R, Cole AW, Diaz DJ, et al. Discovery of novel gain-of-function mutations guided by structure-based deep learning [J]. ACS Synth Biol, 2020, 9(11): 2927-2935. |
| [48] | Brandes N, Ofer D, Peleg Y, et al. ProteinBERT: a universal deep-learning model of protein sequence and function [J]. Bioinformatics, 2022, 38(8): 2102-2110. |
| [49] | Alexander Rives JM. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences [J]. Proc Natl Acad Sci U S A, 2021, 118(15): 1-12. |
| [50] | Rao RM, Liu J, Verkuil R, et al. MSA transformer [C]//Cold Spring Harbor: Cold Spring Harbor Laboratory Press, 2021. |
| [51] | Language models enable zero-shot prediction of the effects of mutations on protein function [C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. New York: ACM, 2021: 29287-29303. |
| [52] | Lin ZM, Akin H, Rao R, et al. Evolutionary-scale prediction of atomic-level protein structure with a language model [J]. Science, 2023, 379(6637): 1123-1130. |
| [53] | Lu HY, Diaz DJ, Czarnecki NJ, et al. Machine learning-aided engineering of hydrolases for PET depolymerization [J]. Nature, 2022, 604(7907): 662-667. |
| [54] | Vasina M, Vanacek P, Hon J, et al. Advanced database mining of efficient haloalkane dehalogenases by sequence and structure bioinformatics and microfluidics [J]. Chem Catal, 2022, 2(10): 2704-2725. |
| [55] | Hu RY, Fu LH, Chen YC, et al. Protein engineering via Bayesian optimization-guided evolutionary algorithm and robotic experiments [J]. Brief Bioinform, 2023, 24(1): bbac570. |
| [56] | Adhikari N, Ayyannan SR. Development and validation of machine learning models for the prediction of SH-2 containing protein tyrosine phosphatase 2 inhibitors [J]. Mol Divers, 2024, 28(4): 1889-1905. |
| [57] | Xia S, Chen E, Zhang YK. Integrated molecular modeling and machine learning for drug design [J]. J Chem Theory Comput, 2023, 19(21): 7478-7495. |
| [58] | Zhang ZH, Li ZX, Yang ML, et al. Machine learning-guided multi-site combinatorial mutagenesis enhances the thermostability of pectin lyase [J]. Int J Biol Macromol, 2024, 277: 134530. |
| [59] | Bian JH, Tan P, Nie T, et al. Optimizing enzyme thermostability by combining multiple mutations using protein language model [J]. mLife, 2024, 3(4): 492-504. |
| [60] | Cui YL, Chen YC, Sun JY, et al. Computational redesign of a hydrolase for nearly complete PET depolymerization at industrially relevant high-solids loading [J]. Nat Commun, 2024, 15: 1417. |
| [61] | Lauko A, Pellock SJ, Sumida KH, et al. Computational design of serine hydrolases [J]. Science, 2025, 388(6744): eadu2454. |
| [62] | Wang JW, Ouyang XY, Meng SY, et al. Rational multienzyme architecture design with iMARS [J]. Cell, 2025, 188(5): 1349-1362.e17. |
| [63] | Lutz ID, Wang SZ, Norn C, et al. Top-down design of protein architectures with reinforcement learning [J]. Science, 2023, 380(6642): 266-273. |
| [64] | Offline reinforcement learning as one big sequence modeling problem [C]//Proceedings of the 35th International Conference on Neural Information Processing Systems. New York: ACM, 2021: 1273-1286. |
| [65] | Wang CR, Chen Y, Zhang Y, et al. A reinforcement learning approach for protein-ligand binding pose prediction [J]. BMC Bioinform, 2022, 23(1): 368. |
| [66] | Jha K, Saha S, Singh H. Prediction of protein-protein interaction using graph neural networks [J]. Sci Rep, 2022, 12: 8360. |
| [67] | Khan S, Noor S, Awan HH, et al. Deep-ProBind: binding protein prediction with transformer-based deep learning model [J]. BMC Bioinform, 2025, 26(1): 88. |
| [68] | Wang T, Xiang GM, He SW, et al. DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures [J]. Brief Bioinform, 2024, 25(5): bbae409. |
| [69] | Li G, Zhang N, Dai XW, et al. EnzyACT: a novel deep learning method to predict the impacts of single and multiple mutations on enzyme activity [J]. J Chem Inf Model, 2024, 64(15): 5912-5921. |
| [70] | Jiang Y, Ran XC, Yang ZJ. Data-driven enzyme engineering to identify function-enhancing enzymes [J]. Protein Eng Des Sel, 2023, 36: gzac009. |
| [71] | Wang XR, Yin XD, Jiang DJ, et al. Multi-modal deep learning enables efficient and accurate annotation of enzymatic active sites [J]. Nat Commun, 2024, 15: 7348. |
| [72] | Abdine H, Chatzianastasis M, Bouyioukos C, et al. Prot2Text: multimodal protein’s function generation with GNNs and transformers [J]. Proc AAAI Conf Artif Intell, 2024, 38(10): 10757-10765. |
| [73] | Schlichtkrull M, Kipf TN, Bloem P, et al. Modeling relational data with graph convolutional networks [C]//The Semantic Web. New York: ACM, 2018: 593-607. |
| [74] | Ahern W, Yim J, Tischer D, et al. Atom level enzyme active site scaffolding using RFdiffusion2[J]. bioRxiv, 2025: 2025.04. 09. 648075. |
| [75] | Kim D, Noh MH, Park M, et al. Enzyme activity engineering based on sequence co-evolution analysis [J]. Metab Eng, 2022, 74: 49-60. |
| [76] | Hawkins-Hooker A, Depardieu F, Baur S, et al. Generating functional protein variants with variational autoencoders [J]. PLoS Comput Biol, 2021, 17(2): e1008736. |
| [77] | Repecka D, Jauniskis V, Karpus L, et al. Expanding functional protein sequence spaces using generative adversarial networks [J]. Nat Mach Intell, 2021, 3(4): 324-333. |
| [78] | Sevgen E, Moller J, Lange A, et al. ProT-VAE: protein transformer variational autoencoder for functional protein design [J]. bioRxiv, 2023. DOI:10.1101/2023.01.23.525232 |
| [79] | Ingraham JB, Baranov M, Costello Z, et al. Illuminating protein space with a programmable generative model [J]. Nature, 2023, 623(7989): 1070-1078. |
| [80] | Praljak N, Lian XR, Ranganathan R, et al. ProtWave-VAE: integrating autoregressive sampling with latent-based inference for data-driven protein design [J]. ACS Synth Biol, 2023, 12(12): 3544-3561. |
| [81] | Vázquez Torres S, Leung PJY, Venkatesh P, et al. De novo design of high-affinity binders of bioactive helical peptides [J]. Nature, 2024, 626(7998): 435-442. |
| [82] | Nana Teukam YG, Zipoli F, Laino T, et al. Integrating genetic algorithms and language models for enhanced enzyme design [J]. Brief Bioinform, 2024, 26(1): bbae675. |
| [83] | Boob AG, Tan SI, Zaidi A, et al. Design of diverse, functional mitochondrial targeting sequences across eukaryotic organisms using variational autoencoder [J]. Nat Commun, 2025, 16: 4151. |
| [84] | Ingraham J, Garg VK, Barzilay R, et al. Generative models for graph-based protein design [J]. 2019: 15820-15831. |
| [85] | Dauparas J, Anishchenko I, Bennett N, et al. Robust deep learning-based protein sequence design using ProteinMPNN [J]. Science, 2022, 378(6615): 49-56. |
| [86] | Yang JY, Anishchenko I, Park H, et al. Improved protein structure prediction using predicted interresidue orientations [J]. Proc Natl Acad Sci U S A, 2020, 117(3): 1496-1503. |
| [87] | Hansen AL, Theisen FF, Crehuet R, et al. Carving out a glycoside hydrolase active site for incorporation into a new protein scaffold using deep network hallucination [J]. ACS Synth Biol, 2024, 13(3): 862-875. |
| [88] | Baek M, DiMaio F, Anishchenko I, et al. Accurate prediction of protein structures and interactions using a three-track neural network [J]. Science, 2021, 373(6557): 871-876. |
| [89] | Joho Y, Royan S, Caputo AT, et al. Enhancing PET degrading enzymes: a combinatory approach [J]. ChemBioChem, 2024, 25(10): e202400084. |
| [90] | Xi Y, Ye LD, Yu HW. Enhanced thermal and alkaline stability of L-lysine decarboxylase CadA by combining directed evolution and computation-guided virtual screening [J]. Bioresour Bioprocess, 2022, 9(1): 24. |
| [91] | Scherer M, Fleishman SJ, Jones PR, et al. Computational enzyme engineering pipelines for optimized production of renewable chemicals [J]. Front Bioeng Biotechnol, 2021, 9: 673005. |
| [92] | Srinivas N, Krause A, Kakade S M, et al. Gaussian process optimization in the bandit setting: No regret and experimental design [J]. Cornell University Library, 2010. DOI: 10.48550/arxiv.0912.3995 |
| [93] | Fenoy E, Edera AA, Stegmayer G. Transfer learning in proteins: evaluating novel protein learned representations for bioinformatics tasks [J]. Brief Bioinform, 2022, 23(4): bbac232. |
| [94] | Pachter R, Wang ZQ. Adaptive simulated annealing and its application to protein folding [M]//Encyclopedia of Optimization. Cham: Springer Nature Switzerland, 2024: 1-6. |
| [95] | Narayanan H, Dingfelder F, Butté A, et al. Machine learning for biologics: opportunities for protein engineering, developability, and formulation [J]. Trends Pharmacol Sci, 2021, 42(3): 151-165. |
| [96] | Ge FG, Gao YH, Jiang YJ, et al. Design and performance analysis of multi-enzyme activity-doped nanozymes assisted by machine learning [J]. Colloids Surf B Biointerfaces, 2025, 248: 114468. |
| [97] | Ding K, Chin M, Zhao YL, et al. Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering [J]. Nat Commun, 2024, 15: 6392. |
| [98] | Zimmerman L, Alon N, Levin I, et al. Context-dependent design of induced-fit enzymes using deep learning generates well-expressed, thermally stable and active enzymes [J]. Proc Natl Acad Sci U S A, 2024, 121(11): e2313809121. |
| [99] | Xu YY, Zhao XJ, Song XZ, et al. Boosting protein language models with negative sample mining [M]//Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. Cham: Springer Nature Switzerland, 2024: 199-214. |
| [100] | Chatterjee A, Ravandi B, Haddadi P, et al. Topology-driven negative sampling enhances generalizability in protein-protein interaction prediction [J]. Bioinformatics, 2025, 41(5): btaf148. |
| [101] | Niazi SK. Protein catalysis through structural dynamics: a comprehensive analysis of energy conversion in enzymatic systems and its computational limitations [J]. Pharmaceuticals, 2025, 18(7): 951. |
| [102] | Qin YM, Chen ZH, Peng Y, et al. Deep learning methods for protein structure prediction [J]. MedComm, 2024, 3(3): e96. |
| [103] | Xie WJ, Warshel A. Harnessing generative AI to decode enzyme catalysis and evolution for enhanced engineering [J]. bioRxiv, 2023: 2023.10.10.561808. |
| [1] | CHEN Qiang, YU Ying-fei, ZHANG Ying, ZHANG Chong. Regulatory Effect of Methyl Jasmonate on Postharvest Chilling Injury in Oriental Melon ‘Emerald’ [J]. Biotechnology Bulletin, 2025, 41(9): 105-114. |
| [2] | CAI Ru-feng, YANG Yu-xuan, YU Ji-zheng, LI Jia-nan. Artificial Intelligence Transforms Protein Engineering: From Structural Analysis to Synthetic Biology through Algorithmic Advancements [J]. Biotechnology Bulletin, 2025, 41(8): 1-10. |
| [3] | WANG Hui, FAN Ling-xi, SUN Ji-lu, WANG Yuan, WU Ning-feng, TIAN Jian, GUAN Fei-fei. Enhancing the Thermostability of Lysozyme RPL187 Based on Protein Intelligence Models [J]. Biotechnology Bulletin, 2025, 41(7): 336-346. |
| [4] | HE Yuan, MOU Qiang, HE Yu-bing, ZHAO Xiao-yan, WANG Jian, ZHOU Guo-min, ZHANG Jian-hua. Advances in Protein Mining and Design Based on Artificial Intelligence [J]. Biotechnology Bulletin, 2025, 41(10): 143-155. |
| [5] | YUAN Liu-jiao, HUANG Wen-lin, CHEN Chong-zhi, LIANG Min, HUANG Zi-qi, CHEN Xue-xue, CHEN Ri-Meng, WANG Li-yun. Effects of Salt Stress on Physiological Characteristics, Ultrastructure and Medicinal Components of Pogostemon cablin Leaves [J]. Biotechnology Bulletin, 2025, 41(1): 230-239. |
| [6] | JI Hong-chao, LI Zheng-yan. Research Progress and Prospects in the Structural Annotation of Unknown Secondary Metabolites Based on Mass Spectrometry [J]. Biotechnology Bulletin, 2024, 40(10): 76-85. |
| [7] | ZHANG Yan-feng, YE Li-dan, YU Hong-wei. Redox Partner Engineering: A Solution to the Low Catalytic Efficiency of P450s [J]. Biotechnology Bulletin, 2023, 39(4): 10-23. |
| [8] | ZHU Jin-cheng, YANG Yang, LOU Hui, ZHANG Wei. Regulation of Fusarium wilt Resistance in Cotton by Exogenous Melatonin [J]. Biotechnology Bulletin, 2023, 39(1): 243-252. |
| [9] | ZHOU Zheng, LI Qing, CHEN Wan-sheng, ZHANG Lei. Research Strategies of Natural Products Biosynthesis Pathways and Key Enzymes in Medicinal Plants [J]. Biotechnology Bulletin, 2021, 37(8): 25-34. |
| [10] | GU Yang, TAN Hai, YUAN Lin-na, SUN Hai-yan, CHANG Jing-ling, LI Zhi-gang. Physiological Mechanisms for Enhanced Cyclic Adenosine Monophosphate Biosynthesis by Sodium Fluoride in Arthrobacter sp. [J]. Biotechnology Bulletin, 2021, 37(5): 108-116. |
| [11] | LI Ya-nan, YU Li-hong, CHEN Xin-mei, YANG Hao-meng, HUANG Huo-qing. Expression and Characterization of Aquatic Neutral Phytase Gene from Penicillium sp. C1 in Pichia pastoris [J]. Biotechnology Bulletin, 2020, 36(2): 134-141. |
| [12] | MENG Wen-ting, WANG Tian-tian, ZHAO Xue-lin, ZHU Lin. Effects of Different Slope Positions on Soil Moisture and Physiological Indicators of Artemisia ordosica Root Zone in the Mu Us Sandy Land [J]. Biotechnology Bulletin, 2019, 35(12): 57-63. |
| [13] | WANG Ya-ru ,LIANG Xiao ,WU Chun-ling ,CHEN Qing ,ZHAO Hui-ping. Activity Variations of Protective Enzymes in Paracoccus marginatus After Fed Different Cassava Cultivars [J]. Biotechnology Bulletin, 2018, 34(6): 115-119. |
| [14] | JIANG Si-yuan, CHOU Tian-sheng, LI Xiao, HUANG Rong-mei, XIE Bao-gui. Preliminary Study on the Function of Fv-Afe1 Gene in Flammulina velutipes [J]. Biotechnology Bulletin, 2018, 34(3): 230-234. |
| [15] | DENG Chang-zhe, AN Fei-fei, LI Kai-mian, CHEN Song-bi. Effects of ABA and Its Synthesis Inhibitor Sodium Tungstate on Carotenoid Associated Genes and Enzymes of Cassava Tuber Root [J]. Biotechnology Bulletin, 2017, 33(11): 76-83. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||