生物技术通报 ›› 2023, Vol. 39 ›› Issue (4): 38-48.doi: 10.13560/j.cnki.biotech.bull.1985.2022-0724
王慕镪1(), 陈琦1(), 马薇1, 李春秀1, 欧阳鹏飞2, 许建和1()
收稿日期:
2022-06-16
出版日期:
2023-04-26
发布日期:
2023-05-16
通讯作者:
陈琦,女,博士,副教授,研究方向:计算生物学、生物催化;E-mail: chenq@ecust.edu.cn;作者简介:
王慕镪,男,硕士研究生,研究方向:生物化工;E-mail: y30210500@mail.ecust.edu.cn
基金资助:
WANG Mu-qiang1(), CHEN Qi1(), MA Wei1, LI Chun-xiu1, OUYANG Peng-fei2, XU Jian-he1()
Received:
2022-06-16
Published:
2023-04-26
Online:
2023-05-16
摘要:
定向进化法通过模拟自然界的进化过程,可提高酶的进化速度,成为酶分子改造的关键技术。定向进化在生物催化以及药物设计等方面发挥着重要作用,但因突变的随机性所产生的数量庞大的突变体,使得实验筛选的能力面临巨大挑战。近年来,人工智能、大数据处理等新兴技术也发展成为生物催化领域的重要研究手段。其中,机器学习是一种统计学习的方法,通过数据驱动的方式获得序列/结构到酶功能的映射,为提高酶分子工程的效率提供帮助。本文综述了机器学习模型中所涉及的数据处理、描述符和算法等内容,重点叙述了机器学习方法在酶工程方面的研究与应用进展。随着机器学习算法和应用技术的进步,有望提出更加精准和有效的模型,助力新酶筛选与生物催化剂的精准设计改造。
王慕镪, 陈琦, 马薇, 李春秀, 欧阳鹏飞, 许建和. 机器学习方法在酶定向进化中的应用进展[J]. 生物技术通报, 2023, 39(4): 38-48.
WANG Mu-qiang, CHEN Qi, MA Wei, LI Chun-xiu, OUYANG Peng-fei, XU Jian-he. Advances in the Application of Machine Learning Methods for Directed Evolution of Enzymes[J]. Biotechnology Bulletin, 2023, 39(4): 38-48.
类别 Type | 编码方法/信息 Encoding methods/Information | 描述符 Descriptors | 特征信息 Feature information | 参考文献 Reference |
---|---|---|---|---|
基于序列 的描述符 | 单热编码 | 识别 Identity | 表示残基位置 | [ |
同源信息 | 位置特异性得分矩阵 PSSM | 序列的同源信息 | [ | |
理化性质 | 一维深度卷积神经网络 DeepSF | 序列描述符、序列位置、二级结构和溶剂可及性 | [ | |
几何描述符 Geometric descriptors | 转角密度和残基距离密度直方图 | [ | ||
联合三元组描述符 Conjoint triad descriptors | K间隔残基对和联合三联体 | [ | ||
空间和化学特征 Spatial and chemical features | 三维网络蛋白配体结构信息 | [ | ||
Protr package中的氨基酸组成描述符 | 氨基酸的含量 | [ | ||
谱核函数 Spectrum kernel | 远端残基间同源性 | [ | ||
z-标度 zScales | 氨基酸的理化特性 | [ | ||
主成分分析疏水、立体和电子性质矢量 VHSE | 基于疏水特性、立体特性和电子特性的主成分分析降维得到的信息 | [ | ||
sScales | 基于AAindex的理化特征 | [ | ||
ProFET | 基于文献检索选择AAindex理化特征 | [ | ||
拓扑标度 T-scale | 拓扑结构特征 | [ | ||
结构拓扑标度 ST-scale | 拓扑结构特征 | [ | ||
蛋白质指纹图谱 ProtFP | 氨基酸的理化性质 | [ | ||
AAindex | 蛋白质属性 | [ | ||
隐藏信息 | UniRep | 通过神经网络模型自动提取序列特征 | [ | |
基于结构 的描述符 | 理化性质 | sPairs | 基于氨基酸对接触图和AAindex二维描述符 | [ |
单热编码 | 残基-残基接触图Residue-residue contact map | 同一家族两个蛋白之间的结构距离 | [ | |
嵌入式 描述符 | 隐藏信息 | ProtVec | 基于三联氨基酸产生的突变信息 | [ |
突变描述符 | 表示突变方式 | MutInd | 使用0或1表示相应的突变是否发生在突变体序列中 | [ |
表1 机器学习用于指导酶分子设计时常用的描述符
Table 1 Descriptors used in machine learning-guided enzyme design
类别 Type | 编码方法/信息 Encoding methods/Information | 描述符 Descriptors | 特征信息 Feature information | 参考文献 Reference |
---|---|---|---|---|
基于序列 的描述符 | 单热编码 | 识别 Identity | 表示残基位置 | [ |
同源信息 | 位置特异性得分矩阵 PSSM | 序列的同源信息 | [ | |
理化性质 | 一维深度卷积神经网络 DeepSF | 序列描述符、序列位置、二级结构和溶剂可及性 | [ | |
几何描述符 Geometric descriptors | 转角密度和残基距离密度直方图 | [ | ||
联合三元组描述符 Conjoint triad descriptors | K间隔残基对和联合三联体 | [ | ||
空间和化学特征 Spatial and chemical features | 三维网络蛋白配体结构信息 | [ | ||
Protr package中的氨基酸组成描述符 | 氨基酸的含量 | [ | ||
谱核函数 Spectrum kernel | 远端残基间同源性 | [ | ||
z-标度 zScales | 氨基酸的理化特性 | [ | ||
主成分分析疏水、立体和电子性质矢量 VHSE | 基于疏水特性、立体特性和电子特性的主成分分析降维得到的信息 | [ | ||
sScales | 基于AAindex的理化特征 | [ | ||
ProFET | 基于文献检索选择AAindex理化特征 | [ | ||
拓扑标度 T-scale | 拓扑结构特征 | [ | ||
结构拓扑标度 ST-scale | 拓扑结构特征 | [ | ||
蛋白质指纹图谱 ProtFP | 氨基酸的理化性质 | [ | ||
AAindex | 蛋白质属性 | [ | ||
隐藏信息 | UniRep | 通过神经网络模型自动提取序列特征 | [ | |
基于结构 的描述符 | 理化性质 | sPairs | 基于氨基酸对接触图和AAindex二维描述符 | [ |
单热编码 | 残基-残基接触图Residue-residue contact map | 同一家族两个蛋白之间的结构距离 | [ | |
嵌入式 描述符 | 隐藏信息 | ProtVec | 基于三联氨基酸产生的突变信息 | [ |
突变描述符 | 表示突变方式 | MutInd | 使用0或1表示相应的突变是否发生在突变体序列中 | [ |
类别 Type | 算法/方法 Algorithm/Method | 特征 Feature | 文献 Reference |
---|---|---|---|
经典机器学习 | 贝叶斯算法 | 为变量之间多种关系建模 | [ |
高斯过程 GP | 基于数据对所有可能的模型输出的概率进行统计,根据概率分布情况输出预测结果,使用了协方差确定了数据与数据之间的关系 | [ | |
K近邻 KNN | 基于数据-标签关系,比较新数据与旧数据之间的特征距离,提取特征最相似的数据 | [ | |
支持向量机 SVM | 基于核函数的算法,可以通过升维将原来线性不可分的关系转变为线性可分的关系 | [ | |
决策树 DTs | 对输入数据进行分类或预测 | [ | |
随机森林 RF | 一种Bagging方法,每个分类器的数据采集和特征选择数量一致且随机 | [ | |
AdaBoost | 一种Boosting方法,将弱分类器融合形成一个强分类器 | [ | |
Stacking | 聚合使用多个分类器进行第一轮训练,将输出作为第二轮输入,确定某一个分类器用于训练输出结果 | [ | |
主成分分析 PCA | 基于无监督学习将原始数据映射到新的特征空间以提取特征 | [ | |
深度学习 | 深度神经网络 DNN | 具有多个隐藏层的ANN | [ |
前馈神经网络 FNN | 神经网络结构中不含循环结构 | [ | |
循环神经网络 RNN | 识别序列的上下文关系并建模 | [ | |
卷积神经网络 CNN | 输入数据为图像或类图像形式 | [ | |
对抗生成网络 GAN | 同时训练两个独立的竞争网络 | [ |
表2 机器学习的常用算法
Table 2 Algorithms of machine learning
类别 Type | 算法/方法 Algorithm/Method | 特征 Feature | 文献 Reference |
---|---|---|---|
经典机器学习 | 贝叶斯算法 | 为变量之间多种关系建模 | [ |
高斯过程 GP | 基于数据对所有可能的模型输出的概率进行统计,根据概率分布情况输出预测结果,使用了协方差确定了数据与数据之间的关系 | [ | |
K近邻 KNN | 基于数据-标签关系,比较新数据与旧数据之间的特征距离,提取特征最相似的数据 | [ | |
支持向量机 SVM | 基于核函数的算法,可以通过升维将原来线性不可分的关系转变为线性可分的关系 | [ | |
决策树 DTs | 对输入数据进行分类或预测 | [ | |
随机森林 RF | 一种Bagging方法,每个分类器的数据采集和特征选择数量一致且随机 | [ | |
AdaBoost | 一种Boosting方法,将弱分类器融合形成一个强分类器 | [ | |
Stacking | 聚合使用多个分类器进行第一轮训练,将输出作为第二轮输入,确定某一个分类器用于训练输出结果 | [ | |
主成分分析 PCA | 基于无监督学习将原始数据映射到新的特征空间以提取特征 | [ | |
深度学习 | 深度神经网络 DNN | 具有多个隐藏层的ANN | [ |
前馈神经网络 FNN | 神经网络结构中不含循环结构 | [ | |
循环神经网络 RNN | 识别序列的上下文关系并建模 | [ | |
卷积神经网络 CNN | 输入数据为图像或类图像形式 | [ | |
对抗生成网络 GAN | 同时训练两个独立的竞争网络 | [ |
模型 Model | 任务 Task | 机器学习算法 Machine learning algorithm | 输入类型 Input type | 应用 Application | 文献 Reference |
---|---|---|---|---|---|
- | 酶的功能 | RF | 分子 | 乙酰胆碱酯酶抑制剂与非抑制剂的鉴别 | [ |
CWLy-SVM | 酶的分类 | SVM | 序列 | 鉴定细胞壁催化酶 | [ |
SVR | 酶的功能 | SVM | 序列 | 改善酶的活力与溶解度 | [ |
GPR/GNB | 酶的功能 | GP | 结构 | 改善脂肪酰基还原酶的活力 | [ |
- | 酶的分类 | HMM/RF/LR/KNN/SVM/RF | 序列 | 分类第七家族糖苷水解酶中的CBH和EG | [ |
Innov'SAR | 酶的功能 | PLSR | 序列 | 找到提高活性的最佳突变组合 | [ |
- | 酶的功能 | LR | 结构 | 测定底物-酶对的反应活性 | [ |
SoluProt | 酶的功能 | RF | 序列 | 预测酶在大肠杆菌表达系统中的溶解性 | [ |
ProSAR | 酶的功能 | PLSR | 序列 | 提高卤醇脱卤酶的活力 | [ |
TOME | 酶的功能 | RF | 序列 | 预测最适温度 | [ |
PREvaIL | 酶的催化残基 | RF | 序列和结构 | 预测酶的催化残基的方法 | [ |
- | 酶的功能 | GP | 序列 | 改造绿色荧光蛋白的荧光性 | [ |
- | 酶的功能 | GP | 结构 | 改造细胞色素P450的稳定性 | [ |
- | 酶的催化残基 | CNN | 结构 | 预测酶的催化残基的框架 | [ |
SolventNet | 酶的功能 | CNN | 结构 | 酸、催化剂和溶剂对水解速率的影响 | [ |
DeepSol | 酶的溶解性 | ANN | 序列 | 预测蛋白质的溶解性 | [ |
- | 酶的功能 | CNN | 结构 | 优化PET水解酶的催化能力和耐受性 | [ |
表3 机器学习在酶工程的应用
Table 3 Application of machine learning in enzyme engineering
模型 Model | 任务 Task | 机器学习算法 Machine learning algorithm | 输入类型 Input type | 应用 Application | 文献 Reference |
---|---|---|---|---|---|
- | 酶的功能 | RF | 分子 | 乙酰胆碱酯酶抑制剂与非抑制剂的鉴别 | [ |
CWLy-SVM | 酶的分类 | SVM | 序列 | 鉴定细胞壁催化酶 | [ |
SVR | 酶的功能 | SVM | 序列 | 改善酶的活力与溶解度 | [ |
GPR/GNB | 酶的功能 | GP | 结构 | 改善脂肪酰基还原酶的活力 | [ |
- | 酶的分类 | HMM/RF/LR/KNN/SVM/RF | 序列 | 分类第七家族糖苷水解酶中的CBH和EG | [ |
Innov'SAR | 酶的功能 | PLSR | 序列 | 找到提高活性的最佳突变组合 | [ |
- | 酶的功能 | LR | 结构 | 测定底物-酶对的反应活性 | [ |
SoluProt | 酶的功能 | RF | 序列 | 预测酶在大肠杆菌表达系统中的溶解性 | [ |
ProSAR | 酶的功能 | PLSR | 序列 | 提高卤醇脱卤酶的活力 | [ |
TOME | 酶的功能 | RF | 序列 | 预测最适温度 | [ |
PREvaIL | 酶的催化残基 | RF | 序列和结构 | 预测酶的催化残基的方法 | [ |
- | 酶的功能 | GP | 序列 | 改造绿色荧光蛋白的荧光性 | [ |
- | 酶的功能 | GP | 结构 | 改造细胞色素P450的稳定性 | [ |
- | 酶的催化残基 | CNN | 结构 | 预测酶的催化残基的框架 | [ |
SolventNet | 酶的功能 | CNN | 结构 | 酸、催化剂和溶剂对水解速率的影响 | [ |
DeepSol | 酶的溶解性 | ANN | 序列 | 预测蛋白质的溶解性 | [ |
- | 酶的功能 | CNN | 结构 | 优化PET水解酶的催化能力和耐受性 | [ |
[1] |
Chen K, Arnold FH. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide[J]. Proc Natl Acad Sci USA, 1993, 90(12): 5618-5622.
pmid: 8516309 |
[2] | Tang CD, Zhang ZH, Shi HL, et al. Directed evolution of formate dehydrogenase and its application in the biosynthesis of L-phenylglycine from phenylglyoxylic acid[J]. Mol Catal, 2021, 513: 111666. |
[3] |
Fox RJ, Davis SC, Mundorff EC, et al. Improving catalytic function by ProSAR-driven enzyme evolution[J]. Nat Biotechnol, 2007, 25(3): 338-344.
pmid: 17322872 |
[4] |
Reetz MT. The importance of additive and non-additive mutational effects in protein engineering[J]. Angewandte Chemie Int Ed, 2013, 52(10): 2658-2666.
doi: 10.1002/anie.201207842 URL |
[5] |
Greenhalgh JC, Fahlberg SA, Pfleger BF, et al. Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production[J]. Nat Commun, 2021, 12(1): 5825.
doi: 10.1038/s41467-021-25831-w pmid: 34611172 |
[6] |
Miton CM, Tokuriki N. How mutational epistasis impairs predictability in protein evolution and design[J]. Protein Sci, 2016, 25(7): 1260-1272.
doi: 10.1002/pro.2876 pmid: 26757214 |
[7] |
Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution[J]. Nat Rev Mol Cell Biol, 2009, 10(12): 866-876.
doi: 10.1038/nrm2805 |
[8] |
Ma EJ, Siirola E, Moore C, et al. Machine-directed evolution of an imine reductase for activity and stereoselectivity[J]. ACS Catal, 2021, 11(20): 12433-12445.
doi: 10.1021/acscatal.1c02786 URL |
[9] |
Gado JE, Harrison BE, Sandgren M, et al. Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases[J]. J Biol Chem, 2021, 297(2): 100931.
doi: 10.1016/j.jbc.2021.100931 URL |
[10] |
Ostafe R, Fontaine N, Frank D, et al. One-shot optimization of multiple enzyme parameters: Tailoring glucose oxidase for pH and electron mediators[J]. Biotechnol Bioeng, 2020, 117(1): 17-29.
doi: 10.1002/bit.27169 pmid: 31520472 |
[11] | Peng M, de Vries RP. Machine learning prediction of novel pectinolytic enzymes in Aspergillus niger through integrating heterogeneous(post-)genomics data[J]. Microb Genom, 2021, 7(12): 000674. |
[12] |
Wu Z, Kan SBJ, Lewis RD, et al. Machine learning-assisted directed protein evolution with combinatorial libraries[J]. Proc Natl Acad Sci USA, 2019, 116(18): 8852-8858.
doi: 10.1073/pnas.1901979116 pmid: 30979809 |
[13] | Li GY, Dong YJ, Reetz MT. Can machine learning revolutionize directed evolution of selective enzymes?[J]. Adv Synth Catal, 2019, 361(11): 2377-2386. |
[14] | Baştanlar Y, Özuysal M. Introduction to machine learning[M]// miRNomics:microRNA biology and computational analysis. Totowa, NJ: Humana Press, 2013: 105-128. |
[15] | 蒋迎迎, 曲戈, 孙周通. 机器学习助力酶定向进化[J]. 生物学杂志, 2020, 37(4): 1-11. |
Jiang YY, Qu G, Sun ZT. Machine learning-assisted enzyme directed evolution[J]. J Biol, 2020, 37(4): 1-11. | |
[16] |
Sikander R, Wang YP, Ghulam A, et al. Identification of enzymes-specific protein domain based on DDE, and convolutional neural network[J]. Front Genet, 2021, 12: 759384.
doi: 10.3389/fgene.2021.759384 URL |
[17] |
Jing XY, Li FM. Predicting cell wall lytic enzymes using combined features[J]. Front Bioeng Biotechnol, 2021, 8: 627335.
doi: 10.3389/fbioe.2020.627335 URL |
[18] |
Wan ZY, Wang QD, Liu DC, et al. Accelerating the optimization of enzyme-catalyzed synthesis conditions via machine learning and reactivity descriptors[J]. Org Biomol Chem, 2021, 19(28): 6267-6273.
doi: 10.1039/D1OB01066B URL |
[19] |
Kirk PDW, Stumpf MPH. Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data[J]. Bioinformatics, 2009, 25(10): 1300-1306.
doi: 10.1093/bioinformatics/btp139 pmid: 19289448 |
[20] | Romero PA, Krause A, Arnold FH. Navigating the protein fitness landscape with Gaussian processes[J]. Proc Natl Acad Sci USA, 2013, 110(3): E193-E201. |
[21] | Rasmussen CE, Williams CKI. Gaussian processes for machine learning[M]. Cambridge: The MIT Press, 2005. |
[22] |
Zhang ZH, Schott JA, Liu MM, et al. Prediction of carbon dioxide adsorption via deep learning[J]. Angew Chem Int Ed Engl, 2019, 58(1): 259-263.
doi: 10.1002/anie.201812363 URL |
[23] |
Luo HZ, Gao L, Liu Z, et al. Prediction of phenolic compounds and glucose content from dilute inorganic acid pretreatment of lignocellulosic biomass using artificial neural network modeling[J]. Bioresour Bioprocess, 2021, 8: 134.
doi: 10.1186/s40643-021-00488-x |
[24] |
Saito Y, Oikawa M, Sato T, et al. Machine-learning-guided library design cycle for directed evolution of enzymes: the effects of training data composition on sequence space exploration[J]. ACS Catal, 2021, 11(23): 14615-14624.
doi: 10.1021/acscatal.1c03753 URL |
[25] |
del Rio-Chanona EA, Fiorelli F, Zhang DD, et al. An efficient model construction strategy to simulate microalgal lutein photo-production dynamic process[J]. Biotechnol Bioeng, 2017, 114(11): 2518-2527.
doi: 10.1002/bit.26373 pmid: 28671262 |
[26] |
卞佳豪, 杨广宇. 人工智能辅助的蛋白质工程[J]. 合成生物学, 2022, 3(3): 429-444.
doi: 10.12211/2096-8280.2021-032 |
Bian JH, Yang GY. Artificial intelligence-assisted protein engineering[J]. Synth Biol J, 2022, 3(3): 429-444. | |
[27] |
Xu YT, Verma D, Sheridan RP, et al. Deep dive into machine learning models for protein engineering[J]. J Chem Inf Model, 2020, 60(6): 2773-2790.
doi: 10.1021/acs.jcim.0c00073 pmid: 32250622 |
[28] |
Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering[J]. Nat Methods, 2019, 16(8): 687-694.
doi: 10.1038/s41592-019-0496-6 pmid: 31308553 |
[29] |
Yang KK, Wu Z, Bedbrook CN, et al. Learned protein embeddings for machine learning[J]. Bioinformatics, 2018, 34(15): 2642-2648.
doi: 10.1093/bioinformatics/bty178 pmid: 29584811 |
[30] |
Roy S, Martinez D, Platero H, et al. Exploiting amino acid composition for predicting protein-protein interactions[J]. PLoS One, 2009, 4(11): e7813.
doi: 10.1371/journal.pone.0007813 URL |
[31] |
Wolpert DH. The lack of a priori distinctions between learning algorithms[J]. Neural Comput, 1996, 8(7): 1341-1390.
doi: 10.1162/neco.1996.8.7.1341 URL |
[32] |
van Westen GJ, Swier RF, Wegner JK, et al. Benchmarking of protein descriptor sets in proteochemometric modeling(part 2): comparative study of 13 amino acid descriptor sets[J]. J Cheminform, 2013, 5(1): 41.
doi: 10.1186/1758-2946-5-41 |
[33] |
Hou J, Adhikari B, Cheng JL. DeepSF: deep convolutional neural network for mapping protein sequences to folds[J]. Bioinformatics, 2018, 34(8): 1295-1303.
doi: 10.1093/bioinformatics/btx780 pmid: 29228193 |
[34] |
Zacharaki EI. Prediction of protein function using a deep convolutional neural network ensemble[J]. Peerj Comput Sci, 2017, 3: e124.
doi: 10.7717/peerj-cs.124 URL |
[35] |
White C, Ismail HD, Saigo H, et al. CNN-BLPred: a convolutional neural network based predictor for β-lactamases(BL)and their classes[J]. BMC Bioinformatics, 2017, 18(Suppl 16): 577.
doi: 10.1186/s12859-017-1972-6 URL |
[36] |
Ragoza M, Hochuli J, Idrobo E, et al. Protein-ligand scoring with convolutional neural networks[J]. J Chem Inf Model, 2017, 57(4): 942-957.
doi: 10.1021/acs.jcim.6b00740 pmid: 28368587 |
[37] |
Xiao N, Cao DS, Zhu MF, et al. Protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences[J]. Bioinformatics, 2015, 31(11): 1857-1859.
doi: 10.1093/bioinformatics/btv042 pmid: 25619996 |
[38] |
Ismail HD, Saigo H, Kc DB. RF-NR: random forest based approach for improved classification of nuclear receptors[J]. IEEE/ACM Trans Comput Biol Bioinform, 2018, 15(6): 1844-1852.
doi: 10.1109/TCBB.2017.2773063 pmid: 29990125 |
[39] | Leslie C, Eskin E, Noble WS. The spectrum kernel: a string kernel for SVM protein classification[J]. Pac Symp Biocomput, 2002: 564-575. |
[40] |
Sandberg M, Eriksson L, Jonsson J, et al. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids[J]. J Med Chem, 1998, 41(14): 2481-2491.
pmid: 9651153 |
[41] |
Mei H, Liao ZH, Zhou Y, et al. A new set of amino acid descriptors and its application in peptide QSARs[J]. Biopolymers, 2005, 80(6): 775-786.
pmid: 15895431 |
[42] |
Biou V, Gibrat JF, Levin JM, et al. Secondary structure prediction: combination of three different methods[J]. Protein Eng, 1988, 2(3): 185-191.
pmid: 3237683 |
[43] |
Ofer D, Linial M. ProFET: Feature engineering captures high-level protein functions[J]. Bioinformatics, 2015, 31(21): 3429-3436.
doi: 10.1093/bioinformatics/btv345 pmid: 26130574 |
[44] |
Tian FF, Zhou P, Li ZL. T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides[J]. J Mol Struct, 2007, 830(1/2/3): 106-115.
doi: 10.1016/j.molstruc.2006.07.004 URL |
[45] |
Yang L, Shu M, Ma KW, et al. ST-scale as a novel amino acid descriptor and its application in QSAM of peptides and analogues[J]. Amino Acids, 2010, 38(3): 805-816.
doi: 10.1007/s00726-009-0287-y pmid: 19373543 |
[46] |
van Westen GJ, Swier RF, Wegner JK, et al. Benchmarking of protein descriptor sets in proteochemometric modeling(part 1): comparative study of 13 amino acid descriptor sets[J]. J Cheminform, 2013, 5(1): 41.
doi: 10.1186/1758-2946-5-41 |
[47] |
Kawashima S, Pokarowski P, Pokarowska M, et al. AAindex: amino acid index database, progress report 2008[J]. Nucleic Acids Res, 2008, 36(Database issue): D202-D205.
doi: 10.1093/nar/gkm998 pmid: 17998252 |
[48] |
Alley EC, Khimulya G, Biswas S, et al. Unified rational protein engineering with sequence-based deep representation learning[J]. Nat Methods, 2019, 16(12): 1315-1322.
doi: 10.1038/s41592-019-0598-1 pmid: 31636460 |
[49] |
Tanaka S, Scheraga HA. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins[J]. Macromolecules, 1976, 9(6): 945-950.
pmid: 1004017 |
[50] |
Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics[J]. PLoS One, 2015, 10(11): e0141287.
doi: 10.1371/journal.pone.0141287 URL |
[51] | Jensen FV. An introduction to Bayesian networks[M]. London: UCL press, 1996 |
[52] | Lim S, Lu Y, Cho CY, et al. A review on compound-protein interaction prediction methods: Data, format, representation and model[J]. Comput Struct Biotechnol J, 2021, 19: 1541-1556. |
[53] |
del Rio-Chanona EA, Cong XY, Bradford E, et al. Review of advanced physical and data-driven models for dynamic bioprocess simulation: case study of algae-bacteria consortium wastewater treatment[J]. Biotechnol Bioeng, 2019, 116(2): 342-353.
doi: 10.1002/bit.26881 pmid: 30475404 |
[54] |
Natarajan P, Moghadam R, Jagannathan S. Online deep neural network-based feedback control of a Lutein bioprocess[J]. J Process Control, 2021, 98: 41-51.
doi: 10.1016/j.jprocont.2020.11.011 URL |
[55] |
Kim GB, Kim WJ, Kim HU, et al. Machine learning applications in systems metabolic engineering[J]. Curr Opin Biotechnol, 2020, 64: 1-9.
doi: 10.1016/j.copbio.2019.08.010 URL |
[56] | Wettschereck D, Aha DW, Mohri T. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms[M]// Lazy learning. Dordrecht: Springer Netherlands, 1997: 273-314. |
[57] | Drucker H, Surges CJC, Kaufman L, et al. Support vector regression machines[J]. Adv Neural Inf Process Syst, 1997: 155-161. |
[58] | Quinlan JR. Induction of decision trees[J]. Mach Learn, 1986, 1(1): 81-106. |
[59] |
Li Y, Song K, Zhang J, et al. A computational method to predict effects of residue mutations on the catalytic efficiency of hydrolases[J]. Catalysts, 2021, 11(2): 286.
doi: 10.3390/catal11020286 URL |
[60] | Schapire RE. Explaining adaboost[M]// SchölkopfB, LuoZY, VovkV. Empirical inference. Verlag:Springer, 2013: 37-52. |
[61] |
Wolpert DH. Stacked generalization[J]. Neural Netw, 1992, 5(2): 241-259.
doi: 10.1016/S0893-6080(05)80023-1 URL |
[62] |
Abdi H, Williams LJ. Principal component analysis[J]. WIREs Comp Stat, 2010, 2(4): 433-459.
doi: 10.1002/wics.101 URL |
[63] |
LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
doi: 10.1038/nature14539 |
[64] |
Lohmann R, Schneider G, Behrens D, et al. A neural network model for the prediction of membrane-spanning amino acid sequences[J]. Protein Sci, 1994, 3(9): 1597-1601.
pmid: 7833818 |
[65] |
Rawat W, Wang ZH. Deep convolutional neural networks for image classification: a comprehensive review[J]. Neural Comput, 2017, 29(9): 2352-2449.
doi: 10.1162/NECO_a_00990 pmid: 28599112 |
[66] |
Creswell A, White T, Dumoulin V, et al. Generative adversarial net-works: an overview[J]. IEEE Signal Process Mag, 2018, 35(1): 53-65.
doi: 10.1109/MSP.2017.2765202 URL |
[67] | Auer P. Using confidence bounds for exploitation-exploration trade-offs[J]. J Machine Learning Res, 2002, 3(Nov): 397-422. |
[68] | International Conference on Machine Learning. Proceedings of the Twenty-Ninth International Conference on Machine Learning[C]. Madison, Wis: International Machine Learning Society, 2012. |
[69] |
Endelman JB, Silberg JJ, Wang ZG, et al. Site-directed protein recombination as a shortest-path problem[J]. Protein Eng Des Sel, 2004, 17(7): 589-594.
pmid: 15331774 |
[70] |
Sandhu H, Kumar RN, Garg P. Machine learning-based modeling to predict inhibitors of acetylcholinesterase[J]. Mol Divers, 2022, 26(1): 331-340.
doi: 10.1007/s11030-021-10223-5 |
[71] |
Meng CL, Guo F, Zou Q. CWLy-SVM: a support vector machine-based tool for identifying cell wall lytic enzymes[J]. Comput Biol Chem, 2020, 87: 107304.
doi: 10.1016/j.compbiolchem.2020.107304 URL |
[72] |
Han X, Ning WB, Ma XQ, et al. Improve protein solubility and activity based on machine learning models[J]. bioRxiv, 2019. DOI:10.1101/817890.
doi: 10.1101/817890 |
[73] |
Cadet F, Fontaine N, Li GY, et al. A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes[J]. Sci Rep, 2018, 8(1): 16757.
doi: 10.1038/s41598-018-35033-y pmid: 30425279 |
[74] |
Cadet F, Fontaine N, Vetrivel I, et al. Application of fourier transform and proteochemometrics principles to protein engineering[J]. BMC Bioinformatics, 2018, 19(1): 382.
doi: 10.1186/s12859-018-2407-8 pmid: 30326841 |
[75] |
Bonk BM, Weis JW, Tidor B. Machine learning identifies chemical characteristics that promote enzyme catalysis[J]. J Am Chem Soc, 2019, 141(9): 4108-4118.
doi: 10.1021/jacs.8b13879 pmid: 30761897 |
[76] |
Hon J, Borko S, Stourac J, et al. EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities[J]. Nucleic Acids Res, 2020, 48(W1): W104-W109.
doi: 10.1093/nar/gkaa372 URL |
[77] |
Li G, Rabe KS, Nielsen J, et al. Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima[J]. ACS Synth Biol, 2019, 8(6): 1411-1420.
doi: 10.1021/acssynbio.9b00099 pmid: 31117361 |
[78] |
Song JN, Li FY, Takemoto K, et al. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework[J]. J Theor Biol, 2018, 443: 125-137.
doi: S0022-5193(18)30039-0 pmid: 29408627 |
[79] |
Saito Y, Oikawa M, Nakazawa H, et al. Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins[J]. ACS Synth Biol, 2018, 7(9): 2014-2022.
doi: 10.1021/acssynbio.8b00155 pmid: 30103599 |
[80] |
Torng W, Altman RB. High precision protein functional site detection using 3D convolutional neural networks[J]. Bioinformatics, 2019, 35(9): 1503-1512.
doi: 10.1093/bioinformatics/bty813 pmid: 31051039 |
[81] |
Chew A, Jiang SL, Zhang WQ, et al. Fast predictions of liquid-phase acid-catalyzed reaction rates using molecular dynamics simulations and convolutional neural networks[J]. Chem Sci, 2019, 11: 12464-12476.
doi: 10.1039/D0SC03261A URL |
[82] |
Khurana S, Rawi R, Kunji K, et al. DeepSol: a deep learning framework for sequence-based protein solubility prediction[J]. Bioinformatics, 2018, 34(15): 2605-2613.
doi: 10.1093/bioinformatics/bty166 pmid: 29554211 |
[83] |
Lu HY, Diaz DJ, Czarnecki NJ, et al. Machine learning-aided engineering of hydrolases for PET depolymerization[J]. Nature, 2022, 604(7907): 662-667.
doi: 10.1038/s41586-022-04599-z |
[84] |
Dubey A, Realff MJ, Lee JH, et al. Support vector machines for learning to identify the critical positions of a protein[J]. J Theor Biol, 2005, 234(3): 351-361.
pmid: 15784270 |
[85] |
Cai YC, Yang HB, Li WH, et al. Computational prediction of site of metabolism for UGT-catalyzed reactions[J]. J Chem Inf Model, 2019, 59(3): 1085-1095.
doi: 10.1021/acs.jcim.8b00851 pmid: 30586295 |
[86] | Silberg JJ, Endelman JB, Arnold FH. SCHEMA-guided protein recombination[J]. Methods Enzymol, 2004, 388: 35-42. |
[87] | Srinivas N, Krause A, Kakade SM, et al. Gaussian process optimization in the bandit setting: no regret and experimental design[EB/OL]. 2009: arXiv: 0912.3995[cs.LG]. https://arxiv.org/abs/0912.3995. |
[88] |
Voigt CA, Martinez C, Wang ZG, et al. Protein building blocks preserved by recombination[J]. Nat Struct Biol, 2002, 9(7): 553-558.
pmid: 12042875 |
[89] |
Shroff R, Cole AW, Diaz DJ, et al. Discovery of novel gain-of-function mutations guided by structure-based deep learning[J]. ACS Synth Biol, 2020, 9(11): 2927-2935.
doi: 10.1021/acssynbio.0c00345 pmid: 33064458 |
[90] | Paik I, Ngo PHT, Shroff R, et al. Improved bst DNA polymerase variants derived via a machine learning approach[J]. Biochemistry, 2021. https://doi.org/10.1021/acs.biochem.1c00451. |
[91] |
Kulikova AV, Diaz DJ, Loy JM, et al. Learning the local landscape of protein structures with convolutional neural networks[J]. J Biol Phys, 2021, 47(4): 435-454.
doi: 10.1007/s10867-021-09593-6 pmid: 34751854 |
[92] |
Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589.
doi: 10.1038/s41586-021-03819-2 |
[93] |
Baek M, DiMaio F, Anishchenko I, et al. Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876.
doi: 10.1126/science.abj8754 pmid: 34282049 |
[94] |
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations[J]. Nat Methods, 2018, 15(10): 816-822.
doi: 10.1038/s41592-018-0138-4 pmid: 30250057 |
[1] | 曲戈, 孙周通. 催化混杂性驱动的酶功能重塑[J]. 生物技术通报, 2023, 39(4): 1-9. |
[2] | 郁慧丽, 李爱涛. 细胞色素P450酶在香精香料绿色生物合成中的应用[J]. 生物技术通报, 2023, 39(4): 24-37. |
[3] | 张雪, 谭玉萌, 蒋海霞, 杨广宇. 基于单细胞超高通量筛选的α-1,2-岩藻糖基转移酶定向进化[J]. 生物技术通报, 2022, 38(1): 289-298. |
[4] | 陈春, 宿玲恰, 夏伟, 吴敬. 定向进化提高来源于Arthrobacter ramosus 的MTHase的热稳定性[J]. 生物技术通报, 2021, 37(3): 84-91. |
[5] | 石利霞, 高松枫, 朱蕾蕾. PET水解酶的研究进展[J]. 生物技术通报, 2020, 36(10): 226-236. |
[6] | 王叶, 贾振华, 宋水山. 宏基因组学结合合成生物学法挖掘新型生物催化剂的研究进展[J]. 生物技术通报, 2018, 34(8): 35-42. |
[7] | 任天雷, 杨海泉, 许菲. 基于分子结构与生物信息学等多维度特征的定向进化改造甲基对硫磷水解酶[J]. 生物技术通报, 2018, 34(10): 194-200. |
[8] | 王晓璐, 王钰,刘娇,郑平,路福平. 利用定向进化提高基因工程大肠杆菌的甲醇利用能力[J]. 生物技术通报, 2017, 33(9): 101-109. |
[9] | 郭园, 赵仲麟. 微生物系统定向进化与合成生物学应用研究进展[J]. 生物技术通报, 2017, 33(1): 76-82. |
[10] | 吴树丽, 刘启顺, 谭海东, 张付云, 尹恒. 5-羟甲基糠醛的生物催化氧化研究进展[J]. 生物技术通报, 2016, 32(9): 50-58. |
[11] | 张雪玲,陈小利,李荷. 漆酶Lac1338的酶学特性测定及定向突变对其酶解染料影响[J]. 生物技术通报, 2016, 32(7): 170-177. |
[12] | 吕永坤,堵国成,陈坚,周景文. 合成生物学技术研究进展[J]. 生物技术通报, 2015, 31(4): 134-148. |
[13] | 王玺,段胜林,熊舒莉,郑桂兰,张贵友,王洪钟. 自诱导系统在酶促合成2’-脱氧胞苷中的应用[J]. 生物技术通报, 2014, 0(11): 225-232. |
[14] | 刘瑜,李丕武. 黑曲霉葡萄糖氧化酶高产基因工程菌研究进展[J]. 生物技术通报, 2013, 0(7): 12-19. |
[15] | 邵敏, 李长福, 葛正龙, 周鹤峰. 基于易错PCR技术定向进化枯草芽孢杆菌β-葡聚糖酶[J]. 生物技术通报, 2013, 0(12): 141-145. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||