Biotechnology Bulletin ›› 2023, Vol. 39 ›› Issue (4): 38-48.doi: 10.13560/j.cnki.biotech.bull.1985.2022-0724
Previous Articles Next Articles
WANG Mu-qiang1(), CHEN Qi1(), MA Wei1, LI Chun-xiu1, OUYANG Peng-fei2, XU Jian-he1()
Received:
2022-06-16
Online:
2023-04-26
Published:
2023-05-16
WANG Mu-qiang, CHEN Qi, MA Wei, LI Chun-xiu, OUYANG Peng-fei, XU Jian-he. Advances in the Application of Machine Learning Methods for Directed Evolution of Enzymes[J]. Biotechnology Bulletin, 2023, 39(4): 38-48.
类别 Type | 编码方法/信息 Encoding methods/Information | 描述符 Descriptors | 特征信息 Feature information | 参考文献 Reference |
---|---|---|---|---|
基于序列 的描述符 | 单热编码 | 识别 Identity | 表示残基位置 | [ |
同源信息 | 位置特异性得分矩阵 PSSM | 序列的同源信息 | [ | |
理化性质 | 一维深度卷积神经网络 DeepSF | 序列描述符、序列位置、二级结构和溶剂可及性 | [ | |
几何描述符 Geometric descriptors | 转角密度和残基距离密度直方图 | [ | ||
联合三元组描述符 Conjoint triad descriptors | K间隔残基对和联合三联体 | [ | ||
空间和化学特征 Spatial and chemical features | 三维网络蛋白配体结构信息 | [ | ||
Protr package中的氨基酸组成描述符 | 氨基酸的含量 | [ | ||
谱核函数 Spectrum kernel | 远端残基间同源性 | [ | ||
z-标度 zScales | 氨基酸的理化特性 | [ | ||
主成分分析疏水、立体和电子性质矢量 VHSE | 基于疏水特性、立体特性和电子特性的主成分分析降维得到的信息 | [ | ||
sScales | 基于AAindex的理化特征 | [ | ||
ProFET | 基于文献检索选择AAindex理化特征 | [ | ||
拓扑标度 T-scale | 拓扑结构特征 | [ | ||
结构拓扑标度 ST-scale | 拓扑结构特征 | [ | ||
蛋白质指纹图谱 ProtFP | 氨基酸的理化性质 | [ | ||
AAindex | 蛋白质属性 | [ | ||
隐藏信息 | UniRep | 通过神经网络模型自动提取序列特征 | [ | |
基于结构 的描述符 | 理化性质 | sPairs | 基于氨基酸对接触图和AAindex二维描述符 | [ |
单热编码 | 残基-残基接触图Residue-residue contact map | 同一家族两个蛋白之间的结构距离 | [ | |
嵌入式 描述符 | 隐藏信息 | ProtVec | 基于三联氨基酸产生的突变信息 | [ |
突变描述符 | 表示突变方式 | MutInd | 使用0或1表示相应的突变是否发生在突变体序列中 | [ |
Table 1 Descriptors used in machine learning-guided enzyme design
类别 Type | 编码方法/信息 Encoding methods/Information | 描述符 Descriptors | 特征信息 Feature information | 参考文献 Reference |
---|---|---|---|---|
基于序列 的描述符 | 单热编码 | 识别 Identity | 表示残基位置 | [ |
同源信息 | 位置特异性得分矩阵 PSSM | 序列的同源信息 | [ | |
理化性质 | 一维深度卷积神经网络 DeepSF | 序列描述符、序列位置、二级结构和溶剂可及性 | [ | |
几何描述符 Geometric descriptors | 转角密度和残基距离密度直方图 | [ | ||
联合三元组描述符 Conjoint triad descriptors | K间隔残基对和联合三联体 | [ | ||
空间和化学特征 Spatial and chemical features | 三维网络蛋白配体结构信息 | [ | ||
Protr package中的氨基酸组成描述符 | 氨基酸的含量 | [ | ||
谱核函数 Spectrum kernel | 远端残基间同源性 | [ | ||
z-标度 zScales | 氨基酸的理化特性 | [ | ||
主成分分析疏水、立体和电子性质矢量 VHSE | 基于疏水特性、立体特性和电子特性的主成分分析降维得到的信息 | [ | ||
sScales | 基于AAindex的理化特征 | [ | ||
ProFET | 基于文献检索选择AAindex理化特征 | [ | ||
拓扑标度 T-scale | 拓扑结构特征 | [ | ||
结构拓扑标度 ST-scale | 拓扑结构特征 | [ | ||
蛋白质指纹图谱 ProtFP | 氨基酸的理化性质 | [ | ||
AAindex | 蛋白质属性 | [ | ||
隐藏信息 | UniRep | 通过神经网络模型自动提取序列特征 | [ | |
基于结构 的描述符 | 理化性质 | sPairs | 基于氨基酸对接触图和AAindex二维描述符 | [ |
单热编码 | 残基-残基接触图Residue-residue contact map | 同一家族两个蛋白之间的结构距离 | [ | |
嵌入式 描述符 | 隐藏信息 | ProtVec | 基于三联氨基酸产生的突变信息 | [ |
突变描述符 | 表示突变方式 | MutInd | 使用0或1表示相应的突变是否发生在突变体序列中 | [ |
类别 Type | 算法/方法 Algorithm/Method | 特征 Feature | 文献 Reference |
---|---|---|---|
经典机器学习 | 贝叶斯算法 | 为变量之间多种关系建模 | [ |
高斯过程 GP | 基于数据对所有可能的模型输出的概率进行统计,根据概率分布情况输出预测结果,使用了协方差确定了数据与数据之间的关系 | [ | |
K近邻 KNN | 基于数据-标签关系,比较新数据与旧数据之间的特征距离,提取特征最相似的数据 | [ | |
支持向量机 SVM | 基于核函数的算法,可以通过升维将原来线性不可分的关系转变为线性可分的关系 | [ | |
决策树 DTs | 对输入数据进行分类或预测 | [ | |
随机森林 RF | 一种Bagging方法,每个分类器的数据采集和特征选择数量一致且随机 | [ | |
AdaBoost | 一种Boosting方法,将弱分类器融合形成一个强分类器 | [ | |
Stacking | 聚合使用多个分类器进行第一轮训练,将输出作为第二轮输入,确定某一个分类器用于训练输出结果 | [ | |
主成分分析 PCA | 基于无监督学习将原始数据映射到新的特征空间以提取特征 | [ | |
深度学习 | 深度神经网络 DNN | 具有多个隐藏层的ANN | [ |
前馈神经网络 FNN | 神经网络结构中不含循环结构 | [ | |
循环神经网络 RNN | 识别序列的上下文关系并建模 | [ | |
卷积神经网络 CNN | 输入数据为图像或类图像形式 | [ | |
对抗生成网络 GAN | 同时训练两个独立的竞争网络 | [ |
Table 2 Algorithms of machine learning
类别 Type | 算法/方法 Algorithm/Method | 特征 Feature | 文献 Reference |
---|---|---|---|
经典机器学习 | 贝叶斯算法 | 为变量之间多种关系建模 | [ |
高斯过程 GP | 基于数据对所有可能的模型输出的概率进行统计,根据概率分布情况输出预测结果,使用了协方差确定了数据与数据之间的关系 | [ | |
K近邻 KNN | 基于数据-标签关系,比较新数据与旧数据之间的特征距离,提取特征最相似的数据 | [ | |
支持向量机 SVM | 基于核函数的算法,可以通过升维将原来线性不可分的关系转变为线性可分的关系 | [ | |
决策树 DTs | 对输入数据进行分类或预测 | [ | |
随机森林 RF | 一种Bagging方法,每个分类器的数据采集和特征选择数量一致且随机 | [ | |
AdaBoost | 一种Boosting方法,将弱分类器融合形成一个强分类器 | [ | |
Stacking | 聚合使用多个分类器进行第一轮训练,将输出作为第二轮输入,确定某一个分类器用于训练输出结果 | [ | |
主成分分析 PCA | 基于无监督学习将原始数据映射到新的特征空间以提取特征 | [ | |
深度学习 | 深度神经网络 DNN | 具有多个隐藏层的ANN | [ |
前馈神经网络 FNN | 神经网络结构中不含循环结构 | [ | |
循环神经网络 RNN | 识别序列的上下文关系并建模 | [ | |
卷积神经网络 CNN | 输入数据为图像或类图像形式 | [ | |
对抗生成网络 GAN | 同时训练两个独立的竞争网络 | [ |
模型 Model | 任务 Task | 机器学习算法 Machine learning algorithm | 输入类型 Input type | 应用 Application | 文献 Reference |
---|---|---|---|---|---|
- | 酶的功能 | RF | 分子 | 乙酰胆碱酯酶抑制剂与非抑制剂的鉴别 | [ |
CWLy-SVM | 酶的分类 | SVM | 序列 | 鉴定细胞壁催化酶 | [ |
SVR | 酶的功能 | SVM | 序列 | 改善酶的活力与溶解度 | [ |
GPR/GNB | 酶的功能 | GP | 结构 | 改善脂肪酰基还原酶的活力 | [ |
- | 酶的分类 | HMM/RF/LR/KNN/SVM/RF | 序列 | 分类第七家族糖苷水解酶中的CBH和EG | [ |
Innov'SAR | 酶的功能 | PLSR | 序列 | 找到提高活性的最佳突变组合 | [ |
- | 酶的功能 | LR | 结构 | 测定底物-酶对的反应活性 | [ |
SoluProt | 酶的功能 | RF | 序列 | 预测酶在大肠杆菌表达系统中的溶解性 | [ |
ProSAR | 酶的功能 | PLSR | 序列 | 提高卤醇脱卤酶的活力 | [ |
TOME | 酶的功能 | RF | 序列 | 预测最适温度 | [ |
PREvaIL | 酶的催化残基 | RF | 序列和结构 | 预测酶的催化残基的方法 | [ |
- | 酶的功能 | GP | 序列 | 改造绿色荧光蛋白的荧光性 | [ |
- | 酶的功能 | GP | 结构 | 改造细胞色素P450的稳定性 | [ |
- | 酶的催化残基 | CNN | 结构 | 预测酶的催化残基的框架 | [ |
SolventNet | 酶的功能 | CNN | 结构 | 酸、催化剂和溶剂对水解速率的影响 | [ |
DeepSol | 酶的溶解性 | ANN | 序列 | 预测蛋白质的溶解性 | [ |
- | 酶的功能 | CNN | 结构 | 优化PET水解酶的催化能力和耐受性 | [ |
Table 3 Application of machine learning in enzyme engineering
模型 Model | 任务 Task | 机器学习算法 Machine learning algorithm | 输入类型 Input type | 应用 Application | 文献 Reference |
---|---|---|---|---|---|
- | 酶的功能 | RF | 分子 | 乙酰胆碱酯酶抑制剂与非抑制剂的鉴别 | [ |
CWLy-SVM | 酶的分类 | SVM | 序列 | 鉴定细胞壁催化酶 | [ |
SVR | 酶的功能 | SVM | 序列 | 改善酶的活力与溶解度 | [ |
GPR/GNB | 酶的功能 | GP | 结构 | 改善脂肪酰基还原酶的活力 | [ |
- | 酶的分类 | HMM/RF/LR/KNN/SVM/RF | 序列 | 分类第七家族糖苷水解酶中的CBH和EG | [ |
Innov'SAR | 酶的功能 | PLSR | 序列 | 找到提高活性的最佳突变组合 | [ |
- | 酶的功能 | LR | 结构 | 测定底物-酶对的反应活性 | [ |
SoluProt | 酶的功能 | RF | 序列 | 预测酶在大肠杆菌表达系统中的溶解性 | [ |
ProSAR | 酶的功能 | PLSR | 序列 | 提高卤醇脱卤酶的活力 | [ |
TOME | 酶的功能 | RF | 序列 | 预测最适温度 | [ |
PREvaIL | 酶的催化残基 | RF | 序列和结构 | 预测酶的催化残基的方法 | [ |
- | 酶的功能 | GP | 序列 | 改造绿色荧光蛋白的荧光性 | [ |
- | 酶的功能 | GP | 结构 | 改造细胞色素P450的稳定性 | [ |
- | 酶的催化残基 | CNN | 结构 | 预测酶的催化残基的框架 | [ |
SolventNet | 酶的功能 | CNN | 结构 | 酸、催化剂和溶剂对水解速率的影响 | [ |
DeepSol | 酶的溶解性 | ANN | 序列 | 预测蛋白质的溶解性 | [ |
- | 酶的功能 | CNN | 结构 | 优化PET水解酶的催化能力和耐受性 | [ |
[1] |
Chen K, Arnold FH. Tuning the activity of an enzyme for unusual environments: sequential random mutagenesis of subtilisin E for catalysis in dimethylformamide[J]. Proc Natl Acad Sci USA, 1993, 90(12): 5618-5622.
pmid: 8516309 |
[2] | Tang CD, Zhang ZH, Shi HL, et al. Directed evolution of formate dehydrogenase and its application in the biosynthesis of L-phenylglycine from phenylglyoxylic acid[J]. Mol Catal, 2021, 513: 111666. |
[3] |
Fox RJ, Davis SC, Mundorff EC, et al. Improving catalytic function by ProSAR-driven enzyme evolution[J]. Nat Biotechnol, 2007, 25(3): 338-344.
pmid: 17322872 |
[4] |
Reetz MT. The importance of additive and non-additive mutational effects in protein engineering[J]. Angewandte Chemie Int Ed, 2013, 52(10): 2658-2666.
doi: 10.1002/anie.201207842 URL |
[5] |
Greenhalgh JC, Fahlberg SA, Pfleger BF, et al. Machine learning-guided acyl-ACP reductase engineering for improved in vivo fatty alcohol production[J]. Nat Commun, 2021, 12(1): 5825.
doi: 10.1038/s41467-021-25831-w pmid: 34611172 |
[6] |
Miton CM, Tokuriki N. How mutational epistasis impairs predictability in protein evolution and design[J]. Protein Sci, 2016, 25(7): 1260-1272.
doi: 10.1002/pro.2876 pmid: 26757214 |
[7] |
Romero PA, Arnold FH. Exploring protein fitness landscapes by directed evolution[J]. Nat Rev Mol Cell Biol, 2009, 10(12): 866-876.
doi: 10.1038/nrm2805 |
[8] |
Ma EJ, Siirola E, Moore C, et al. Machine-directed evolution of an imine reductase for activity and stereoselectivity[J]. ACS Catal, 2021, 11(20): 12433-12445.
doi: 10.1021/acscatal.1c02786 URL |
[9] |
Gado JE, Harrison BE, Sandgren M, et al. Machine learning reveals sequence-function relationships in family 7 glycoside hydrolases[J]. J Biol Chem, 2021, 297(2): 100931.
doi: 10.1016/j.jbc.2021.100931 URL |
[10] |
Ostafe R, Fontaine N, Frank D, et al. One-shot optimization of multiple enzyme parameters: Tailoring glucose oxidase for pH and electron mediators[J]. Biotechnol Bioeng, 2020, 117(1): 17-29.
doi: 10.1002/bit.27169 pmid: 31520472 |
[11] | Peng M, de Vries RP. Machine learning prediction of novel pectinolytic enzymes in Aspergillus niger through integrating heterogeneous(post-)genomics data[J]. Microb Genom, 2021, 7(12): 000674. |
[12] |
Wu Z, Kan SBJ, Lewis RD, et al. Machine learning-assisted directed protein evolution with combinatorial libraries[J]. Proc Natl Acad Sci USA, 2019, 116(18): 8852-8858.
doi: 10.1073/pnas.1901979116 pmid: 30979809 |
[13] | Li GY, Dong YJ, Reetz MT. Can machine learning revolutionize directed evolution of selective enzymes?[J]. Adv Synth Catal, 2019, 361(11): 2377-2386. |
[14] | Baştanlar Y, Özuysal M. Introduction to machine learning[M]// miRNomics:microRNA biology and computational analysis. Totowa, NJ: Humana Press, 2013: 105-128. |
[15] | 蒋迎迎, 曲戈, 孙周通. 机器学习助力酶定向进化[J]. 生物学杂志, 2020, 37(4): 1-11. |
Jiang YY, Qu G, Sun ZT. Machine learning-assisted enzyme directed evolution[J]. J Biol, 2020, 37(4): 1-11. | |
[16] |
Sikander R, Wang YP, Ghulam A, et al. Identification of enzymes-specific protein domain based on DDE, and convolutional neural network[J]. Front Genet, 2021, 12: 759384.
doi: 10.3389/fgene.2021.759384 URL |
[17] |
Jing XY, Li FM. Predicting cell wall lytic enzymes using combined features[J]. Front Bioeng Biotechnol, 2021, 8: 627335.
doi: 10.3389/fbioe.2020.627335 URL |
[18] |
Wan ZY, Wang QD, Liu DC, et al. Accelerating the optimization of enzyme-catalyzed synthesis conditions via machine learning and reactivity descriptors[J]. Org Biomol Chem, 2021, 19(28): 6267-6273.
doi: 10.1039/D1OB01066B URL |
[19] |
Kirk PDW, Stumpf MPH. Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data[J]. Bioinformatics, 2009, 25(10): 1300-1306.
doi: 10.1093/bioinformatics/btp139 pmid: 19289448 |
[20] | Romero PA, Krause A, Arnold FH. Navigating the protein fitness landscape with Gaussian processes[J]. Proc Natl Acad Sci USA, 2013, 110(3): E193-E201. |
[21] | Rasmussen CE, Williams CKI. Gaussian processes for machine learning[M]. Cambridge: The MIT Press, 2005. |
[22] |
Zhang ZH, Schott JA, Liu MM, et al. Prediction of carbon dioxide adsorption via deep learning[J]. Angew Chem Int Ed Engl, 2019, 58(1): 259-263.
doi: 10.1002/anie.201812363 URL |
[23] |
Luo HZ, Gao L, Liu Z, et al. Prediction of phenolic compounds and glucose content from dilute inorganic acid pretreatment of lignocellulosic biomass using artificial neural network modeling[J]. Bioresour Bioprocess, 2021, 8: 134.
doi: 10.1186/s40643-021-00488-x |
[24] |
Saito Y, Oikawa M, Sato T, et al. Machine-learning-guided library design cycle for directed evolution of enzymes: the effects of training data composition on sequence space exploration[J]. ACS Catal, 2021, 11(23): 14615-14624.
doi: 10.1021/acscatal.1c03753 URL |
[25] |
del Rio-Chanona EA, Fiorelli F, Zhang DD, et al. An efficient model construction strategy to simulate microalgal lutein photo-production dynamic process[J]. Biotechnol Bioeng, 2017, 114(11): 2518-2527.
doi: 10.1002/bit.26373 pmid: 28671262 |
[26] |
卞佳豪, 杨广宇. 人工智能辅助的蛋白质工程[J]. 合成生物学, 2022, 3(3): 429-444.
doi: 10.12211/2096-8280.2021-032 |
Bian JH, Yang GY. Artificial intelligence-assisted protein engineering[J]. Synth Biol J, 2022, 3(3): 429-444. | |
[27] |
Xu YT, Verma D, Sheridan RP, et al. Deep dive into machine learning models for protein engineering[J]. J Chem Inf Model, 2020, 60(6): 2773-2790.
doi: 10.1021/acs.jcim.0c00073 pmid: 32250622 |
[28] |
Yang KK, Wu Z, Arnold FH. Machine-learning-guided directed evolution for protein engineering[J]. Nat Methods, 2019, 16(8): 687-694.
doi: 10.1038/s41592-019-0496-6 pmid: 31308553 |
[29] |
Yang KK, Wu Z, Bedbrook CN, et al. Learned protein embeddings for machine learning[J]. Bioinformatics, 2018, 34(15): 2642-2648.
doi: 10.1093/bioinformatics/bty178 pmid: 29584811 |
[30] |
Roy S, Martinez D, Platero H, et al. Exploiting amino acid composition for predicting protein-protein interactions[J]. PLoS One, 2009, 4(11): e7813.
doi: 10.1371/journal.pone.0007813 URL |
[31] |
Wolpert DH. The lack of a priori distinctions between learning algorithms[J]. Neural Comput, 1996, 8(7): 1341-1390.
doi: 10.1162/neco.1996.8.7.1341 URL |
[32] |
van Westen GJ, Swier RF, Wegner JK, et al. Benchmarking of protein descriptor sets in proteochemometric modeling(part 2): comparative study of 13 amino acid descriptor sets[J]. J Cheminform, 2013, 5(1): 41.
doi: 10.1186/1758-2946-5-41 |
[33] |
Hou J, Adhikari B, Cheng JL. DeepSF: deep convolutional neural network for mapping protein sequences to folds[J]. Bioinformatics, 2018, 34(8): 1295-1303.
doi: 10.1093/bioinformatics/btx780 pmid: 29228193 |
[34] |
Zacharaki EI. Prediction of protein function using a deep convolutional neural network ensemble[J]. Peerj Comput Sci, 2017, 3: e124.
doi: 10.7717/peerj-cs.124 URL |
[35] |
White C, Ismail HD, Saigo H, et al. CNN-BLPred: a convolutional neural network based predictor for β-lactamases(BL)and their classes[J]. BMC Bioinformatics, 2017, 18(Suppl 16): 577.
doi: 10.1186/s12859-017-1972-6 URL |
[36] |
Ragoza M, Hochuli J, Idrobo E, et al. Protein-ligand scoring with convolutional neural networks[J]. J Chem Inf Model, 2017, 57(4): 942-957.
doi: 10.1021/acs.jcim.6b00740 pmid: 28368587 |
[37] |
Xiao N, Cao DS, Zhu MF, et al. Protr/ProtrWeb: R package and web server for generating various numerical representation schemes of protein sequences[J]. Bioinformatics, 2015, 31(11): 1857-1859.
doi: 10.1093/bioinformatics/btv042 pmid: 25619996 |
[38] |
Ismail HD, Saigo H, Kc DB. RF-NR: random forest based approach for improved classification of nuclear receptors[J]. IEEE/ACM Trans Comput Biol Bioinform, 2018, 15(6): 1844-1852.
doi: 10.1109/TCBB.2017.2773063 pmid: 29990125 |
[39] | Leslie C, Eskin E, Noble WS. The spectrum kernel: a string kernel for SVM protein classification[J]. Pac Symp Biocomput, 2002: 564-575. |
[40] |
Sandberg M, Eriksson L, Jonsson J, et al. New chemical descriptors relevant for the design of biologically active peptides. A multivariate characterization of 87 amino acids[J]. J Med Chem, 1998, 41(14): 2481-2491.
pmid: 9651153 |
[41] |
Mei H, Liao ZH, Zhou Y, et al. A new set of amino acid descriptors and its application in peptide QSARs[J]. Biopolymers, 2005, 80(6): 775-786.
pmid: 15895431 |
[42] |
Biou V, Gibrat JF, Levin JM, et al. Secondary structure prediction: combination of three different methods[J]. Protein Eng, 1988, 2(3): 185-191.
pmid: 3237683 |
[43] |
Ofer D, Linial M. ProFET: Feature engineering captures high-level protein functions[J]. Bioinformatics, 2015, 31(21): 3429-3436.
doi: 10.1093/bioinformatics/btv345 pmid: 26130574 |
[44] |
Tian FF, Zhou P, Li ZL. T-scale as a novel vector of topological descriptors for amino acids and its application in QSARs of peptides[J]. J Mol Struct, 2007, 830(1/2/3): 106-115.
doi: 10.1016/j.molstruc.2006.07.004 URL |
[45] |
Yang L, Shu M, Ma KW, et al. ST-scale as a novel amino acid descriptor and its application in QSAM of peptides and analogues[J]. Amino Acids, 2010, 38(3): 805-816.
doi: 10.1007/s00726-009-0287-y pmid: 19373543 |
[46] |
van Westen GJ, Swier RF, Wegner JK, et al. Benchmarking of protein descriptor sets in proteochemometric modeling(part 1): comparative study of 13 amino acid descriptor sets[J]. J Cheminform, 2013, 5(1): 41.
doi: 10.1186/1758-2946-5-41 |
[47] |
Kawashima S, Pokarowski P, Pokarowska M, et al. AAindex: amino acid index database, progress report 2008[J]. Nucleic Acids Res, 2008, 36(Database issue): D202-D205.
doi: 10.1093/nar/gkm998 pmid: 17998252 |
[48] |
Alley EC, Khimulya G, Biswas S, et al. Unified rational protein engineering with sequence-based deep representation learning[J]. Nat Methods, 2019, 16(12): 1315-1322.
doi: 10.1038/s41592-019-0598-1 pmid: 31636460 |
[49] |
Tanaka S, Scheraga HA. Medium- and long-range interaction parameters between amino acids for predicting three-dimensional structures of proteins[J]. Macromolecules, 1976, 9(6): 945-950.
pmid: 1004017 |
[50] |
Asgari E, Mofrad MRK. Continuous distributed representation of biological sequences for deep proteomics and genomics[J]. PLoS One, 2015, 10(11): e0141287.
doi: 10.1371/journal.pone.0141287 URL |
[51] | Jensen FV. An introduction to Bayesian networks[M]. London: UCL press, 1996 |
[52] | Lim S, Lu Y, Cho CY, et al. A review on compound-protein interaction prediction methods: Data, format, representation and model[J]. Comput Struct Biotechnol J, 2021, 19: 1541-1556. |
[53] |
del Rio-Chanona EA, Cong XY, Bradford E, et al. Review of advanced physical and data-driven models for dynamic bioprocess simulation: case study of algae-bacteria consortium wastewater treatment[J]. Biotechnol Bioeng, 2019, 116(2): 342-353.
doi: 10.1002/bit.26881 pmid: 30475404 |
[54] |
Natarajan P, Moghadam R, Jagannathan S. Online deep neural network-based feedback control of a Lutein bioprocess[J]. J Process Control, 2021, 98: 41-51.
doi: 10.1016/j.jprocont.2020.11.011 URL |
[55] |
Kim GB, Kim WJ, Kim HU, et al. Machine learning applications in systems metabolic engineering[J]. Curr Opin Biotechnol, 2020, 64: 1-9.
doi: 10.1016/j.copbio.2019.08.010 URL |
[56] | Wettschereck D, Aha DW, Mohri T. A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms[M]// Lazy learning. Dordrecht: Springer Netherlands, 1997: 273-314. |
[57] | Drucker H, Surges CJC, Kaufman L, et al. Support vector regression machines[J]. Adv Neural Inf Process Syst, 1997: 155-161. |
[58] | Quinlan JR. Induction of decision trees[J]. Mach Learn, 1986, 1(1): 81-106. |
[59] |
Li Y, Song K, Zhang J, et al. A computational method to predict effects of residue mutations on the catalytic efficiency of hydrolases[J]. Catalysts, 2021, 11(2): 286.
doi: 10.3390/catal11020286 URL |
[60] | Schapire RE. Explaining adaboost[M]// SchölkopfB, LuoZY, VovkV. Empirical inference. Verlag:Springer, 2013: 37-52. |
[61] |
Wolpert DH. Stacked generalization[J]. Neural Netw, 1992, 5(2): 241-259.
doi: 10.1016/S0893-6080(05)80023-1 URL |
[62] |
Abdi H, Williams LJ. Principal component analysis[J]. WIREs Comp Stat, 2010, 2(4): 433-459.
doi: 10.1002/wics.101 URL |
[63] |
LeCun Y, Bengio Y, Hinton G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
doi: 10.1038/nature14539 |
[64] |
Lohmann R, Schneider G, Behrens D, et al. A neural network model for the prediction of membrane-spanning amino acid sequences[J]. Protein Sci, 1994, 3(9): 1597-1601.
pmid: 7833818 |
[65] |
Rawat W, Wang ZH. Deep convolutional neural networks for image classification: a comprehensive review[J]. Neural Comput, 2017, 29(9): 2352-2449.
doi: 10.1162/NECO_a_00990 pmid: 28599112 |
[66] |
Creswell A, White T, Dumoulin V, et al. Generative adversarial net-works: an overview[J]. IEEE Signal Process Mag, 2018, 35(1): 53-65.
doi: 10.1109/MSP.2017.2765202 URL |
[67] | Auer P. Using confidence bounds for exploitation-exploration trade-offs[J]. J Machine Learning Res, 2002, 3(Nov): 397-422. |
[68] | International Conference on Machine Learning. Proceedings of the Twenty-Ninth International Conference on Machine Learning[C]. Madison, Wis: International Machine Learning Society, 2012. |
[69] |
Endelman JB, Silberg JJ, Wang ZG, et al. Site-directed protein recombination as a shortest-path problem[J]. Protein Eng Des Sel, 2004, 17(7): 589-594.
pmid: 15331774 |
[70] |
Sandhu H, Kumar RN, Garg P. Machine learning-based modeling to predict inhibitors of acetylcholinesterase[J]. Mol Divers, 2022, 26(1): 331-340.
doi: 10.1007/s11030-021-10223-5 |
[71] |
Meng CL, Guo F, Zou Q. CWLy-SVM: a support vector machine-based tool for identifying cell wall lytic enzymes[J]. Comput Biol Chem, 2020, 87: 107304.
doi: 10.1016/j.compbiolchem.2020.107304 URL |
[72] |
Han X, Ning WB, Ma XQ, et al. Improve protein solubility and activity based on machine learning models[J]. bioRxiv, 2019. DOI:10.1101/817890.
doi: 10.1101/817890 |
[73] |
Cadet F, Fontaine N, Li GY, et al. A machine learning approach for reliable prediction of amino acid interactions and its application in the directed evolution of enantioselective enzymes[J]. Sci Rep, 2018, 8(1): 16757.
doi: 10.1038/s41598-018-35033-y pmid: 30425279 |
[74] |
Cadet F, Fontaine N, Vetrivel I, et al. Application of fourier transform and proteochemometrics principles to protein engineering[J]. BMC Bioinformatics, 2018, 19(1): 382.
doi: 10.1186/s12859-018-2407-8 pmid: 30326841 |
[75] |
Bonk BM, Weis JW, Tidor B. Machine learning identifies chemical characteristics that promote enzyme catalysis[J]. J Am Chem Soc, 2019, 141(9): 4108-4118.
doi: 10.1021/jacs.8b13879 pmid: 30761897 |
[76] |
Hon J, Borko S, Stourac J, et al. EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities[J]. Nucleic Acids Res, 2020, 48(W1): W104-W109.
doi: 10.1093/nar/gkaa372 URL |
[77] |
Li G, Rabe KS, Nielsen J, et al. Machine learning applied to predicting microorganism growth temperatures and enzyme catalytic optima[J]. ACS Synth Biol, 2019, 8(6): 1411-1420.
doi: 10.1021/acssynbio.9b00099 pmid: 31117361 |
[78] |
Song JN, Li FY, Takemoto K, et al. PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework[J]. J Theor Biol, 2018, 443: 125-137.
doi: S0022-5193(18)30039-0 pmid: 29408627 |
[79] |
Saito Y, Oikawa M, Nakazawa H, et al. Machine-learning-guided mutagenesis for directed evolution of fluorescent proteins[J]. ACS Synth Biol, 2018, 7(9): 2014-2022.
doi: 10.1021/acssynbio.8b00155 pmid: 30103599 |
[80] |
Torng W, Altman RB. High precision protein functional site detection using 3D convolutional neural networks[J]. Bioinformatics, 2019, 35(9): 1503-1512.
doi: 10.1093/bioinformatics/bty813 pmid: 31051039 |
[81] |
Chew A, Jiang SL, Zhang WQ, et al. Fast predictions of liquid-phase acid-catalyzed reaction rates using molecular dynamics simulations and convolutional neural networks[J]. Chem Sci, 2019, 11: 12464-12476.
doi: 10.1039/D0SC03261A URL |
[82] |
Khurana S, Rawi R, Kunji K, et al. DeepSol: a deep learning framework for sequence-based protein solubility prediction[J]. Bioinformatics, 2018, 34(15): 2605-2613.
doi: 10.1093/bioinformatics/bty166 pmid: 29554211 |
[83] |
Lu HY, Diaz DJ, Czarnecki NJ, et al. Machine learning-aided engineering of hydrolases for PET depolymerization[J]. Nature, 2022, 604(7907): 662-667.
doi: 10.1038/s41586-022-04599-z |
[84] |
Dubey A, Realff MJ, Lee JH, et al. Support vector machines for learning to identify the critical positions of a protein[J]. J Theor Biol, 2005, 234(3): 351-361.
pmid: 15784270 |
[85] |
Cai YC, Yang HB, Li WH, et al. Computational prediction of site of metabolism for UGT-catalyzed reactions[J]. J Chem Inf Model, 2019, 59(3): 1085-1095.
doi: 10.1021/acs.jcim.8b00851 pmid: 30586295 |
[86] | Silberg JJ, Endelman JB, Arnold FH. SCHEMA-guided protein recombination[J]. Methods Enzymol, 2004, 388: 35-42. |
[87] | Srinivas N, Krause A, Kakade SM, et al. Gaussian process optimization in the bandit setting: no regret and experimental design[EB/OL]. 2009: arXiv: 0912.3995[cs.LG]. https://arxiv.org/abs/0912.3995. |
[88] |
Voigt CA, Martinez C, Wang ZG, et al. Protein building blocks preserved by recombination[J]. Nat Struct Biol, 2002, 9(7): 553-558.
pmid: 12042875 |
[89] |
Shroff R, Cole AW, Diaz DJ, et al. Discovery of novel gain-of-function mutations guided by structure-based deep learning[J]. ACS Synth Biol, 2020, 9(11): 2927-2935.
doi: 10.1021/acssynbio.0c00345 pmid: 33064458 |
[90] | Paik I, Ngo PHT, Shroff R, et al. Improved bst DNA polymerase variants derived via a machine learning approach[J]. Biochemistry, 2021. https://doi.org/10.1021/acs.biochem.1c00451. |
[91] |
Kulikova AV, Diaz DJ, Loy JM, et al. Learning the local landscape of protein structures with convolutional neural networks[J]. J Biol Phys, 2021, 47(4): 435-454.
doi: 10.1007/s10867-021-09593-6 pmid: 34751854 |
[92] |
Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589.
doi: 10.1038/s41586-021-03819-2 |
[93] |
Baek M, DiMaio F, Anishchenko I, et al. Accurate prediction of protein structures and interactions using a three-track neural network[J]. Science, 2021, 373(6557): 871-876.
doi: 10.1126/science.abj8754 pmid: 34282049 |
[94] |
Riesselman AJ, Ingraham JB, Marks DS. Deep generative models of genetic variation capture the effects of mutations[J]. Nat Methods, 2018, 15(10): 816-822.
doi: 10.1038/s41592-018-0138-4 pmid: 30250057 |
[1] | QU Ge, SUN Zhou-tong. Catalytic Promiscuity-driven Redesign of Enzyme Functions [J]. Biotechnology Bulletin, 2023, 39(4): 1-9. |
[2] | YU Hui-li, LI Ai-tao. Application of Cytochrome P450 in the Biosynthesis of Flavors and Fragrances [J]. Biotechnology Bulletin, 2023, 39(4): 24-37. |
[3] | ZHANG Xue, TAN Yu-meng, JIANG Hai-xia, YANG Guang-yu. Directed Evolution of α-1,2-fucosyltransferase by a Single-cell Ultra-high-throughput Screening Method [J]. Biotechnology Bulletin, 2022, 38(1): 289-298. |
[4] | CHEN Chun, SU Ling-qia, XIA Wei, WU Jing. Improved the Thermostability of MTHase from Arthrobacter ramosus by Directed Evolution [J]. Biotechnology Bulletin, 2021, 37(3): 84-91. |
[5] | SHI Li-xia, GAO Song-feng, ZHU Lei-lei. Research Advance in Polyethylene Terephthalate Hydrolytic Enzymes [J]. Biotechnology Bulletin, 2020, 36(10): 226-236. |
[6] | REN Tian-lei, YANG Hai-quan, XU Fei. Directed Evolution of Methyl Parathion Hydrolase Based on the Multi-dimensional Features:Molecular Structure and Bioinformatics [J]. Biotechnology Bulletin, 2018, 34(10): 194-200. |
[7] | WANG Xiao-lu, WANG Yu, LIU Jiao,ZHENG Ping,LU Fu-ping. Enhanced Methanol Utilization in Genetically Engineered Escherichia coli by Directed Evolution [J]. Biotechnology Bulletin, 2017, 33(9): 101-109. |
[8] | GUO Yuan, ZHAO Zhong-lin. Advances on Applications of Synthetic Biology and Directed Evolution in Microbial Systems [J]. Biotechnology Bulletin, 2017, 33(1): 76-82. |
[9] | ZHANG Xue-ling CHEN Xiao-li LI He. Determination of Enzymatic Properties of a Laccase Lac1338,and Effects of Directed Mutants on the Degradations of Different Dyes [J]. Biotechnology Bulletin, 2016, 32(7): 170-177. |
[10] | Lü Yongkun, Du Guocheng, Chen Jian, Zhou Jingwen. Advances in Synthetic Biology [J]. Biotechnology Bulletin, 2015, 31(4): 134-148. |
[11] | Zhang Congcong, Chen Caixia, Chen Xiao, Wen Ya, Yan Liming, Tao Yong. Advances in Production of N-acetyl-D-neuraminic Acid by #br#Whole-cell Biocatalysis [J]. Biotechnology Bulletin, 2015, 31(4): 175-183. |
[12] | Wang Xi,Duan Shenglin,Xiong Shuli,Zheng Guilan,Zhang Guiyou,Wang Hongzhong. Application of Auto-induction System in the Synthesis of 2’-deoxycytidine [J]. Biotechnology Bulletin, 2014, 0(11): 225-232. |
[13] | Liu Miao, Wang Yonghui, Lu Qun. Advances in Directed Research of VD3 Hydroxylase [J]. Biotechnology Bulletin, 2014, 0(1): 27-31. |
[14] | Shao Min, Li Changfu, Ge Zhenglong, Zhou Hefeng. Directed evolution of β-glucanase from Bacillus subtilis by Error-prone PCR [J]. Biotechnology Bulletin, 2013, 0(12): 141-145. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||