Biotechnology Bulletin ›› 2024, Vol. 40 ›› Issue (10): 76-85.doi: 10.13560/j.cnki.biotech.bull.1985.2024-0523
Previous Articles Next Articles
JI Hong-chao1(
), LI Zheng-yan1,2,3
Received:2024-05-31
Online:2024-10-26
Published:2024-11-20
Contact:
JI Hong-chao
E-mail:jihongchao@caas.cn
JI Hong-chao, LI Zheng-yan. Research Progress and Prospects in the Structural Annotation of Unknown Secondary Metabolites Based on Mass Spectrometry[J]. Biotechnology Bulletin, 2024, 40(10): 76-85.
| 策略类型 Strategy type | 方法名称 Method name | 输入数据 Input data | 预测依据 Prediction basis | 是否有开源代码 Is there open source code available | 是否有可视化软件Is there any visualization software available | 官方网站 Official website |
|---|---|---|---|---|---|---|
| 从结构到谱图 | CFM-ID | 单张/多张谱图 | 模拟谱图 | 是 | 是 | https://cfmid.wishartlab.com/ |
| 3DmolMS | 单张/多张谱图 | 模拟谱图 | 是 | 否 | https://github.com/JosieHong/3DMolMS | |
| MassFormer | 单张/多张谱图 | 模拟谱图 | 是 | 否 | https://github.com/Roestlab/massformer | |
| MetFrag | 单张/多张谱图 | 碎片离子超集 | 是 | 是 | https://ipb-halle.github.io/MetFrag/ | |
| MS-Finder | 单张/多张谱图 | 碎片离子超集 | 是 | 是 | http://prime.psc.riken.jp/compms/msfinder/main.html | |
| MAGMa | 单张/多张谱图 | 碎片离子超集 | 是 | 是 | https://github.com/NLeSC/MAGMa | |
| 从谱图到结构 | SIRIUS | 单张/多张谱图 | 分子指纹 | 是 | 是 | https://bio.informatik.uni-jena.de/software/sirius |
| Mass2SMILES | 单张/多张谱图 | 分子结构 | 是 | 否 | https://github.com/volvox292/mass2smiles | |
| MS2Mol | 单张/多张谱图 | 分子结构 | 否 | 否 | 暂无 | |
| MSNovelist | 单张/多张谱图 | 分子结构 | 是 | 是 | https://github.com/meowcat/MSNovelist | |
| 从已知到未知 | MPEA | 非靶标组学数据 | 代谢反应 | 否 | 否 | 暂无 |
| MetDNA | 非靶标组学数据 | 代谢反应 | 是 | 是 | http://metdna.zhulab.cn/ | |
| iMet | 单张/多张谱图 | 代谢反应 | 是 | 是 | http://imet.seeslab.net/ | |
| SGMNS | 非靶标组学数据 | 代谢反应 | 否 | 否 | 暂无 | |
| DeepMASS | 单张/多张谱图 | 化学空间定位 | 是 | 是 | https://github.com/hcji/DeepMASS2_GUI |
Table 1 Representative methods for searching molecular structure databases based on MS/MS
| 策略类型 Strategy type | 方法名称 Method name | 输入数据 Input data | 预测依据 Prediction basis | 是否有开源代码 Is there open source code available | 是否有可视化软件Is there any visualization software available | 官方网站 Official website |
|---|---|---|---|---|---|---|
| 从结构到谱图 | CFM-ID | 单张/多张谱图 | 模拟谱图 | 是 | 是 | https://cfmid.wishartlab.com/ |
| 3DmolMS | 单张/多张谱图 | 模拟谱图 | 是 | 否 | https://github.com/JosieHong/3DMolMS | |
| MassFormer | 单张/多张谱图 | 模拟谱图 | 是 | 否 | https://github.com/Roestlab/massformer | |
| MetFrag | 单张/多张谱图 | 碎片离子超集 | 是 | 是 | https://ipb-halle.github.io/MetFrag/ | |
| MS-Finder | 单张/多张谱图 | 碎片离子超集 | 是 | 是 | http://prime.psc.riken.jp/compms/msfinder/main.html | |
| MAGMa | 单张/多张谱图 | 碎片离子超集 | 是 | 是 | https://github.com/NLeSC/MAGMa | |
| 从谱图到结构 | SIRIUS | 单张/多张谱图 | 分子指纹 | 是 | 是 | https://bio.informatik.uni-jena.de/software/sirius |
| Mass2SMILES | 单张/多张谱图 | 分子结构 | 是 | 否 | https://github.com/volvox292/mass2smiles | |
| MS2Mol | 单张/多张谱图 | 分子结构 | 否 | 否 | 暂无 | |
| MSNovelist | 单张/多张谱图 | 分子结构 | 是 | 是 | https://github.com/meowcat/MSNovelist | |
| 从已知到未知 | MPEA | 非靶标组学数据 | 代谢反应 | 否 | 否 | 暂无 |
| MetDNA | 非靶标组学数据 | 代谢反应 | 是 | 是 | http://metdna.zhulab.cn/ | |
| iMet | 单张/多张谱图 | 代谢反应 | 是 | 是 | http://imet.seeslab.net/ | |
| SGMNS | 非靶标组学数据 | 代谢反应 | 否 | 否 | 暂无 | |
| DeepMASS | 单张/多张谱图 | 化学空间定位 | 是 | 是 | https://github.com/hcji/DeepMASS2_GUI |
| [1] | Shen SQ, Zhan CS, Yang CK, et al. Metabolomics-centered mining of plant metabolic diversity and function: past decade and future perspectives[J]. Mol Plant, 2023, 16(1): 43-63. |
| [2] | Liu XY, Zhou LN, Shi XZ, et al. New advances in analytical methods for mass spectrometry-based large-scale metabolomics study[J]. Trac Trends Anal Chem, 2019, 121: 115665. |
| [3] |
Medema MH. The year 2020 in natural product bioinformatics: an overview of the latest tools and databases[J]. Nat Prod Rep, 2021, 38(2): 301-306.
doi: 10.1039/d0np00090f pmid: 33533785 |
| [4] | Bazsó FL, Ozohanics O, Schlosser G, et al. Quantitative comparison of tandem mass spectra obtained on various instruments[J]. J Am Soc Mass Spectrom, 2016, 27(8): 1357-1365. |
| [5] |
Hoang C, Uritboonthai W, Hoang L, et al. Tandem mass spectrometry across platforms[J]. Anal Chem, 2024, 96(14): 5478-5488.
doi: 10.1021/acs.analchem.3c05576 pmid: 38529642 |
| [6] | Wei JN, Belanger D, Adams RP, et al. Rapid prediction of electron-ionization mass spectrometry using neural networks[J]. ACS Cent Sci, 2019, 5(4): 700-708. |
| [7] |
Wang MX, Jarmusch AK, Vargas F, et al. Mass spectrometry searches using MASST[J]. Nat Biotechnol, 2020, 38(1): 23-26.
doi: 10.1038/s41587-019-0375-9 pmid: 31894142 |
| [8] |
Li YY, Kind T, Folz J, et al. Spectral entropy outperforms MS/MS dot product similarity for small-molecule compound identification[J]. Nat Methods, 2021, 18(12): 1524-1531.
doi: 10.1038/s41592-021-01331-z pmid: 34857935 |
| [9] | da Silva RR, Dorrestein PC, Quinn RA. Illuminating the dark matter in metabolomics[J]. Proc Natl Acad Sci U S A, 2015, 112(41): 12549-12550. |
| [10] |
Shen XT, Wang RH, Xiong X, et al. Metabolic reaction network-based recursive metabolite annotation for untargeted metabolomics[J]. Nat Commun, 2019, 10(1): 1516.
doi: 10.1038/s41467-019-09550-x pmid: 30944337 |
| [11] | Tian ZT, Hu X, Xu YY, et al. PMhub 1.0: a comprehensive plant metabolome database[J]. Nucleic Acids Res, 2024, 52(D1): D1579-D1587. |
| [12] |
Wang SC, Alseekh S, Fernie AR, et al. The structure and function of major plant metabolite modifications[J]. Mol Plant, 2019, 12(7): 899-919.
doi: S1674-2052(19)30201-1 pmid: 31200079 |
| [13] |
Böcker S. Searching molecular structure databases using tandem MS data: are we there yet?[J]. Curr Opin Chem Biol, 2017, 36:1-6.
doi: S1367-5931(16)30192-2 pmid: 28025165 |
| [14] |
Ásgeirsson V, Bauer CA, Grimme S. Quantum chemical calculation of electron ionization mass spectra for general organic and inorganic molecules[J]. Chem Sci, 2017, 8(7): 4879-4895.
doi: 10.1039/c7sc00601b pmid: 28959412 |
| [15] | Jumper J, Evans R, Pritzel A, et al. Highly accurate protein structure prediction with AlphaFold[J]. Nature, 2021, 596(7873): 583-589. |
| [16] | Cai YP, Zhou ZW, Zhu ZJ. Advanced analytical and informatic strategies for metabolite annotation in untargeted metabolomics[J]. Trac Trends Anal Chem, 2023, 158: 116903. |
| [17] |
Hu GL, Qiu MH. Machine learning-assisted structure annotation of natural products based on MS and NMR data[J]. Nat Prod Rep, 2023, 40(11): 1735-1753.
doi: 10.1039/d3np00025g pmid: 37519196 |
| [18] | Allen F, Greiner R, Wishart D. Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification[J]. Metabolomics, 2015, 11(1): 98-110. |
| [19] | Allen F, Pon A, Wilson M, et al. CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra[J]. Nucleic Acids Res, 2014, 42(W1): W94-W99. |
| [20] | Djoumbou-Feunang Y, Pon A, Karu N, et al. CFM-ID 3.0: significantly improved ESI-MS/MS prediction and compound identification[J]. Metabolites, 2019, 9(4): 72. |
| [21] |
Wang F, Liigand J, Tian SY, et al. CFM-ID 4.0: more accurate ESI-MS/MS spectral prediction and compound identification[J]. Anal Chem, 2021, 93(34): 11692-11700.
doi: 10.1021/acs.analchem.1c01465 pmid: 34403256 |
| [22] |
Wang F, Allen D, Tian SY, et al. CFM-ID 4.0-a web server for accurate MS-based metabolite identification[J]. Nucleic Acids Res, 2022, 50(W1): W165-W174.
doi: 10.1093/nar/gkac383 pmid: 35610037 |
| [23] | Hong YH, Li SJ, Welch CJ, et al. 3DMolMS: prediction of tandem mass spectra from 3D molecular conformations[J]. Bioinformatics, 2023, 39(6): btad354. |
| [24] | Young A, Röst H, Wang B. Tandem mass spectrum prediction for small molecules using graph transformers[J]. Nat Mach Intell, 2024, 6: 404-416. |
| [25] | Ruttkies C, Schymanski EL, Wolf S, et al. MetFrag relaunched: incorporating strategies beyond in silico fragmentation[J]. J Cheminform, 2016, 8: 3. |
| [26] |
Ruttkies C, Neumann S, Posch S. Improving MetFrag with statistical learning of fragment annotations[J]. BMC Bioinformatics, 2019, 20(1): 376.
doi: 10.1186/s12859-019-2954-7 pmid: 31277571 |
| [27] | Verdegem D, Lambrechts D, Carmeliet P, et al. Improved metabolite identification with MIDAS and MAGMa through MS/MS spectral dataset-driven parameter optimization[J]. Metabolomics, 2016, 12(6): 98. |
| [28] |
Tsugawa H, Kind T, Nakabayashi R, et al. Hydrogen rearrangement rules: computational MS/MS fragmentation and structure elucidation using MS-FINDER software[J]. Anal Chem, 2016, 88(16): 7946-7958.
doi: 10.1021/acs.analchem.6b00770 pmid: 27419259 |
| [29] |
Heinonen M, Shen HB, Zamboni N, et al. Metabolite identification and molecular fingerprint prediction through machine learning[J]. Bioinformatics, 2012, 28(18): 2333-2341.
doi: 10.1093/bioinformatics/bts437 pmid: 22815355 |
| [30] |
Rasche F, Svatos A, Maddula RK, et al. Computing fragmentation trees from tandem mass spectrometry data[J]. Anal Chem, 2011, 83(4): 1243-1251.
doi: 10.1021/ac101825k pmid: 21182243 |
| [31] | Shen HB, Dührkop K, Böcker S, et al. Metabolite identification through multiple kernel learning on fragmentation trees[J]. Bioinformatics, 2014, 30(12): i157-i164. |
| [32] | Dührkop K, Shen HB, Meusel M, et al. Searching molecular structure databases with tandem mass spectra using CSI: FingerID[J]. Proc Natl Acad Sci U S A, 2015, 112(41): 12580-12585. |
| [33] | Ludwig M, Dührkop K, Böcker S. Bayesian networks for mass spectrometric metabolite identification via molecular fingerprints[J]. Bioinformatics, 2018, 34(13): i333-i340. |
| [34] | Brouard C, Shen HB, Dührkop K, et al. Fast metabolite identification with Input Output Kernel Regression[J]. Bioinformatics, 2016, 32(12): i28-i36. |
| [35] |
Dührkop K, Fleischauer M, Ludwig M, et al. SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information[J]. Nat Methods, 2019, 16(4): 299-302.
doi: 10.1038/s41592-019-0344-8 pmid: 30886413 |
| [36] |
Fan ZL, Alley A, Ghaffari K, et al. MetFID: artificial neural network-based compound fingerprint prediction for metabolite annotation[J]. Metabolomics, 2020, 16(10): 104.
doi: 10.1007/s11306-020-01726-7 pmid: 32997169 |
| [37] | Gao S, Chau HYK, Wang KJ, et al. Convolutional neural network-based compound fingerprint prediction for metabolite annotation[J]. Metabolites, 2022, 12(7): 605. |
| [38] | Dührkop K. Deep kernel learning improves molecular fingerprint prediction from tandem mass spectra[J]. Bioinformatics, 2022, 38(Suppl 1): i342-i349. |
| [39] | Goldman S, Wohlwend J, Stražar M, et al. Annotating metabolite mass spectra with domain-inspired chemical formula transformers[J]. Nat Mach Intell, 2023, 5: 965-979. |
| [40] | Mokaya M, Imrie F, van Hoorn WP, et al. Testing the limits of SMILES-based de novo molecular generation with curriculum and deep reinforcement learning[J]. Nat Mach Intell, 2023, 5: 386-394. |
| [41] | Qian H, Lin C, Zhao DW, et al. AlphaDrug: protein target specific de novo molecular generation[J]. PNAS Nexus, 2022, 1(4): pgac227. |
| [42] | Elser D, Huber F, Gaquerel E. Mass2SMILES: deep learning based fast prediction of structures and functional groups directly from high-resolution MS/MS spectra[J]. bioRxiv, 2023. DOI: 10.1101/2023.07.06.547963. |
| [43] | Butler T, Frandsen A, Lightheart R, et al. MS2Mol: a transformer model for illuminating dark chemical space from mass spectra[J]. ChemRxiv, 2023. https://chemrxiv.org/engage/chemrxiv/article-details/64f76a0279853bbd7829bf27. |
| [44] | Stravs MA, Dührkop K, Böcker S, et al. MSNovelist: de novo structure generation from mass spectra[J]. Nat Methods, 2022, 19(7): 865-870. |
| [45] |
Wang L, Ye H, Sun D, et al. Metabolic pathway extension approach for metabolomic biomarker identification[J]. Anal Chem, 2017, 89(2): 1229-1237.
doi: 10.1021/acs.analchem.6b03757 pmid: 27983783 |
| [46] |
Zhou ZW, Luo MD, Zhang HS, et al. Metabolite annotation from knowns to unknowns through knowledge-guided multi-layer metabolic networking[J]. Nat Commun, 2022, 13(1): 6656.
doi: 10.1038/s41467-022-34537-6 pmid: 36333358 |
| [47] | Wang XX, Li C, Li ZF, et al. A structure-guided molecular network strategy for global untargeted metabolomics data annotation[J]. Anal Chem, 2023, 95(31): 11603-11612. |
| [48] |
Aguilar-Mogas A, Sales-Pardo M, Navarro M, et al. iMet: a network-based computational tool to assist in the annotation of metabolites from tandem mass spectra[J]. Anal Chem, 2017, 89(6): 3474-3482.
doi: 10.1021/acs.analchem.6b04512 pmid: 28221024 |
| [49] |
Ji HC, Xu YM, Lu HM, et al. Deep MS/MS-aided structural-similarity scoring for unknown metabolite identification[J]. Anal Chem, 2019, 91(9): 5629-5637.
doi: 10.1021/acs.analchem.8b05405 pmid: 30990670 |
| [50] | Huber F, Ridder L, Verhoeven S, et al. Spec2Vec: improved mass spectral similarity scoring through learning of structural relationships[J]. PLoS Comput Biol, 2021, 17(2): e1008724. |
| [51] | Huber F, van der Burg S, van der Hooft JJJ, et al. MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra[J]. J Cheminform, 2021, 13(1): 84. |
| [52] |
Guo H, Xue KB, Sun HM, et al. Contrastive learning-based embedder for the representation of tandem mass spectra[J]. Anal Chem, 2023, 95(20): 7888-7896.
doi: 10.1021/acs.analchem.3c00260 pmid: 37172113 |
| [53] | Blaženović I, Kind T, Torbašinović H, et al. Comprehensive comparison of in silico MS/MS fragmentation tools of the CASMI contest: database boosting is needed to achieve 93% accuracy[J]. J Cheminform, 2017, 9(1): 32. |
| [54] | Hoffmann MA, Kretschmer F, Ludwig M, et al. MAD HATTER correctly annotates 98% of small molecule tandem mass spectra searching in PubChem[J]. Metabolites, 2023, 13(3): 314. |
| [55] |
Domingo-Almenara X, Guijas C, Billings E, et al. The METLIN small molecule dataset for machine learning-based retention time prediction[J]. Nat Commun, 2019, 10(1): 5811.
doi: 10.1038/s41467-019-13680-7 pmid: 31862874 |
| [56] |
Kretschmer F, Harrieder EM, Hoffmann MA, et al. RepoRT: a comprehensive repository for small molecule retention times[J]. Nat Methods, 2024, 21(2): 153-155.
doi: 10.1038/s41592-023-02143-z pmid: 38191934 |
| [57] |
Zhang HS, Luo MD, Wang HM, et al. AllCCS2: curation of ion mobility collision cross-section atlas for small molecules using comprehensive molecular representations[J]. Anal Chem, 2023, 95(37): 13913-13921.
doi: 10.1021/acs.analchem.3c02267 pmid: 37664900 |
| [1] | WU Hui-qin, WANG Yan-hong, LIU Han, SI Zheng, LIU Xue-qing, WANG Jing, YANG Yi, CHENG Yan. Identification and Expression Analysis of UGT Gene Family in Pepper [J]. Biotechnology Bulletin, 2024, 40(9): 198-211. |
| [2] | TAN Bo-wen, ZHANG Yi, ZHANG Peng, WANG Zhen-yu, MA Qiu-xiang. Identification and Bioinformatics Analysis of Gene in the Magnesium Transporter Family in Cassava [J]. Biotechnology Bulletin, 2024, 40(9): 20-32. |
| [3] | MAN Quan-cai, MENG Zi-nuo, LI Wei, CAI Xin-ru, SU Run-dong, FU Chang-qing, GAO Shun-juan, CUI Jiang-hui. Identification and Expression Analysis of AQP Gene Family in Potato [J]. Biotechnology Bulletin, 2024, 40(9): 51-63. |
| [4] | WU Juan, WU Xiao-juan, WANG Pei-jie, XIE Rui, NIE Hu-shuai, LI Nan, MA Yan-hong. Screening and Expression Analysis of ERF Gene Related to Anthocyanin Synthesis in Colored Potato [J]. Biotechnology Bulletin, 2024, 40(9): 82-91. |
| [5] | SONG Bing-fang, LIU Ning, CHENG Xin-yan, XU Xiao-bin, TIAN Wen-mao, GAO Yue, BI Yang, WANG Yi. Identification of Potato G6PDH Gene Family and Its Expression Analysis in Damaged Tubers [J]. Biotechnology Bulletin, 2024, 40(9): 104-112. |
| [6] | WU Shuai, XIN Yan-ni, MAI Chun-hai, MU Xiao-ya, WANG Min, YUE Ai-qin, ZHAO Jin-zhong, WU Shen-jie, DU Wei-jun, WANG Li-xiang. Genome-wide Identification and Stress Response Analysis of Soybean GS Gene Family [J]. Biotechnology Bulletin, 2024, 40(8): 63-73. |
| [7] | YANG Wei, ZHAO Li-fen, TANG Bing, ZHOU Lin-bi, YANG Juan, MO Chuan-yuan, ZHANG Bao-hui, LI Fei, RUAN Song-lin, DENG Ying. Genome-wide Identification and Expression Analysis of the SRO Gene Family in Brassica juncea L. [J]. Biotechnology Bulletin, 2024, 40(8): 129-141. |
| [8] | ZHOU Lin, HUANG Shun-man, SU Wen-kun, YAO Xiang, QU Yan. Identification of the bHLH Gene Family and Selection of Genes Related to Color Formation in Camellia reticulata [J]. Biotechnology Bulletin, 2024, 40(8): 142-151. |
| [9] | ZHANG Ming-ya, PANG Sheng-qun, LIU Yu-dong, SU Yong-feng, NIU Bo-wen, HAN Qiong-qiong. Identification and Expression Analysis of FAD Gene Family in Solanum lycopersicum [J]. Biotechnology Bulletin, 2024, 40(7): 150-162. |
| [10] | ZANG Wen-rui, MA Ming, CHE Gen, HASI Agula. Genome-wide Identification and Expression Pattern Analysis of BZR Transcription Factor Gene Family of Melon [J]. Biotechnology Bulletin, 2024, 40(7): 163-171. |
| [11] | HU Yong-bo, LEI Yu-tian, YANG Yong-sen, CHEN Xin, LIN Huang-fang, LIN Bi-ying, LIU Shuang, BI Ge, SHEN Bao-ying. Genome-wide Identification and Expression Pattern Analysis of the Bcl-2-related Anti-apoptotic Family in Cucumis sativus L. and Cucurbita moschata Duch. [J]. Biotechnology Bulletin, 2024, 40(6): 219-237. |
| [12] | CHANG Xue-rui, WANG Tian-tian, WANG Jing. Identification and Analysis of E2 Gene Family in Pepper(Capsicum annuum L.) [J]. Biotechnology Bulletin, 2024, 40(6): 238-250. |
| [13] | LIU Rong, TIAN Min-yu, LI Guang-ze, TAN Cheng-fang, RUAN Ying, LIU Chun-lin. Identification and Induced-expression Analysis of REVEILLE Family in Brassica napus L. [J]. Biotechnology Bulletin, 2024, 40(6): 161-171. |
| [14] | WANG Jian, YANG Sha, SUN Qing-wen, CHEN Hong-yu, YANG Tao, HUANG Yuan. Genome-wide Identification and Expression Analysis of bHLH Transcription Factor Family in Dendrobium nobile [J]. Biotechnology Bulletin, 2024, 40(6): 203-218. |
| [15] | LI Meng-ran, YE Wei, LI Sai-ni, ZHANG Wei-yang, LI Jian-jun, ZHANG Wei-min. Expression of Lithocarols Biosynthesis Gene litI and Functional Analysis of Its Promoter [J]. Biotechnology Bulletin, 2024, 40(6): 310-318. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||