生物技术通报 ›› 2024, Vol. 40 ›› Issue (10): 76-85.doi: 10.13560/j.cnki.biotech.bull.1985.2024-0523

• 综述与专论 • 上一篇    下一篇

基于质谱的未知次生代谢物结构解析研究进展与展望

纪宏超1(), 李正艳1,2,3   

  1. 1.中国农业科学院深圳农业基因组研究所(岭南现代农业科学与技术广东省实验室深圳分中心),深圳 518000
    2.河南大学生命科学学院,开封 475004
    3.河南大学深圳研究院,深圳 518000
  • 收稿日期:2024-05-31 出版日期:2024-10-26 发布日期:2024-11-20
  • 通讯作者: 纪宏超
  • 作者简介:纪宏超,男,博士,研究员,研究方向:生物信息学;E-mail: jihongchao@caas.cn
  • 基金资助:
    国家重点研发计划(2023YFA0915800)

Research Progress and Prospects in the Structural Annotation of Unknown Secondary Metabolites Based on Mass Spectrometry

JI Hong-chao1(), LI Zheng-yan1,2,3   

  1. 1. Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518000
    2. School of Life Sciences, Henan University, Kaifeng 475004
    3. Shenzhen Research Institute of Henan University, Shenzhen 518000
  • Received:2024-05-31 Published:2024-10-26 Online:2024-11-20

摘要:

次生代谢产物研究对于植物生长发育、环境适应以及人类健康和药物研发具有重要意义。液相色谱-质谱联用平台(LC-MS)已成为次生代谢研究的首选策略。然而,代谢物结构解析仍受限于标准谱图库覆盖度不足的问题。由于代谢物结构数据库的覆盖度远高于标准谱图库的覆盖度,通过人工智能方法建立代谢物结构与谱图之间的关联,从而实现基于质谱谱图搜索结构数据库是解决这一问题的有效途径。论文综述了通过深度学习技术和生物信息学方法建立代谢物结构与谱图之间联系的三种策略,包括通过结构预测谱图、通过谱图预测结构和通过已知推测未知,并介绍了每种策略解决问题的思路和代表方法。对于每种策略,论文讨论了其算法的优势和不足,以及在实际应用中可能遇到的挑战。此外,还探讨了在开发新算法和进行基准测试时应注意的因素,以及这些因素如何影响对算法的评估。最后,指出了融合更多正交信息是实现更加准确的代谢物注释的未来方向。

关键词: 次生代谢, 生物信息学, 质谱分析, 人工智能

Abstract:

Research on secondary metabolites is of great significance for plant growth and development, environmental adaptation, as well as human health and drug development. Liquid chromatography-mass spectrometry(LC-MS)has become the preferred strategy for secondary metabolism research. However, the annotation of metabolite structures is still hindered by the insufficient coverage of standard spectral libraries. Given that the coverage of metabolite structure databases far exceeds that of standard spectral libraries, establishing the association between metabolite structures and mass spectra through artificial intelligence methods to search molecular structure databases based on mass spectrometry data is an effective approach to address this issue. This paper reviews three strategies for establishing the association between metabolite structures and mass spectra using deep learning techniques and bioinformatics methods, including structure-to-spectrum, spectrum-to-structure, and known-to-unknown strategies. It also introduces the rationale and representative methods for each strategy. For each strategy, the paper discusses the advantages and limitations of its algorithms, as well as the challenges that may be encountered in practical applications. Additionally, the paper explores that factors should be considered when developing new algorithms and conducting benchmark tests, and how these factors may affect the evaluation of algorithms. Finally, the paper points out that integrating more orthogonal information is a future direction for achieving more accurate metabolite annotation.

Key words: secondary metabolism, bioinformatics, mass spectrometry analysis, artificial intelligence