生物技术通报 ›› 2022, Vol. 38 ›› Issue (9): 180-190.doi: 10.13560/j.cnki.biotech.bull.1985.2022-0344

• 研究报告 • 上一篇    下一篇

基于PacBio三代测序的香瓜茄(人参果)基因组的组装

司诚1,2(), 钟启文1, 杨世鹏1()   

  1. 1.青海大学农林科学院 青海省蔬菜遗传与生理重点实验室,西宁 810016
    2.青海大学农牧学院,西宁 810016
  • 收稿日期:2022-03-22 出版日期:2022-09-26 发布日期:2022-10-11
  • 作者简介:司诚,女,硕士研究生,研究方向:作物遗传与生长发育;E-mail: qhdxsc@163.com
  • 基金资助:
    青海省科技厅重点实验室项目(2020-ZJ-Y02)

Assembly of Pepino Genome Based on PacBio's Third-generation Sequencing Technology

SI Cheng1,2(), ZHONG Qi-wen1, YANG Shi-peng1()   

  1. 1. Qinghai Academy of Agricultural and Forestry Sciences/Qinghai Key Laboratory of Vegetable Genetics and physiology,Xining 810016
    2. College of Agriculture and Animal Husbandry,Qinghai University,Xining 810016
  • Received:2022-03-22 Published:2022-09-26 Online:2022-10-11

摘要:

香瓜茄又名人参果,具有抗氧化、抗肿瘤、抗糖尿病等多种生物活性。为丰富茄科作物基因组信息及进化发育历程,获取香瓜茄全基因组序列信息,同时为香瓜茄相关分子研究奠定基础。以香瓜茄植物组织为试验材料,基于Illumina HiSeq构建小片段文库进行基因组特征评估,利用PacBio三代测序技术、Hi-C技术构建及组装香瓜茄全基因组数据库。利用生物信息学方法对获得的基因组序列进行组装、功能注释以及进化分析研究。结果表明,获得54.11 Gb Illumina HiSeq数据;获得55.08 Gb PacBio数据,reads平均长度为14 179 bp;获得Hi-C数据量约143 Gb;拼接得到该基因组contig序列总长为1.16 Gb,Hi-C纠错后contig N50为22.63 Mb;Hi-C挂载染色体,共有1.12 Gb长度的序列可以挂载到12条染色体上,占比97.16%;其中,能够确定顺序和方向的序列长度为1.08 Gb,占定位染色体序列总长度的96.11%,得到基因组大小1.25 Gb;预测有64.22%的重复序列,41 571个基因,99.06%的基因可以注释到NR、GO、KEGG等数据库中;预测得到4 360个tRNA、5 677个rRNA、154个miRNA;得到449个假基因。香瓜茄与马铃薯的进化时间大约在12.82 MYA。

关键词: 香瓜茄, 基因组, PacBio三代测序技术, 基因注释

Abstract:

Pepino(Solanum muricatum)has a variety of biological activities such as antioxidation,antitumor activity,antidiabetic activity. To enrich genomic information and evolutionary development of Solanaceae crops,we obtained the whole genome sequencing information of pepino,which lays the foundation for pepino-related molecular studies. Illumina HiSeq sequencing platform was used to construct a small fragment library for pepino characterization and evaluation while using plant tissues of pepino as experimental material. Then third-generation sequencing technology PacBio's sequencing technology and Hi-C technology were used to construct a whole genome database of the pepino. Different bioinformatics methods were to study assembling the obtained pepino genomes,function annotating and evolutionary analysis. The results showed that a total of 54.11 Gb of Illumina HiSeq data were acquired. First,55.08 Gb of PacBio data were obtained with an average reads length of 14 179 bp. The obtained chromosome conformation capture(Hi-C)was 143 Gb and total length contig sequemce of the assembled genome was 1.16 Gb,with a scaffold N50 of 22.63 Mb,chromosomes with a total of 1.12 Gb length of sequence that can be mapped to 12 chromosomes,accounting for 97.16% of the total genome sequence,respectively. Among the sequences,the length of sequences for which the order and orientation could be determined was 1.08 Gb,96.11% of the total length genes were localized chromosomal sequences. Based on the estimated genome size(1.25 Gb),the 64.22% repeat sequences were predicted,including 41 571 genes and 99.06% of which could be annotated to NR,GO,KEGG and other databases. Noncoding RNAs included 4 360 tRNAs,5 677 rRNAs,154 miRNAs,and a total of 449 pseudogenes were identified. The evolutionary time between pepino and potato was at approximately 12.82 MYA.

Key words: pepino, genome, PacBio third generation sequencing technology, gene annotation