生物技术通报 ›› 2015, Vol. 31 ›› Issue (5): 84-92.doi: 10.13560/j.cnki.biotech.bull.1985.2015.05.014

• 技术与方法 • 上一篇    下一篇

基于RNA-seq技术的楮头红转录组分析

金红1, 焦根林1, 陈刚2   

  1. (1. 深圳市中国科学院仙湖植物园,深圳 518004;2. 肇庆学院生命科学学院,肇庆 526061)
  • 收稿日期:2014-09-11 出版日期:2015-05-18 发布日期:2015-05-18
  • 作者简介:金红,博士,高级工程师,研究方向:植物多样性保护及利用;E-mail:jinhong@szum.gov.cn
  • 基金资助:
    深圳市城市管理局项目(201318)

Transcriptome Analysis of Sarcopyramis nepalensis via RNA-seq Technology

Jin Hong1, Jiao Genlin1, Chen Gang2,   

  1. (1. Fairy Lake Botanical Garden,Shenzhen and CAS,Shenzhen 518004;2. College of Life Science,Zhaoqing University,Zhaoqing 526061)
  • Received:2014-09-11 Published:2015-05-18 Online:2015-05-18

摘要: 利用RNA-seq技术对所构建的楮头红叶片的转录组进行测定,对原始reads进行过滤和组装,得到了51 305条质量较高的Unigenes,平均长度为921 nt,N50为1 490 nt。利用BLAST和BLAST2GO 软件对这些从头组装的Unigenes进行注释。用NCBI蛋白质数据库(Nr)、非冗余核苷酸数据库(Nt)、基因本体论(GO)、直系同源基因簇(COG)和京都基因与基因组百科全书(KEGG)数据库做参考,共注释了40 532条Unigenes。注释到Nr、Nt、Swiss-Prot、KEGG、COG和GO库中的比例相对较高,分别为77.53%、56.18%、53.14%、46.58%、29.69%和60.72%。在蛋白质数据库中对所有的Unigenes进行blast以后,发现有39 302个CDS,用ESTscan预测了2 065个CDS。KEGG通路分析显示,参与次生代谢物生物合成的Unigenes有2 323条,占全部Unigenes的9.72%。其中有78条Unigenes编码了细胞色素P450家族蛋白,这些信息为药用植物次生代谢物生物合成关键基因的挖掘提供了理论参考。

关键词: 楮头红转录组, RNA-seq

Abstract: Transcriptome analysis of Sarcopyramis nepalensis leaves was performed via a newly developed high-throughput sequencing technology(Illumina RNA-seq). A total of 51 305 unigenes were generated with 921 nt of average length and 1 490 nt of unigene N50 after filtering and assembly of original reads. These unigenes from the de novo assembly were further annotated using BLAST and BLAST2GO softwares. A total of 40 532 unigenes annotated with databases of non-redundant protein sequence(Nr), non-redundant nucleotide(Nt), Swiss-Prot, Gene Ontology database(GO), Clusters of Orthologous Groups(COG)and Kyoto Encyclopedia of Genes and Genomes(KEGG)databases available at NCBI as references. The proportion of unigenes annotated in Nr, Nt, Swiss-Prot, KEGG, COG and GO databases were 77.53%, 56.18%, 53.14%, 46.58%, 29.69% and 60.72%, respectively. Total 39 302 CDSs were obtained using blast in protein databases, and 2 065 CDSs were predicted using ESTscan software. KEGG pathway parsing revealed that 2 323(9.72%)unigenes were involved in biosynthesis of secondary metabolites(KO01110), and 78 unigenes encoding the cytochrome P450 family proteins were identified. These annotated information provided theoretical foundationfordetermining the vital genes involved in biosynthesis of secondary metabolites of medicinal plants.

Key words: Sarcopyramis nepalensis transcriptome, RNA-seq