引用本文:
【打印本页】   【下载PDF全文】   查看/发表评论  下载PDF阅读器  关闭
←前一篇|后一篇→ 过刊浏览    高级检索
本文已被:浏览 1396次   下载 1050 本文二维码信息
码上扫一扫!
分享到: 微信 更多
冬瓜高通量转录组测序及分析
叶新如1, 张前荣1, 陈敏氡,等1
福建省农业科学院 作物研究所
摘要:
【目的】获得冬瓜转录组序列、遗传变异等信息,从中挖掘冬瓜基因数据及SSR分子标记,为冬瓜后续研究提供数据支撑。【方法】以冬瓜嫩叶为材料,利用Illumina HiSeqTM2000技术对冬瓜进行转录组测序,构建数据库从中获得干净序列。经De-novo拼接组装后,将获得的单基因簇(Unigene)数据在非冗余蛋白数据库(nonredundant protein database,Nr)、蛋白质序列数据库(Swiss Prot protein database,Swiss Prot)、基因本体论数据库(gene ontology,GO)、蛋白质真核同源数据库(eukaryotic orthologous groups,KOG)、东京基因与基金组百科全书( Kyoto encyclopedia of genes and genomes,KEGG)、蛋白质家族域数据库(protein families database,Pfam)6个公共数据库中进行比对,最终得到冬瓜单基因簇注释信息。利用MISA软件对转录组单基因簇进行搜索,获得单基因簇中的SSR位点。【结果】从冬瓜嫩叶中得到62 021 032条高品质序列,组装后获得40 611条单基因簇,平均长度955 bp。将所有单基因簇在Nr和Swiss Prot数据库中进行比对,结果分别比对到27 474及19 573条单基因簇;在GO数据库中,所注释到的10 659条单基因簇分别匹配到生物功能、分子功能和细胞组分3个本体的47个功能组中;与KOG数据库进行注释比对,根据其功能将注释到的单基因簇划分为25类;KEGG数据库比对注释到10 799条冬瓜的单基因簇,可分为5个大类、19个亚类、125条代谢途径;在Pfam数据库中比对到17 990条单基因簇,分属于369个类群。SSR位点搜索发现,有5 086条单基因簇包含SSR序列,获得5 474个SSR位点。【结论】利用高通量测序获得大量冬瓜转录组信息,有助于从分子水平对冬瓜进行深入研究。
关键词:  冬瓜  高通量测序  转录组  基因注释
DOI:
分类号:
基金项目:福建省属公益类科研院所基本科研专项(2018R1026-2);中央引导地方科技发展专项(2018L3005);福建省农业科学院“青年科技英才百人计划”项目(YC2017-5)
High-throughput sequencing and analysis of transcriptome of Benincasa hispida Cogn
YE Xinru,ZHANG Qianrong,CHEN Mindong,et al
Abstract:
【Objective】The transcriptional sequence,genetic variation and other information of Benincasa hispida were obtained and genetic data and SSR markers were extracted to improve researches on Benincasa hispida.【Method】The Illumina HiSeqTM2000 technology was used for the transcription sequencing of Benincasa hispida tender leaves, and a transcriptome database was built to obtain clean reads.The annotation information of the Unigenes was obtained by De novo splicing assembly and compared with public database (Includes Nr,Swiss Prot,GO,KOG,KEGG,and Pfam).MISA software was used to search the transcription group Unigenes and obtain the SSR sites in Unigenes.【Result】A total of 62 021 032 reads fragments were generated by sequencing.The reads gave 40 611 Unigenes with an average length of 955 bp.A total of 27 474 and 19 573 Unigenes were annotated against the Nr and Swiss Prot databases,respectively.Gene ontology analysis revealed that annotated 10 659 unigenes were grouped into 47 different categories in terms of biological function,molecular function and cellular component.Unigenes in the transcriptome of Benincasa hispida were divided into 25 classes according to the function comparing with the KOG database.Moreover,10 799 unigenes were annotated to 125 KEGG pathway and broadly divided into 5 categories of 19 branches,and 17 990 unigenes were annotated against Pfam database and formed 369 groups.A total of 5 474 SSRs were identified from the sequence of transcription,distributed in 5 086 unigenes.【Conclusion】Large amount of transcriptome information was obtained by high-throughput sequencing,which was helpful for further study of Benincasa hispida at molecular level.
Key words:  Benincasa hispida Cogn  high throughput sequencing  transcriptome  gene annotation