Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
Date
2019-04
Authors
Johnson, Lisa K.
Alexander, Harriet
Brown, C. Titus
Alexander, Harriet
Brown, C. Titus
Linked Authors
Alternative Title
Citable URI
As Published
Date Created
Location
DOI
10.1093/gigascience/giy158
Related Materials
Replaces
Replaced By
Keywords
marine microbial eukaryote
transcriptome assembly
automated pipeline
re-analysis
transcriptome assembly
automated pipeline
re-analysis
Abstract
Background: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without
an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using
different workflows, or “pipelines,” on the resulting assemblies are poorly understood. Here, a pipeline was
programmatically automated and used to assemble and annotate raw transcriptomic short-read data collected as part of
the Marine Microbial Eukaryotic Transcriptome Sequencing Project. The resulting transcriptome assemblies were evaluated
and compared against assemblies that were previously generated with a different pipeline developed by the National
Center for Genome Research. Results: New transcriptome assemblies contained the majority of previous contigs as well as
new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the
previous assemblies. Taxonomic trends were observed in the assembly metrics. Assemblies from the Dinoflagellata showed
a higher number of contigs and unique k-mers than transcriptomes from other phyla, while assemblies from Ciliophora
had a lower percentage of open reading frames compared to other phyla. Conclusions: Given current bioinformatics
approaches, there is no single “best” reference transcriptome for a particular set of raw data. As the optimum
transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable
pipelines are invaluable for managing the computationally intensive tasks required for re-processing large sets of samples
with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing
data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific
trends across samples in addition to novel and useful products for the community.
Description
© The Author(s), 2019. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Johnson, L. K., Alexander, H., & Brown, C. T. Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Gigascience, 8(4), (2019): giy158, doi: 10.1093/gigascience/giy158.
Embargo Date
Citation
Johnson, L. K., Alexander, H., & Brown, C. T. (2019). Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Gigascience, 8(4), giy158.