Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
MetadataShow full item record
Background: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines,” on the resulting assemblies are poorly understood. Here, a pipeline was programmatically automated and used to assemble and annotate raw transcriptomic short-read data collected as part of the Marine Microbial Eukaryotic Transcriptome Sequencing Project. The resulting transcriptome assemblies were evaluated and compared against assemblies that were previously generated with a different pipeline developed by the National Center for Genome Research. Results: New transcriptome assemblies contained the majority of previous contigs as well as new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the previous assemblies. Taxonomic trends were observed in the assembly metrics. Assemblies from the Dinoflagellata showed a higher number of contigs and unique k-mers than transcriptomes from other phyla, while assemblies from Ciliophora had a lower percentage of open reading frames compared to other phyla. Conclusions: Given current bioinformatics approaches, there is no single “best” reference transcriptome for a particular set of raw data. As the optimum transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable pipelines are invaluable for managing the computationally intensive tasks required for re-processing large sets of samples with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific trends across samples in addition to novel and useful products for the community.
© The Author(s), 2019. This article is distributed under the terms of the Creative Commons Attribution License. The definitive version was published in Johnson, L. K., Alexander, H., & Brown, C. T. Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Gigascience, 8(4), (2019): giy158, doi: 10.1093/gigascience/giy158.
Suggested CitationJohnson, L. K., Alexander, H., & Brown, C. T. (2019). Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes. Gigascience, 8(4), giy158.
The following license files are associated with this item:
Showing items related by title, author, creator and subject.
The Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) : illuminating the functional diversity of eukaryotic life in the oceans through transcriptome sequencing Keeling, Patrick J.; Burki, Fabien; Wilcox, Heather M.; Allam, Bassem; Allen, Eric E.; Amaral-Zettler, Linda A.; Armbrust, E. Virginia; Archibald, John M.; Bharti, Arvind K.; Bell, Callum J.; Beszteri, Bank; Bidle, Kay D.; Cameron, Connor T.; Campbell, Lisa; Caron, David A.; Cattolico, Rose Ann; Collier, Jackie L.; Coyne, Kathryn J.; Davy, Simon K.; Deschamps, Phillipe; Dyhrman, Sonya T.; Edvardsen, Bente; Gates, Ruth D.; Gobler, Christopher J.; Greenwood, Spencer J.; Guida, Stephanie M.; Jacobi, Jennifer L.; Jakobsen, Kjetill S.; James, Erick R.; Jenkins, Bethany D.; John, Uwe; Johnson, Matthew D.; Juhl, Andrew R.; Kamp, Anja; Katz, Laura A.; Kiene, Ronald P.; Kudryavtsev, Alexander N.; Leander, Brian S.; Lin, Senjie; Lovejoy, Connie; Lynn, Denis; Marchetti, Adrian; McManus, George; Nedelcu, Aurora M.; Menden-Deuer, Susanne; Miceli, Cristina; Mock, Thomas; Montresor, Marina; Moran, Mary Ann; Murray, Shauna A.; Nadathur, Govind; Nagai, Satoshi; Ngam, Peter B.; Palenik, Brian; Pawlowski, Jan; Petroni, Giulio; Piganeau, Gwenael; Posewitz, Matthew C.; Rengefors, Karin; Romano, Giovanna; Rumpho, Mary E.; Rynearson, Tatiana A.; Schilling, Kelly B.; Schroeder, Declan C.; Simpson, Alastair G. B.; Slamovits, Claudio H.; Smith, David R.; Smith, G. Jason; Smith, Sarah R.; Sosik, Heidi M.; Stief, Peter; Theriot, Edward; Twary, Scott N.; Umale, Pooja E.; Vaulot, Daniel; Wawrik, Boris; Wheeler, Glen L.; Wilson, William H.; Xu, Yan; Zingone, Adriana; Worden, Alexandra Z. (Public Library of Science, 2014-06-24)Microbial ecology is plagued by problems of an abstract nature. Cell sizes are so small and population sizes so large that both are virtually incomprehensible. Niches are so far from our everyday experience as to make their ...
Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) location information on samples obtained on Gould (LMG1411) in the Western Antarctica Peninsula during 2014 (Polar Transcriptomes project) Marchetti, Adrian (Biological and Chemical Oceanography Data Management Office (BCO-DMO). Contact: firstname.lastname@example.org, 2019-04-17)Marine Microbial Eukaryote Transcriptome Sequencing Project (MMETSP) location information on samples obtained on Gould (LMG1411) in the Western Antarctica Peninsula during 2014 (Polar Transcriptomes project) For a complete ...
Identifying contamination with advanced visualization and analysis practices : metagenomic approaches for eukaryotic genome assemblies Delmont, Tom O.; Eren, A. Murat (PeerJ, 2016-03-29)High-throughput sequencing provides a fast and cost-effective mean to recover genomes of organisms from all domains of life. However, adequate curation of the assembly results against potential contamination of non-target ...