hide
Free keywords:
-
Abstract:
Advances of NGS technologies have enabled the assembly of an increasing number of high quality reference genomes. However, any reference genome needs to be complemented witha high-quality annotation. Genome annotation is a key feature of any genomic resource aimed to be shared within the scientific community to facilitate study of organisms as it describes the genomic features of an organism and their relationships. Most published reference genomes come with an annotation of genes, but annotation at the genetic level only offers a partial view of regulatory processes in this organism, many of which rely on epigenetic mechanisms. To address this issue, here we provide a more detailed annotation of Thlaspi arvense new reference assembly. This annotation focuses on twofeatures most relevant for epigenomic research: transposable elements and small intergenic RNA loci. For transposable element (TE) annotation, we first used RepeatModeler2 and Extensive de novo annotator (EDTA) to identify a first set of putative transposable elements that we then verified and curated manually. With this, we annotated 423,249 individual elements which together make up roughly 61 % of the genome of T. arvense. We complemented this TE annotation with a custom pipeline of small RNA loci annotation. The pipeline uses the blast tool for the filtering of several known confounding sources of non-coding sRNAs and the ShortStack tool for bona fide small RNA biogenesis loci annotation. Using this pipeline in four major tissues -- leaf, root, inflorescence and pollen -- we identified 19,288 distinct sRNAloci in the organism. Altogether, this annotation will provide an important resource for any research group studying this non-model organism or for anybody interested in comparative genomics.