hide
Free keywords:
-
Abstract:
Reference genomes are often generated with a comprehensive annotation of protein coding genes and the mRNAs they produce, but this offers only a partial view of genome functions, many of which involve epigenetic mechanisms. In this work, we investigate epigenetic components of Thlaspi arvense genome by providing a detailed annotation of transposable elements (TE) and small RNA loci (sRNA). We identified and annotated 423,249 individual TEs, which together constitute 61% of the T. arvense genome of. Among these TEs, we found that retroelements of the GYPSY superfamily are the most abundant, with a single family responsible for 6% of the total genome size. In contrast, some scarcer CACTA and HELITRON families are the most active, as observed from transcriptomic data profiles of several plant tissues. To understand how TE activity is regulated, we complemented our TE annotation with sRNA data. Applying a custom pipeline to data from leaf, root, inflorescence and pollen, we identified 19,288 distinct sRNA loci, of which 72 were microRNAs. Most of the sRNA loci were located at the transition point between gene-rich and TE-dense regions, with sRNA being highly expressed in gene-rich regions. Using this annotation, we will survey a diverse set of wild T. arvense populations for transposable insertion polymorphisms to gain insight into their evolutionary history and correlate them with sRNA expression profiles in a population subset. Our results should provide an important resource for comparative genomics and transposable element evolution research in the Brassicaceae.