hide
Free keywords:
-
Abstract:
Background: New gene emergence is so far assumed to be mostly driven by duplication and divergence of
existing genes. The possibility that entirely new genes could emerge out of the non-coding genomic background
was long thought to be almost negligible. With the increasing availability of fully sequenced genomes across broad
scales of phylogeny, it has become possible to systematically study the origin of new genes over time and thus
revisit this question.
Results: We have used phylostratigraphy to assess trends of gene evolution across successive phylogenetic phases,
using mostly the well-annotated mouse genome as a reference. We find several significant general trends and
confirm them for three other vertebrate genomes (humans, zebrafish and stickleback). Younger genes are shorter,
both with respect to gene length, as well as to open reading frame length. They contain also fewer exons and
have fewer recognizable domains. Average exon length, on the other hand, does not change much over time. Only
the most recently evolved genes have longer exons and they are often associated with active promotor regions, i.e.
are part of bidirectional promotors. We have also revisited the possibility that de novo evolution of genes could
occur even within existing genes, by making use of an alternative reading frame (overprinting). We find several
cases among the annotated Ensembl ORFs, where the new reading frame has emerged at a higher
phylostratigraphic level than the original one. We discuss some of these overprinted genes, which include also the
Hoxa9 gene where an alternative reading frame covering the homeobox has emerged within the lineage leading
to rodents and primates (Euarchontoglires).
Conclusions: We suggest that the overall trends of gene emergence are more compatible with a de novo
evolution model for orphan genes than a general duplication-divergence model. Hence de novo evolution of genes
appears to have occurred continuously throughout evolutionary time and should therefore be considered as a
general mechanism for the emergence of new gene functions.