hide
Free keywords:
-
Abstract:
Novel proteins can emerge not only from ancestral proteins but from formerly non-coding DNA; these proteins have emerged de novo. De novo proteins are assumed to be more similar to random sequences rather than to established proteins. So far, no experimental structure of a de novo protein has been solved. Accordingly, it is unclear whether certain de novo proteins fold into structures that are similar to established proteins. The development of structure prediction programs based on co-evolutionary patterns and language models has allowed us to obtain confident insights into possible de novo protein structures. Nevertheless, the reliability of structure predictions for de novo proteins remains uncertain due to limited homology and high disorder levels. To overcome this uncertainty, we employed multiple different but overlapping prediction tools for structural, biophysical, and chemical properties on de novo proteins of Drosophila. Concurrently, we applied these predictions to random and established proteins to allow for comparison of these groups. Our comparisons indicate that de novo proteins in general are more similar to random sequences in terms of disorder, and younger de novo proteins are more disordered than older ones. Additionally, de novo proteins are more prone to undergo liquid-liquid phase separation, which may provide an initial activity to enable their retention in the cellular environment. Also, AlphaFold2's pLDDT metric correlates differently for secondary elements of the groups of de novo, random, and established proteins. Furthermore, some de novo proteins show structural homology with established proteins, although they exhibit no sequence homology to other proteins. Our findings suggest that de novo proteins are, on average, different from random and established proteins, while certain de novo proteins are predicted to be structurally and functionally similar to established proteins. Our goal is to experimentally solve the structure of these de novo proteins and reveal potential chemical activities.