English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Poster

The prevalence of conservative evolution in the protein sequence universe

MPS-Authors
/persons/resource/persons271259

Weidmann,  L       
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons271422

Dijkstra,  TMH       
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons44815

Kohlbacher,  O       
IMPRS From Molecules to Organisms, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons78342

Lupas,  AN       
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Weidmann, L., Dijkstra, T., Kohlbacher, O., & Lupas, A. (2018). The prevalence of conservative evolution in the protein sequence universe. Poster presented at CAS Conference 2018: Molecular Origins of LIFE, München, Germany.


Cite as: https://hdl.handle.net/21.11116/0000-000B-70E4-5
Abstract
The genesis of a structured and functional protein by random processes is exceedingly unlikely. However, once a functioning protein emerges, it can easily gain acceptance [1]. The evolution of natural proteins therefore often proceeds through the amplification of already established protein sequences. Copies of the same sequence evolve over time, leading to the co-existence of similar
sequences that might also have diversified in function [2]. We investigate the prevalence of such conservative evolution by analyzing reuse in the protein sequence universe. 1300 non-redundant bacterial genomes of distinct genera with exemplars from most bacterial classes are chosen as a representative for this study. We use statistical modeling in order to distinguish sequence similarities arising through reuse, as opposed to mere chance. For this purpose we derive the distribution of point mutation distances between randomly drawn k-mers. For long point mutation distances, the distribution can be described by a binomial distribution based on the amino acid composition of the underlying data. The frequency of shorter distances is significantly increased relative to the binomial distribution and can be explained by reuse. In the example of 100mers, we find that most sequence fragments (>90%) are at least reused once (p-value of 10-5). More than 10% of all sequence fragments are extensively reused and reoccur more than thousand times. Pairwise genome comparison reveals an overlap of around 19% common sequences on average. This demonstrates that the pressure to conserve sequences is strong enough to cause such significant sequence overlap, even after billions of years have passed.