The prevalence of conservative evolution in the protein sequence universe

Weidmann, L; Dijkstra, TMH; Kohlbacher, O; Lupas, AN

Datensatz

DATENSATZ AKTIONENEXPORT

Zur Ablage hinzufügen

Lokale TagsFreigabegeschichteDetailsÜbersicht

Freigegeben

Poster

The prevalence of conservative evolution in the protein sequence universe

MPG-Autoren

/persons/resource/persons271259

Weidmann, L
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons271422

Dijkstra, TMH
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons44815

Kohlbacher, O
IMPRS From Molecules to Organisms, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons78342

Lupas, AN
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

Externe Ressourcen

https://www.emergence-of-life.de/past-events/181011-12_mom_abstractbook.pdf
(Zusammenfassung)

Volltexte (beschränkter Zugriff)

Für Ihren IP-Bereich sind aktuell keine Volltexte freigegeben.

Volltexte (frei zugänglich)

Es sind keine frei zugänglichen Volltexte in PuRe verfügbar

Ergänzendes Material (frei zugänglich)

Es sind keine frei zugänglichen Ergänzenden Materialien verfügbar

Zitation

Weidmann, L., Dijkstra, T., Kohlbacher, O., & Lupas, A. (2018). The prevalence of conservative evolution in the protein sequence universe. Poster presented at CAS Conference 2018: Molecular Origins of LIFE, München, Germany.

Zitierlink: https://hdl.handle.net/21.11116/0000-000B-70E4-5

Zusammenfassung

The genesis of a structured and functional protein by random processes is exceedingly unlikely. However, once a functioning protein emerges, it can easily gain acceptance [1]. The evolution of natural proteins therefore often proceeds through the amplification of already established protein sequences. Copies of the same sequence evolve over time, leading to the co-existence of similar
sequences that might also have diversified in function [2]. We investigate the prevalence of such conservative evolution by analyzing reuse in the protein sequence universe. 1300 non-redundant bacterial genomes of distinct genera with exemplars from most bacterial classes are chosen as a representative for this study. We use statistical modeling in order to distinguish sequence similarities arising through reuse, as opposed to mere chance. For this purpose we derive the distribution of point mutation distances between randomly drawn k-mers. For long point mutation distances, the distribution can be described by a binomial distribution based on the amino acid composition of the underlying data. The frequency of shorter distances is significantly increased relative to the binomial distribution and can be explained by reuse. In the example of 100mers, we find that most sequence fragments (>90%) are at least reused once (p-value of 10-5). More than 10% of all sequence fragments are extensively reused and reoccur more than thousand times. Pairwise genome comparison reveals an overlap of around 19% common sequences on average. This demonstrates that the pressure to conserve sequences is strong enough to cause such significant sequence overlap, even after billions of years have passed.