The Power of Reuse in the Evolution of natural Proteins

Weidmann, L; Dijkstra, T; Kohlbacher, O; Lupas, A

Item

ITEM ACTIONSEXPORT

Add to Basket

Local TagsRelease HistoryDetailsSummary

Released

Meeting Abstract

The Power of Reuse in the Evolution of natural Proteins

MPS-Authors

/persons/resource/persons271259

Weidmann, L
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons78342

Lupas, A
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

External Resource

http://www.ngp-net.gmc.vu.lt/wp-content/uploads/2018/10/NGP-Net4_abstract_book.pdf
(Abstract)

Fulltext (restricted access)

There are currently no full texts shared for your IP range.

Fulltext (public)

There are no public fulltexts stored in PuRe

Supplementary Material (public)

There is no public supplementary material available

Citation

Weidmann, L., Dijkstra, T., Kohlbacher, O., & Lupas, A. (2018). The Power of Reuse in the Evolution of natural Proteins. In 4th Symposium on Non-Globular Proteins (NGP-NET 2018) (pp. 31).

Cite as: https://hdl.handle.net/21.11116/0000-000E-0EDF-9

Abstract

Background: The probability of a foldable and functional protein sequence to emerge de novo is extremely small. The evolution of natural proteins therefore often proceeds through the amplification of already existing sequences or their integration into different genomic contexts. Copies of the same protein sequence will diversify over time, leading to the co-existence of similar sequences in nature. Questions addressed: This scenario is not the only explanation for the presence of similar sequences in present-day proteomes: Sequence similarities can arise through common descent, convergence and random chance. We address the individual contributions of these phenomena by analyzing a non-redundant set of bacterial genomes from a statistical point of view. Methods: In an all-against-all approach, we separate randomly expected from unexpected similarities between natural sequence fragments of the same length. We therefore use an unbiased definition of sequence similarity, simply the position-specific sequence identity, without allowing for insertions, deletions or a substitution matrix. Sampling the pairwise similarities between fragments of equal length, reveals the frequencies of certain similarities. These frequencies are then compared to an expected value derived form a null model, which is biased by the underlying natural amino acid composition. We model the expected values using a binomial distribution that estimates the frequency of fragment pairs with a certain similarity in random sequence data. The ratio between natural and randomly expected frequencies results into the over-representation of natural recurrence, the difference in the possible amount of naturally driven reuse. Results and discussion: The majority of natural similarities can be described by the amino acid biased null model. This is reflecting the fact, that fragments are in general similar to a small subset of other fragments but not to the majority of all natural fragments. Hence, most natural fragments are scattered randomly across sequence space with additional local agglomerations in regions, where sequences are reused in nature. We are able to capture the magnitude of these local agglomerations by a trend line of similarity frequencies. It indicates the extent of over-represented highly similar fragments in nature and thereby accounts for homology. This confirms that reuse of existing protein sequences is a major mechanism in protein evolution. When subtracting these presumably homologous similarities from the overall natural similarities, there is still a significant difference to the null model. Especially in the 20-40% sequence identity region, an increase of natural distances can be observed. We assume this increase to be caused by convergence and are currently investigating this hypothesis.