English
 
Help Privacy Policy Disclaimer
  Advanced SearchBrowse

Item

ITEM ACTIONSEXPORT

Released

Preprint

Minor deviations from randomness have huge repercussions on the functional structuring of sequence space

MPS-Authors
/persons/resource/persons271259

Weidmann,  L
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons271422

Dijkstra,  T
Research Group Biomolecular Interactions, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons44815

Kohlbacher,  O
Research Group Biomolecular Interactions, Max Planck Institute for Developmental Biology, Max Planck Society;

/persons/resource/persons78342

Lupas,  AN
Department Protein Evolution, Max Planck Institute for Developmental Biology, Max Planck Society;

External Resource
No external resources are shared
Fulltext (restricted access)
There are currently no full texts shared for your IP range.
Fulltext (public)
There are no public fulltexts stored in PuRe
Supplementary Material (public)
There is no public supplementary material available
Citation

Weidmann, L., Dijkstra, T., Kohlbacher, O., & Lupas, A. (submitted). Minor deviations from randomness have huge repercussions on the functional structuring of sequence space.


Cite as: https://hdl.handle.net/21.11116/0000-000A-62BB-5
Abstract
Approaches based on molecular evolution have organized natural proteins into a hierarchy of families, superfamilies, and folds, which are often pictured as islands in a great sea of unrealized and generally non-functional polypeptides. In contrast, approaches based on information theory have substantiated a mostly random scatter of natural proteins in global sequence space. We evaluate these opposing views by analyzing fragments of a given length derived from either a natural dataset or different random models. For this, we compile distances in sequence space between fragments within each dataset and compare the resulting distance distributions between sets. Even for 100-mers, more than 95% of distances can be accounted for by a random sequence model that incorporates the natural amino acid frequency of proteins. When further accounting for the specific residue composition of the respective fragments, which would include biophysical constraints of protein folding, more than 99% of all distances can be modeled. Thus, while the local space surrounding a protein is almost entirely shaped by common descent, the global distribution of proteins in sequence space is close to random, only constrained by divergent evolution through the requirement that all intermediates connecting two forms in evolution must be functional.

Significance Statement When generating new proteins by evolution or design, can the entire sequence space be used, or do viable sequences mainly occur only in some areas of this space? As a result of divergent evolution, natural proteins mostly form families that occupy local areas of sequence space, suggesting the latter. Theoretical work however indicates that these local areas are highly diffuse and do not dramatically affect the statistics of sequence distribution, such that natural proteins can be considered to effectively cover global space randomly, though extremely sparsely. By comparing the distance distribution of natural sequences to that of various random models, we find that they are indeed distributed largely randomly, provided that the amino acid composition of natural proteins is respected.