Abstract
Coiled coils are a widespread protein structure motif, consisting of multiple α-helices that wind around a central axis to bury their hydrophobic core. At the sequence level, they are underpinned by short repeats, the most common of which is the 7-residue heptad. By varying the number, composition, and length of their repeats, coiled-coil proteins have access to an outstanding range of structural diversity. In spite of this, their highly regular nature has facilitated the study of the relationship between their sequence and structure, thus becoming a model system for this paradigm of protein science. In this thesis, I address issues with two aspects of coiled-coil research, the discovery and the modeling. In an effort to discover new coiled-coil families, we bioinformatically investigate the distribution of a hitherto understudied coiled-coil repeat, the 11-residue hendecad, in the proteome of life. To this end, we performed a broad survey for proteins that showed features compatible with hendecad coiled-coil structure, and performed interactive analyses on the resulting dataset. The protein families that we found show that hendecads are more diverse than previously thought, and that this motif expands the topological space accessible to coiled coils. Further, we address some of the limitations of coiled-coil modeling tools. For this, we evaluated the applicability of AlphaFold, a state-of-the-art protein structure prediction tool, to modeling coiled-coil structures. We benchmarked its performance through two approaches: measuring its accuracy in terms of local geometry, and testing its potential for topological prediction. Our results demonstrate that, even as a general purpose protein structure prediction tool, AlphaFold performs better than coiled-coil specific software. In addition, we also show that it can be leveraged in a coiled-coil framework to improve topological prediction as well as to probe local coiled-coil folding potentials.