Normal Modes and Essential Dynamics
- Steven HaywardAffiliated withSchool of Computing Sciences and School of Biological Sciences, University of East Anglia
- , Bert L. de GrootAffiliated withMax Planck Institute for Biophysical Chemistry
Normal mode analysis and essential dynamics analysis are powerful methods used for the analysis of collective motions in biomolecules. Their application has led to an appreciation of the importance of protein dynamics in function and the relationship between structure and dynamical behavior. In this chapter, the methods and their implementation are introduced and recent developments such as elastic networks and advanced sampling techniques are described.
KeywordsCollective protein dynamics Conformational flooding Conformational sampling Elastic network Principal component analysis
1.1 Standard Normal Mode Analysis
Normal mode analysis (NMA) is one of the major simulation techniques used to probe the large-scale, shape-changing motions in biological molecules [1–3]. Although it has connection to the experimental techniques of infrared and Raman spectroscopy, its recent application has been to predict functional motions in proteins or other biological molecules. Functional motions are those that relate to function and are often the consequence of binding other molecules. In NMA studies, it is always assumed that the normal modes with the largest fluctuation (lowest frequency modes) are the ones that are functionally relevant, because, like function, they exist by evolutionary design rather than by chance. The ultimate justification for this assumption must come from comparisons with experimental data and indeed studies that compare predictions of an NMA with transitions derived from multiple x-ray conformers do suggest that the low-frequency normal modes are often functionally relevant.
NMA is a harmonic analysis. In its purest form, it uses exactly the same force fields as used in molecular dynamics simulations. In that sense, it is accurate. However, the underlying assumption that the conformational energy surface at an energy minimum can be approximated by a parabola over the range of thermal fluctuations is known not to be correct at physiological temperatures. There exists abundant evidence, both experimental  and computational , that the harmonic approximation breaks down spectacularly for proteins at physiological temperatures, where, far from performing harmonic motion in a single energy minimum, the state point visits multiple minima crossing energy barriers of various heights. Thus, when performing NMA, one has to be aware of this assumption and its limitations at functioning temperatures.
A standard NMA requires a set of coordinates, a force field describing the interactions between constituent atoms, and software to perform the required calculations. The performance of an NMA in Cartesian coordinate space requires three main calculation steps: 1) minimization of the conformational potential energy as a function of the atomic Cartesian coordinates; 2) the calculation of the so-called “Hessian” matrix, which is the matrix of second derivatives of the potential energy with respect to the mass-weighted atomic coordinates; and 3) the diagonalization of the Hessian matrix. This final step yields eigenvalues and eigenvectors (the “normal modes”). Each of these three steps can be computationally demanding, depending on the size of the molecule. Usually, the first and final steps are the bottlenecks. Normally, energy minimization is demanding of CPU time and diagonalization is demanding of CPU time and memory because it involves the diagonalization of a 3N × 3N matrix, where N is the number of atoms in the molecule. We have called this NMA “standard” NMA to distinguish it from the elastic network model NMA.
1.2 Elastic Network Models
Because of the computational difficulties of standard NMA, the current popularity of the elastic network models is not surprising. This is still an NMA, but the protein model is dramatically simplified. Tirion first introduced it into protein research . As the name suggests, the atoms are connected by a network of elastic connections. The method has two main advantages over the standard NMA. The first is that there is no need for energy minimization because the distances of all of the elastic connections are taken to be at their minimum energy length. Second, the diagonalization task is greatly reduced compared with the standard NMA method because the number of atoms is reduced from the total number of atoms to the number of residues, if one uses only Cα atoms, as is common practice. This leads to a tenfold reduction in the number of atoms. Unlike standard NMA, elastic network models have two parameters to be set. One is the force or spring constant, normally denoted as γ or C, and the other is a cut-off distance, denoted Rc.
A pertinent question is whether the method is any less accurate than the standard NMA. Tirion showed that there is a respectable degree of correspondence between the two methods . Given the drastic assumptions that are inherent in the standard NMA, the small difference between the results from these two methods is probably unimportant relative to differences between standard NMA and reality. Comparisons between movements in the low-frequency modes derived from elastic network models of 20 proteins with movements derived from pairs of x-ray structures  suggest the same level of moderate correspondence seen in similar studies using standard NMA. This, together with the relatively low computational cost of elastic network models, explains their current popularity in comparison with standard NMA.
1.3 Essential Dynamics and Principal Components Analysis
Because of the complexity of biomolecular systems, molecular dynamics simulations can be notoriously hard to analyze, rendering it difficult to grasp the motions of interest, or to uncover functional mechanisms. A principal components analysis (PCA) [8–10] often alleviates this problem. Similar to NMA, PCA rests on the assumption that the major collective modes of fluctuation dominate the functional dynamics. Interestingly, it has been found that the vast majority of protein dynamics can be described by a surprisingly low number of collective degrees of freedom . For the analysis of protein molecular dynamics simulations, this approach has the advantage that the dynamics along the individual modes can be inspected and visualized separately, thereby allowing one to filter the main modes of collective motion from more local fluctuations. Because these principal modes of motion could, in many cases, be linked to protein function, the dynamics in the low-dimensional subspace spanned by these modes was termed “essential dynamics” , to reflect the notion that these are the modes essential for function. The subspace spanned by the major modes of collective fluctuations is accordingly often referred to as “essential subspace.” The fact that only a small subset of the total number of degrees of freedom dominates the molecular dynamics of biomolecules not only aids the analysis and interpretation of molecular dynamics trajectories, but also opens the way to enhanced sampling algorithms that search the essential subspace in either a systematic or exploratory fashion [11–14].
In contrast to NMA, PCA of a molecular dynamics simulation trajectory does not rest on the assumption of a harmonic potential. In fact, PCA can be used to study the degree of anharmonicity in the molecular dynamics of a simulated system. For proteins, it was shown that, at physiological temperatures, especially the major modes of collective fluctuation are dominated by anharmonic fluctuations [9, 15]. Overall, protein dynamics at physiological temperatures has been described as diffusion among multiple minima [16–18]; on short timescales, the dynamics are dominated by fluctuations within a local minimum (that can be approximated well by a system's local normal modes), whereas, on longer timescales, the large fluctuations are dominated by a largely anharmonic diffusion between multiple wells.
In NMA the modes of greatest fluctuation are those with the lowest frequencies. As in PCA, no assumptions are implied regarding the harmonicity of the motion, modes are usually sorted according to variance rather than frequency. Nevertheless, the largest-amplitude modes of a PCA usually also represent the slowest dynamical transitions.
2.1 Standard NMA
NMA is usually performed in a vacuum, where the potential energy of a biomole-cule is a complex function of its 3N coordinates, N being the number of atoms. This function is normally written in terms of its bonded and nonbonded energy terms. It is usual to use Cartesian coordinates , although dihedral angles have been used [1,19,20]. The basic idea is that, at a minimum, the potential energy function V can be expanded in a Taylor series in terms of the mass-weighted coordinates qi = √mi Δxi, where Δxi is the displacement of the ith coordinate from the energy minimum and mi is that mass of the corresponding atom. If the expansion is terminated at the quadratic level, then because the linear term is zero at an energy minimum:
Thus, the energy surface is approximated by a parabola characterized by the second derivatives evaluated at the energy at the minimum. The basic, but false, assumption of NMA of biomolecules at physiological temperatures is that fluctuations still occur within this parabolic energy surface. It is known, however, that, at these temperatures, the state point moves on a complex energy surface with multiple minima, crossing energy barriers of various heights . The second derivatives in Eq.1 can be written in a matrix, which is often called the “Hessian,” F. Determination of its eigenvalues and eigenvectors (equivalent to diagonalization) implies: where w j is the jth eigenvector and ω2 j is the jth eigenvalue. There are 3N such eigenvector equations. Each eigenvector specifies a normal mode coordinate through:
The sum is over the elements of w j. Note that ǀw jǀ = 1. It can be shown that these normal mode coordinates oscillate harmonically and independently of each other each with the angular frequency, ωj:
It can be shown that the lower the frequency, the larger the fluctuation of the corresponding normal mode coordinate . It is common to compare the lowest frequency modes with functional modes derived from, e.g., a pair of x-ray structures, one bound to a functional ligand and the other unbound. The overlap with the jth mode can be defined as :
2.2 Elastic Network Models
There is, in fact, no essential difference between the elastic network NMA and the standard NMA other than the force field. In the case of the elastic network, the Hessian would be derived from the following potential energy function : where rij is the distance between atoms i and j and r0 ij is the distance between the atoms in the reference structure, e.g., the crystallographic structure. This summation is only performed over atoms less than a cut-off distance Rc, and γ is the spring, or force constant for the elastic bond between the atoms and is the same for all atoms pairs (see Fig. 1a). The energy function in Eq.7 seems to be the most popular, although other types of functions can be used. The network corresponding to the energy function of Eq.7 is sometimes referred to as the anharmonic network model . A Gaussian network model has a different energy function, which results in modes without any directional information [24,25] and will not be considered here. Once the function of Eq.7 has been calculated, the procedure is exactly the same as for the standard NMA, namely, the Hessian is calculated and its eigenvalues and eigenvectors are determined. Whereas the standard NMA must be performed on all atoms as required by the force field, the elastic network model can be carried out on a subset of atoms. Often, for a protein, this would be the Cα atoms. Compared with the standard NMA, this would result in a Hessian approximately tenfold lower in order, thus, yielding considerable computational savings in the calculation of the eigenvalues and eigenvectors because these routines are normally of the order of N3 operations, where N is the order of the Hessian matrix.
2.3 Essential Dynamics and PCA
After superposition to a common reference structure, a variance—covariance matrix of positional fluctuations is constructed: where <> denotes an ensemble average. The coordinates x are denoted as a function of time for clarity, but may be provided in any order and can be, for example, a molecular dynamics trajectory or a set of experimental structures. C is a symmetric matrix that can be diagonalized by an orthogonal coordinate transformation T: with Λ the diagonal (eigenvalue) matrix and T containing, as columns, the eigenvectors of C. The eigenvalues λ, correspond to the mean square eigenvector coordinate fluctuation, and, therefore, contain the contribution of each principal component to the total fluctuation. The eigenvectors are usually sorted such that their eigenvalues are in decreasing order. For a system of N atoms, C is a 3N = 3N matrix. If at least 3N configurations are used to construct C, then 3N – 6 eigenvectors with nonzero eigenvalues will be obtained. Six eigenvalues should be exactly zero, of which the corresponding eigenvectors describe the overall rotation and translation (that is eliminated by the superposition). If only M configurations are available (with M < 3N), then at most M – 1 nonzero eigenvalues with corresponding eigenvectors will result. If μi is the ith eigenvector of C (the ith column of T), then the original configurations can be projected onto each of the principal components to yield the principal coordinates pi(t) as follows:
Note that the variance < pi2 > equals the eigenvalue λi. These projections can be easily transformed back to Cartesian coordinates for visualization purposes as follows:
Two sets of eigenvectors μ and ν can be compared with each other by taking inner products:
Subspace overlaps are often calculated as summed squared inner products: expressing how much of the n-dimensional subspace of set μ is contained within the m-dimensional subspace of set ν. Note that m should be larger than n to achieve full overlap (O = 1).
3.1 Standard NMA
For standard NMA, one needs a set of coordinates, a force field, and software to perform the calculations. Often NMA is performed using molecular mechanics software packages that are also able to perform molecular dynamics simulations, etc. For a protein, the structural information is normally held in a PDB file. The software will normally be able to interpret the file to determine the correct energy function using the selected force field. Any missing atoms should be added. Missing hydrogen atoms also need to be added but most software packages have routines to do this. It is usual, but not a requirement of the methodology, to remove water and ligands. Once the system is prepared, the first major calculation is energy minimization.
3.1.1 Energy Minimization
The two main energy minimization routines are steepest descent and conjugate gradient. The former can be used in the initial stages, for the first 100 steps, for example, followed by the latter. Sometimes, when approaching the energy minimum, the actual minimum cannot be found because of overstepping. This can present a problem for NMA, where very precise location of the minimum is required. However, many minimizers are able to adjust the step size to avoid overstepping. Normally, minimization can be stopped when the root mean square force is approximately 10−4 to 10−12 kcal ∙ mol−1 ∙ Å −1.
3.1.2 Hessian Calculation
This step creates the Hessian matrix, which is the matrix of second derivatives of the potential energy function with respect to the mass-weighted Cartesian coordinates. It is a symmetric matrix and, therefore, it is not required to store the whole matrix.
3.1.3 Diagonalization of Hessian Matrix
This stage determines the eigenvalues and eigenvectors. Because of the large size of this 3N × 3N matrix, where N is the number of atoms in the molecule, this stage often presents memory problems for large molecules (see Note 1 ). The process results in a set of 3N eigenvalues and a set of 3N eigenvectors each with 3N components. The eigenvalues are sorted in ascending order and the eigenvectors are sorted accordingly. The first six eigenvalues should have values close to zero because these correspond to the three translational and three rotational degrees of freedom for the whole molecule (see Note 2 ). The seventh eigenvector is the lowest frequency mode, and it is often predicted to be a functionally relevant mode.
3.1.4 Comparison with Experimental Results
Eq. 6 shows how to measure the overlap with a functional mode derived from, e.g., two x-ray structures. To perform this calculation, one needs to calculate the experimental displacements, Δxexp i. These displacements need to be calculated from the experimental structures oriented in the same way as the minimized structure used for the NMA. To do this, one can use a least-squares best fit routine to superpose the two experimental structures on the minimized structure.
3.2 Elastic Network Models
• Prepare the structure, e.g., remove ligands and water molecules.
• Decide which atoms will build the network, e.g., just Cα atoms.
• Choose a cut-off length, Rc, which typically is 7 to 10 Å when using just Cα atoms (see Note 4 ).
• Choose the spring constant, γ (see Note 5 ).
• Calculate the eigenvalues and eigenvectors (see Note 6 ).
From the last step onward, there is no essential difference to the standard NMA. However, if calculations are performed on the Cα atoms only, then, naturally, one can only compare with the movements of the Cα atoms in the experimentally determined functional mode, i.e., movements of side chains cannot be compared. Tama and Sanejouand have made a comparison between the results from an elastic network model and modes derived from a pair of x-ray structures for 20 proteins .
3.3 Essential Dynamics and PCA
3.3.1 PCA of Structural Ensembles
A principal component or essential dynamics analysis may be carried out on a molecular dynamics trajectory or any other structural ensemble. It typically consists of three steps. First, the configurations from the ensemble must be superposed, to enable the filtering of internal motions from overall rotation and translation. This is usually accomplished by a least-squares fit of each of the configurations onto a reference structure (see Note 7 ). Second, this “fitted” trajectory is used to construct a variance—covariance matrix that is subsequently diagonalized. The variance—covariance matrix is a symmetric matrix containing, as elements, the co-variances of the atomic displacements relative to their respective averages for each pair of atoms for the off-diagonal elements and the variances of each atom displacements along the diagonal. Atoms that move concertedly give rise to positive covariances, whereas anticorrelated motions give rise to negative entries. Noncorre-lated displacements result in near-zero covariances (see also Note 8 ). Diagonaliza-tion of this covariance matrix yields a set of eigenvectors and eigenvalues, which are usually sorted such that the eigenvalues are in decreasing order. The eigenvalues represent the variance along each of the corresponding collective modes (eigenvectors) and usually a small number of modes suffice to describe the majority of the total fluctuation. As a third step, the original trajectory may be analyzed in terms of the principal components. To this end, the trajectory is projected onto each of the principal modes to yield the time behavior and distribution of each of the principal coordinates (see also Note 9 ). Often, two- or three-dimensional projections along the major principal components are used to allow a representation of the sampled distribution in configuration space or to compare multiple ensembles along the principal modes of collective fluctuation. These projections onto single or multiple principal coordinates can also be readily translated back into Cartesian space to yield an ensemble or animation of the motion along a selection of principal coordinates.
3.3.2 Convergence of PCA Results Derived from Molecular Dynamics Simulations
Principal components derived from different simulations or simulation parts allow us to compare the major directions of configurational space and sampled regions and to judge similarity and convergence. It has been observed that sub-nanosecond protein molecular dynamics simulations suffer from a significant sampling problem, resulting in an apparently poor overlap between the principal components extracted from multiple parts of these trajectories [32, 33]. Nevertheless, it was observed that despite the fact that individual principal components may be different, the subspaces that are spanned by the major principal components converge remarkable rapidly and show a favorable agreement not only between different simulation results, but also between simulation and experiment [28, 30,34,35], see also Fig. 2.
3.3.3 Comparison of PCA Results from Different Sources
It is often useful to compare structural ensembles from different simulations or from experiment with each other in terms of their major principal coordinates. It is instructive to discuss three possibilities that are often used to carry out such a comparison. First, separate principal component analyses may be carried out over each individual ensemble. Subsequently, the resulting eigenvectors are compared with each other, either individually or as, e.g., a subset of major directions. Such a comparison usually involves inner products between sets of eigenvectors as a measure for similarity. For sets of eigenvectors, the summed (or cumulative) squared inner product is a useful measure of similarity that is zero for orthogonal, non-overlapping subspaces and one for identical subspaces. Values from 0.3 to 0.4 already indicate significant overlap, because of the usually large dimensionality of the configuration space as compared with the analyzed subspace. Alternatively, full inner product matrices can also be used [26, 34]. This method focuses on the directions of the principal modes rather than the sampled region along the modes. Therefore, a second, complementary, method can be used to include this ensemble information. In this case, the structures from one ensemble are projected onto the eigenvectors extracted from another ensemble (usually together with the structures from that ensemble), allowing a direct comparison of the sampled regions in each of the projected ensembles. This approach has proven particularly useful for cases in which one set of eigenvectors can be regarded as a reference set, for example, those that were extracted from a set of experimental structures [28, 29]; see also Fig. 2. For cases in which there is not one natural reference set of directions, a third approach may be used. In such cases, multiple sub-ensembles may be concatenated into one meta-ensemble on which the PCA is carried out. The individual sub-ensembles can be separately projected onto this combined set of modes, allowing a direct comparison of sub-ensembles. This method has the advantage that differences between the different sub-ensembles are frequently visible along one of the combined principal modes, even for subtle effects such as the difference between an apo- or holo ensemble, or the effect of a point mutation .
3.3.4 Enhanced Sampling Techniques
Diagonalization routines exert great demands on memory. For example, the routine in AMBER  requires 8 ϗ 9N(3N − 1)/2 bytes of memory. A 400-atom system requires 1.7 Mbytes, but a 4,000-atom system requires 1.7 Gbytes . A number of methodologies have been devised to overcome this memory problem. These methods are usually used to calculate only the lowest frequency eigenvectors. Another alternative is to perform dihedral angle space NMA. This reduces the number of variables by a factor of approximately 8 for proteins and approximately 11 for nucleic acids. These methods have been reviewed elsewhere .
The first six eigenvalues should be close to zero. No eigenvalues should be negative. Negative eigenvalues indicate negative curvature on the energy surface and suggest insufficient minimization.
A major advantage of this method over the standard NMA is that energy minimization is not required. Because energy minimization does not normally bring about large changes in conformation, it is to be expected that there would be little difference between the results from the starting structure and an energy-minimized structure.
It seems that choosing a value for Rc is often problematic. It relates to the radius of the first coordination shell around the selected atoms. If one uses Cα atoms, then its value (7–10 Å ) should be longer than when using all atoms, where a value of 3 Å would be more appropriate. Some reports suggest that results do not vary dramatically with small variations in the cutoff distance . Obviously, the shorter the cut-off, the greater the savings there would be in the calculation of the energy.
The value of γ has no effect on the eigenvectors and, thus, if one is only interested in the character of the motions, then its value is not important. However, its appropriate value is sometimes determined for x-ray structures by calculating atomic mean square fluctuations and matching them to experimentally determined B-factors. A value of 1.0 kcal/mol Å2 might be a reasonable starting value, if no appropriate value is known.
Depending on the structure, some regions may be only loosely connected to the rest of the molecule, e.g., a terminal region in a protein. In such a case, the movement of this region could appear as a low-frequency mode. This may be undesirable if one is interested in global motions. Some programs (private communication from Dr. Atsushi Matsumoto) allow one to provide extra connections to these regions, thus, effectively integrating them more with the rest of the structure.
Before a PCA, all structures should be superimposed onto a common reference structure. This can be problematic for very flexible systems such as peptides, where the fit may be ambiguous, leading to artificial structural transitions. In certain cases, such problems may be alleviated by using a progressive fit, where each structure is superimposed onto the previous one. It is also important to note that when results of different PCAs are to be compared with each other, then each individual PCA should be based on the same reference structure used for superposition.
PCA is a linear analysis, i.e., only linear correlations between atomic displacements enter the covariance matrix. This means that nonlinear correlations between atom movements may be overlooked because they get spread out across multiple collective coordinates. In practice, this is usually not a big problem, except for systems that undergo large-scale rotations.
Similar to NMA, PCA can also be carried out in dihedral angle space [26, 43]. Although it has the advantage that it does not require superposition to a reference structure (because it is based on internal coordinates), PCA in dihedral space has two main disadvantages. First, major collective dihedral transitions do not usually correspond to major transitions in Cartesian space. For example, a small change of one backbone dihedral in a central residue in a two-domain protein can result in a large-scale motion of the two domains with respect to each other. Although such a motion would likely be relevant, it would easily be overlooked. Second, the metric of the configuration space cannot be retained in a straightforward way. This may lead to artificial correlations between the dihedral coordinates and complicates the translation back to Cartesian space for, e.g., visualization purposes.
The authors are grateful to Dr. Akio Kitao and Dr. Atsushi Matsumoto for helpful discussions, and to Oliver Lange and Helmut Grubmüller for kindly providing Fig. 4.