Computing spatially resolved rotational hydration entropies from atomistic simulations

For a first principles understanding of macromolecular processes, a quantitative understanding of the underlying free energy landscape and in particular its entropy contribution is crucial. The stability of biomolecules, such as proteins, is governed by the hydrophobic effect, which arises from competing enthalpic and entropic contributions to the free energy of the solvent shell. While the statistical mechanics of liquids, as well as molecular dynamics simulations have provided much insight, solvation shell entropies remain notoriously difficult to calculate, especially when spatial resolution is required. Here, we present a method that allows for the computation of spatially resolved rotational solvent entropies via a non-parametric k-nearest-neighbor density estimator. We validated our method using analytic test distributions and applied it to atomistic simulations of a water box. With an accuracy of better than 9.6%, the obtained spatial resolution should shed new light on the hydrophobic effect and the thermodynamics of solvation in general.


Introduction
Competing enthalpic and entropic contributions to the solvation free energies give rise to the hydrophobic effect, 1 which is vital for protein function and folding. [2][3][4] Despite extensive theoretical work, 1,5 a quantitative understanding of the hydrophobic effect particularly at heterogeneous surfaces, such as of proteins and mixed bilayers, remains elusive.
Because surface-water shows a significantly altered behaviour compared to bulk, 6,7 it is essential for our understanding of the thermodynamics and energetics of protein solvation to better characterize, e.g., the relative contributions by different solvation shells or the effect of individual protein side chains on the solvent. Molecular dynamics (MD) simulations describe the hydrophobic effect at an atomic level, 8,9 but a deeper understanding of the molecular driving forces requires a quantitative and spatially resolved picture of solvation shell thermodynamics, which poses considerable challenges.
Methods like thermodynamic integration (TI) 10,11 allow for the calculation of solvation entropies based on MD simulations, but the lack of a spatial resolution precludes detailed analysis of how local features of the solventsurface interface contribute and interact. Various order parameters 12-16 assess both the local translational and the local rotational order of water molecules but yield only a qualitative picture of the thermodynamic entropy.
Here, we limit our analysis to absolute rotational water entropies and present a method to reach a spatial resolution from atomistic simulations or Monte Carlo ensembles. Our method employs a mutual information expansion (MIE) to calculate the total entropy of N water molecules based on the contributions of each molecule individually and the entropy loss due to correlations between molecule pairs and triples. A similar approach was taken by, e.g., the grid inhomogeneous solvation theory (GIST). 17-23 Rather than considering entropic contributions by correlations between individual molecules directly, GIST calculates discretised correlation-integrals within voxels, which causes severe sampling problems for higher order correlations. 3D-2-Phase-Thermodynamics (3D-2PT) 24-26 also uses voxels and approximates the system as a superposition of gas-like and solid-like components. Likewise, the Grid Cell Theory (GCT) 27 includes free energies and enthalpies, but it approximates rotational water correlation terms using a generalized Pauling's residual ice entropy model. 28,29 Here, we address these correlations directly, convergence of which is challenging, as they require sampling and density estimates in high-dimensional configuration spaces.
In our approach, all MIE terms were calculated using a k-nearest-neighbor (kNN) density estimator, typically used in Euclidean spaces, 30-32 which we modified and optimized for SO(3) n , the cartesian products of the group of rotations. We considered different metrices for the k-nearest neighbors in SO(3) n , determined an optimal k-value, and provide a computationally efficient framework for rotational entropy calculation.
For easier notation, will develop our method for water molecules, although it is general and applicable to any system with rotational degrees of freedom.
In the following sections, we will first provide the conceptual foundation and then describe our rotation entropy approach. Subsequently, we will apply it to analytical test distributions, as well as to MD water boxes.

Absolute entropy
Separating the entropy of water into rotational and translational contributions yields where S rotation is the entropy of the phase space distribution after projection onto the rotational degrees of freedom; S translation , respectively, is the entropy arising from translational degrees of freedom; and the mutual information (MI) term I corr quantifies the correlations between translation and rotation. In this paper, we focus on the rotational contribution S rotation .
Note that some authors 21,23,33 define the rotational entropy as a conditional entropy, in which case it includes the MI term −I corr .
Let the rotation of N water molecules of the simulation system be described by the Hamil- (3), the kinetic energy T , and the potential energy V, typically described by a molecular mechanics force field. The total entropy is with the Boltzmann constant k B , Planck's constant h, and the normalized and dimensionless phase space density , and the partiton function Z = Z T Z V . Because factorizes, the entropy can be split into a kinetic and a configurational term where h T > 0 is arbitrary, h V = h/h T , and I i are the eigenvalues of the moment-of-inertia tensor of a water molecule. Because S kin can be solved analytically, the challenge is to estimate S conf .

Entropy estimation
Because the rotational entropy integral in 3N dimensions usually cannot be computed directly, we used a truncated mutual information expansion 34-37 (see section 2.2.1) to expand the full high-dimensional integral into multiple low-dimensional integrals over marginal distributions, which can be calculated numerically, similarly to the Inhomogeneous Solvation Theory (IST), 19,20,38 underlying GIST. To obtain these marginal entropies, a k-nearest-neighbor estimator (see section 2.2.2), which estimates the density at each sample point by finding the k closest neighboring sample points and dividing by the volume of a ball that encloses the points, was used. Here, the orientations of N water molecules in n f different samples, e.g., frames of a computer simulation trajectory, were represented by a series of quaternions (see We then defined suitable distance metrics, as required by the kNN algorithm, which are not trivial in curved spaces of rotations SO(3) n (see section 2.2.4), and then calculated the volumes of balls, as induced by the metrics (see section 2.2.5). We finally present a computationally efficient framework that allows finding k neighbors to each sample point (see section 3.1). Figure 1A shows an example of an entropy expansion into mutual information (MI) terms of a system containing three subsystems, such as three water molecules, in a Venn diagram: The full entropy (S) is expanded into MI terms (I m ), of which the first term represents the entropies of each molecule individually and the further terms are correlation terms of 2nd and 3rd order, respectively,

Mutual information expansion
In this notation, S(γ 1 , . . . , γ m ) is the entropy of the marginal distribution with respect to molecules with indices γ 1 , . . . , γ m . For N water molecules, the expansion consists of N MI orders, of which the mth term involves (3m)-dimensional integrals and takes all possible m-molecule correlations into account. Approximating the full entropy by a truncated expansion thus leads to lower dimensional integrals, which can be better sampled. Although there is no guarantee that truncated orders are small and can be neglected, it has been shown that a truncated expansion provides accurate entropy estimates if the correlations are short ranged, 39 as for water in physiological conditions.
Here, we took up to 3-molecule-correlations into account by truncating after the 3rd order, hence where the 1st order includes the kinetic entropy contribution and a correction of −N log 2, due to the two-fold symmetry of the water molecule. The three terms are akin to the terms in IST. 23 In fact, closer analysis shows that in the thermodynamic limit, the 2nd and 3rd order terms in the IST-expansion 17-19 of the molar entropy converge towards the respective terms in eq 2.

kNN entropy estimation
To evaluate eq 2 from a given sample of orientations {q 1 , . . . , q n f } i with i = 1, . . . , N , the marginal entropies from eq 1 are calculated using a kNN entropy estimator. 30-32,40,41 For SO(3) 1 , the kth nearest neighbor with respect to the sample point q i is defined by a metric d(q i , q j ) (see Figure 1B), and (q i ) is estimated as (n f − 1) −1 k/V (r i,k ), where k is a fixed integer, V (r i,k ) is the volume of a ball with radius r i,k , the distance between q i and its kth neighbor, and (n f − 1) −1 is a normalization constant. Results for SO(3) 2 and SO(3) 3 . Each dot on the sphere represents an orientation. For each point x i , the kth neighbor according to a distance metric (e.g., d quat or d geo ) is found. The density is estimated via the volume V (r) of a ball with radius r = d(·, ·). (C) Visualization of the fill mode approach: A correlated dataset is shown on the left hand side. The identical data is decorrelated by applying a random permutation along one axis, as shown on the right. The entropy of the decorrelated data is the sum of both "marginal entropies". are obtained by generalizing the metric d and the volume V (r i,k ) to higher dimensions. The choice of metrices, on which the results may depend for finite sampling, and their corresponding volumes in SO(3) n will be discussed in section 2.2.4 and section 2.2.5. The entropy is where γ k = ψ(k)−log k is a correction which accounts for the bias introduced by the kth neighbors being, by definition, on the edges of the balls. 32 ψ is the digamma function. Because eqs 1b and 1c are sums and differences of integrals of different dimensionalities, biases are introduced: With increasing dimensionality and thus reduced sampling, the kNN estimator yields increasingly smoothed versions of the underlying true distributions. The estimator therefore overestimates entropies of distributions with higher-dimensional supports more than of those defined in lower-dimensional spaces, resulting in biases if entropies of different dimensionality are added or subtracted. To overcome this problem, the sampling space is expanded to equal dimensionality by using fill modes. 37,42 I 2 , defined in eq 1b as the sum of integrals in SO(3) 1 and SO(3) 2 , can be rewritten as a sum of two SO(3) 2 integrals if the corresponding joint distribution (j,k) factorizes to (j) (k) = (j) (k). To achieve statistical independence, the sample points corresponding to index k were subjected to a random permutation {qk ,1 , . . . , qk ,n f } = perm{q k,1 , . . . , q k,n f }, which decorrelates {q j,1 , . . . , q j,n f } and {q k,1 , . . . , q k,n f }, but leaves the marginal distributions unchanged, as sketched in Figure 1C. The joint entropy S(j,k) is thus the sum of the initial marginal entropies S(j) + S(k).

Parametrization of orientations
From different parametrizations of orientations in 3D-space, such as Euler angles, Tait-Bryan angles, Hopf coordinates, 43,44 and spherical coordinates, we used quaternions, 45 which, contrary to most other charts of SO(3), do not suffer from Gimbal lock. They are defined as q = (q 1 , q 2 , q 3 , q 4 ) = ±(cos θ 2 , u sin θ 2 ) , where u and θ are a normalized rotation axis and a rotation angle, respectively. q can thus be interpreted as an element of the 3-sphere, i.e., q 2 = 1. Because there is a one-to-one mapping of the 3-sphere to the Special Unitary group SU (2), which in turn provides a doublecovering of SO(3), each orientation is described by two equivalent quaternions, which differ only by a sign. 46

Choice of metrics in SO(3) n
We next considered the proper choice of metrics in SO(3) n . At first sight, one might think that, of many possible metrics in SO(3), 46,47 only one, e.g., the geodesic metric d geo (q 1 , q 2 ) = arccos (|q 1 · q 2 |) shown in Figure 1B, yields the correct entropy. However, in the limit of infinite sampling, kNN entropy estimation with any metric is possible if used with its induced ball-volumes (see section 2.2.5). 48 Our choice was therefore guided by the speed of convergence and computational efficiency.
We chose the quaternion metric 46,49 Figure 1B, which defines a metric between two rotations as the minimum Euclidean distance between unit quaternions, taking the sign ambiguity into account. In SO (3), the quaternion metric and the more natural geodesic metric d quat yield identical nearest neighbors. They are functionally equivalent be-cause a positive continuous strictly increasing function h, such that h • d geo = d quat (and vice versa), exists. 46 d quat does not require evaluation of the inverse cosine function and thus is computationally more efficient; it was therefore preferred over d geo . Metrices in SO(3) 2 and SO(3) 3 were obtained by combining d quat with the Euclidean norms in R 2 and R 3 , respectively, When combined with the Euclidean norms, the quaternion metric and the more natural geodesic metric are not functionally equivalent and hence yield, in general, different nearest neighbors. For small distances, i.e., for high sampling, the metrices are asymptotically identical.
To test whether this choice of metrics impacts the accuracy of the MI results, we compared our choice to the composite metric using d quat and the maximum-norm in R 2 , which is functionally equivalent to the geodesic composite metric but slightly less efficient to evaluate than d quat 2 . For 10 5 frames, no significant difference between the MI values was seen.

Volumes of balls in SO(3) n
The volumes V (r) = d(q i ,y)<r dy (dark green in Figure 1B), enclosed by the kNN radius r, read The integrals were solved numerically for 10 4 equally spaced values of r using the software Mathematica 10.0 50 and the multidimensional rule; the results were stored in a lookup table. Cubic interpolation was used to obtain results from the stored values.

Nearest-neighbor search
Nearest-neighbor searches were performed using the Non-Metric Space Library 1.7.3.6 51 (NMSLIB) 1 and the above metrics. Each data set was indexed in a vantage-point tree 52,53 (VP-tree) that rests on the triangle inequality. Our version of the NMSLIB, modified to include the orientational metrices, is available online 2 .

Test distributions
To assess the accuracy of our method, we used with a quaternion q ∈ SO(3) 1 , the first quaternion component q 1 , the first azimuthal angle in spherical coordinates for the 3-sphere φ 1 ∈ [0, π/2), and the appropriate normalization constant Z (µ) (Figure 2A). The analytical expression for the configurational entropy dqp using the gamma function Γ and k B = h = 1 for simpler notation. As illustrated in Figure 2A, the distribution depends on the localization parameter µ; a value of 0 yields a uniform distribution; larger µ values yield increasingly narrower distributions.
To also assess the accuracy for correlated dis- was used, which was designed such that the marginals with respect to q 1 and q 2 are p (µ) 1 (q 2 ) and p (µ) 1 (q 1 ), respectively. The localization µ here controls the degree of correlation between q 1 and q 2 , ranging from an uncorrelated uniform distribution (µ = 0) on SO(3) 2 to strongly correlated distributions for lager values. The entropy of this distribution is where log(8π 2 ) is the entropy of a free rotor.
Samples were obtained using a rejection method: First, a random point in Q = (q 1 , . . . , q n ) ∈ SO(3) n was drawn from a uniform distribution by drawing n quaternions from the uniform distribution on the 3-sphere. Next, a random number a was drawn from a uniform distribution between 0 and max(p (µ) n ). Q was accepted if a < p (µ) n (Q) and was rejected otherwise. This process was repeated until the desired number of samples was obtained.
The accuracy of our method was assessed for each test distribution for localization parameters between µ = 0 and 50, nearest-neighbor k-values of 1, 5, 9, and 13, and with 10 2 to 10 5 frames (n f ). The computed entropy and MI values were compared to the analytical results. To obtain statistical error estimates, the calculations for each parameterset was repeated 1000 times.

Molecular dynamics simulations
All MD simulations were carried out using a modified version 3 of the software package Gromacs 2018 [54][55][56][57][58] with an additional center of mass motion (COM) removal 59 method, used to individually constrain all oxygen atoms. We furthermore made small additional changes to apply COM removal to individual atoms and to overcome the limit of 254 COM removal groups 4 . The CHARMM36m force field [60][61][62][63][64] and the CHARMM-TIP3P water model 65 were used. All water molecules were subjected to SETTLE 66 constraints (i.e., rigid), and the leap frog integrator with a time step of 2 fs was used. Electrostatic forces were calculated using the Particle-Mesh Ewald (PME) method 67 with a 1.2 nm real space cut-off; the same cutoff was used for Lennard-Jones potentials. 68 In all simulations, the V-rescale thermostat 69 with a time constant of 0.1 ps, and, if applicable, the Parrinello-Rahman barostat 70,71 with a time constant of 1.0 ps and 1 bar pressure were used. A total of 1728 water molecules were placed within a cubic simulation box, and the system was equilibrated for 1 ns at 300 K as a NPT ensemble. From the equilibrated system, resulting in a box size of approximately 3.7 nm, three 1 µs production runs were started, as shown in the first column of Figure 3. Run m ("mobile") was carried out as described above. To benchmark our method against the established method of thermodynamic integration (TI), a system with only rotational degrees of freedom was constructed. To this end, all oxygens were position-constrained using COM removal as shown in the first column of Figure 3 in run p ("pinned"), allowing only rotational movements around the oxygen atom under NVT conditions. The temperature was increased to 600 K, since the water molecules formed an almost rigid, icelike hydrogen bond network at 300 K, showing only very little dynamics. Run sp ("sliced & pinned") was simulated like p, but all water molecules within a slice of 0.5 nm width were removed to create a water-vacuum interface.

Entropy calculation
For all three test systems, the entropy of rotation was calculated as described in section 2, each using a 1 µs trajectory with 10 5 frames. For the MI terms, a cut-off depending on the distance between average molecule positions was used. Whereas including the MIs of many molecule pairs by using a large cut-off distance gave rise to a more accurate MIE, it also introduced larger noise due to limited sampling. For pairwise MI terms, the cut-off was chosen as 1.0 nm, because for larger distances, the MI terms vanished within statistical errors (see Figure 3B and D). Similarly, triple MI contributions were cut off at 0.45 nm.
Because the water molecules in system m were mobile, average positions across the obtained trajectory were unusable to define a cut-off. Therefore, the water molecules were relabeled in each frame, such that they remained as close as possible to a simple-cubic reference structure using permutation reduction, 72,73 which left the physics of the system unchanged. In systems p and sp, the molecules were immobilized and the oxygen positions where used for applying the cut-off.
To quantify the precision of the method, the MD simulations and the subsequent entropy analyses were repeated in 10 independent calculations.

Thermodynamic integration reference
Reference entropy values for systems p and sp were obtained using thermodynamic integration 10,11,72 (TI). Interactions between water molecules were gradually switched off in a stepwise fashion to obtain the entropy difference between real water and non-interacting water. The absolute rotational entropy was obtained as the sum of the excess entropy, obtained via TI, and the ideal gas contribution, where I i are the eigenvalues of the moment of inertia tensor of a water molecule. Both TI calculations were performed using the soft-core 74 parameters α = 0.5 and σ = 0.3. Coulomb interactions were linearly switched off in 80 windows of 20 ns each and further 10 windows were used to subsequently switch off the van-der-Waals interactions. The first nanosecond of each window was discarded.

Test distributions
We first assessed the accuracy of our method for three uncorrelated analytical test distributions (defined in SO (3) Figure 2A, and, for p (µ) 2,corr , controls the strength of the correlation. As can be seen in Figure 2B, the kNN estimator largely agrees with the analytic results (dashed lines) for the uncorrelated distributions p   and k = 1, the analytical result is matched within statistical errors (0.28 nats at µ = 50 at maximum), whereas larger values of k lead to overestimated entropies of up to 0.7 nats (k = 13, µ = 50), caused by the limited sampling of just 100 frames and the increased dimensionality of SO(3) 3 compared to the other tested distributions.
Next, we assessed the accuracy for the correlated test distribution. The panels of Figure 2C and D show the entropy (calculated via the MIE as defined in eq 1b) and the MI of p (µ) 2,corr for 1000 frames, respectively. For the uniform distribution (µ = 0), the algorithm yields the analytic values of 2 log(8π 2 ) and 0 for entropy and mutual information, respectively. With increasing correlation µ, the entropy is increasingly overestimated as MI is underestimated. Both effects are more pronounced for larger k-values: Whereas for k = 1, the algorithm yields accurate values within statistical errors up to a correlation of µ ≈ 20, the results deviate significantly for k = 13 even for very small µ-values. Overall, small k-values, such as k = 1, yield high accuracy but with reduced precision (i.e., larger statistical errors) compared to large kvalues like 13, which gives rise to smaller statistical errors but reduced accuracy.
To further assess this trade-off and the convergence properties of our method, we calculated the relative entropy errors for p (20) 2,corr for sampling between 10 2 and 10 5 frames, shown in Figure 2E. For k = 1 and only 100 frames, the method overestimates the true entropy by 5 to 10%, which quickly drops to below 1% for more than 2 · 10 3 frames. For larger k-values, the entropy errors increase and the convergence becomes slower, e.g., k = 13 requires 2·10 4 frames to achieve an entropy error of less than 1 %. The statistical errors at 10 5 frames are 0.11 % and 0.05 % for k-values of 1 and 13, respectively. Overall, k = 1 yields somewhat lower precision but significantly faster convergence compared to larger values, which becomes even more pronounced in higher dimensions. We therefore consider this value the optimal choice for the systems at hand and used it for all subsequent analyses.
The kNN entropy estimator rests on the assumption that the density is approximately constant and isotropic within each k-nearestneighbor ball (see Figure 1B). This assumption implies that features of the true distribution that are smaller than the average distance between sample points are not resolved, which, in case of poor sampling, inevitably leads to an overestimated entropy, as seen for p (µ) 3 with large k or as shown in Figure 2E. The assumption of isotropy no longer holds for highly correlated datasets, such as p (µ) 2,corr for large values of µ. In this case, also the k-nearest neighbors to each sample point are correlated and thus not isotropically distributed, which is not reflected by an isotropic kernel, i.e., a ball. For Euclidean spaces, this problem was addressed by using anisotropic kernels. 75,76 Although this idea could also be applied in SO(3) n , the correlation of water molecules at standard conditions is weak enough ( Figure 3A) to allow for sufficiently accurate results under the isotropy assumption.
The trade-off between accuracy and precision with respect to the k-value is a general property of kNN entropy estimators, which has been characterized previously, 76,77 and is intuitively accessible: Whereas averaging over an increasing number of neighbors reduces statistical uncertainties and thus improves precision, the assumptions of approximately constant isotropic densities are applied to increasingly larger balls, resulting in increasingly overestimated entropies for distributions with small scale features or strong correlations.
Overall, the kNN method with k = 1 yields most accurate results while being only slightly less precise than estimators with lager k. It retrieves the analytical entropies within statistical errors for the uncorrelated distributions, as well as for the correlated distribution with µ < 20 using just 100 and 1000 frames, respectively.

Entropy calculated from MD simulations
Having assessed our rotational entropy method against analytic test distributions, we tested its accuracy for more realistic systems of up to 1728 interacting water molecules. To this end, we simulated three atomistic water MD systems (Figure 3, left column), as described in section 3.3. For all systems, 10 independent MD simulations were performed, and for each system, entropies were calculated via a MIE as explained in section 3.3.1. System m ("mobile") comprises 1728 unconstrained water molecules.
As shown in Figure 3A, an absolute rotational entropy of (40.53 ± 0.04) J·mol −1 ·K −1 per molecule is obtained, to which the first, second, and third MI orders contribute (44.2349 ± 0.0007) J·mol −1 ·K −1 , (−4.550 ± 0.015) J·mol −1 ·K −1 , and (0.85±0.04) J·mol −1 ·K −1 , respectively. Note that the provided values are averages and standard deviations of the 10 independent calculations and that the uncertainties are too small to be shown as error bars in Figure 3. The pair-mutual information terms I 2 , shown in Figure 3B, reach a maximum of 0.8 J·mol −1 ·K −1 for very close water molecules and vanish monotonically for molecules that are, after permutation reduction and on average, separated by more than ≈ 0.8 nm. Note that the discrete nature of distances in Figure 3B is due to the choice of a simple cubic reference structure for permutation reduction.
To compare the obtained absolute entropies to TI 10,11 (described in section 3.3.2), the water movement was restricted to the rotational degrees of freedom in system p ("pinned") by pinning each molecule as described in section 3.3. Here, the rotational entropy, shown in panel C, is reduced to (29.53 ± 0.03) J·mol −1 ·K −1 . The 2nd and 3rd order mutual information terms contribute (−18.47 ± 0.01) J·mol −1 ·K −1 and (4.21 ± 0.02) J·mol −1 ·K −1 , respectively. Compared to the results from TI shown in gray, the entropy is underestimated by 9.6 % due to the limited sampling of the strongly correlated system. Similar to what we observe for the analytical test case depicted in Figure 2D, the MI terms are underestimated for strong correlations, of which the 3rd order is most severely affected due to the high dimensionality of the sampling space.
The I 2 terms, illustrated in Figure 3D, show a maximum of 7 J·mol −1 ·K −1 and indicate that water molecules decorrelate beyond ≈ 0.4 nm. The distribution shows secondary and tertiary peaks around 0.55 nm and 0.80 nm that arise from indirect coupling via one or two mediating water molecules, as indicated by the structures shown in Figure 3D. In this case, the correlations between the molecule pairs are not due to direct interactions; instead, mediating water molecules (orange) enhance distant orientation correlations via short hydrogen-bonded chains (shown in red). This finding demonstrates that the method is able to identify regions of locally coupled water molecules and to quantify the resulting entropy losses, thus providing a spatially resolved picture of entropy changes.
To further assess and demonstrate the accuracy of the method for systems with spatial features, we included a 0.5 nm vacuum slice in system sp ("sliced & pinned", Figure 3), such that the dynamics of water molecules at the surface differs from those molecules in the bulk. For system sp, the accuracy of our entropy estimation relative to TI improves to 8.5 %, whereas the contributions by higher MI orders remain almost identical (see Figure 3E). We assume that the improved accuracy is due to the smaller number of molecules (1728 vs. 1493 with slice) and possibly because the vacuum slice limits the range of many-particle correlations that would not be captured by a 3rd order approximation. Figure 3F shows the I 2 terms of molecule pairs that are closer than 0.33 nm, i.e., those that are within their first hydration shells, relative to their distance to the slice. The correlations of pairs that are close to the vacuum interface are increased to 5.6 J·mol −1 ·K −1 on average compared to 4.1 J·mol −1 ·K −1 in bulk. Although the entropy per molecule increases compared to system p, mainly due to the dominating 1st order term (see Figure 3C and E), the increased correlations at the surface and their associated entropy losses contribute to the thermodynamic unfavorability of water at a (hydrophobic) vac- The MIE approaches the TI values for systems p and sp to 9.6 % and 8.5 %, respectively, and additionally yields information about individual correlations and their associated entropy losses, thus providing spatial resolution. Remarkably, about 25-fold less computer time was required for the MIE compared to TI for the shown examples.
The large 2nd and 3rd order contributions, illustrated in Figure 3C and E, show that both systems with pinned water exhibit strong correlations between water molecules. As for the test distributions illustrated in Figure 2, strong correlations result in systematically underestimated MI values. Due to their high dimensionality, and thus low sampling density, we expect the 3rd order MIE contributions for systems p and sp to be mostly affected, contributing to their overall underestimated entropy. For the same reason, we expect entropies calculated from more loosely coupled mobile water to yield markedly more accurate results.
Although a direct comparison to TI is impossible for system m, we expect that the errors due to the truncation of higher order MI terms, observed for the more tightly correlated systems p and sp, are larger than for unconstrained water. Therefore, the approximation of the truncated MIE yields more accurate results for realistic solute systems. These two effects combined, the performances obtained for the more correlated pinned water systems provide upper bounds for the expected errors.

Conclusion
We developed an estimator for spatially resolved rotational solvent entropies based on a truncated mutual information expansion and the k-nearest-neighbor algorithm on SO(3) n . Accuracy and computational efficiency were assessed for both analytical test distributions and for systems of up to 1728 water molecules, described by atomistic MD simulations.
For the uncorrelated test distributions in SO(3) 1 , SO(3) 2 , and SO(3) 3 , the estimator with k = 1 yields accurate entropies for as lit-tle as 100 sample points. For the correlated test distribution p (µ) 2 , the entropies are overestimated for increasing coupling, caused by underestimating mutual information terms. The latter effect is especially pronounced for large kvalues. Precision increased only marginally for larger k at the cost of decreased accuracy, which led us to conclude that k = 1 represents the best trade-off for the problem at hand. We furthermore demonstrated convergence within 2 · 10 3 frames for a correlated distribution (µ = 20) and therefore expect our approach to accurately describe correlations of water molecules already in relatively short MD trajectories of 100 ns to 1 µs.
For the considered MD systems, we find agreement within 9.6 % and 8.5 % with TI for pinned waters in systems p and sp, respectively, corresponding to energy deviations (−T ∆S) of 0.94 kJ·mol −1 and 0.84 kJ·mol −1 per water molecule at 300 K. The obtained rotational entropic contributions to the free energy are precise within ±0.008 kJ·mol −1 and ±0.018 kJ·mol −1 , respectively. For the binding of a small ligand that displaces 10 water molecules at the binding pocket, we therefore expect to obtain absolute rotational entropycontributions corresponding to an accuracy of at least 10 kJ·mol −1 and to resolve rotational entropy differences corresponding to at least 0.06 kJ·mol −1 . As seen in the second column of Figure 3, fully mobile water exhibits considerably smaller correlations than pinned water, rendering the tests using pinned water a tough benchmark compared to realistic solute systems. For a protein/water system, we would therefore expect markedly smaller error margins.
The algorithm provides spatial resolution by assessing the mutual information contributions on the level of individual molecules, distinguishing it from, e.g., GIST. 17-22 For the hydrophobic vacuum interface, we calculated an entropy loss due to an increase in mutual information close to the surface. The ability to resolve the origin of entropy changes renders the method a promising tool to enhance our understanding of processes like the hydrophobic effect and the thermodynamics of solvated complex heteroge-neous biomolecules in general.
Work on including the contributions by the translational entropy and the translationrotation correlation to the overall entropy is in progress and will be published elsewhere. Also, our method can be extended to include intramolecular entropy contributions of flexible solvents, e.g., simulated water without SET-TLE 66 constraints. In this case, additional correlation terms would arise from pairwise correlations between the internal degrees of freedom, translation, and rotational, as well as the respective triple-correlation terms, which might be challenging to converge.
Although in this study we restricted the application and assessment of our approach to water, generalization to other solvents is straight forward. An implementation is available for download 5 as a python module 78,79 with a C++ backend for fast neighbor search.